[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19064#discussion_r135446631
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -22,6 +22,9 @@ import 
org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.types.DataType
 
+// Trait for identifying user-defined functions.
--- End diff --

```Scala
/**
 * Common base trait for user-defined functions, including ScalaUDF, 
ScalaUDAF, PythonUDF, 
 * HiveSimpleUDF, HiveGenericUDF, and HiveUDAFFunction.
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery ...

2017-08-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r135443202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -98,6 +99,11 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
   }
+
+  rewrotePlan transform {
+case j @ Join(left, right, _, _) if !j.duplicateResolved =>
+  j.copy(right = DedupQueryAttributesInPlans.dedupRight(left, 
right))
--- End diff --

@hvanhovell We may not be able to revise `duplicateResolved`.

Even LeftSemi do output from only left side of the join, we still need 
`duplicateResolved` as false if there are duplicate attributes between left and 
right sides.

Otherwise, if there is a condition, the condition will be pushdown to left 
side of the join, because all attribute references in the condition is 
belonging to one side. It changes the join results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...

2017-08-27 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/19064

[SPARK-21848][SQL] Add trait UDFType to identify user-defined functions

## What changes were proposed in this pull request?

Add trait UDFType to identify user-defined functions.
UDF can be expensive. In optimizer we may need to avoid executing UDF 
multiple times.
E.g.
```scala
table.select(UDF as 'a).select('a, '(a + 1))
```
If UDF is expensive in this case, optimizer should not collapse the project 
to
```scala
table.select(UDF as 'a, (UDF+1) as '(a+1))
```

Currently UDF classes like PythonUDF, HiveGenericUDF are not defined in 
catalyst. 
This PR is to add a new trait to make it easier to identify user-defined 
functions.


## How was this patch tested?

Unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark UDFType

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19064


commit 5f508ab0b1bf91f7a857adabbefea399c8a71005
Author: Wang Gengliang 
Date:   2017-08-26T13:15:15Z

add trait UDFType to identify UDF classes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81173/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19050
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81172/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81172/testReport)**
 for PR 19050 at commit 
[`121ad5a`](https://github.com/apache/spark/commit/121ad5aa77e5bf251a539b5aa4761c392f2a0db8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18787#discussion_r135439310
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite {
 s"vectorized reader"))
 }
   }
+
+  test("create read-only batch") {
--- End diff --

`create a columnar batch from Arrow column vectors` or something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18787#discussion_r135439372
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite {
 s"vectorized reader"))
 }
   }
+
+  test("create read-only batch") {
+val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, 
Long.MaxValue)
+val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = 
true)
+  .createVector(allocator).asInstanceOf[NullableIntVector]
+vector1.allocateNew()
+val mutator1 = vector1.getMutator()
+val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = 
true)
+  .createVector(allocator).asInstanceOf[NullableIntVector]
+vector2.allocateNew()
+val mutator2 = vector2.getMutator()
+
+(0 until 10).foreach { i =>
+  mutator1.setSafe(i, i)
+  mutator2.setSafe(i + 1, i)
+}
+mutator1.setNull(10)
+mutator1.setValueCount(11)
+mutator2.setNull(0)
+mutator2.setValueCount(11)
+
+val columnVectors = Seq(new ArrowColumnVector(vector1), new 
ArrowColumnVector(vector2))
+
+val schema = StructType(Seq(StructField("int1", IntegerType), 
StructField("int2", IntegerType)))
+val batch = new ColumnarBatch(schema, 
columnVectors.toArray[ColumnVector], 11)
+batch.setNumRows(11)
+
+assert(batch.numCols() == 2)
+assert(batch.numRows() == 11)
+
+val rowIter = batch.rowIterator().asScala
+rowIter.zipWithIndex.foreach { case (row, i) =>
+  if (i == 10) {
+assert(row.isNullAt(0))
+  } else {
+assert(row.getInt(0) == i)
+  }
+  if (i == 0) {
+assert(row.isNullAt(1))
+  } else {
+assert(row.getInt(1) == i - 1)
+  }
+}
+
+intercept[java.lang.AssertionError] {
+  batch.getRow(100)
+}
+
+columnVectors.foreach(_.close())
--- End diff --

We can use `batch.close()` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18787#discussion_r135439793
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -1629,6 +1632,39 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 }
   }
 
+  test("roundtrip payloads") {
+val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, 
Long.MaxValue)
+val vector = ArrowUtils.toArrowField("int", IntegerType, nullable = 
true)
+  .createVector(allocator).asInstanceOf[NullableIntVector]
+vector.allocateNew()
+val mutator = vector.getMutator()
+
+(0 until 10).foreach { i =>
+  mutator.setSafe(i, i)
+}
+mutator.setNull(10)
+mutator.setValueCount(11)
+
+val schema = StructType(Seq(StructField("int", IntegerType)))
+
+val batch = new ColumnarBatch(schema, Array[ColumnVector](new 
ArrowColumnVector(vector)), 11)
--- End diff --

Btw, do we need to use `ColumnarBatch` for this test?
I guess we can simply create `Iterator[InternalRow]` and use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18787#discussion_r135438683
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -111,6 +125,66 @@ private[sql] object ArrowConverters {
   }
 
   /**
+   * Maps Iterator from ArrowPayload to InternalRow. Returns a pair 
containing the row iterator
+   * and the schema from the first batch of Arrow data read.
+   */
+  private[sql] def fromPayloadIterator(
+  payloadIter: Iterator[ArrowPayload],
+  context: TaskContext): ArrowRowIterator = {
+val allocator =
+  ArrowUtils.rootAllocator.newChildAllocator("fromPayloadIterator", 0, 
Long.MaxValue)
+
+new ArrowRowIterator {
+  private var reader: ArrowFileReader = null
+  private var schemaRead = StructType(Seq.empty)
+  private var rowIter = if (payloadIter.hasNext) nextBatch() else 
Iterator.empty
--- End diff --

We can simply put `Iterator.empty` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...

2017-08-27 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18787#discussion_r135439857
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -1629,6 +1632,39 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 }
   }
 
+  test("roundtrip payloads") {
+val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, 
Long.MaxValue)
+val vector = ArrowUtils.toArrowField("int", IntegerType, nullable = 
true)
+  .createVector(allocator).asInstanceOf[NullableIntVector]
--- End diff --

Should the `allocator` and the `vector` be closed at the end of this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81173 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81173/testReport)**
 for PR 19050 at commit 
[`bf07e2a`](https://github.com/apache/spark/commit/bf07e2ab338f2c030b78279111a6e431997ba13b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81170/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81170 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81170/testReport)**
 for PR 18581 at commit 
[`41369cf`](https://github.com/apache/spark/commit/41369cf26fcdd20708168a78c0ca35b614f83f77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19050
  
**[Test build #81172 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81172/testReport)**
 for PR 19050 at commit 
[`121ad5a`](https://github.com/apache/spark/commit/121ad5aa77e5bf251a539b5aa4761c392f2a0db8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery ...

2017-08-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19050#discussion_r135434713
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -98,6 +99,11 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
   }
+
+  rewrotePlan transform {
+case j @ Join(left, right, _, _) if !j.duplicateResolved =>
+  j.copy(right = DedupQueryAttributesInPlans.dedupRight(left, 
right))
--- End diff --

@hvanhovell Because predicate subqueries are rewritten into left semi/anti 
joins which don't have duplicate outputs. I think you mean correlated scalar 
subqueries which are rewritten into left outer joins, is it right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81171/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19029
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19029
  
**[Test build #81171 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81171/testReport)**
 for PR 19029 at commit 
[`c40eba3`](https://github.com/apache/spark/commit/c40eba38d82893d5604aa66ec9037df706da712d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19062: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-08-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19062
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19062: [SPARK-21845] [SQL] Make codegen fallback of expr...

2017-08-27 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19062#discussion_r135430732
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -370,8 +373,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 try {
   GeneratePredicate.generate(expression, inputSchema)
 } catch {
-  case e @ (_: JaninoRuntimeException | _: CompileException)
-  if sqlContext == null || sqlContext.conf.wholeStageFallback =>
--- End diff --

Better to put this comment in 
https://github.com/apache/spark/pull/19062/files#diff-b9f96d092fb3fea76bcf75e016799678R57?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...

2017-08-27 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19018
  
ping @felixcheung We can make all R tests for trees deterministic (not only 
random trees). Leave other problems to separate PR. It would be great to fix it 
soon, Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19029
  
**[Test build #81171 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81171/testReport)**
 for PR 19029 at commit 
[`c40eba3`](https://github.com/apache/spark/commit/c40eba38d82893d5604aa66ec9037df706da712d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81170 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81170/testReport)**
 for PR 18581 at commit 
[`41369cf`](https://github.com/apache/spark/commit/41369cf26fcdd20708168a78c0ca35b614f83f77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19049: [WEB-UI]Add the 'master' column to identify the t...

2017-08-27 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/19049#discussion_r135427302
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js 
---
@@ -136,6 +136,16 @@ $(document).ready(function() {
 (attempt.hasOwnProperty("attemptId") ? attempt["attemptId"] + 
"/" : "") + "logs";
   attempt["durationMillisec"] = attempt["duration"];
   attempt["duration"] = formatDuration(attempt["duration"]);
+  var idStr = id.toString();
+  if(idStr.indexOf("application_") > -1) {
+attempt["master"] = "yarn";
+  } else if(idStr.indexOf("app-") > -1) {
+attempt["master"] = "standalone";
+  } else if(idStr.indexOf("local-") > -1) {
+attempt["master"] = "local";
+  } else {
+attempt["master"] = "mesos";
--- End diff --

I am through the ID to judge, not through the name to judge. ID is based on 
the type of resource management to determine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19061
  
Hi, @vanzin .
Could you review this when you have sometime? I'm wondering if this is 
implemented correctly in a way you expected. Please let me know if there is 
something to do more. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...

2017-08-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19031
  
ok, I'll close for now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...

2017-08-27 Thread maropu
Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/19031


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...

2017-08-27 Thread logannc
Github user logannc commented on the issue:

https://github.com/apache/spark/pull/18945
  
Sorry for the delay. Things got busy and now there is the storm in Houston. 
Will update this per these suggestions soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning

2017-08-27 Thread mdespriee
Github user mdespriee commented on the issue:

https://github.com/apache/spark/pull/17461
  
I updated the example following your suggestion. It's more consistent with 
LDAExample this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18837: [Spark-20812][Mesos] Add secrets support to the dispatch...

2017-08-27 Thread ArtRand
Github user ArtRand commented on the issue:

https://github.com/apache/spark/pull/18837
  
Hello @vanzin, thanks for the review. I added `.toSequence` to the new 
configuration specs, certainly a nice solution to parsing on the fly. Please 
let me know if there is anything else that needs changing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/18610#discussion_r135418170
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -226,6 +246,12 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 
 if (($(solver) == Auto &&
   numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) 
== Normal) {
+
+  if (isSet(initialModel)) {
+logWarning("Initial model will be ignored if fitting by normal 
solver. " +
--- End diff --

Since initial model is a pretty important parameter. By setting the initial 
model, user would expect it to work and they may neglect the warning in the 
overwhelming Spark logs.
Maybe we can move the parameter check to `transformSchema` and throws an 
exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/18610#discussion_r135418289
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -72,6 +72,22 @@ private[regression] trait LinearRegressionParams extends 
PredictorParams
 }
 
 /**
+ * Params for linear regression.
+ */
+private[regression] trait LinearRegressionParams extends 
LinearRegressionModelParams
--- End diff --

It maybe cleaner if we just move the param `initialModel` into 
LinearRegression? so we don't have to touch the class hierarchy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #7842: [SPARK-8542][MLlib]PMML export for Decision Trees

2017-08-27 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/7842
  
@coderxiang what is the plan for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16992
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81169/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16992
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16992
  
**[Test build #81169 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81169/testReport)**
 for PR 16992 at commit 
[`3d6c80b`](https://github.com/apache/spark/commit/3d6c80b4857b2b776b55516ea5e699e0e470b4a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16992
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16992
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81168/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16992
  
**[Test build #81168 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81168/testReport)**
 for PR 16992 at commit 
[`77cfb03`](https://github.com/apache/spark/commit/77cfb03a82966412e6468edbff358415197c8aaa).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19055: [SPARK-21839][SQL] Support SQL config for ORC compressio...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19055
  
Hi, @gatorsmile .
Could you review this ORC configuration PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19063: [SPARK-21846][TEST] Reduce the number of shuffle ...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/19063


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19063
  
I'm closing this PR because the numbers are different than what I expected 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19060
  
For Parquet, I can find 
[TestInputOutputFormat.java](https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/test/java/parquet/hadoop/example/TestInputOutputFormat.java).
 Parquet also has a test case which is very specific to the impl, too.

What do you mean by `e2e test case not specific to the impl.` exactly? 
Sorry, but could you provide an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19063
  
I saw that, too. Right. It becomes meaningless. Some is reduced but the 
other increase. I'm trying to do another approche in this PR and JIRA. I will 
update more. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-08-27 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/18610
  
Just to confirm, so we have agreed that the initialModel should be of type 
[T <: Model[T]] rather than a String type (path to the saved model)? Sorry I 
didn't find the related discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-08-27 Thread JohnHBrock
Github user JohnHBrock commented on the issue:

https://github.com/apache/spark/pull/18610
  
Is there anymore work to do before this can get merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19031
  
This is an internal conf. For the advanced users, we do not encourage them 
to disable it. If they want to disable it, they can simply set it to a number 
above 8000. Thus, setting `maxLinesPerFunction` to `-1` is not needed, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19062: [SPARK-21845] [SQL] Make codegen fallback of expr...

2017-08-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19062#discussion_r135416594
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -54,6 +54,9 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
   @transient
   final val sqlContext = 
SparkSession.getActiveSession.map(_.sqlContext).orNull
 
+  // whether we should fallback when hitting compilation errors caused by 
codegen
+  private val codeGenFallBack = sqlContext == null || 
sqlContext.conf.codegenFallback
--- End diff --

I see


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18961: [SPARK-21746][SQL]there is an java.lang.IllegalArgumentE...

2017-08-27 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18961
  
@dongjoon-hyun @cloud-fan Do you have any suggestions?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19062: [SPARK-21845] [SQL] Make codegen fallback of expressions...

2017-08-27 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/19062
  
+1,LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread erenavsarogullari
Github user erenavsarogullari commented on the issue:

https://github.com/apache/spark/pull/16992
  
Hi @squito,

Thanks for the review this patch. It is ready to re-review / merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler...

2017-08-27 Thread erenavsarogullari
Github user erenavsarogullari commented on a diff in the pull request:

https://github.com/apache/spark/pull/16992#discussion_r135415731
  
--- Diff: docs/job-scheduling.md ---
@@ -235,7 +235,7 @@ properties:
   of the cluster. By default, each pool's `minShare` is 0.
 
 The pool properties can be set by creating an XML file, similar to 
`conf/fairscheduler.xml.template`,
-and setting a `spark.scheduler.allocation.file` property in your
+and either setting `fairscheduler.xml` into classpath or a 
`spark.scheduler.allocation.file` property in your
--- End diff --

Addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16992
  
**[Test build #81169 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81169/testReport)**
 for PR 16992 at commit 
[`3d6c80b`](https://github.com/apache/spark/commit/3d6c80b4857b2b776b55516ea5e699e0e470b4a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18966#discussion_r135415687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -769,16 +769,27 @@ class CodegenContext {
   foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): 
String = {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
+val defaultMaxLines = 100
+val maxLines = if (SparkEnv.get != null) {
+  
SparkEnv.get.conf.getInt("spark.sql.codegen.expressions.maxCodegenLinesPerFunction",
--- End diff --

This is not following what we are doing for the other SQLConf. I am also 
thinking if we should just put it into `StaticSQLConf`. Let me check it with 
others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16992
  
**[Test build #81168 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81168/testReport)**
 for PR 16992 at commit 
[`77cfb03`](https://github.com/apache/spark/commit/77cfb03a82966412e6468edbff358415197c8aaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r135415452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -32,7 +32,9 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
  * in that file.
  */
--- End diff --

Add `@parameter`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19063
  
- org.apache.spark.sql.hive 10 min
- org.apache.spark.sql.hive.client  6 min 20 sec
- org.apache.spark.sql.hive.execution   28 min
- org.apache.spark.sql.hive.orc 2 min 1 sec 
- org.apache.spark.sql.hive.thriftserver3 min 11 sec

Changing it from 5 to 3 does not help, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19060
  
I mean, how about Parquet and the others? Do they have the e2e test cases 
in their projects?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateO...

2017-08-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19029#discussion_r135411403
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -439,8 +439,9 @@ private[ml] object WeightedLeastSquares {
 
 /**
  * Weighted population standard deviation of labels.
+ * We prevent variance from negative value caused by numerical error.
--- End diff --

I'm not so against this, but this is really an implementation detail and 
not relevant to the caller. It's a value that is by definition nonnegative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19029
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81167/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19029
  
**[Test build #81167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81167/testReport)**
 for PR 19029 at commit 
[`21e7ff7`](https://github.com/apache/spark/commit/21e7ff7ea65da1c03b32445405d2bd55346db096).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81166/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81166/testReport)**
 for PR 18581 at commit 
[`08233e6`](https://github.com/apache/spark/commit/08233e654072b1f117926b813459f3e2bf6b8a55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19029
  
**[Test build #81167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81167/testReport)**
 for PR 19029 at commit 
[`21e7ff7`](https://github.com/apache/spark/commit/21e7ff7ea65da1c03b32445405d2bd55346db096).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19063
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19063
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81163/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19063
  
**[Test build #81163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81163/testReport)**
 for PR 19063 at commit 
[`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81165/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81165/testReport)**
 for PR 18581 at commit 
[`cbed415`](https://github.com/apache/spark/commit/cbed41534ff7a6a7219542ec282fcee4bdf67a67).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81166 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81166/testReport)**
 for PR 18581 at commit 
[`08233e6`](https://github.com/apache/spark/commit/08233e654072b1f117926b813459f3e2bf6b8a55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81164/testReport)**
 for PR 18581 at commit 
[`9b1bf10`](https://github.com/apache/spark/commit/9b1bf108e61f6af69331a5d4052b53a47d34bf71).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81164/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18581
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81165/testReport)**
 for PR 18581 at commit 
[`cbed415`](https://github.com/apache/spark/commit/cbed41534ff7a6a7219542ec282fcee4bdf67a67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19059: [SS] - Avoid using `return` inside `CachedKafkaCo...

2017-08-27 Thread YuvalItzchakov
Github user YuvalItzchakov commented on a diff in the pull request:

https://github.com/apache/spark/pull/19059#discussion_r135405801
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -125,8 +131,11 @@ private[kafka010] case class CachedKafkaConsumer 
private(
   toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, 
untilOffset)
   }
 }
-resetFetchedData()
-null
+
+if (isFetchComplete) consumerRecord else {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18581
  
**[Test build #81164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81164/testReport)**
 for PR 18581 at commit 
[`9b1bf10`](https://github.com/apache/spark/commit/9b1bf108e61f6af69331a5d4052b53a47d34bf71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-08-27 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18966
  
ping @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-08-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r135405534
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -32,7 +32,9 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
  * in that file.
  */
 class HadoopFileLinesReader(
-file: PartitionedFile, conf: Configuration) extends Iterator[Text] 
with Closeable {
+file: PartitionedFile,
+lineSeparator: Option[String],
--- End diff --

OK. Will change this but I should say this way looks incorrect to me and 
this behaviour should be discussed and possibly updated in the near future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18928: [SPARK-21696][SS]Fix a potential issue that may generate...

2017-08-27 Thread YuvalItzchakov
Github user YuvalItzchakov commented on the issue:

https://github.com/apache/spark/pull/18928
  
Right. We've had some problems with reading snapshots after executors dying 
on OOM, I hope this does the trick :)

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19058: SPARK-21843:testNameNote should be "(minNumPostSh...

2017-08-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19058


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19058: SPARK-21843:testNameNote should be "(minNumPostShufflePa...

2017-08-27 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19058
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19063
  
**[Test build #81163 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81163/testReport)**
 for PR 19063 at commit 
[`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19063
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81162/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19063
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19063
  
**[Test build #81162 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81162/testReport)**
 for PR 19063 at commit 
[`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19060
  
If you agree, I will try to write more code here as POC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19060
  
Parquet is the same. We can use `unhandledFilters` for PPD.
I think that the others text-based data sources(TEXT/CSV/JSON) doesn't 
support PPD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19063
  
**[Test build #81162 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81162/testReport)**
 for PR 19063 at commit 
[`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19063
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19060
  
How about Parquet and the others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-08-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19060
  
In ORC GitHub, up to my knowledge, this is the highest level.
```
val reader = new OrcInputFormat[OrcStruct]().createRecordReader(split, 
attemptContext)
... reader.nextKeyValue()
... reader.getCurrentValue
... row.getFieldValue(0).asInstanceOf[IntWritable].get
```

Actually, I have one idea. If we support 
[unhandledFilters](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L237)
 in data source testing by SQLConf, we can do in Spark level like *value limit* 
case. How do you think about that? May I try that way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18581#discussion_r135403912
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -32,7 +32,9 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
  * in that file.
  */
 class HadoopFileLinesReader(
-file: PartitionedFile, conf: Configuration) extends Iterator[Text] 
with Closeable {
+file: PartitionedFile,
+lineSeparator: Option[String],
--- End diff --

So far, following Hive is the safest. If users complain about it, we can 
behave differently from Hive with a new SQLConf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >