[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19064#discussion_r135446631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -22,6 +22,9 @@ import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.types.DataType +// Trait for identifying user-defined functions. --- End diff -- ```Scala /** * Common base trait for user-defined functions, including ScalaUDF, ScalaUDAF, PythonUDF, * HiveSimpleUDF, HiveGenericUDF, and HiveUDAFFunction. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19050#discussion_r135443202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -98,6 +99,11 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { val (newCond, inputPlan) = rewriteExistentialExpr(Seq(predicate), p) Project(p.output, Filter(newCond.get, inputPlan)) } + + rewrotePlan transform { +case j @ Join(left, right, _, _) if !j.duplicateResolved => + j.copy(right = DedupQueryAttributesInPlans.dedupRight(left, right)) --- End diff -- @hvanhovell We may not be able to revise `duplicateResolved`. Even LeftSemi do output from only left side of the join, we still need `duplicateResolved` as false if there are duplicate attributes between left and right sides. Otherwise, if there is a condition, the condition will be pushdown to left side of the join, because all attribute references in the condition is belonging to one side. It changes the join results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19064: [SPARK-21848][SQL] Add trait UDFType to identify ...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/19064 [SPARK-21848][SQL] Add trait UDFType to identify user-defined functions ## What changes were proposed in this pull request? Add trait UDFType to identify user-defined functions. UDF can be expensive. In optimizer we may need to avoid executing UDF multiple times. E.g. ```scala table.select(UDF as 'a).select('a, '(a + 1)) ``` If UDF is expensive in this case, optimizer should not collapse the project to ```scala table.select(UDF as 'a, (UDF+1) as '(a+1)) ``` Currently UDF classes like PythonUDF, HiveGenericUDF are not defined in catalyst. This PR is to add a new trait to make it easier to identify user-defined functions. ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark UDFType Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19064 commit 5f508ab0b1bf91f7a857adabbefea399c8a71005 Author: Wang GengliangDate: 2017-08-26T13:15:15Z add trait UDFType to identify UDF classes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19050 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81173/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19050 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19050 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19050 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81172/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19050 **[Test build #81172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81172/testReport)** for PR 19050 at commit [`121ad5a`](https://github.com/apache/spark/commit/121ad5aa77e5bf251a539b5aa4761c392f2a0db8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439310 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create read-only batch") { --- End diff -- `create a columnar batch from Arrow column vectors` or something? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439372 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create read-only batch") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) +} + +columnVectors.foreach(_.close()) --- End diff -- We can use `batch.close()` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439793 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -1629,6 +1632,39 @@ class ArrowConvertersSuite extends SharedSQLContext with BeforeAndAfterAll { } } + test("roundtrip payloads") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector = ArrowUtils.toArrowField("int", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector.allocateNew() +val mutator = vector.getMutator() + +(0 until 10).foreach { i => + mutator.setSafe(i, i) +} +mutator.setNull(10) +mutator.setValueCount(11) + +val schema = StructType(Seq(StructField("int", IntegerType))) + +val batch = new ColumnarBatch(schema, Array[ColumnVector](new ArrowColumnVector(vector)), 11) --- End diff -- Btw, do we need to use `ColumnarBatch` for this test? I guess we can simply create `Iterator[InternalRow]` and use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135438683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -111,6 +125,66 @@ private[sql] object ArrowConverters { } /** + * Maps Iterator from ArrowPayload to InternalRow. Returns a pair containing the row iterator + * and the schema from the first batch of Arrow data read. + */ + private[sql] def fromPayloadIterator( + payloadIter: Iterator[ArrowPayload], + context: TaskContext): ArrowRowIterator = { +val allocator = + ArrowUtils.rootAllocator.newChildAllocator("fromPayloadIterator", 0, Long.MaxValue) + +new ArrowRowIterator { + private var reader: ArrowFileReader = null + private var schemaRead = StructType(Seq.empty) + private var rowIter = if (payloadIter.hasNext) nextBatch() else Iterator.empty --- End diff -- We can simply put `Iterator.empty` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r135439857 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --- @@ -1629,6 +1632,39 @@ class ArrowConvertersSuite extends SharedSQLContext with BeforeAndAfterAll { } } + test("roundtrip payloads") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector = ArrowUtils.toArrowField("int", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] --- End diff -- Should the `allocator` and the `vector` be closed at the end of this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19050 **[Test build #81173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81173/testReport)** for PR 19050 at commit [`bf07e2a`](https://github.com/apache/spark/commit/bf07e2ab338f2c030b78279111a6e431997ba13b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81170/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81170/testReport)** for PR 18581 at commit [`41369cf`](https://github.com/apache/spark/commit/41369cf26fcdd20708168a78c0ca35b614f83f77). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery should ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19050 **[Test build #81172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81172/testReport)** for PR 19050 at commit [`121ad5a`](https://github.com/apache/spark/commit/121ad5aa77e5bf251a539b5aa4761c392f2a0db8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19050: [SPARK-21835][SQL][WIP] RewritePredicateSubquery ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19050#discussion_r135434713 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -98,6 +99,11 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { val (newCond, inputPlan) = rewriteExistentialExpr(Seq(predicate), p) Project(p.output, Filter(newCond.get, inputPlan)) } + + rewrotePlan transform { +case j @ Join(left, right, _, _) if !j.duplicateResolved => + j.copy(right = DedupQueryAttributesInPlans.dedupRight(left, right)) --- End diff -- @hvanhovell Because predicate subqueries are rewritten into left semi/anti joins which don't have duplicate outputs. I think you mean correlated scalar subqueries which are rewritten into left outer joins, is it right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81171/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19029 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19029 **[Test build #81171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81171/testReport)** for PR 19029 at commit [`c40eba3`](https://github.com/apache/spark/commit/c40eba38d82893d5604aa66ec9037df706da712d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19062: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19062 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19062: [SPARK-21845] [SQL] Make codegen fallback of expr...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19062#discussion_r135430732 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -370,8 +373,7 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ try { GeneratePredicate.generate(expression, inputSchema) } catch { - case e @ (_: JaninoRuntimeException | _: CompileException) - if sqlContext == null || sqlContext.conf.wholeStageFallback => --- End diff -- Better to put this comment in https://github.com/apache/spark/pull/19062/files#diff-b9f96d092fb3fea76bcf75e016799678R57? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19018: [SPARK-21801][SPARKR][TEST] unit test randomly fail with...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19018 ping @felixcheung We can make all R tests for trees deterministic (not only random trees). Leave other problems to separate PR. It would be great to fix it soon, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19029 **[Test build #81171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81171/testReport)** for PR 19029 at commit [`c40eba3`](https://github.com/apache/spark/commit/c40eba38d82893d5604aa66ec9037df706da712d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81170 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81170/testReport)** for PR 18581 at commit [`41369cf`](https://github.com/apache/spark/commit/41369cf26fcdd20708168a78c0ca35b614f83f77). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19049: [WEB-UI]Add the 'master' column to identify the t...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/19049#discussion_r135427302 --- Diff: core/src/main/resources/org/apache/spark/ui/static/historypage.js --- @@ -136,6 +136,16 @@ $(document).ready(function() { (attempt.hasOwnProperty("attemptId") ? attempt["attemptId"] + "/" : "") + "logs"; attempt["durationMillisec"] = attempt["duration"]; attempt["duration"] = formatDuration(attempt["duration"]); + var idStr = id.toString(); + if(idStr.indexOf("application_") > -1) { +attempt["master"] = "yarn"; + } else if(idStr.indexOf("app-") > -1) { +attempt["master"] = "standalone"; + } else if(idStr.indexOf("local-") > -1) { +attempt["master"] = "local"; + } else { +attempt["master"] = "mesos"; --- End diff -- I am through the ID to judge, not through the name to judge. ID is based on the type of resource management to determine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19061 Hi, @vanzin . Could you review this when you have sometime? I'm wondering if this is implemented correctly in a way you expected. Please let me know if there is something to do more. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19031 ok, I'll close for now. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable m...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/19031 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18945: Add option to convert nullable int columns to float colu...
Github user logannc commented on the issue: https://github.com/apache/spark/pull/18945 Sorry for the delay. Things got busy and now there is the storm in Houston. Will update this per these suggestions soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning
Github user mdespriee commented on the issue: https://github.com/apache/spark/pull/17461 I updated the example following your suggestion. It's more consistent with LDAExample this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18837: [Spark-20812][Mesos] Add secrets support to the dispatch...
Github user ArtRand commented on the issue: https://github.com/apache/spark/pull/18837 Hello @vanzin, thanks for the review. I added `.toSequence` to the new configuration specs, certainly a nice solution to parsing on the fly. Please let me know if there is anything else that needs changing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r135418170 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -226,6 +246,12 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String if (($(solver) == Auto && numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == Normal) { + + if (isSet(initialModel)) { +logWarning("Initial model will be ignored if fitting by normal solver. " + --- End diff -- Since initial model is a pretty important parameter. By setting the initial model, user would expect it to work and they may neglect the warning in the overwhelming Spark logs. Maybe we can move the parameter check to `transformSchema` and throws an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r135418289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -72,6 +72,22 @@ private[regression] trait LinearRegressionParams extends PredictorParams } /** + * Params for linear regression. + */ +private[regression] trait LinearRegressionParams extends LinearRegressionModelParams --- End diff -- It maybe cleaner if we just move the param `initialModel` into LinearRegression? so we don't have to touch the class hierarchy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #7842: [SPARK-8542][MLlib]PMML export for Decision Trees
Github user skonto commented on the issue: https://github.com/apache/spark/pull/7842 @coderxiang what is the plan for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81169/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #81169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81169/testReport)** for PR 16992 at commit [`3d6c80b`](https://github.com/apache/spark/commit/3d6c80b4857b2b776b55516ea5e699e0e470b4a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16992 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81168/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #81168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81168/testReport)** for PR 16992 at commit [`77cfb03`](https://github.com/apache/spark/commit/77cfb03a82966412e6468edbff358415197c8aaa). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19055: [SPARK-21839][SQL] Support SQL config for ORC compressio...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19055 Hi, @gatorsmile . Could you review this ORC configuration PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19063: [SPARK-21846][TEST] Reduce the number of shuffle ...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/19063 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19063 I'm closing this PR because the numbers are different than what I expected before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 For Parquet, I can find [TestInputOutputFormat.java](https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/test/java/parquet/hadoop/example/TestInputOutputFormat.java). Parquet also has a test case which is very specific to the impl, too. What do you mean by `e2e test case not specific to the impl.` exactly? Sorry, but could you provide an example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19063 I saw that, too. Right. It becomes meaningless. Some is reduced but the other increase. I'm trying to do another approche in this PR and JIRA. I will update more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/18610 Just to confirm, so we have agreed that the initialModel should be of type [T <: Model[T]] rather than a String type (path to the saved model)? Sorry I didn't find the related discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...
Github user JohnHBrock commented on the issue: https://github.com/apache/spark/pull/18610 Is there anymore work to do before this can get merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19031: [SPARK-21603][SQL][FOLLOW-UP] Use -1 to disable maxLines...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19031 This is an internal conf. For the advanced users, we do not encourage them to disable it. If they want to disable it, they can simply set it to a number above 8000. Thus, setting `maxLinesPerFunction` to `-1` is not needed, IMO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19062: [SPARK-21845] [SQL] Make codegen fallback of expr...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19062#discussion_r135416594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -54,6 +54,9 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ @transient final val sqlContext = SparkSession.getActiveSession.map(_.sqlContext).orNull + // whether we should fallback when hitting compilation errors caused by codegen + private val codeGenFallBack = sqlContext == null || sqlContext.conf.codegenFallback --- End diff -- I see --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18961: [SPARK-21746][SQL]there is an java.lang.IllegalArgumentE...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18961 @dongjoon-hyun @cloud-fan Do you have any suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19062: [SPARK-21845] [SQL] Make codegen fallback of expressions...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/19062 +1,LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user erenavsarogullari commented on the issue: https://github.com/apache/spark/pull/16992 Hi @squito, Thanks for the review this patch. It is ready to re-review / merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler...
Github user erenavsarogullari commented on a diff in the pull request: https://github.com/apache/spark/pull/16992#discussion_r135415731 --- Diff: docs/job-scheduling.md --- @@ -235,7 +235,7 @@ properties: of the cluster. By default, each pool's `minShare` is 0. The pool properties can be set by creating an XML file, similar to `conf/fairscheduler.xml.template`, -and setting a `spark.scheduler.allocation.file` property in your +and either setting `fairscheduler.xml` into classpath or a `spark.scheduler.allocation.file` property in your --- End diff -- Addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #81169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81169/testReport)** for PR 16992 at commit [`3d6c80b`](https://github.com/apache/spark/commit/3d6c80b4857b2b776b55516ea5e699e0e470b4a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r135415687 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -769,16 +769,27 @@ class CodegenContext { foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): String = { val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() +val defaultMaxLines = 100 +val maxLines = if (SparkEnv.get != null) { + SparkEnv.get.conf.getInt("spark.sql.codegen.expressions.maxCodegenLinesPerFunction", --- End diff -- This is not following what we are doing for the other SQLConf. I am also thinking if we should just put it into `StaticSQLConf`. Let me check it with others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16992 **[Test build #81168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81168/testReport)** for PR 16992 at commit [`77cfb03`](https://github.com/apache/spark/commit/77cfb03a82966412e6468edbff358415197c8aaa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r135415452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -32,7 +32,9 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl * in that file. */ --- End diff -- Add `@parameter`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19063 - org.apache.spark.sql.hive 10 min - org.apache.spark.sql.hive.client 6 min 20 sec - org.apache.spark.sql.hive.execution 28 min - org.apache.spark.sql.hive.orc 2 min 1 sec - org.apache.spark.sql.hive.thriftserver3 min 11 sec Changing it from 5 to 3 does not help, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 I mean, how about Parquet and the others? Do they have the e2e test cases in their projects? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateO...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19029#discussion_r135411403 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -439,8 +439,9 @@ private[ml] object WeightedLeastSquares { /** * Weighted population standard deviation of labels. + * We prevent variance from negative value caused by numerical error. --- End diff -- I'm not so against this, but this is really an implementation detail and not relevant to the caller. It's a value that is by definition nonnegative. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19029 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81167/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19029 **[Test build #81167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81167/testReport)** for PR 19029 at commit [`21e7ff7`](https://github.com/apache/spark/commit/21e7ff7ea65da1c03b32445405d2bd55346db096). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81166/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81166/testReport)** for PR 18581 at commit [`08233e6`](https://github.com/apache/spark/commit/08233e654072b1f117926b813459f3e2bf6b8a55). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19029: [SPARK-21818][ML][MLLIB] Fix bug of MultivariateOnlineSu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19029 **[Test build #81167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81167/testReport)** for PR 19029 at commit [`21e7ff7`](https://github.com/apache/spark/commit/21e7ff7ea65da1c03b32445405d2bd55346db096). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19063 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19063 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81163/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19063 **[Test build #81163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81163/testReport)** for PR 19063 at commit [`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81165/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81165/testReport)** for PR 18581 at commit [`cbed415`](https://github.com/apache/spark/commit/cbed41534ff7a6a7219542ec282fcee4bdf67a67). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81166/testReport)** for PR 18581 at commit [`08233e6`](https://github.com/apache/spark/commit/08233e654072b1f117926b813459f3e2bf6b8a55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81164/testReport)** for PR 18581 at commit [`9b1bf10`](https://github.com/apache/spark/commit/9b1bf108e61f6af69331a5d4052b53a47d34bf71). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81164/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18581 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81165/testReport)** for PR 18581 at commit [`cbed415`](https://github.com/apache/spark/commit/cbed41534ff7a6a7219542ec282fcee4bdf67a67). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19059: [SS] - Avoid using `return` inside `CachedKafkaCo...
Github user YuvalItzchakov commented on a diff in the pull request: https://github.com/apache/spark/pull/19059#discussion_r135405801 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -125,8 +131,11 @@ private[kafka010] case class CachedKafkaConsumer private( toFetchOffset = getEarliestAvailableOffsetBetween(toFetchOffset, untilOffset) } } -resetFetchedData() -null + +if (isFetchComplete) consumerRecord else { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18581: [SPARK-21289][SQL][ML] Supports custom line separator fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18581 **[Test build #81164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81164/testReport)** for PR 18581 at commit [`9b1bf10`](https://github.com/apache/spark/commit/9b1bf108e61f6af69331a5d4052b53a47d34bf71). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18966 ping @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r135405534 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -32,7 +32,9 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl * in that file. */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], --- End diff -- OK. Will change this but I should say this way looks incorrect to me and this behaviour should be discussed and possibly updated in the near future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18928: [SPARK-21696][SS]Fix a potential issue that may generate...
Github user YuvalItzchakov commented on the issue: https://github.com/apache/spark/pull/18928 Right. We've had some problems with reading snapshots after executors dying on OOM, I hope this does the trick :) Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19058: SPARK-21843:testNameNote should be "(minNumPostSh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19058 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19058: SPARK-21843:testNameNote should be "(minNumPostShufflePa...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19058 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19063 **[Test build #81163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81163/testReport)** for PR 19063 at commit [`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19063 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19063 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81162/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19063 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19063 **[Test build #81162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81162/testReport)** for PR 19063 at commit [`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 If you agree, I will try to write more code here as POC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Parquet is the same. We can use `unhandledFilters` for PPD. I think that the others text-based data sources(TEXT/CSV/JSON) doesn't support PPD. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19063 **[Test build #81162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81162/testReport)** for PR 19063 at commit [`557b0c6`](https://github.com/apache/spark/commit/557b0c656aa919c35c7db0832dc2bd276f0cac03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19063: [SPARK-21846][TEST] Reduce the number of shuffle partiti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19063 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19060 How about Parquet and the others? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 In ORC GitHub, up to my knowledge, this is the highest level. ``` val reader = new OrcInputFormat[OrcStruct]().createRecordReader(split, attemptContext) ... reader.nextKeyValue() ... reader.getCurrentValue ... row.getFieldValue(0).asInstanceOf[IntWritable].get ``` Actually, I have one idea. If we support [unhandledFilters](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L237) in data source testing by SQLConf, we can do in Spark level like *value limit* case. How do you think about that? May I try that way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18581: [SPARK-21289][SQL][ML] Supports custom line separ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18581#discussion_r135403912 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala --- @@ -32,7 +32,9 @@ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl * in that file. */ class HadoopFileLinesReader( -file: PartitionedFile, conf: Configuration) extends Iterator[Text] with Closeable { +file: PartitionedFile, +lineSeparator: Option[String], --- End diff -- So far, following Hive is the safest. If users complain about it, we can behave differently from Hive with a new SQLConf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org