[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84839/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19862 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84839/testReport)** for PR 19862 at commit [`e40c2f1`](https://github.com/apache/spark/commit/e40c2f138a8640487a18665e2caf62fce1ce5c8a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84840/testReport)** for PR 19862 at commit [`80231ab`](https://github.com/apache/spark/commit/80231ab670d5bf1640fad3a9741b6315dba9d1bb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle rea...
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/19862#discussion_r156581645 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java --- @@ -159,6 +154,12 @@ public boolean hasNext() { @Override public UnsafeRow next() { try { +if (!alreadyCalculated) { + while (inputIterator.hasNext()) { +insertRow(inputIterator.next()); + } + alreadyCalculated = true; +} sortedIterator.loadNext(); --- End diff -- Yes, you are right. Now I modified the `sortedIterator` after inserting rows. Due to I can only access an outer final field inside an inner class, so I used an array, is there better solution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19962 LGTM, pending Jenkins. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84839/testReport)** for PR 19862 at commit [`e40c2f1`](https://github.com/apache/spark/commit/e40c2f138a8640487a18665e2caf62fce1ce5c8a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19257 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19257 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can break when...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19257 LGTM except a few style comments. We can merge it and fix it in the follow-up PR. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r156580049 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -602,6 +602,37 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils { ) } + test("SPARK-22042 ReorderJoinPredicates can break when child's partitioning is not decided") { +withTable("bucketed_table", "table1", "table2") { + df.write.format("parquet").saveAsTable("table1") + df.write.format("parquet").saveAsTable("table2") + df.write.format("parquet").bucketBy(8, "j", "k").saveAsTable("bucketed_table") + + withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") { +checkAnswer( + sql(""" +|SELECT ab.i, ab.j, ab.k, c.i, c.j, c.k +|FROM ( +| SELECT a.i, a.j, a.k +| FROM bucketed_table a +| JOIN table1 b +| ON a.i = b.i +|) ab +|JOIN table2 c +|ON ab.i = c.i +|""".stripMargin), + sql(""" +|SELECT a.i, a.j, a.k, c.i, c.j, c.k +|FROM bucketed_table a +|JOIN table1 b +|ON a.i = b.i +|JOIN table2 c +|ON a.i = c.i +|""".stripMargin)) --- End diff -- Please follow the other test cases --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r156579879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { operator.withNewChildren(children) } + /** + * When the physical operators are created for JOIN, the ordering of join keys is based on order + * in which the join keys appear in the user query. That might not match with the output + * partitioning of the join node's children (thus leading to extra sort / shuffle being + * introduced). This rule will change the ordering of the join keys to match with the + * partitioning of the join nodes' children. + */ + def reorderJoinPredicates(plan: SparkPlan): SparkPlan = { +def reorderJoinKeys( --- End diff -- We do not prefer the embedded function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r156579907 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { operator.withNewChildren(children) } + /** + * When the physical operators are created for JOIN, the ordering of join keys is based on order + * in which the join keys appear in the user query. That might not match with the output + * partitioning of the join node's children (thus leading to extra sort / shuffle being + * introduced). This rule will change the ordering of the join keys to match with the + * partitioning of the join nodes' children. + */ + def reorderJoinPredicates(plan: SparkPlan): SparkPlan = { +def reorderJoinKeys( +leftKeys: Seq[Expression], +rightKeys: Seq[Expression], +leftPartitioning: Partitioning, +rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = { + + def reorder(expectedOrderOfKeys: Seq[Expression], + currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) = { --- End diff -- indents. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19257: [SPARK-22042] [SQL] ReorderJoinPredicates can bre...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19257#discussion_r156579889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -248,13 +252,83 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { operator.withNewChildren(children) } + /** + * When the physical operators are created for JOIN, the ordering of join keys is based on order + * in which the join keys appear in the user query. That might not match with the output + * partitioning of the join node's children (thus leading to extra sort / shuffle being + * introduced). This rule will change the ordering of the join keys to match with the + * partitioning of the join nodes' children. + */ + def reorderJoinPredicates(plan: SparkPlan): SparkPlan = { --- End diff -- private --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19932 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] add init-container bootstrappi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19954 **[Test build #84838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84838/testReport)** for PR 19954 at commit [`1a74521`](https://github.com/apache/spark/commit/1a74521c3f114a9774598738daef5489c6fa8bae). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19894: [SPARK-22700][ML] Bucketizer.transform incorrectl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19894 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19960 Thank you, @HyukjinKwon and @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19894: [SPARK-22700][ML] Bucketizer.transform incorrectly drops...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19894 LGTM thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19950 Since `VectorWithNorm` and `TreePoint` do not override method `equals`, we can not directly using `===` to compare objects. `LabeledPoint` is a case class, which method `equals` is automaticly supplied --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84828/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19962 **[Test build #84837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84837/testReport)** for PR 19962 at commit [`3922ff4`](https://github.com/apache/spark/commit/3922ff4625aba951884c3f780782c8a4675aff06). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19950 **[Test build #84828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84828/testReport)** for PR 19950 at commit [`024d835`](https://github.com/apache/spark/commit/024d835d4ed00f384b2f221c36c3edc656031a65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84836/testReport)** for PR 19862 at commit [`57550fb`](https://github.com/apache/spark/commit/57550fbd0c42c1616dee0197af6dedbd57a8da89). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19947: [SPARK-22759] [SQL] Filters can be combined iff b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19947 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19952: [SPARK-21322][SQL][followup] support histogram in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19952 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19952 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19947: [SPARK-22759] [SQL] Filters can be combined iff both are...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19947 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84829/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19932 **[Test build #84829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19960 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19947: [SPARK-22759] [SQL] Filters can be combined iff both are...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19947 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84834/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84834/testReport)** for PR 19811 at commit [`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19960 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19960 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19953 LGTM pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19862 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84835/testReport)** for PR 19862 at commit [`012c9ee`](https://github.com/apache/spark/commit/012c9ee61d03c0e8fa8dff1a7a84e0adcda2c67c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84835/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19862: [SPARK-22671][SQL] Make SortMergeJoin shuffle read less ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19862 **[Test build #84835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84835/testReport)** for PR 19862 at commit [`012c9ee`](https://github.com/apache/spark/commit/012c9ee61d03c0e8fa8dff1a7a84e0adcda2c67c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84827/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19952 **[Test build #84827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84827/testReport)** for PR 19952 at commit [`4e35c43`](https://github.com/apache/spark/commit/4e35c43957cf27b105c8f6b8ff19621aac540098). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84834/testReport)** for PR 19811 at commit [`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19811 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19962 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84826/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19962 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19962: [SPARK-22767][SQL] use ctx.addReferenceObj in InSet and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19962 **[Test build #84826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84826/testReport)** for PR 19962 at commit [`b8c0689`](https://github.com/apache/spark/commit/b8c068934d31f7ccacbc3b20cb2810bc67ccecd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19953 **[Test build #84833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84833/testReport)** for PR 19953 at commit [`84a3ed3`](https://github.com/apache/spark/commit/84a3ed3e0f69485645bc92c471c35cfbfab7ffa2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19953 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19778#discussion_r156569281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -224,22 +224,52 @@ case class Elt(children: Seq[Expression]) override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val index = indexExpr.genCode(ctx) val strings = stringExprs.map(_.genCode(ctx)) +val indexVal = ctx.freshName("index") +val stringVal = ctx.freshName("stringVal") val assignStringValue = strings.zipWithIndex.map { case (eval, index) => s""" case ${index + 1}: - ${ev.value} = ${eval.isNull} ? null : ${eval.value}; + ${eval.code} + $stringVal = ${eval.isNull} ? null : ${eval.value}; break; """ -}.mkString("\n") -val indexVal = ctx.freshName("index") -val stringArray = ctx.freshName("strings"); +} -ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" - final int $indexVal = ${index.value}; - UTF8String ${ev.value} = null; - switch ($indexVal) { -$assignStringValue +val cases = ctx.buildCodeBlocks(assignStringValue) +val codes = if (cases.length == 1) { + s""" +UTF8String $stringVal = null; +switch ($indexVal) { + ${cases.head} +} + """ +} else { + var prevFunc = "null" + for (c <- cases.reverse) { +val funcName = ctx.freshName("eltFunc") +val funcBody = s""" + private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int $indexVal) { --- End diff -- ah good catch! we should fix it with `splitExpressionsWithCurrentInputs` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19855: [SPARK-22662] [SQL] Failed to prune columns after rewrit...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/19855 @maropu Good to know, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19963 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84830/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84830/testReport)** for PR 19811 at commit [`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19963 **[Test build #84831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84831/testReport)** for PR 19963 at commit [`7bf74d2`](https://github.com/apache/spark/commit/7bf74d2eaa8521932737ff6a24172f776c75b16a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19950 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84824/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19963 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19950 **[Test build #84824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84824/testReport)** for PR 19950 at commit [`183868c`](https://github.com/apache/spark/commit/183868cd2a572470c512e92b212b3bc775af562f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/19953 @vanzin @gatorsmile @cloud-fan Thanks for the comments. I decide to display warning message for each unrecognized event/property, and add a debug message for the original content of event log. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19953: [SPARK-22763][Core]SHS: Ignore unknown events and parse ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19953 **[Test build #84832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84832/testReport)** for PR 19953 at commit [`a3aca2e`](https://github.com/apache/spark/commit/a3aca2ef98bf2116f90565282bf24730f264b6b3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84822/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19952: [SPARK-21322][SQL][followup] support histogram in filter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19952 **[Test build #84822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84822/testReport)** for PR 19952 at commit [`8fe0c49`](https://github.com/apache/spark/commit/8fe0c4991b90781a7017de4938705bbc32244dc6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - L...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19963 **[Test build #84831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84831/testReport)** for PR 19963 at commit [`7bf74d2`](https://github.com/apache/spark/commit/7bf74d2eaa8521932737ff6a24172f776c75b16a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST] Make ML testsuite support...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19843 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19963: [SPARK-20849][DOC][FOLLOWUP] Document R DecisionT...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/19963 [SPARK-20849][DOC][FOLLOWUP] Document R DecisionTree - Link Classification Example ## What changes were proposed in this pull request? in https://github.com/apache/spark/pull/18067, only the regression example is linked this pr link decision tree classification example to the doc ping @felixcheung ## How was this patch tested? local build of docs ![default](https://user-images.githubusercontent.com/7322292/33922857-9b00fdd0-e008-11e7-92c2-85a3de52ea8f.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark r_examples Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19963.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19963 commit 988cf18aff70fca7a75c1b8f72a73d01d0976c19 Author: Zheng RuiFeng Date: 2017-12-13T04:04:49Z create pr commit 7bf74d2eaa8521932737ff6a24172f776c75b16a Author: Zheng RuiFeng Date: 2017-12-13T04:49:37Z update pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84830/testReport)** for PR 19811 at commit [`96fa044`](https://github.com/apache/spark/commit/96fa0441b5f6422784bd60b9c2a1b46d8781). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19843: [SPARK-22644][ML][TEST] Make ML testsuite support Struct...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19843 Merging with master Thanks @WeichenXu123 and @MrBago ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19020 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84817/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19778: [SPARK-22550][SQL] Fix 64KB JVM bytecode limit pr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19778#discussion_r156566161 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -224,22 +224,52 @@ case class Elt(children: Seq[Expression]) override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val index = indexExpr.genCode(ctx) val strings = stringExprs.map(_.genCode(ctx)) +val indexVal = ctx.freshName("index") +val stringVal = ctx.freshName("stringVal") val assignStringValue = strings.zipWithIndex.map { case (eval, index) => s""" case ${index + 1}: - ${ev.value} = ${eval.isNull} ? null : ${eval.value}; + ${eval.code} + $stringVal = ${eval.isNull} ? null : ${eval.value}; break; """ -}.mkString("\n") -val indexVal = ctx.freshName("index") -val stringArray = ctx.freshName("strings"); +} -ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" - final int $indexVal = ${index.value}; - UTF8String ${ev.value} = null; - switch ($indexVal) { -$assignStringValue +val cases = ctx.buildCodeBlocks(assignStringValue) +val codes = if (cases.length == 1) { + s""" +UTF8String $stringVal = null; +switch ($indexVal) { + ${cases.head} +} + """ +} else { + var prevFunc = "null" + for (c <- cases.reverse) { +val funcName = ctx.freshName("eltFunc") +val funcBody = s""" + private UTF8String $funcName(InternalRow ${ctx.INPUT_ROW}, int $indexVal) { --- End diff -- Looks like this splitting doesn't prevent the case in wholestage codegen? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19020 **[Test build #84817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84817/testReport)** for PR 19020 at commit [`4304b6e`](https://github.com/apache/spark/commit/4304b6e0e939a658d38c2ef70de569bfcf76139b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84820/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84820/testReport)** for PR 16578 at commit [`1936c9b`](https://github.com/apache/spark/commit/1936c9b2e4cf4008e5ee7282c6371fc0ca0535bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AggregateFieldExtractionPushdownSuite extends SchemaPruningTest ` * `class JoinFieldExtractionPushdownSuite extends SchemaPruningTest ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r156564056 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -205,67 +207,21 @@ class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext { } } - test("debugging test") { -val df = denseData(Nil) -val c = df.col("features") -val c1 = metrics("mean").summary(c) -val res = df.select(c1) -intercept[SparkException] { - compare(res, Seq.empty) -} - } - - test("basic error handling") { -val df = denseData(Nil) -val c = df.col("features") -val res = df.select(metrics("mean").summary(c), mean(c)) -intercept[SparkException] { - compare(res, Seq.empty) -} - } + testExample("single element", Seq((Vectors.dense(0.0, 1.0, 2.0), 2.0))) - test("no element, working metrics") { -val df = denseData(Nil) -val c = df.col("features") -val res = df.select(metrics("count").summary(c), count(c)) -compare(res, Seq(Row(0L), 0L)) - } + testExample("multiple elements (dense)", +Seq( + (Vectors.dense(-1.0, 0.0, 6.0), 0.5), + (Vectors.dense(3.0, -3.0, 0.0), 2.8), + (Vectors.dense(1.0, -3.0, 0.0), 0.0) +) + ) - val singleElem = Seq(0.0, 1.0, 2.0) - testExample("single element", Seq(singleElem), ExpectedMetrics( -mean = singleElem, -variance = Seq(0.0, 0.0, 0.0), -count = 1, -numNonZeros = Seq(0, 1, 1), -max = singleElem, -min = singleElem, -normL1 = singleElem, -normL2 = singleElem - )) - - testExample("two elements", Seq(Seq(0.0, 1.0, 2.0), Seq(0.0, -1.0, -2.0)), ExpectedMetrics( -mean = Seq(0.0, 0.0, 0.0), -// TODO: I have a doubt about these values, they are not normalized. -variance = Seq(0.0, 2.0, 8.0), -count = 2, -numNonZeros = Seq(0, 2, 2), -max = Seq(0.0, 1.0, 2.0), -min = Seq(0.0, -1.0, -2.0), -normL1 = Seq(0.0, 2.0, 4.0), -normL2 = Seq(0.0, math.sqrt(2.0), math.sqrt(2.0) * 2.0) - )) - - testExample("dense vector input", -Seq(Seq(-1.0, 0.0, 6.0), Seq(3.0, -3.0, 0.0)), --- End diff -- Why do you remove the test against ground true value? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r156564200 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/SummarizerSuite.scala --- @@ -19,149 +19,165 @@ package org.apache.spark.ml.stat import org.scalatest.exceptions.TestFailedException -import org.apache.spark.{SparkException, SparkFunSuite} +import org.apache.spark.SparkFunSuite import org.apache.spark.ml.linalg.{Vector, Vectors} import org.apache.spark.ml.util.TestingUtils._ import org.apache.spark.mllib.linalg.{Vector => OldVector, Vectors => OldVectors} import org.apache.spark.mllib.stat.{MultivariateOnlineSummarizer, Statistics} import org.apache.spark.mllib.util.MLlibTestSparkContext import org.apache.spark.sql.{DataFrame, Row} -import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema class SummarizerSuite extends SparkFunSuite with MLlibTestSparkContext { import testImplicits._ import Summarizer._ import SummaryBuilderImpl._ - private case class ExpectedMetrics( - mean: Seq[Double], - variance: Seq[Double], - count: Long, - numNonZeros: Seq[Long], - max: Seq[Double], - min: Seq[Double], - normL2: Seq[Double], - normL1: Seq[Double]) - /** - * The input is expected to be either a sparse vector, a dense vector or an array of doubles - * (which will be converted to a dense vector) - * The expected is the list of all the known metrics. + * The input is expected to be either a sparse vector, a dense vector. * - * The tests take an list of input vectors and a list of all the summary values that - * are expected for this input. They currently test against some fixed subset of the - * metrics, but should be made fuzzy in the future. + * The tests take an list of input vectors, and compare results with + * `mllib.stat.MultivariateOnlineSummarizer`. They currently test against some fixed subset + * of the metrics, but should be made fuzzy in the future. */ - private def testExample(name: String, input: Seq[Any], exp: ExpectedMetrics): Unit = { + private def testExample(name: String, inputVec: Seq[(Vector, Double)]): Unit = { -def inputVec: Seq[Vector] = input.map { - case x: Array[Double @unchecked] => Vectors.dense(x) - case x: Seq[Double @unchecked] => Vectors.dense(x.toArray) - case x: Vector => x - case x => throw new Exception(x.toString) +val summarizer = { + val _summarizer = new MultivariateOnlineSummarizer + inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v._1), v._2)) + _summarizer } -val summarizer = { +val summarizerWithoutWeight = { val _summarizer = new MultivariateOnlineSummarizer - inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v))) + inputVec.foreach(v => _summarizer.add(OldVectors.fromML(v._1))) _summarizer } // Because the Spark context is reset between tests, we cannot hold a reference onto it. def wrappedInit() = { - val df = inputVec.map(Tuple1.apply).toDF("features") - val col = df.col("features") - (df, col) + val df = inputVec.toDF("features", "weight") + val featuresCol = df.col("features") + val weightCol = df.col("weight") + (df, featuresCol, weightCol) } registerTest(s"$name - mean only") { - val (df, c) = wrappedInit() - compare(df.select(metrics("mean").summary(c), mean(c)), Seq(Row(exp.mean), summarizer.mean)) + val (df, c, weight) = wrappedInit() + compare(df.select(metrics("mean").summary(c, weight), mean(c, weight)), +Seq(Row(summarizer.mean), summarizer.mean)) } -registerTest(s"$name - mean only (direct)") { - val (df, c) = wrappedInit() - compare(df.select(mean(c)), Seq(exp.mean)) +registerTest(s"$name - mean only w/o weight") { + val (df, c, _) = wrappedInit() + compare(df.select(metrics("mean").summary(c), mean(c)), +Seq(Row(summarizerWithoutWeight.mean), summarizerWithoutWeight.mean)) } registerTest(s"$name - variance only") { - val (df, c) = wrappedInit() - compare(df.select(metrics("variance").summary(c), variance(c)), -Seq(Row(exp.variance), summarizer.variance)) + val (df, c, weight) = wrappedInit() --- End diff -- nit: ```weight``` can be abbreviated to ```w```. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: review
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19960 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19960 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84823/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19960 **[Test build #84823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84823/testReport)** for PR 19960 at commit [`a32da5f`](https://github.com/apache/spark/commit/a32da5fdffd0c8d19d9d777864b48f810c0b149e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19959: [SPARK-22766] Install R linter package in spark l...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/19959#discussion_r156564046 --- Diff: dev/lint-r.R --- @@ -27,10 +27,11 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) { # Installs lintr from Github in a local directory. # NOTE: The CRAN's version is too old to adapt to our rules. if ("lintr" %in% row.names(installed.packages()) == FALSE) { --- End diff -- Why does the specific Rcpp version matter ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84818/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19811 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19811 **[Test build #84818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84818/testReport)** for PR 19811 at commit [`8efa0b4`](https://github.com/apache/spark/commit/8efa0b47f5c25db84e379a4c41e82c735707a5a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19943: [SPARK-16060][SQL] Support Vectorized ORC Reader
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19943 Also cc @kiszk , this question also applies to the table cache reader. We should think more about using a wrapper or writing to spark column vector. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19932 **[Test build #84829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84829/testReport)** for PR 19932 at commit [`b80c8f3`](https://github.com/apache/spark/commit/b80c8f39ede82bc805352a5abeb5d7ec0dcb8df8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19932 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19960 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19960 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84821/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19960: [SPARK-19809][SQL][TEST][FOLLOWUP] Move the test case to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19960 **[Test build #84821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84821/testReport)** for PR 19960 at commit [`1d5dd76`](https://github.com/apache/spark/commit/1d5dd768dbb56a6e84bd0494c55423668895a0ff). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19745 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84816/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19745 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19745: [SPARK-2926][Core][Follow Up] Sort shuffle reader for Sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19745 **[Test build #84816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84816/testReport)** for PR 19745 at commit [`fe9394e`](https://github.com/apache/spark/commit/fe9394eadf8ea51af2b2cb41b5b42981fa600752). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org