[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20964 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20964 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2194/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20964 **[Test build #89174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89174/testReport)** for PR 20964 at commit [`592670c`](https://github.com/apache/spark/commit/592670c90975d904605864f3168eff00c2befa5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20964 I rebased off of master because of the merge warning in the last tests. I did not have to resolve any conflicts. I'll merge this once tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2193/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89173/testReport)** for PR 21007 at commit [`deacb17`](https://github.com/apache/spark/commit/deacb17816c21811efd630e91cee4c30d421eb36). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21007 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2192/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #89172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89172/testReport)** for PR 20345 at commit [`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20871 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20871 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2191/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20964 **[Test build #4151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4151/testReport)** for PR 20964 at commit [`6dff94e`](https://github.com/apache/spark/commit/6dff94e87ea77b971c97ca64ea7a06eacfbc7110). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20874 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2190/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2189/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2188/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21021 **[Test build #89170 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89170/testReport)** for PR 21021 at commit [`0203920`](https://github.com/apache/spark/commit/020392024f19cbc1ce172051643e15dade7e7b19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21031 **[Test build #89169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89169/testReport)** for PR 21031 at commit [`d405d8a`](https://github.com/apache/spark/commit/d405d8a37c6e46340f49ff449b8a5aeca80de751). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20871 **[Test build #89171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89171/testReport)** for PR 20871 at commit [`a7665fd`](https://github.com/apache/spark/commit/a7665fd64875e1343796c8923126d715b5376808). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21031: [SPARK-23923][SQL] Add cardinality function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21031 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21021 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20874 **[Test build #89168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89168/testReport)** for PR 20874 at commit [`9ef19df`](https://github.com/apache/spark/commit/9ef19dfcde9dc84f494bff5f03a56db840741496). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20871: [SPARK-23762][SQL] UTF8StringBuffer uses MemoryBlock
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20871 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20874: [SPARK-23763][SQL] OffHeapColumnVector uses MemoryBlock
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20874 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89167/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20695 **[Test build #89167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89167/testReport)** for PR 20695 at commit [`21edbcd`](https://github.com/apache/spark/commit/21edbcde7a1277030baca58105092e734483006f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89163/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89163/testReport)** for PR 21007 at commit [`deacb17`](https://github.com/apache/spark/commit/deacb17816c21811efd630e91cee4c30d421eb36). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SQLTestUtils(object):` * `class ReusedSQLTestCase(ReusedPySparkTestCase, SQLTestUtils):` * `class QueryExecutionListenerTests(unittest.TestCase, SQLTestUtils):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2187/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20695 **[Test build #89167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89167/testReport)** for PR 20695 at commit [`21edbcd`](https://github.com/apache/spark/commit/21edbcde7a1277030baca58105092e734483006f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21009 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89161/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21009 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21009 **[Test build #89161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89161/testReport)** for PR 21009 at commit [`2b5db56`](https://github.com/apache/spark/commit/2b5db564e73e7919a0ac9783c44f3378291a72b8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89162/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21018 **[Test build #89162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89162/testReport)** for PR 21018 at commit [`313f44b`](https://github.com/apache/spark/commit/313f44b389eb67f6ae96d0c159599813f8070c48). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89160/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89160/testReport)** for PR 21026 at commit [`e6abd22`](https://github.com/apache/spark/commit/e6abd22807a728fe878043cfd30dea791d51575f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89159/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89159/testReport)** for PR 21026 at commit [`ce39905`](https://github.com/apache/spark/commit/ce3990537c682089d31a6dbd3f2ee96e3ced7bcd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21035 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21035 **[Test build #89166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89166/testReport)** for PR 21035 at commit [`ffa05dc`](https://github.com/apache/spark/commit/ffa05dcccdfb4d509741e0ca6b38c17915028eb2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89166/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21036 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21036: [SPARK-23958][CORE] HadoopRdd filters empty files to avo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21036 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21036: [SPARK-23958][CORE] HadoopRdd filters empty files...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/21036 [SPARK-23958][CORE] HadoopRdd filters empty files to avoid generating empty tasks that affect the performance of the Spark computing performance. ## What changes were proposed in this pull request? HadoopRdd filter empty files to avoid generating empty tasks that affect the performance of the Spark computing performance. Empty file's length is zero. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-23958 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21036 commit e4ccdf913157b45f11efe8b8900d1f805d853278 Author: guoxiaolongDate: 2018-04-11T02:48:51Z [SPARK-23958][CORE] HadoopRdd filters empty files to avoid generating empty tasks that affect the performance of the Spark computing performance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20980: [SPARK-23589][SQL] ExternalMapToCatalyst should support ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20980 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20757 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21035 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2186/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21023: [SPARK-23949] makes && supports the function of predicat...
Github user httfighter commented on the issue: https://github.com/apache/spark/pull/21023 @gatorsmile Thank you very muchï¼Can you help me see this pr ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21035 **[Test build #89166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89166/testReport)** for PR 21035 at commit [`ffa05dc`](https://github.com/apache/spark/commit/ffa05dcccdfb4d509741e0ca6b38c17915028eb2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21035 cc @ueshin, could you take a look when you are available? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21035: [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually tes...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/21035 [SPARK-23847][FOLLOWUP][PYTHON][SQL] Actually test [desc|acs]_nulls_[first|last] functions ## What changes were proposed in this pull request? There was a mistake in `tests.py` missing `assertEquals`. ## How was this patch tested? Fixed tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-23847 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21035.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21035 commit ffa05dcccdfb4d509741e0ca6b38c17915028eb2 Author: hyukjinkwonDate: 2018-04-11T02:18:52Z Actually test [desc|asc]_nulls_[first|last] functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89158/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20984 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21019 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2185/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21019 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20984: [SPARK-23875][SQL] Add IndexedSeq wrapper for ArrayData
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20984 **[Test build #89158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89158/testReport)** for PR 20984 at commit [`a77128f`](https://github.com/apache/spark/commit/a77128f910eca1e0ced20257fa94ddaef513eae1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180616339 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -145,4 +161,55 @@ class JoinOptimizationSuite extends PlanTest { } assert(broadcastChildren.size == 1) } + + test("SPARK-23172 skip projections when flattening joins") { +val x = testRelation.subquery('x) +val y = testRelation1.subquery('y) +val z = testRelation.subquery('z) +val joined = x.join(z, Inner, Some($"x.b" === $"z.b")).select($"x.a", $"z.a", $"z.c") + .join(y, Inner, Some($"y.d" === $"z.a")).analyze +val expectedTables = joined.collectLeaves().map { case p => (p, Inner) } +val expectedConditions = joined.collect { case Join(_, _, _, Some(conditions)) => conditions } +testExtractInnerJoins(joined, Some((expectedTables, expectedConditions))) + } + + test("SPARK-23172 reorder joins with projections") { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180616142 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -116,7 +127,12 @@ class JoinOptimizationSuite extends PlanTest { ) queryAnswers foreach { queryAnswerPair => - val optimized = Optimize.execute(queryAnswerPair._1.analyze) + val optimized = Optimize.execute(queryAnswerPair._1.analyze) match { +// `ReorderJoin` may add `Project` to keep the same order of output attributes. +// So, we drop a top `Project` for tests. +case project: Project => project.child --- End diff -- yea, great suggestion and I think so. I'll try to fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21019 **[Test build #89165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89165/testReport)** for PR 21019 at commit [`685124a`](https://github.com/apache/spark/commit/685124a11b789af2a42b4978e25ed404b2a15176). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21019 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] speculative task should not run on a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20998 **[Test build #89164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89164/testReport)** for PR 20998 at commit [`2ed9584`](https://github.com/apache/spark/commit/2ed958418ff182bca0a3af1bf35999130312e78f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180615479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,20 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { case Filter(filterCondition, j @ Join(left, right, _: InnerLike, joinCondition)) => val (plans, conditions) = flattenJoin(j) (plans, conditions ++ splitConjunctivePredicates(filterCondition)) - +case p @ Project(_, j @ Join(_, _, _: InnerLike, _)) +// Keep flattening joins when the project has attributes only +if p.projectList.forall(_.isInstanceOf[Attribute]) => + flattenJoin(j) case _ => (Seq((plan, parentJoinType)), Seq.empty) } - def unapply(plan: LogicalPlan): Option[(Seq[(LogicalPlan, InnerLike)], Seq[Expression])] - = plan match { -case f @ Filter(filterCondition, j @ Join(_, _, joinType: InnerLike, _)) => - Some(flattenJoin(f)) -case j @ Join(_, _, joinType, _) => - Some(flattenJoin(j)) -case _ => None + def unapply(plan: LogicalPlan): Option[(Seq[(LogicalPlan, InnerLike)], Seq[Expression])] = { +val (plans, conditions) = flattenJoin(plan) +if (plans.size > 1) { --- End diff -- aha, sounds good to me. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] speculative task should not run on a...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20998 Hi, @squito . Thank for review and comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180615326 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -59,12 +75,7 @@ class JoinOptimizationSuite extends PlanTest { (noCartesian, seq_pair._2) } } - testExtractCheckCross(plan, expectedNoCross) -} - -def testExtractCheckCross -(plan: LogicalPlan, expected: Option[(Seq[(LogicalPlan, InnerLike)], Seq[Expression])]) { - assert(ExtractFiltersAndInnerJoins.unapply(plan) === expected) --- End diff -- If we have multiple conditions, we need the compare them as `Set`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2184/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21007 **[Test build #89163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89163/testReport)** for PR 21007 at commit [`deacb17`](https://github.com/apache/spark/commit/deacb17816c21811efd630e91cee4c30d421eb36). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20998: [SPARK-23888][CORE] speculative task should not r...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/20998#discussion_r180614873 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -880,6 +880,59 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg assert(manager.resourceOffer("execB", "host2", ANY).get.index === 3) } + test("speculative task should not run on a given host where another attempt " + +"is already running on") { +sc = new SparkContext("local", "test") +sched = new FakeTaskScheduler( + sc, ("execA", "host1"), ("execB", "host2")) +val taskSet = FakeTask.createTaskSet(1, + Seq(TaskLocation("host1", "execA"), TaskLocation("host2", "execB"))) +val clock = new ManualClock +val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock = clock) + +// let task0.0 run on host1 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL).get.index == 0) +val info1 = manager.taskAttempts(0)(0) +assert(info1.running === true) +assert(info1.host === "host1") + +// long time elapse, and task0.0 is still running, +// so we launch a speculative task0.1 on host2 +clock.advance(1000) +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL).get.index === 0) +val info2 = manager.taskAttempts(0)(0) +assert(info2.running === true) +assert(info2.host === "host2") +assert(manager.speculatableTasks.size === 0) + +// now, task0 has two copies running on host1, host2 separately, +// so we can not launch a speculative task on any hosts. +manager.speculatableTasks += 0 +assert(manager.resourceOffer("execA", "host1", PROCESS_LOCAL) === None) +assert(manager.resourceOffer("execB", "host2", PROCESS_LOCAL) === None) +assert(manager.speculatableTasks.size === 1) + +// after a long long time, task0.0 failed, and task0.0 can not re-run since +// there's already a running copy. +clock.advance(1000) +info1.finishTime = clock.getTimeMillis() --- End diff -- nice suggestion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpa...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21007#discussion_r180614280 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TestQueryExecutionListener.scala --- @@ -0,0 +1,45 @@ +/* --- End diff -- I think it's possible. Just took a look; however, mind if I had a separate one as is for Python test specifically? maybe I am too much worried but thinking about having a dependency with a class in a suite and I am a bit hesitant. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r180613972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,50 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { } } + // Extract a list of logical plans to be joined for join-order comparisons. + // Since `ExtractFiltersAndInnerJoins` handles left-deep trees only, this function have + // the same strategy to extract the plan list. + private[optimizer] def extractLeftDeepInnerJoins(plan: LogicalPlan) +: Seq[LogicalPlan] = plan match { +case j @ Join(left, right, _: InnerLike, _) => right +: extractLeftDeepInnerJoins(left) +case p @ Project(_, j @ Join(_, _, _: InnerLike, _)) => extractLeftDeepInnerJoins(j) +case _ => Seq(plan) + } + + private def checkSameJoinOrder(plan1: LogicalPlan, plan2: LogicalPlan): Boolean = { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2183/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21018 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21018: [SPARK-23880][SQL] Do not trigger any jobs for caching d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21018 **[Test build #89162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89162/testReport)** for PR 21018 at commit [`313f44b`](https://github.com/apache/spark/commit/313f44b389eb67f6ae96d0c159599813f8070c48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20963: [SPARK-23849][SQL] Tests for the samplingRatio op...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20963#discussion_r180610436 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2127,4 +2127,39 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(df.schema === expectedSchema) } } + + test("SPARK-23849: schema inferring touches less data if samplingRation < 1.0") { +val predefinedSample = Set[Int](2, 8, 15, 27, 30, 34, 35, 37, 44, 46, + 57, 62, 68, 72) +withTempPath { path => + val writer = Files.newBufferedWriter(Paths.get(path.getAbsolutePath), +StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW) + for (i <- 0 until 100) { +if (predefinedSample.contains(i)) { + writer.write(s"""{"f1":${i.toString}}""" + "\n") +} else { + writer.write(s"""{"f1":${(i.toDouble + 0.1).toString}}""" + "\n") +} + } + writer.close() + + val ds = spark.read.option("samplingRatio", 0.1).json(path.getCanonicalPath) --- End diff -- ^ It's based upon actual experience before. There was a similar case that the test was broken due to the number of partitions and it took me a while to debug it, https://issues.apache.org/jira/browse/SPARK-13728 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21017: [SPARK-23748][SS] Fix SS continuous process doesn't supp...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21017 @jose-torres @tdas would you please help to review, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21009 **[Test build #89161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89161/testReport)** for PR 21009 at commit [`2b5db56`](https://github.com/apache/spark/commit/2b5db564e73e7919a0ac9783c44f3378291a72b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89157/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21026: [SPARK-23951][SQL] Use actual java class instead of stri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21026 **[Test build #89157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89157/testReport)** for PR 21026 at commit [`8ab0931`](https://github.com/apache/spark/commit/8ab09312527275a33e547c354e62870bc9f86c51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21009 I did the local testing, all pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21009: [SPARK-23905][SQL] Add UDF weekday
Github user yucai commented on the issue: https://github.com/apache/spark/pull/21009 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20963: [SPARK-23849][SQL] Tests for the samplingRatio op...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20963#discussion_r180605579 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2127,4 +2127,39 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(df.schema === expectedSchema) } } + + test("SPARK-23849: schema inferring touches less data if samplingRation < 1.0") { +val predefinedSample = Set[Int](2, 8, 15, 27, 30, 34, 35, 37, 44, 46, + 57, 62, 68, 72) +withTempPath { path => + val writer = Files.newBufferedWriter(Paths.get(path.getAbsolutePath), +StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW) + for (i <- 0 until 100) { +if (predefinedSample.contains(i)) { + writer.write(s"""{"f1":${i.toString}}""" + "\n") +} else { + writer.write(s"""{"f1":${(i.toDouble + 0.1).toString}}""" + "\n") +} + } + writer.close() + + val ds = spark.read.option("samplingRatio", 0.1).json(path.getCanonicalPath) --- End diff -- yup, please set the appropriate numbers. I think it's fine if it has some comments so that we read and fix the tests if it's broken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21015: [SPARK-23944][ML] Add the set method for the two ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21015 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21015: [SPARK-23944][ML] Add the set method for the two LSHMode...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/21015 LGTM Merging with master Thanks @ludatabricks ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 @wangmiao1981 Do let me know if you're too busy now to resume this; I know it's been a long time. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20964 **[Test build #4151 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4151/testReport)** for PR 20964 at commit [`6dff94e`](https://github.com/apache/spark/commit/6dff94e87ea77b971c97ca64ea7a06eacfbc7110). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20964: [SPARK-22883] ML test for StructuredStreaming: spark.ml....
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20964 Thanks! I'll rerun tests since they are stale and merge after they pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21013: [WIP][SPARK-23874][SQL][PYTHON] Upgrade Arrow and pyarro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21013 > do you suggest we should test with the minimum version and current version of pyarrow? yup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org