[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65220/consoleFull)** for PR 14971 at commit [`d3dcb56`](https://github.com/apache/spark/commit/d3dcb564509fd2a32a3fadefb811495affaaa466). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14947: [SPARK-17388][SQL] Support for inferring type date/times...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14947 **[Test build #65219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65219/consoleFull)** for PR 14947 at commit [`cda9d7a`](https://github.com/apache/spark/commit/cda9d7a3daea3b13398b20fadf06be4d8620f493). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14947: [SPARK-17388][SQL] Support for inferring type date/times...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14947 Hi @davies , it seems you made some changes related with this before. Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14388 @viirya Any progress on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15049 cc @davies @andreweduffy @rdblue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15049: [SPARK-17310][SQL] Add an option to disable record-level...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15049 **[Test build #65218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65218/consoleFull)** for PR 15049 at commit [`7b2e27e`](https://github.com/apache/spark/commit/7b2e27e5e6510679323def779cf0c2f99b195adc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15049 [SPARK-17310][SQL] Add an option to disable record-level filter in Parquet-side ## What changes were proposed in this pull request? There is a concern that Spark-side codegen row-by-row filtering might be faster than Parquet's one in general due to type-boxing and virtual function calls which Spark's one tries to avoid. So, this PR adds an option to disable/enable record-by-record filtering in Parquet side. This was also discussed in https://github.com/apache/spark/pull/14671. ## How was this patch tested? Manually benchmarks were performed. I generate a billion (1,000,000,000) records and tested equality comparison concatenated with `OR`. This filter combinations were made from 5 to 30. It seem indeed Spark-filtering is faster in the test case and the gap increased as the filter tree becomes larger. The details are as below: **Code** ```scala test("Parquet-side filter vs Spark-side filter - record by record") { withTempPath { path => val N = 1000 * 1000 * 1000 val df = spark.range(N).toDF("a") df.write.parquet(path.getAbsolutePath) val benchmark = new Benchmark("Parquet-side vs Spark-side", N) Seq(5, 10, 20, 30).foreach { num => val filterExpr = (0 to num).map(i => s"a = $i").mkString(" OR ") benchmark.addCase(s"Parquet-side filter - number of filters [$num]", 3) { _ => withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> false.toString, SQLConf.PARQUET_RECORD_FILTER_ENABLED.key -> true.toString) { // We should strip Spark-side filter to compare correctly. stripSparkFilter( spark.read.parquet(path.getAbsolutePath).filter(filterExpr)).count() } } benchmark.addCase(s"Spark-side filter - number of filters [$num]", 3) { _ => withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> false.toString, SQLConf.PARQUET_RECORD_FILTER_ENABLED.key -> false.toString) { spark.read.parquet(path.getAbsolutePath).filter(filterExpr).count() } } } benchmark.run() } } ``` **Result** ``` Parquet-side vs Spark-side: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative Parquet-side filter - number of filters [5] 4268 / 4367234.3 4.3 0.8X Spark-side filter - number of filters [5] 3709 / 3741269.6 3.7 0.9X Parquet-side filter - number of filters [10] 5673 / 5727176.3 5.7 0.6X Spark-side filter - number of filters [10] 3588 / 3632278.7 3.6 0.9X Parquet-side filter - number of filters [20] 8024 / 8440124.6 8.0 0.4X Spark-side filter - number of filters [20] 3912 / 3946255.6 3.9 0.8X Parquet-side filter - number of filters [30]11936 / 12041 83.8 11.9 0.3X Spark-side filter - number of filters [30] 3929 / 3978254.5 3.9 0.8X ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17310 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15049 commit 7b2e27e5e6510679323def779cf0c2f99b195adc Author: hyukjinkwonDate: 2016-09-11T04:34:21Z Add an option to disable record-level filter in Parquet-side --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15048 **[Test build #65217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65217/consoleFull)** for PR 15048 at commit [`da7deed`](https://github.com/apache/spark/commit/da7deed2e1e9e350affcee909159a200a4b7d5b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15048 [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More Than Once ### What changes were proposed in this pull request? As explained in https://github.com/apache/spark/pull/14797: >Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. For example, we have a rule for decimal calculation to promote the precision before binary operations, use PromotePrecision as placeholder to indicate that this rule should not apply twice. But a Optimizer rule will remove this placeholder, that break the assumption, then the rule applied twice, cause wrong result. We should not optimize the query in CTAS more than once. For example, ```Scala spark.range(99, 101).createOrReplaceTempView("tab1") val sqlStmt = "SELECT id, cast(id as long) * cast('1.0' as decimal(38, 18)) as num FROM tab1" sql(s"CREATE TABLE tab2 USING PARQUET AS $sqlStmt") checkAnswer(spark.table("tab2"), sql(sqlStmt)) ``` Before this PR, the results do not match ``` == Results == !== Correct Answer - 2 == == Spark Answer - 2 == ![100,100.00] [100,null] [99,99.00] [99,99.00] ``` After this PR, the results match. ``` +---+--+ |id |num | +---+--+ |99 |99.00 | |100|100.00| +---+--+ ``` In this PR, we do not treat the `query` in CTAS as a child. Thus, the `query` will not be optimized when optimizing CTAS statement. However, we still need to analyze it for normalize and verify the CTAS in the Analyzer. Thus, we do it in the analyzer rule `PreprocessDDL`, because so far only this rule needs the analyzed plan of the `query`. ### How was this patch tested? Added a test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark ctasOptimized Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15048.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15048 commit f7941e846c5ed42a4453518500fbf4938f3f1032 Author: gatorsmileDate: 2016-09-11T04:09:02Z fix commit 3a203f920abf742b2f2ab344d0231f992d8e5355 Author: gatorsmile Date: 2016-09-11T04:20:39Z Merge remote-tracking branch 'upstream/master' into ctasOptimized commit da7deed2e1e9e350affcee909159a200a4b7d5b8 Author: gatorsmile Date: 2016-09-11T04:38:07Z one more test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15044 **[Test build #65216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65216/consoleFull)** for PR 15044 at commit [`2b22d12`](https://github.com/apache/spark/commit/2b22d128ef4c51643cd4dcdbe17a1f3d28362a90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14788 Hi @cloud-fan and @hvanhovell , could I ask to take another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65215/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13758 **[Test build #65215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65215/consoleFull)** for PR 13758 at commit [`80a9038`](https://github.com/apache/spark/commit/80a90385f469e7bce24f456467de3ac6821b771a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13758 **[Test build #65215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65215/consoleFull)** for PR 13758 at commit [`80a9038`](https://github.com/apache/spark/commit/80a90385f469e7bce24f456467de3ac6821b771a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing info ex...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15040 @cloud-fan : cc'ing you as you have lot of context about bucketing in Spark. I am looking for early feedback about this change wrt approach. I have included details in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 @rxin : can you recommend me someone for reviewing this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15047 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15047 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65214/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15047 **[Test build #65214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65214/consoleFull)** for PR 15047 at commit [`c898f5a`](https://github.com/apache/spark/commit/c898f5af10ead29416fec7fee49de5c37a7f48cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveHash(children: Seq[Expression], seed: Int) extends HashExpression[Int] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14644: [MESOS] Enable GPU support with Mesos
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/14644#discussion_r78283698 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -103,6 +103,7 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( private val stateLock = new ReentrantLock val extraCoresPerExecutor = conf.getInt("spark.mesos.extra.cores", 0) + val maxGpus = conf.getInt("spark.mesos.gpus.max", 0) --- End diff -- @klueska I think you need to autodiscover anything, the concept is similar to cpus max in the scheduler. @tnachen I think there should be some logic checking current total against the max gpus configured like in the case of cpusMax. I dont see any. I expect offers to be splitted. In that case we need to check the sum of the assigned gpus against the max right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15047 **[Test build #65214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65214/consoleFull)** for PR 15047 at commit [`c898f5a`](https://github.com/apache/spark/commit/c898f5af10ead29416fec7fee49de5c37a7f48cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15047 [SPARK-17495] [SQL] Add Hash capability semantically equivalent to Hive's ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17495 Spark internally uses Murmur3Hash for partitioning. This is different from the one used by Hive. For queries which use bucketing this leads to different results if one tries the same query on both engines. For us, we want users to have backward compatibility to that one can switch parts of applications across the engines without observing regressions. This PR includes `HiveHash`, `HiveHashFunction`, `HiveHasher` which mimics Hive's hashing at https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638 I am intentionally not introducing any usages of this hash function in rest of the code to keep this PR small. My eventual goal is to have Hive bucketing support in Spark. Once this PR gets in, I will make hash function pluggable in relevant areas (eg. `HashPartitioning`'s `partitionIdExpression` has Murmur3 hardcoded : https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala#L265) ## How was this patch tested? Added `HiveHashSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17495_hive_hash Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15047.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15047 commit c898f5af10ead29416fec7fee49de5c37a7f48cb Author: Tejas PatilDate: 2016-09-10T02:59:24Z Add Hashing capability equivalent to Hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14912 I am thinking whether it makes more sense to maintain multiple semantically equivalent predicate sets for each `Filter`. In your example, we have both `(a > 10 || b > 2) && (a > 10 || c == 3)` and `(a > 10) || (b > 2 && c == 3)`. If we also considering the predicate transitivity inferences and predicate simplication at the same time, we could have multiple semantically equivalent predicate sets. Then, we have more chances to push down the predicates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15046 cc @yhuai @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15046 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15046 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15046 **[Test build #65213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65213/consoleFull)** for PR 15046 at commit [`4ab1b8a`](https://github.com/apache/spark/commit/4ab1b8a45c9a8b9ed1f7ee85202eddf397235df4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user mateiz commented on the issue: https://github.com/apache/spark/pull/14956 Cool, thanks for improving the PIC test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14842 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14842 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65212/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14842 **[Test build #65212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65212/consoleFull)** for PR 14842 at commit [`8e5a223`](https://github.com/apache/spark/commit/8e5a223806e02f00759e250d704f2d248e9f9e41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15015: [SPARK-16445][MLlib][SparkR] Fix @return descript...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15015 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15015: [SPARK-16445][MLlib][SparkR] Fix @return description for...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/15015 Thanks @keypointt - Merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15046 **[Test build #65213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65213/consoleFull)** for PR 15046 at commit [`4ab1b8a`](https://github.com/apache/spark/commit/4ab1b8a45c9a8b9ed1f7ee85202eddf397235df4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data So...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15046 [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources without Extending SchemaRelationProvider ### What changes were proposed in this pull request? For data sources without extending `SchemaRelationProvider`, we expect users to not specify schemas when they creating tables. If the schema is input from users, an exception is issued. Since Spark 2.1, for any data source, to avoid infer the schema every time, we store the schema in the metastore catalog. Thus, when reading a cataloged data source table, the schema could be read from metastore catalog. In this case, we also got an exception. For example, ```Scala sql( s""" |CREATE TABLE relationProvierWithSchema |USING org.apache.spark.sql.sources.SimpleScanSource |OPTIONS ( | From '1', | To '10' |) """.stripMargin) spark.table(tableName).show() ``` ``` org.apache.spark.sql.sources.SimpleScanSource does not allow user-specified schemas.; ``` This PR is to fix the above issue. When building a data source, we introduce a flag `isSchemaFromUsers` to indicate whether the schema is really input from users. If true, we issue an exception. Otherwise, we will call the `createRelation` of `RelationProvider` to generate the `BaseRelation`, in which it contains the actual schema. ### How was this patch tested? Added a few cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark tempViewCases Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15046.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15046 commit 17c2d50c9d4788aaa68f7f57fc873762940a8e9d Author: gatorsmileDate: 2016-09-10T15:31:26Z fix commit 00a49fe60f86775e19f038791a766195d506087a Author: gatorsmile Date: 2016-09-10T15:41:18Z clean commit 335e0d6d5a19b30ec000db8d935869e006dd81e7 Author: gatorsmile Date: 2016-09-10T15:42:11Z clean commit 4ab1b8a45c9a8b9ed1f7ee85202eddf397235df4 Author: gatorsmile Date: 2016-09-10T16:12:26Z add one more test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65211/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #65211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65211/consoleFull)** for PR 14452 at commit [`23e2dc8`](https://github.com/apache/spark/commit/23e2dc865eef690eb273cc69888ca577eaa603a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14842 **[Test build #65212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65212/consoleFull)** for PR 14842 at commit [`8e5a223`](https://github.com/apache/spark/commit/8e5a223806e02f00759e250d704f2d248e9f9e41). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15039: [SPARK-17447] Performance improvement in Partitio...
Github user codlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15039#discussion_r78278899 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -55,14 +55,16 @@ object Partitioner { * We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD. */ def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { -val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.length).reverse -for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) { - return r.partitioner.get -} -if (rdd.context.conf.contains("spark.default.parallelism")) { - new HashPartitioner(rdd.context.defaultParallelism) +val rdds = (Seq(rdd) ++ others) +val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0)) --- End diff -- First time to commit, but enjoy the process,i have updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15039: [SPARK-17447] Performance improvement in Partitio...
Github user codlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15039#discussion_r78278876 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -55,14 +55,16 @@ object Partitioner { * We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD. */ def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { -val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.length).reverse -for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) { - return r.partitioner.get -} -if (rdd.context.conf.contains("spark.default.parallelism")) { - new HashPartitioner(rdd.context.defaultParallelism) +val rdds = (Seq(rdd) ++ others) +val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0)) --- End diff -- @srowen Thank you ,I will lean much about code style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15039: [SPARK-17447] Performance improvement in Partitio...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15039#discussion_r78278727 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -55,14 +55,16 @@ object Partitioner { * We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD. */ def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { -val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.length).reverse -for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) { - return r.partitioner.get -} -if (rdd.context.conf.contains("spark.default.parallelism")) { - new HashPartitioner(rdd.context.defaultParallelism) +val rdds = (Seq(rdd) ++ others) +val hashPartitioner = rdds.filter(_.partitioner.exists(_.numPartitions > 0)) --- End diff -- hasPartitioner, not hashPartitioner. You should copy the code I provided. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/14842 Retest please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Do you mind if I ask whether this work with vectorized parquet reader too? I know normal Parquet reader uses `SpecificMutableRow` but IIRC, Parquet vectorized reader replies on `ColumnarBatch` which does not use `SpecificMutableRow`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65210/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #65210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65210/consoleFull)** for PR 14956 at commit [`b5aaec9`](https://github.com/apache/spark/commit/b5aaec9a398fc4ac0754efb1e14345c3464acd49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Shouldn't we change the reading path for Parquet rather than changing the target row to avoid per-record type dispatch? Also, it seems a Parquet specific issue but I wonder making changes in row is a good approach. I remember my PR to support upcasting in schema for Parquet, https://github.com/apache/spark/pull/14215 which I decided to close for a better approach. I haven't taken a look so closely but I will and leave some comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65209/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14737 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14737 **[Test build #65209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65209/consoleFull)** for PR 14737 at commit [`5163a51`](https://github.com/apache/spark/commit/5163a51a81ea509bd76b3452fa33fb83078c279e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #65211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65211/consoleFull)** for PR 14452 at commit [`23e2dc8`](https://github.com/apache/spark/commit/23e2dc865eef690eb273cc69888ca577eaa603a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12575: [SPARK-14803][SQL][Optimizer] A bug in EliminateS...
Github user sun-rui closed the pull request at: https://github.com/apache/spark/pull/12575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15045 Why bother saying 'specified' or 'default' at all though? it's probably even more informative to state that HashPartitioner doesn't work, no matter what the source. If the user specified HashPartitioner, that's clear. If they didn't, they'll still recognize that the other half of the message is relevant: some thing doesn't like their array keys. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14956#discussion_r78277526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -395,7 +395,7 @@ object PowerIterationClustering extends Logging { val points = v.mapValues(x => Vectors.dense(x)).cache() val model = new KMeans() .setK(k) - .setSeed(0L) + .setSeed(5L) --- End diff -- I got the tests to pass reliably by simply making the two sets of points generated in this test both contain 10 points, not 10 and 40. Balancing them made the issue go away. As to why the paper 'works', I'm actually not clear it does. It does not actually just k-means cluster the values. They say they run 100 clusterings and take the most common cluster assignment. It's a little ambiguous what this means, but may be the source of difference. AFAICT the current PIC test does present a situation that PIC clustering won't get right, often, if it uses straight k-means internally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with better ch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14956 **[Test build #65210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65210/consoleFull)** for PR 14956 at commit [`b5aaec9`](https://github.com/apache/spark/commit/b5aaec9a398fc4ac0754efb1e14345c3464acd49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14956: [SPARK-17389] [ML] [MLLIB] KMeans speedup with be...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14956#discussion_r78277099 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala --- @@ -395,7 +395,7 @@ object PowerIterationClustering extends Logging { val points = v.mapValues(x => Vectors.dense(x)).cache() val model = new KMeans() .setK(k) - .setSeed(0L) + .setSeed(5L) --- End diff -- Sorry Matei I kind of missed your point. Yes it's a bit more strange to be changing a seed in a non-test file. Same reasoning, but I agree. There's a seed here to begin with for determinism but probably shouldn't matter. I think I understand the problem, and I think it's the test. k-means is used to cluster 1D data. The test case generates two concentric circles of points of radius 1 and 4, which are intended to form k=2 separate clusters in the derived values that are clustered by k-means internally. That's even clear from looking at the similarities plotted: ![rplot](https://cloud.githubusercontent.com/assets/822522/18410522/2de05f76-775d-11e6-8e75-07d3d31e5cae.png) While it's clear what the clustering is supposed to be, it's not actually the lowest-cost k-means clustering. Many clusterings do find the 'wrong' better clustering which is one that would include a few of the leftmost elements of the right group into the left one. Many other clusterings get the 'right' answer which is a big local minimum but not optimal. In fact, k-means|| init seems to do worse than random here exactly because it's less likely to find the local minimum. I don't think the choice of radii matters here, since the resulting values above are basically invariant. I'm going to have to read the paper more to understand what the difference is here. It's not quite the k-means change here, and, we can make this test pass easily by either - Set seed back to 0 and fix init steps = 5 for this use of k-means, because that happens to work. Then this implementation doesn't change at all. It means it does more work just to make the test pass. - Set seed to, say, 5 to get this to pass, on the theory that the choice of seed still doesn't seem to matter per se, and 5 is no worse than 0. Obviously I want to understand a little more about how this is ever supposed to work in PIC, though it ends up being a slightly different issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65208/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13758 **[Test build #65208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65208/consoleFull)** for PR 13758 at commit [`8639319`](https://github.com/apache/spark/commit/863931994a2f24936b5312f7a8f79ae8204d57b1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65207/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15045 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15045 **[Test build #65207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65207/consoleFull)** for PR 15045 at commit [`6520854`](https://github.com/apache/spark/commit/6520854c565b87c80bd96a26b9b2aaefa0c5f752). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15045 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65206/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix "default partitioner cannot part...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15045 **[Test build #65206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65206/consoleFull)** for PR 15045 at commit [`d423b41`](https://github.com/apache/spark/commit/d423b4165a0b778852a76cf8d04615ec2465c4d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15039: [SPARK-17447] Performance improvement in Partitio...
Github user codlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15039#discussion_r78276638 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -55,14 +55,15 @@ object Partitioner { * We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD. */ def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { -val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.length).reverse -for (r <- bySize if r.partitioner.isDefined && r.partitioner.get.numPartitions > 0) { - return r.partitioner.get +val rdds = Seq(rdd) ++ others +val filteredRdds = rdds.filter( _.partitioner.exists(_.numPartitions > 0 )) --- End diff -- @srowen thank you very much , i am a new hand about saprk, but i'm interested in it very much,i have fixed my code style.thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15022: [SPARK-17465] [Spark Core] Inappropriate memory manageme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15022 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65200/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15022: [SPARK-17465] [Spark Core] Inappropriate memory manageme...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15022 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15022: [SPARK-17465] [Spark Core] Inappropriate memory manageme...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15022 **[Test build #65200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65200/consoleFull)** for PR 15022 at commit [`6b11fe8`](https://github.com/apache/spark/commit/6b11fe8d07728e9add07d8df5845658f9fef3e60). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65204/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14969 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14969 **[Test build #65204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65204/consoleFull)** for PR 14969 at commit [`c725891`](https://github.com/apache/spark/commit/c7258916b8f34cc31edcb7033e783d990a3fa769). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14737: [SPARK-17171][WEB UI] DAG will list all partitions in th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14737 **[Test build #65209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65209/consoleFull)** for PR 14737 at commit [`5163a51`](https://github.com/apache/spark/commit/5163a51a81ea509bd76b3452fa33fb83078c279e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13758 **[Test build #65208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65208/consoleFull)** for PR 13758 at commit [`8639319`](https://github.com/apache/spark/commit/863931994a2f24936b5312f7a8f79ae8204d57b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14894 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14894 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65205/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14894 **[Test build #65205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65205/consoleFull)** for PR 14894 at commit [`cd8e9a4`](https://github.com/apache/spark/commit/cd8e9a4ddf704f2d01df870cc898af53e62a9d2f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15042: [SPARK-17449] [Documentation] [Relation between heartbea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15042 **[Test build #3254 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3254/consoleFull)** for PR 15042 at commit [`83031c4`](https://github.com/apache/spark/commit/83031c4d285db633c6468ef8471810765f62c0be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65203/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14969 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14969 **[Test build #65203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65203/consoleFull)** for PR 14969 at commit [`4dda55c`](https://github.com/apache/spark/commit/4dda55ca2f614228fbd6f926fd201073894a8abf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix partitionBy error message
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15045 **[Test build #65207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65207/consoleFull)** for PR 15045 at commit [`6520854`](https://github.com/apache/spark/commit/6520854c565b87c80bd96a26b9b2aaefa0c5f752). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15041 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65202/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix partitionBy error message
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15045 oh, there are 5 similar messages.. I check the others, the others may be set the default one, so I update their message as "Specified or default partitioner..." but the one in `partitionBy` must be set by user, can't use default one, because the `partitionBy` API is let user specify how the RDD to be partitioned. other cases, the `partitioner` parameter is optional and if user not specify it, it will be set into the default `HashPartitioner`. Now I update the code. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15041 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15041 **[Test build #65202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65202/consoleFull)** for PR 15041 at commit [`e7c6b16`](https://github.com/apache/spark/commit/e7c6b1625b6a67cfab958f64f5238811d5a39640). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix partitionBy error message
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15045 There are 5 instances of this check in the file -- they should all be handled the same way. I'm not sure this is accurate either because some code paths lead to these methods when HashPartitioner is used as a default. Just say that HashPartitioner can't be used? refactor one check method for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14961 @a-roberts are you in a position to add this change to this PR as an experiment? I can try it on the side too. I can't seem to reproduce the failure locally, even when fully rebuilding the project with a newer netty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15045: [Spark Core][MINOR] fix partitionBy error message
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15045 **[Test build #65206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65206/consoleFull)** for PR 15045 at commit [`d423b41`](https://github.com/apache/spark/commit/d423b4165a0b778852a76cf8d04615ec2465c4d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15045: [Spark Core][MINOR] fix partitionBy error message
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/15045 [Spark Core][MINOR] fix partitionBy error message ## What changes were proposed in this pull request? In order to avoid confusing user, it is better to change `PairRDDfunctions.partitionBy` error message from `Default partitioner cannot partition array keys.` ==> `Specified partitioner cannot partition array keys.` ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark fix_partitionBy_error_message Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15045 commit d423b4165a0b778852a76cf8d04615ec2465c4d0 Author: WeichenXuDate: 2016-09-08T12:24:06Z fix partitionBy error message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...
Github user lins05 commented on the issue: https://github.com/apache/spark/pull/15043 Did a simple test and it does fix the bug. One interesting thing is while records.count() returns a smaller number than the actual count, the spark UI still shows the correct records number, in my test case it's 2999808 v.s. 30. ![screen shot 2016-09-10 at 5 57 38 pm](https://cloud.githubusercontent.com/assets/717363/18409696/70a407e0-7780-11e6-9f22-7b55c24b0595.png) ![screen shot 2016-09-10 at 5 58 26 pm](https://cloud.githubusercontent.com/assets/717363/18409697/75f5254e-7780-11e6-98fd-5cae496f7c22.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14671: [SPARK-17091][SQL] ParquetFilters rewrite IN to OR of Eq
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14671 Thanks for confirming this. I will work on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15028: [SPARK-17336][PYSPARK] Fix appending multiple times to P...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15028 I think the current behavior might be worse on that dimension ... you might get several different versions of things at once on the classpath, not just redundant copies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14969: [SPARK-17406][WEB UI] limit timeline executor events
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14969 **[Test build #65204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65204/consoleFull)** for PR 14969 at commit [`c725891`](https://github.com/apache/spark/commit/c7258916b8f34cc31edcb7033e783d990a3fa769). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14894 **[Test build #65205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65205/consoleFull)** for PR 14894 at commit [`cd8e9a4`](https://github.com/apache/spark/commit/cd8e9a4ddf704f2d01df870cc898af53e62a9d2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14894: [SPARK-17330] [SPARK UT] Clean up spark-warehouse in UT
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14894 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14985: [SPARK-17396][core] Share the task support betwee...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14985 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14985: [SPARK-17396][core] Share the task support between Union...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14985 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org