[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16696 cc @cloud-fan @gatorsmile please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16696 **[Test build #71963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71963/testReport)** for PR 16696 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71962/testReport)** for PR 16695 at commit

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97701247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with

[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-24 Thread wzhfy
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16696 [SPARK-19350] [SQL] Cardinality estimation of Limit and Sample ## What changes were proposed in this pull request? Before this pr, LocalLimit/GlobalLimit/Sample propagates the same row count

[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16661 BTW, it maybe nice to add a `SymmetricMatrix` class, for symmetric matrice are widely used in computation of covariance/concurrence/etc --- If your project is set up for it, you can reply to

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700863 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-24 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97700450 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -274,25 +277,28 @@ private[spark] object JettyUtils extends Logging {

[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-24 Thread sarutak
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/16582 O.K, It's reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97700702 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -272,6 +277,10 @@ class GaussianMixture private ( }

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-24 Thread sarutak
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/16582#discussion_r97700650 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -337,17 +350,20 @@ private[spark] object JettyUtils extends Logging {

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -90,25 +95,100 @@ trait BaseLimitExec extends UnaryExecNode with

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-01-24 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r97700568 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -230,6 +230,21 @@ case object SinglePartition

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71962/testReport)** for PR 16695 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71961/testReport)** for PR 16695 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16695 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16695 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71961/ Test FAILed. ---

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71961/testReport)** for PR 16695 at commit

[GitHub] spark issue #16502: Branch 2.1

2017-01-24 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16502 mistakenly opened? close it please! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...

2017-01-24 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16656 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71951/ Test PASSed. ---

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71951/testReport)** for PR 16650 at commit

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 Thank you for your time reviewing this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16572 **[Test build #71960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71960/testReport)** for PR 16572 at commit

[GitHub] spark pull request #15505: [SPARK-18890][CORE] Move task serialization from ...

2017-01-24 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15505#discussion_r97695455 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -602,6 +619,21 @@ class

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71957/ Test FAILed. ---

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71957/testReport)** for PR 16686 at commit

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97694229 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -223,7 +228,10 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97694210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,66 +117,72 @@ trait CheckAnalysis extends

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread leifwalsh
Github user leifwalsh commented on the issue: https://github.com/apache/spark/pull/15821 The next iteration of this for perf would likely involve generating the arrow batches on executors and having the driver use the new streaming arrow format to just forward this to python. In our

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97690998 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -0,0 +1,370 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97690962 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -0,0 +1,370 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97692374 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -311,7 +271,7 @@ private[kafka010] class

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97690727 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97690665 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97691654 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97690588 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97691519 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-24 Thread QQshu1
Github user QQshu1 commented on the issue: https://github.com/apache/spark/pull/16494 @imatiach-msft Excuse me. I have two questions. 1.Why this issue only happen that if we use "sc.setCheckpointDir(path)" ? 2. You say "LDA fails with a ClassCastException when run on

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71954/ Test PASSed. ---

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16686 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71954/testReport)** for PR 16686 at commit

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16478 **[Test build #71958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71958/testReport)** for PR 16478 at commit

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #71959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71959/testReport)** for PR 15505 at commit

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-01-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16478 also cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16691 **[Test build #71956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71956/testReport)** for PR 16691 at commit

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71957/testReport)** for PR 16686 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71953/testReport)** for PR 16695 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16695 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71953/ Test FAILed. ---

[GitHub] spark pull request #16687: [SPARK-19343][DStreams] Do once optimistic checkp...

2017-01-24 Thread uncleGen
Github user uncleGen closed the pull request at: https://github.com/apache/spark/pull/16687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16695 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...

2017-01-24 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16691 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)** for PR 16677 at commit

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16686 **[Test build #71954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71954/testReport)** for PR 16686 at commit

[GitHub] spark issue #16695: [SPARK-19277][yarn] Localize topology scripts inside Had...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16695 **[Test build #71953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71953/testReport)** for PR 16695 at commit

[GitHub] spark pull request #16695: [SPARK-19277][yarn] Localize topology scripts ins...

2017-01-24 Thread vanzin
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/16695 [SPARK-19277][yarn] Localize topology scripts inside Hadoop configuration. Hadoop has some configurations that can be used to run an external script that prints out information about cluster

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71952/testReport)** for PR 16605 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 also cc @cloud-fan and @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16673: [SPARK-19330][DStreams] Also show tooltip for suc...

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16673 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15821 On a personal note, those benchmarks certainly look very exciting (<3 max of with arrow less than min of without arrow) :) It certainly seems it would probably be worth the review bandwidth

[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue

2017-01-24 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16660 @gmoehler `ExampleBaseClass` overrides `toString` impl. That is redundant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16465: [SPARK-19064][PySpark]Fix pip installing of sub componen...

2017-01-24 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16465 Sounds like a plan, I think should should probably be on the 2.1 branch as well so I'll go bug someone who has done backports to make sure I do that part right :) --- If your project is set up

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Here are some rough benchmarks done locally on machine with 16GB mem and 8 cores, using Spark config defaults and taken from 50 trials of calling `toPandas()` with and without Arrow enabled:

[GitHub] spark issue #16685: [SPARK-19335] Introduce insert, update, and upsert comma...

2017-01-24 Thread ilganeli
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16685 @xwu0226 Thanks for the comments, I've reviewed your submission and commented here https://github.com/apache/spark/pull/16692. Specifically in response to your comments: 1) We did not

[GitHub] spark issue #16692: [SPARK-19335] Introduce UPSERT feature to SPARK

2017-01-24 Thread ilganeli
Github user ilganeli commented on the issue: https://github.com/apache/spark/pull/16692 Hi, all - thanks for this submission. Overall it's a very clean implementation and I like it a lot. There's obviously a large amount of effort that went into developing this. The main issue with

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97676523 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -223,7 +228,10 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97676592 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,66 +117,72 @@ trait CheckAnalysis extends

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97676813 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -163,7 +163,12 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 Note the way the plans inside subqueries are not treated as part of the tree traversal is a common problem. Besides this problem, another was reported in SPARK-19093. Also the way Spark needs to

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 This has been updated after integrating changes made with @icexelloss and @wesm. There has been good progress made and it would be great if others could take a look and review/test this out.

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97675872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,66 +117,72 @@ trait CheckAnalysis extends

[GitHub] spark issue #16687: [SPARK-19343][DStreams] Do once optimistic checkpoint be...

2017-01-24 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16687 Thanks for your contribution. However, I don't see any strong point to add this. This change may introduce other issues: - The failure or stopping StreamingContext should rarely

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #71951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71951/testReport)** for PR 16650 at commit

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16572 Note that the diff is better to read using githubs `w=1` flag: https://github.com/apache/spark/pull/16572/files?w=1 --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread jsoltren
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/16650 The error was: `ERROR: Error fetching remote repo 'origin' hudson.plugins.git.GitException: Failed to fetch from https://github.com/apache/spark.git` I don't think that my

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71950/ Test FAILed. ---

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #71950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71950/testReport)** for PR 15821 at commit

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16650 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #15821: [SPARK-13534][WIP][PySpark] Using Apache Arrow to increa...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #71950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71950/testReport)** for PR 15821 at commit

[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...

2017-01-24 Thread anabranch
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/16138 @cloud-fan The error i see is the one in this [test

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71949/ Test FAILed. ---

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16681: [SPARK-19334][SQL]Fix the code injection vulnerab...

2017-01-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16681 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16681: [SPARK-19334][SQL]Fix the code injection vulnerability r...

2017-01-24 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16681 LGTM. Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-24 Thread jsoltren
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/16650 My solution for the `org.apache.spark.deploy.StandaloneDynamicAllocationSuite.kill all executors on localhost` failure was to add a Boolean argument to killExecutorsOnHost specifying if killed

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97657263 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala --- @@ -0,0 +1,167 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97626627 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -0,0 +1,376 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97627081 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -0,0 +1,376 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97661376 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97665946 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97628344 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -0,0 +1,376 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97629645 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaProvider.scala --- @@ -28,19 +28,27 @@ import

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97657985 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala --- @@ -0,0 +1,167 @@ +/* + * Licensed to

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97664854 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97666273 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the

[GitHub] spark pull request #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-01-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16686#discussion_r97637198 --- Diff: external/kafka-0-10-sql/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister --- @@ -1 +1 @@

<    1   2   3   4   5   >