[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...

2017-10-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19412 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19443 I think it's okay to wait for a few days more and for other committers who might support or like this idea before closing this. I won't stay against. Providing more compelling reasons

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19444 **[Test build #82509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82509/testReport)** for PR 19444 at commit

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143250260 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82515/testReport)** for PR 18732 at commit

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18664 If we all agree on the necessity of a design doc first, I can create a Jira and we can make progress there. What do you all think? @BryanCutler @gatorsmile @HyukjinKwon ---

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19448 **[Test build #82517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82517/testReport)** for PR 19448 at commit

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19448 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82517/ Test PASSed. ---

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19448 **[Test build #82517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82517/testReport)** for PR 19448 at commit

[GitHub] spark pull request #19444: [SPARK-22214][SQL] Refactor the list hive partiti...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19444#discussion_r143235124 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -638,12 +638,14 @@ private[hive] class HiveClientImpl(

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18460 I see. There is a typo in PR title: `comparision`. Will review the fix soon. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143243338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143243228 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -1015,6 +1020,10 @@ object DateTimeUtils {

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82514/ Test PASSed. ---

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/19449 [SPARK-22219][SQL] Refactor code to get a value for "spark.sql.codegen.comments" ## What changes were proposed in this pull request? This PR refactors code to get a value for

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread zivanfi
Github user zivanfi commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143257840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark issue #18229: [SPARK-20691][CORE] Difference between Storage Memory as...

2017-10-06 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18229 gentle ping @mkesselaers --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143263694 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19438 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19438 Maybe we can run some of the major test suites locally and update all the results. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19444 **[Test build #82519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82519/testReport)** for PR 19444 at commit

[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparision should respec...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143238804 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion {

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19424 **[Test build #82513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82513/testReport)** for PR 19424 at commit

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 The baseline should be (as said above): Internal optimisation should not introduce any behaviour change, and we are discouraged to change the previous behaviour unless it has bugs in general.

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143245229 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -230,6 +230,13 @@ case class AlterTableSetPropertiesCommand(

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143252186 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class

[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...

2017-10-06 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19077 gentle ping @jerryshao for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 I am okay with proceeding separately for dealing with timezone, and matching the behaviour with Arrow to the existing behaviour without Arrow here with respect to timezone. Less sure

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143232991 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19438 **[Test build #82512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82512/testReport)** for PR 19438 at commit

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19438 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82512/ Test FAILed. ---

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19444 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143242256 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #82516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82516/testReport)** for PR 18931 at commit

[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18966 **[Test build #82518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82518/testReport)** for PR 18966 at commit

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143272791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 Here the answers to your questions @gatorsmile , please tell me if I need to elaborate more deeply. This conf controls how many inner classes are generated. A big value means that we will have

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19444 LGTM except a minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143246203 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/TimestampTableTimeZone.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to

[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19449 **[Test build #82520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82520/testReport)** for PR 19449 at commit

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 Yup, I admit there could be some exceptions (there have been actually) but that should still be the baseline we should basically pursue. Probably, we could treat this Arrow optimisation as an

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19447 Instead of simply introducing a new conf, we need to answer three questions; otherwise, this is not useful to the users. - What is the perf impact of this conf? - When should users

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143232506 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class

[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-06 Thread jsnowacki
Github user jsnowacki commented on the issue: https://github.com/apache/spark/pull/19443 The only other reason for Python I can think of, if the above are not compelling enough, is that the issue with function not having call-by-column-name option is that we'll get the error only at

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82513/ Test PASSed. ---

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143244400 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -266,6 +267,10 @@ final class DataFrameWriter[T] private[sql](ds:

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19447 We still need to add a test case. We also should capture the exception and issue a better one; otherwise, users will not know what they should do when they hit such a confusing error.

[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82516/ Test FAILed. ---

[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19250#discussion_r143243769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1213,6 +1213,71 @@ case class

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19424 **[Test build #82514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82514/testReport)** for PR 19424 at commit

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82515/ Test PASSed. ---

[GitHub] spark pull request #19450: [SPARK-22218] spark shuffle services fails to upd...

2017-10-06 Thread tgravescs
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/19450 [SPARK-22218] spark shuffle services fails to update secret on app re-attempts This patch fixes application re-attempts when running spark on yarn using the external shuffle service with

[GitHub] spark issue #19450: [SPARK-22218] spark shuffle services fails to update sec...

2017-10-06 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/19450 ping @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19450: [SPARK-22218] spark shuffle services fails to update sec...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19450 **[Test build #82521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82521/testReport)** for PR 19450 at commit

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-06 Thread steveloughran
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/19448 [SPARK-22217] [SQL] ParquetFileFormat to support arbitrary OutputCommitters ## What changes were proposed in this pull request? `ParquetFileFormat` to relax its requirement of output

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r143232229 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +60,15 @@ class

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19448 + @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18966 **[Test build #82518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82518/testReport)** for PR 18966 at commit

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18664 > The baseline should be (as said above): Internal optimisation should not introduce any behaviour change, and we are discouraged to change the previous behaviour unless it has bugs in general.

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18664 I agree. I think some high level document describing these differences so we can discuss it. I think we should be more careful about Arrow-version behavior before releasing support for timestamp

[GitHub] spark issue #19372: [SPARK-22156][MLLIB] Fix update equation of learning rat...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19372 **[Test build #3943 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3943/testReport)** for PR 19372 at commit

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19448 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19424 **[Test build #82513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82513/testReport)** for PR 19424 at commit

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19438 **[Test build #82512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82512/testReport)** for PR 19438 at commit

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-06 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19438#discussion_r143203632 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ImputerSuite.scala --- @@ -43,7 +43,7 @@ class ImputerSuite extends SparkFunSuite with

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18732 Hi All, I think all comments should be addressed at this point, except for the naming comment from @rxin. If I missed something or if there is anything else you want me to address,

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143226714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143204826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class

[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19370 I saw the same problem on github.com a couple of days ago. I think something is broken with the github integration but not sure how that is managed. ---

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19447 Could you add a test case for this? Since it's configurable, you can set a small number and catch some exceptions? --- -

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143223820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143226823 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19447 Do you mean Spark will work with small value like 1000? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143202186 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext {

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19444 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82509/ Test PASSed. ---

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19444 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143207681 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark pull request #19090: [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in W...

2017-10-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19090 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19090: [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows ...

2017-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19090 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19444: [SPARK-22214][SQL] Refactor the list hive partitions cod...

2017-10-06 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19444 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19442: [SPARK-8515][ML][WIP] Improve ML Attribute API

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19442 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82511/ Test PASSed. ---

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19082 I'm also curious the performance of the master + #18931 on q66. Based on @maropu's benchmark before, q66 has much performance improvement. ---

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143217286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext

[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r143228054 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -73,25 +73,37 @@ case class

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19082 TPC-DS benchmark just shows the perf of one of the Spark workloads. To avoid introducing the unexpected regressions, we have to be very careful when merging this PR and

[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82508/ Test PASSed. ---

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19447 [SPARK-22215][SQL] Add configuration to set the threshold for generated class ## What changes were proposed in this pull request? SPARK-18016 introduced an arbitrary threshold for the

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143198047 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143213000 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-10-06 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r143216395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -944,6 +945,24 @@ class CodegenContext {

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19082 sure! I'll check two more; - master + #18931 - master + this pr + #18931 --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-06 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143221337 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala --- @@ -105,4 +108,108 @@ object Utils { def

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-06 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r143221450 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala --- @@ -105,4 +108,108 @@ object Utils { def

[GitHub] spark pull request #19412: [SPARK-22142][BUILD][STREAMING] Move Flume suppor...

2017-10-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19412 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143215289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class

  1   2   3   4   >