[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16818 **[Test build #72432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72432/testReport)** for PR 16818 at commit

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16820 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #72434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72434/testReport)** for PR 16819 at commit

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16809 @cloud-fan The new behavior looks reasonable to me, unless users are expecting to keey the original cached data. I went over the change history. I found @sameeragarwal did this in

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16820 **[Test build #72445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72445/testReport)** for PR 16820 at commit

[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16680#discussion_r99583137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -170,32 +111,21 @@ class CSVFileFormat

[GitHub] spark pull request #16803: [SPARK-19458][SQL]load hive jars from local repo ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16803#discussion_r99583052 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -480,7 +479,12 @@ object SparkSubmit extends CommandLineUtils {

[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16803 Adding a new option `spark.jars.repositories` afffects more than loading hive jars, right? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-06 Thread robbinspg
Github user robbinspg commented on the issue: https://github.com/apache/spark/pull/16751 Sorry, I've been away for the w/end. Yes we use maven for our test runs. Looks like you have it under control. Thanks --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16803 **[Test build #72428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72428/testReport)** for PR 16803 at commit

[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72428/ Test PASSed. ---

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r99572391 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -232,19 +232,40 @@ class Word2VecModel private[ml] ( @Since("1.5.0")

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72442/testReport)** for PR 16787 at commit

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16792 **[Test build #72441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72441/testReport)** for PR 16792 at commit

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread yangw1234
Github user yangw1234 commented on the issue: https://github.com/apache/spark/pull/16820 @mengxr @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 cc @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16792 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16792 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72441/ Test PASSed. ---

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16792 **[Test build #72441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72441/testReport)** for PR 16792 at commit

[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16803 Please rename it to [SPARK-19458][BUILD][SQL]load hive jars from local repo which has downloaded --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16626 **[Test build #72447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72447/testReport)** for PR 16626 at commit

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16680 **[Test build #72446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72446/testReport)** for PR 16680 at commit

[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2017-02-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16387 @samkum Any update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72433/testReport)** for PR 16787 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72433/ Test FAILed. ---

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16792 **[Test build #72439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72439/testReport)** for PR 16792 at commit

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16815 **[Test build #3558 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3558/testReport)** for PR 16815 at commit

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16815 **[Test build #72436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72436/testReport)** for PR 16815 at commit

[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16819 **[Test build #72434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72434/testReport)** for PR 16819 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72438/testReport)** for PR 16787 at commit

[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16818 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72432/ Test PASSed. ---

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread windpiger
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16787 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and exten...

2017-02-06 Thread zero323
Github user zero323 commented on a diff in the pull request: https://github.com/apache/spark/pull/16792#discussion_r99572943 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1272,16 +1272,18 @@ def replace(self, to_replace, value, subset=None): """Returns a new

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16815 **[Test build #72435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72435/testReport)** for PR 16815 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72442/testReport)** for PR 16787 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72442/ Test FAILed. ---

[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16747 (FWIW, I am OK but just worried if it might be supposed to be internal type, maybe in the future) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16818 @uncleGen I think we should limit this to allowing long values for range frames only; row frames should not get larger than `1 << 31 + 1`. The reason for this is that we also need to be able to

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72448/ Test FAILed. ---

[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r99585661 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72448/testReport)** for PR 16787 at commit

[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r99585644 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends

[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-02-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16751 Pardon me, but is there anywhere else keeping track of the build break with SBT? It's been failing for a while in master:

[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16810#discussion_r99561230 --- Diff: docs/building-spark.md --- @@ -63,57 +63,30 @@ with Maven profile settings and so on like the direct Maven build. Example: This will

[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16810 **[Test build #72437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72437/testReport)** for PR 16810 at commit

[GitHub] spark issue #16738: [SPARK-19398] Change one misleading log in TaskSetManage...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16738 **[Test build #3556 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3556/testReport)** for PR 16738 at commit

[GitHub] spark issue #16789: [SPARK-19444][ML][Documentation] Fix imports not being p...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16789 **[Test build #3557 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3557/testReport)** for PR 16789 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72440/testReport)** for PR 16787 at commit

[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72434/ Test PASSed. ---

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72448/testReport)** for PR 16787 at commit

[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16269 LGTM pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16809 Found the design doc: https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts=574f717f# > An alternative is to support a new command REFRESH path that

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16820 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72445/ Test FAILed. ---

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16820 **[Test build #72445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72445/testReport)** for PR 16820 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 @sujith71955 Thanks for the test! The test number looks promising! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16820 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72443 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72443/testReport)** for PR 16787 at commit

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16820 @yangw1234 could you also check if we need to do this for whole stage code generation? ...and you really need to add tests. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-02-06 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16820 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16626 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16386 Sorry, I missed the ping. Will review it tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16787: [WIP][SPARK-19448][SQL]optimize some duplication ...

2017-02-06 Thread windpiger
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r99588025 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -841,5 +841,6 @@ private[client] class Shim_v1_2 extends

[GitHub] spark issue #16818: [SPARK-19451][SQL][Core] Underlying integer overflow in ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16818 @hvanhovell Thanks for your suggestions, it is just what I failed to notice or consider. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16816: Code style improvement

2017-02-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16816 @zhoucen please close this PR and read http://spark.apache.org/contributing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16817 **[Test build #72430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72430/testReport)** for PR 16817 at commit

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread sujith71955
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/16677 @viirya i tested with the above mentioned approach with sample data, it has improved the performance almost into 3X Please find the test report Total No of Executers = 3 Total

[GitHub] spark issue #16803: [SPARK-19458][SQL]load hive jars from local repo which h...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72443/ Test FAILed. ---

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72443/testReport)** for PR 16787 at commit

[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16747 It seems there are several ones here and there. Maybe https://github.com/apache/spark/pull/15751#issuecomment-258518577 is related too because it is about supporting reading/writing out that

[GitHub] spark issue #16787: [WIP][SPARK-19448][SQL]optimize some duplication functio...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72449/testReport)** for PR 16787 at commit

[GitHub] spark pull request #16816: Code style improvement

2017-02-06 Thread zhoucen
Github user zhoucen closed the pull request at: https://github.com/apache/spark/pull/16816 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #16815: [SPARK-19407][SS] defaultFS is used FileSystem.ge...

2017-02-06 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16815#discussion_r99556423 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala --- @@ -47,7 +47,7 @@ object StreamMetadata extends Logging

[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16819 I don't think this is a necessary change. Already, you can't ask for more resources than the cluster has; the cluster won't grant them. Capping it here means the app can't use more resources if the

[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16747 CC @cloud-fan for https://github.com/apache/spark/pull/13008#r62947902 and @yhuai for https://github.com/apache/spark/pull/8597#r38769233 as they might be what you're referring to? --- If your

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16792 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72438/ Test FAILed. ---

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16792 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72439/ Test FAILed. ---

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16792 **[Test build #72439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72439/testReport)** for PR 16792 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72440/ Test FAILed. ---

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72438/testReport)** for PR 16787 at commit

[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72440/testReport)** for PR 16787 at commit

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16680 (I am fine with changing the name only for CSV ones for now as well. I would appreciate if you confirm please) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16815 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16817 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16817: [SPARK-17213][SQL][FOLLOWUP] Re-enable Parquet filter te...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72430/ Test PASSed. ---

[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...

2017-02-06 Thread yangw1234
GitHub user yangw1234 opened a pull request: https://github.com/apache/spark/pull/16820 [SPARK-19471] AggregationIterator does not initialize the generated result projection before using it ## What changes were proposed in this pull request? When AggregationIterator

[GitHub] spark issue #16789: [SPARK-19444][ML][Documentation] Fix imports not being p...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16789 **[Test build #3557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3557/testReport)** for PR 16789 at commit

[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16269 **[Test build #72444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72444/testReport)** for PR 16269 at commit

[GitHub] spark pull request #16269: [SPARK-19080][SQL] simplify data source analysis

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16269#discussion_r99578597 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -374,25 +375,24 @@ case class

[GitHub] spark issue #16680: [SPARK-16101][SQL] Refactoring CSV schema inference path...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16680 @cloud-fan, I just mainly resembled ones in JSON datasource and I am pretty sure you knew this when you added some comments. But let me just rebase this as is for now just in case maybe you are

[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16680#discussion_r99583374 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -39,22 +37,76 @@ private[csv] object

[GitHub] spark pull request #16680: [SPARK-16101][SQL] Refactoring CSV schema inferen...

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16680#discussion_r99583376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the

  1   2   3   4   5   6   >