[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19086 Sure, current behavior is hive behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19086 Please hold it. It means it is a behavior change. Let me consider it more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19086 yes, correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19086 > use db2; > alter table db1.t2 rename to t1; After this PR, it is renamed to `db2.t1`, right? Before this PR, it is renamed to `db1.t1`, right? --- If your project is set

[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-03 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19115 Please create a new pr against master branch and close this one. If the issue doesn't exist in master branch, then consider backporting that fix to 2.2 branch. --- If your project is set up for

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136747432 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -569,46 +569,51 @@ class SessionCatalog(

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136747159 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -418,6 +439,42 @@ abstract class

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17014 **[Test build #81372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81372/testReport)** for PR 17014 at commit

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17014 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81372/ Test FAILed. ---

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17014 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136746833 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala --- @@ -89,6 +89,8 @@ case class ExpandExec(

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17014 **[Test build #81372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81372/testReport)** for PR 17014 at commit

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136746468 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -177,6 +177,8 @@ case class SortExec( """.stripMargin.trim

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19086 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19086 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81370/ Test FAILed. ---

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19086 **[Test build #81370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81370/testReport)** for PR 19086 at commit

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136745765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -569,46 +569,51 @@ class SessionCatalog(

[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19116 cc @srowen and @vanzin, could you take a look please when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136745412 --- Diff: scalastyle-config.xml --- @@ -268,10 +268,7 @@ This file is divided into 3 sections: - -^Override$ -

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136745459 --- Diff: project/plugins.sbt --- @@ -7,8 +7,7 @@ addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.1.0") // sbt 1.0.0 support:

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19116#discussion_r136745559 --- Diff: project/SparkBuild.scala --- @@ -163,14 +163,15 @@ object SparkBuild extends PomBuild { val configUrlV =

[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19086#discussion_r136745597 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -569,46 +569,51 @@ class SessionCatalog(

[GitHub] spark issue #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19116 **[Test build #81371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81371/testReport)** for PR 19116 at commit

[GitHub] spark pull request #19116: [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0.

2017-09-03 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19116 [SPARK-21903][BUILD] Upgrade scalastyle to 1.0.0. ## What changes were proposed in this pull request? 1.0.0 fixes an issue with import order, explicit type for public methods, line

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18902 hmm... that's interesting. So I found performance gap between dataframe codegen aggregation and the simple RDD aggregation. I will discuss with SQL team for this later. Thanks! --- If your

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18902 @WeichenXu123 No, I only cache the DataFrame. And the RDD-Version is [here](https://github.com/apache/spark/pull/18902/commits/8daffc9007c65f04e005ffe5dcfbeca634480465). I use the same

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 How about the other popular open source projects? Do you know whether which projects are using Univocity 2.5? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17014 @zhengruifeng `KMeans` regarded as a bugfix(SPARK-21799) because the double-cache issue is introduced in 2.2 and cause perf regression. Other algos also have the same issue, but the issue

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19113 Any performance measure from 2.2 to 2.5? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18865 Yea, it is and that's what we discussed - https://github.com/apache/spark/pull/18865#discussion_r131887972. In that way, I was thinking

[GitHub] spark pull request #18869: [SPARK-21654][SQL] Complement SQL predicates expr...

2017-09-03 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18869 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18869: [SPARK-21654][SQL] Complement SQL predicates expression ...

2017-09-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18869 Thanks @HyukjinKwon @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18869: [SPARK-21654][SQL] Complement SQL predicates expression ...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18869 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18869: [SPARK-21654][SQL] Complement SQL predicates expression ...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18869 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18865 But doesn't this behavior that `_corrupt_record` content depends on the selected json fields is designed at the first? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18865 > the usage can cause weird results too > `_corrupt_record` should return all the records that Spark SQL fail to parse I think another point should be, this issue still exists

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136743435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortExec.scala --- @@ -177,6 +177,8 @@ case class SortExec( """.stripMargin.trim

[GitHub] spark issue #19115: [SPARK-21882][CORE] OutputMetrics doesn't count written ...

2017-09-03 Thread awarrior
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19115 @jerryshao hi~ I have modified this PR. But this patch just work in 2.2.0 (some changes apply now). I want to confirm whether I need to create a new PR. Thanks! --- If your project is set up

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19113 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19113 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81368/ Test PASSed. ---

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19113 **[Test build #81368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81368/testReport)** for PR 19113 at commit

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18865 BTW, this change should be put into the migration guide of Spark SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18865 From the viewpoints of the end users of Spark, `dfFromFile.select($"_corrupt_record").show()` might not return all the expected records. ``_corrupt_record`` should return all the records that

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136742515 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala --- @@ -89,6 +89,8 @@ case class ExpandExec(

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136742331 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,14 +149,146 @@ trait CodegenSupport extends

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136742282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,14 +149,146 @@ trait CodegenSupport extends

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136742019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,14 +149,146 @@ trait CodegenSupport extends

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-09-03 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r136741637 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,14 +149,146 @@ trait CodegenSupport extends

[GitHub] spark issue #19050: [SPARK-21835][SQL] RewritePredicateSubquery should not p...

2017-09-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19050 ping @cloud-fan @hvanhovell This blocks the #18956 going forward, can you help review this change? Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18869: [SPARK-21654][SQL] Complement SQL predicates expression ...

2017-09-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18869 @HyukjinKwon @gatorsmile Any more comments on this? The added tests should be enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136740379 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,132 @@ class InsertIntoHiveTableSuite extends QueryTest

[GitHub] spark pull request #19114: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
Github user awarrior closed the pull request at: https://github.com/apache/spark/pull/19114 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136739979 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,132 @@ class InsertIntoHiveTableSuite extends QueryTest

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136739964 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,132 @@ class InsertIntoHiveTableSuite extends QueryTest

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136739990 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,132 @@ class InsertIntoHiveTableSuite extends QueryTest

[GitHub] spark issue #18902: [SPARK-21690][ML] one-pass imputer

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18902 +1 for using Dataframe-based version code. @zhengruifeng One thing I want to confirm is that, I check your testing code, both RDD-based version and Dataframe-based version code will

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19086 **[Test build #81370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81370/testReport)** for PR 19086 at commit

[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19086 @gatorsmile I updated, let me known if there's still comments not resolved. Thanks again for review. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18865 @HyukjinKwon 's provided use case looks pretty fair. The corrupt record is the whole line which doesn't follow the json format. It is kind of different to the corrupt record case that some json

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136739126 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to

[GitHub] spark pull request #19079: [SPARK-21859][CORE] Fix SparkFiles.get failed on ...

2017-09-03 Thread lgrcyanny
Github user lgrcyanny closed the pull request at: https://github.com/apache/spark/pull/19079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #19102: [SPARK-21859][CORE] Fix SparkFiles.get failed on ...

2017-09-03 Thread lgrcyanny
Github user lgrcyanny commented on a diff in the pull request: https://github.com/apache/spark/pull/19102#discussion_r136738525 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -496,7 +496,7 @@ object SparkSubmit extends CommandLineUtils with Logging

[GitHub] spark issue #19115: Update PairRDDFunctions.scala

2017-09-03 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19115 @awarrior please follow the [doc](https://spark.apache.org/contributing.html) to submit patch. You need to change the PR title like other PRs by adding JIRA id and component tag.

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136738263 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -241,11 +241,21 @@ query : ctes? queryNoWith

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136738182 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -178,11 +179,50 @@ class AstBuilder(conf: SQLConf)

[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms

2017-09-03 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17014 @WeichenXu123 @jkbradley I am curious about why `ml.Kmeans` is special that it needs a separate PR --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81369/ Test PASSed. ---

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #81369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81369/testReport)** for PR 18538 at commit

[GitHub] spark issue #19115: Update PairRDDFunctions.scala

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19115 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #19114: Update PairRDDFunctions.scala

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19114 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #19115: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
GitHub user awarrior opened a pull request: https://github.com/apache/spark/pull/19115 Update PairRDDFunctions.scala [https://issues.apache.org/jira/browse/SPARK-21882](url) You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request #17014: [SPARK-18608][ML] Fix double-caching in ML algori...

2017-09-03 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/17014#discussion_r136737427 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -304,16 +304,14 @@ class KMeans @Since("1.5.0") ( override

[GitHub] spark pull request #19114: Update PairRDDFunctions.scala

2017-09-03 Thread awarrior
GitHub user awarrior opened a pull request: https://github.com/apache/spark/pull/19114 Update PairRDDFunctions.scala [https://issues.apache.org/jira/browse/SPARK-21882](url) You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request #18957: [SPARK-21744][CORE] Add retry logic for new broad...

2017-09-03 Thread caneGuy
Github user caneGuy closed the pull request at: https://github.com/apache/spark/pull/18957 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #19111: [SPARK-21801][SPARKR][TEST][WIP] set random seed for pre...

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19111 I found `NaiveBayes` also possible to fail. Founded here #18538 . Hope this can works! https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81316/console ```

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #81369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81369/testReport)** for PR 18538 at commit

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-09-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18538 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #19079: [SPARK-21859][CORE] Fix SparkFiles.get failed on driver ...

2017-09-03 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19079 Please close this PR @lgrcyanny thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19113: [SPARK-20978][SQL] Bump up Univocity version to 2.5.4

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19113 **[Test build #81368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81368/testReport)** for PR 19113 at commit

[GitHub] spark pull request #19113: [SPARK-20978][SQL] Bump up Univocity version to 2...

2017-09-03 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19113 [SPARK-20978][SQL] Bump up Univocity version to 2.5.4 ## What changes were proposed in this pull request? There was a bug in Univocity Parser that causes the issue in SPARK-20978.

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81367/ Test FAILed. ---

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17862 **[Test build #81367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18975 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81366/ Test PASSed. ---

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/18975 There is a difference in Hive's semantics vs what this PR is doing. In Hive, the query execution writes to a staging location and the destination location is cleared + re-populated after the

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18975 **[Test build #81366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81366/testReport)** for PR 18975 at commit

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136727120 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136727065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -0,0 +1,81 @@ +/* + *

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726991 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -178,11 +179,50 @@ class AstBuilder(conf: SQLConf)

[GitHub] spark issue #19090: [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows ...

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19090 Thanks for thorough testing. Yea, looks fine. Will take a look few times more by myself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726921 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1509,4 +1509,86 @@ class SparkSqlAstBuilder(conf:

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136726866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1509,4 +1509,86 @@ class SparkSqlAstBuilder(conf:

[GitHub] spark pull request #19112: [SPARK-21901][SS] Define toString for StateOperat...

2017-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19112#discussion_r136726682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -177,11 +179,11 @@ class SourceProgress protected[sql](

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

2017-09-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18865 A use case might be: ``` echo '{"field": 1 {"field" 2} {"field": 3}' >/tmp/sample.json ``` ```scala val file = "/tmp/sample.json" val dfFromFile =

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81365/ Test PASSed. ---

[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...

2017-09-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81365/testReport)** for PR 19060 at commit

  1   2   3   >