[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 So will it be more practice to benchmark the case in which there are some constant and some not constant column vectors are used together? And compare it with the original case in which all columns

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 I see. My question is, as for example we create 2 column vectors, one is constant and one is not. Because we will not re-use the column vectors, so their constant flag is fixed and not changed. As

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13439 What I meant is that if in one process you have some invocation of the function that would hit the true branch, and some other invocation of the function that would hit the false branch, the

[GitHub] spark pull request #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread squito
Github user squito closed the pull request at: https://github.com/apache/spark/pull/13548 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13548 **[Test build #60150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60150/consoleFull)** for PR 13548 at commit

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13548 **[Test build #3069 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3069/consoleFull)** for PR 13548 at commit

[GitHub] spark pull request #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread squito
GitHub user squito reopened a pull request: https://github.com/apache/spark/pull/13548 [DO NOT MERGE] lots of blacklist testing making jenkins run the scheduler tests a lot You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-07 Thread NarineK
Github user NarineK commented on the issue: https://github.com/apache/spark/pull/12836 Thank you for the quick responses @sun-rui and @shivaram . Here is how the `dataframe.queyExection.toString` printout starts with: == Parsed Logical Plan ==

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13413 **[Test build #60149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60149/consoleFull)** for PR 13413 at commit

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-07 Thread techaddict
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13413 @maropu Thanks for the review, addressed all the comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-07 Thread techaddict
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/13413#discussion_r66192955 --- Diff: python/pyspark/sql/tests.py --- @@ -1481,17 +1481,7 @@ def test_list_functions(self): spark.sql("CREATE DATABASE some_db")

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 Besides, I just wrote this test according to other tests in `ColumnarBatchBenchmark` that benchmark on-heap, off-heap column vector access. I was thinking it might be enough. If not, any else need

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 hmm, but as the flag is set, I think it will not be changed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13526 could we "rewind"/undo the append for the key and change it to a map that inserts new values and key? so remove one append and replace it with another operation? --- If your project is set

[GitHub] spark pull request #13547: Update KafkaWordCount.scala

2016-06-07 Thread ShreyasFadnavis
Github user ShreyasFadnavis closed the pull request at: https://github.com/apache/spark/pull/13547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression

2016-06-07 Thread kamalcoursera
Github user kamalcoursera commented on the issue: https://github.com/apache/spark/pull/10706 Hi Davies, Could you please shed more light on the status of correlated but non-scalar subquery in Spark 2.0 release. Appreciate if you can summarize any other restrictions, if any.

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13439 I am not sure if you are really testing it correctly -- your benchmark is mostly likely just testing how well the CPU does branch prediction when the flag is always true or false. --- If your

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread koertkuipers
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/13526 the tricky part with that is that (ds: Dataset[(K, V)]).groupBy(_._1).mapValues(_._2) should return a KeyValueGroupedDataset[K, V] On Tue, Jun 7, 2016 at 8:22 PM, Wenchen Fan

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13549 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13549 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60148/ Test PASSed. ---

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13549 **[Test build #60148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60148/consoleFull)** for PR 13549 at commit

[GitHub] spark issue #13552: [SPARK-15813] Use past tense for the cancel container re...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13552 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #13552: [SPARK-15813] Use past tense for the cancel conta...

2016-06-07 Thread peterableda
GitHub user peterableda opened a pull request: https://github.com/apache/spark/pull/13552 [SPARK-15813] Use past tense for the cancel container request message ## What changes were proposed in this pull request? Use past tense for the cancel container request message as it is

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13543 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13543 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60146/ Test PASSed. ---

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13543 **[Test build #60146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60146/consoleFull)** for PR 13543 at commit

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13550 @marymwu this has been fixed in https://github.com/apache/spark/commit/09b3c56c91831b3e8d909521b8f3ffbce4eb0395. Could you close this PR? --- If your project is set up for it, you can

[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...

2016-06-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 What do you think `dropDuplicates`? 1. ds.select("_1", "_2", "_3").dropDuplicates(Seq("_1", "_2")).orderBy("_1", "_2").show() 2. ds.select("_1", "_2", "_3").dropDuplicates("_1",

[GitHub] spark pull request #13551: merge original repository

2016-06-07 Thread AllenShi
Github user AllenShi closed the pull request at: https://github.com/apache/spark/pull/13551 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #13551: merge original repository

2016-06-07 Thread AllenShi
GitHub user AllenShi opened a pull request: https://github.com/apache/spark/pull/13551 merge original repository ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13548 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13548 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60138/ Test FAILed. ---

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13548 **[Test build #60138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60138/consoleFull)** for PR 13548 at commit

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13550 It would be nicer if this PR follows https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and has a test. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13550 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13371 cc @rxin Can you also take a look of this? This is staying for a while too. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request #13550: SPARK-15755

2016-06-07 Thread marymwu
GitHub user marymwu opened a pull request: https://github.com/apache/spark/pull/13550 SPARK-15755 JIRA Issue: https://issues.apache.org/jira/browse/SPARK-15755 java.lang.NullPointerException when run spark 2.0 setting

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 @rxin hmm, I just think if we can improve it by just adding conditional check, it might be worth doing. For the performance hurt, this is benchmark for on-heap and off-heap column vectors

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-07 Thread zhonghaihua
Github user zhonghaihua commented on the issue: https://github.com/apache/spark/pull/12258 @vanzin my JIRA username is `iward`. Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13549 **[Test build #60148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60148/consoleFull)** for PR 13549 at commit

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/13549 @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #13549: Added support for sorting after streaming aggrega...

2016-06-07 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13549#discussion_r66182722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -123,27 +159,6 @@ object

[GitHub] spark pull request #13549: Added support for sorting after streaming aggrega...

2016-06-07 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/13549 Added support for sorting after streaming aggregation with complete mode ## What changes were proposed in this pull request? When the output mode is complete, then the output of a streaming

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60147/ Test PASSed. ---

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13544 **[Test build #60147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60147/consoleFull)** for PR 13544 at commit

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66182476 --- Diff: R/pkg/R/mllib.R --- @@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- function(x, ...) { invisible(x) }

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13544 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13439 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60141/ Test PASSed. ---

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13439 **[Test build #60141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60141/consoleFull)** for PR 13439 at commit

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13540 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60145/ Test PASSed. ---

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13300 @pjfanning we are now focusing on bug fixes and stability fixes rather than adding new features. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13540 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13540 **[Test build #60145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60145/consoleFull)** for PR 13540 at commit

[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13545 For API design it would be better to be very conservative, because we cannot remove APIs. There is always value in adding something, but there is also a cost to maintenance and user experience (too

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60144/ Test PASSed. ---

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13439 @viirya this is still a pretty major change for unclear benefits. There might be other more important things that need more eyes on... --- If your project is set up for it, you can reply to this

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13542 **[Test build #60144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60144/consoleFull)** for PR 13542 at commit

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13542 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...

2016-06-07 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13545#discussion_r66181659 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2262,6 +2275,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T]

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13439 Wouldn't this hurt performance even more due to the extra branch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13544 **[Test build #60147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60147/consoleFull)** for PR 13544 at commit

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13439 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60140/ Test PASSed. ---

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60143/ Test PASSed. ---

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13544 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13439 **[Test build #60140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60140/consoleFull)** for PR 13439 at commit

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13544 **[Test build #60143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60143/consoleFull)** for PR 13544 at commit

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13543 **[Test build #60146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60146/consoleFull)** for PR 13543 at commit

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13540 **[Test build #60145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60145/consoleFull)** for PR 13540 at commit

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13540 Thanks @BryanCutler @MechCoder @MLnick for the review. I just update the PR to make it as property. Regarding the pyspark docs, I think there's umbrella jira to parity scala mllib and pyspark

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13544 @rxin a small problem: in `HiveContext` there is a method `refreshTable` for refreshing metadata of Hive table. now using new SparkSession API with hive support, the method is

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13544 **[Test build #60143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60143/consoleFull)** for PR 13544 at commit

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13542 **[Test build #60144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60144/consoleFull)** for PR 13542 at commit

[GitHub] spark pull request #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] upd...

2016-06-07 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/12938#discussion_r66177599 --- Diff: python/pyspark/ml/classification.py --- @@ -183,7 +191,7 @@ def getThresholds(self): If :py:attr:`thresholds` is set, return its value.

[GitHub] spark issue #13189: [SPARK-14670][SQL] allow updating driver side sql metric...

2016-06-07 Thread yhuai
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13189 Seems it is fine to not have metrics when we use hiveResultString. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66177097 --- Diff: R/pkg/R/mllib.R --- @@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- function(x, ...) { invisible(x) }

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12938 **[Test build #60139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60139/consoleFull)** for PR 12938 at commit

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60139/ Test FAILed. ---

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12938 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13526 A possible approach maybe just keep the function given by `mapValues`, and apply it before calling the function given by `mapGroups`. By doing this, we at least won't make the performance worse,

[GitHub] spark issue #12824: [SPARK-15046] When running hive-thriftserver with yarn o...

2016-06-07 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/12824 @tgravescs the problem is this code in Client.scala: sparkConf.set(TOKEN_RENEWAL_INTERVAL, renewalInterval) That will write the value to the config with the `ms` suffix. I think

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 @rxin I've updated this to more simple approach that doesn't introduce new classes. The main change is to check if the current vector is constant or not and do suitable data access. Please take a

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13543 **[Test build #60142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60142/consoleFull)** for PR 13543 at commit

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13543 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60142/ Test FAILed. ---

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13543 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13534: [SPARK-15789][SQL] Allow reserved keywords in most place...

2016-06-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13534 LGTM, merging to master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13543 **[Test build #60142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60142/consoleFull)** for PR 13543 at commit

[GitHub] spark issue #13189: [SPARK-14670][SQL] allow updating driver side sql metric...

2016-06-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13189 `QueryExecution.hiveResultString` will call `SparkPlan.executeCollect` without setting an execution id. This method is only used in test, should we just stop reporting metrics for this case, or

[GitHub] spark pull request #13439: [SPARK-15701][SQL] Constant ColumnVector only nee...

2016-06-07 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13439#discussion_r66174085 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -70,26 +71,106 @@ public long nullsNativeAddress()

[GitHub] spark pull request #13534: [SPARK-15789][SQL] Allow reserved keywords in mos...

2016-06-07 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13534 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13439 **[Test build #60141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60141/consoleFull)** for PR 13439 at commit

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 The latest benchmark is run individually for each type of column vector. As stated in `ColumnarBatchBenchmark`, it is hard to reason about the JIT. If we put these 4 cases together to run benchmark,

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13439 **[Test build #60140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60140/consoleFull)** for PR 13439 at commit

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13439 Benchmark again on new change: Environment: Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz

[GitHub] spark issue #13495: [SPARK-15751][MLLIB][PYSPARK] Add generateAssociationRul...

2016-06-07 Thread zjffdu
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/13495 \cc @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #13530: [SPARK-14279][BUILD] Pick the spark version from pom

2016-06-07 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/13530 @dhruve could you close the PR? The bot doesn't do it automatically for backports. thx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-06-07 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/9207 So I guess I'm wondering what our plans for PMML look like - I'm happy to update this or go in the direction @MLnick suggested if thats what we want? --- If your project is set up for it, you can

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12938 **[Test build #60139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60139/consoleFull)** for PR 12938 at commit

[GitHub] spark pull request #13335: [SPARK-15580][SQL]Add ContinuousQueryInfo to make...

2016-06-07 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13335 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

  1   2   3   4   >