date:20160607

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
So will it be more practice to benchmark the case in which there are some 
constant and some not constant column vectors are used together? And compare it 
with the original case in which all columns are not with this extra branch 
(i.e., without this path)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
I see. My question is, as for example we create 2 column vectors, one is 
constant and one is not. Because we will not re-use the column vectors, so 
their constant flag is fixed and not changed. As they are two different 
instances, will the problem you said happen? When `getInt` of first vector 
(constant) is called and later  `getInt` of the second (not constant) is 
called, the performance will be down?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13439
  
What I meant is that if in one process you have some invocation of the 
function that would hit the true branch, and some other invocation of the 
function that would hit the false branch, the performance is going to be worse. 
Google "branch prediction" for more information.

Basically you can't measure the overhead of an extra branch in practice by 
running a benchmark in which the flag is either always false or always true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread squito

Github user squito closed the pull request at:

https://github.com/apache/spark/pull/13548


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13548
  
**[Test build #60150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60150/consoleFull)**
 for PR 13548 at commit 
[`41b7b79`](https://github.com/apache/spark/commit/41b7b79b366aa3ebbd5e7796e0d3f703250e51cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13548
  
**[Test build #3069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3069/consoleFull)**
 for PR 13548 at commit 
[`41b7b79`](https://github.com/apache/spark/commit/41b7b79b366aa3ebbd5e7796e0d3f703250e51cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread squito

GitHub user squito reopened a pull request:

https://github.com/apache/spark/pull/13548

[DO NOT MERGE] lots of blacklist testing

making jenkins run the scheduler tests a lot

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark blacklist_extra_tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13548


commit 0ee2f1fe487e4f7defb7a4bc53ab3d69d16c9173
Author: Imran Rashid 
Date:   2016-06-06T13:26:34Z

increase test timeouts

commit 270a038a20d8f1e2604636f00498fc4dcacc178a
Author: Imran Rashid 
Date:   2016-06-06T14:40:54Z

for delay scheduling to work, the mock backend has to periodically revive 
all offers

commit 4dc8711993c69fd852da92597473b6852eaa2e21
Author: Imran Rashid 
Date:   2016-06-06T14:51:37Z

cleanup state before notifying job waiter; stop things to clean up a bunch 
of threads

commit 7f4e9eb41e3276e4e91f8f262b4e3e25a28e8e7c
Author: Imran Rashid 
Date:   2016-06-07T22:14:13Z

repeat tests a lot to check for flakiness

commit f562c6658efca4c2fc505e4ea906eb78a3901a0d
Author: Imran Rashid 
Date:   2016-06-07T22:23:22Z

Merge branch 'master' into blacklist_extra_tests

commit 5bc48f23324a754e695535e036cf3759c0dfb040
Author: Imran Rashid 
Date:   2016-06-07T22:23:39Z

Revert "[SPARK-15783][CORE] still some flakiness in these blacklist tests 
so ignore for now"

This reverts commit 36d3dfa59a1ec0af6118e0667b80e9b7628e2cb6.

commit 41b7b79b366aa3ebbd5e7796e0d3f703250e51cf
Author: Imran Rashid 
Date:   2016-06-08T05:25:56Z

tone it down a bit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-07 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Thank you for the quick responses @sun-rui  and @shivaram .

Here is how the `dataframe.queyExection.toString` printout starts with:

== Parsed Logical Plan ==
'SerializeFromObject [if (assertnotnull(input[0, org.apache.spark.sql.Row, 
true], top level row object).isNullAt) null else 
getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top 
level row object), 0, a, IntegerType) AS a#13651, if (assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object).isNullAt) null else 
getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top 
level row object), 1, avg, DoubleType) AS avg#13652]
+- 'MapPartitionsInR [88, 10, 0, 0, 0, 2, 0, 3, 2, 3, 0, 2, 3, 0, 0, 0, 6, 
3, 0, 0, 4, 2, 0, 0, 0, 1, 0, 4, 0, 9, 0, 0, 0, 6, 115, 114, 99, 114, 101, 102, 
0, 0, 3, 13, 0, 0, 0, 8, 0, 0, 8, 84, 0, 0, 0, 5, 0, 0, 8, 86, 0, 0, 0, 5, 0, 
0, 0, 5, 0, 0, 0, 5, 0, 0, 8, 84, 0, 0, 8, 86, 0, 0, 4, 2, 0, 0, 0, 1, 0, 4, 0, 
9, 0, 0, 0, 7, 115, 114, 99, 102, 105, 108, 101, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 
0, -14, 0, 0, 0, -2, 0, 0, 0, 19, 0, 0, 0, 29, 0, 0, 0, -2, 0, 0, 0, -2, 0, 0, 
0, -2, 0, 0, 4 .  

It is very possible that the large array is the serialized R function.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13413
  
**[Test build #60149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60149/consoleFull)**
 for PR 13413 at commit 
[`5145e53`](https://github.com/apache/spark/commit/5145e533b9d722e1597fd820e2ada776314738ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-07 Thread techaddict

Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13413
  
@maropu Thanks for the review, addressed all the comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-07 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/13413#discussion_r66192955
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1481,17 +1481,7 @@ def test_list_functions(self):
 spark.sql("CREATE DATABASE some_db")
 functions = dict((f.name, f) for f in 
spark.catalog.listFunctions())
 functionsDefault = dict((f.name, f) for f in 
spark.catalog.listFunctions("default"))
-self.assertTrue(len(functions) > 200)
-self.assertTrue("+" in functions)
-self.assertTrue("like" in functions)
-self.assertTrue("month" in functions)
-self.assertTrue("to_unix_timestamp" in functions)
-self.assertTrue("current_database" in functions)
-self.assertEquals(functions["+"], Function(
-name="+",
-description=None,
-className="org.apache.spark.sql.catalyst.expressions.Add",
-isTemporary=True))
+self.assertEquals(len(functions), 0)
--- End diff --

there are already tested below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
Besides, I just wrote this test according to other tests in 
`ColumnarBatchBenchmark` that benchmark on-heap, off-heap column vector access. 
I was thinking it might be enough. If not, any else need to test further?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
hmm, but as the flag is set, I think it will not be changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13526
  
could we "rewind"/undo the append for the key and change it to a map that 
inserts new values and key? so remove one append and replace it with another 
operation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13547: Update KafkaWordCount.scala

2016-06-07 Thread ShreyasFadnavis

Github user ShreyasFadnavis closed the pull request at:

https://github.com/apache/spark/pull/13547


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression

2016-06-07 Thread kamalcoursera

Github user kamalcoursera commented on the issue:

https://github.com/apache/spark/pull/10706
  
Hi Davies,

Could you please shed more light on the status of correlated but non-scalar 
subquery in Spark 2.0 release. Appreciate if you can summarize any other 
restrictions, if any.


**Query:**

Select 
runon as runon,  
case 
 when key in (Select key from sqltesttable where group = 'vowels') then 
'vowels'
  else 'consonants'
end as group,
key as key,
someint as someint  
from sqltesttable;

**Error:**

Error in SQL statement: AnalysisException: Predicate sub-queries can only 
be used in a Filter: Project [runon#4031 AS runon#4026,CASE WHEN 
predicate-subquery#4027 [(key#4033 = key#4037)] THEN vowels ELSE consonants END 
AS group#4028,key#4033 AS key#4029,someint#4034 AS someint#4030]
:  +- SubqueryAlias predicate-subquery#4027 [(key#4033 = key#4037)]
: +- Project [key#4037]
:+- Filter (group#4036 = vowels)
:   +- MetastoreRelation default, sqltesttable, None
+- MetastoreRelation default, sqltesttable, None
;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13439
  
I am not sure if you are really testing it correctly -- your benchmark is 
mostly likely just testing how well the CPU does branch prediction when the 
flag is always true or false.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13526
  
the tricky part with that is that (ds: Dataset[(K,
V)]).groupBy(_._1).mapValues(_._2) should return a
KeyValueGroupedDataset[K, V]

On Tue, Jun 7, 2016 at 8:22 PM, Wenchen Fan 
wrote:

> A possible approach maybe just keep the function given by mapValues, and
> apply it before calling the function given by mapGroups. By doing this,
> we at least won't make the performance worse, as the underlying plan
> doesn't change.
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13549
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60148/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60148/consoleFull)**
 for PR 13549 at commit 
[`a287a9a`](https://github.com/apache/spark/commit/a287a9a6955b58609722947b3480bc5578d0b37d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13552: [SPARK-15813] Use past tense for the cancel container re...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13552
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13552: [SPARK-15813] Use past tense for the cancel conta...

2016-06-07 Thread peterableda

GitHub user peterableda opened a pull request:

https://github.com/apache/spark/pull/13552

[SPARK-15813] Use past tense for the cancel container request message

## What changes were proposed in this pull request?
Use past tense for the cancel container request message as it is logged 
after the updated new `Driver requested a total number of $requestedTotal 
executor(s)` message.

## How was this patch tested?
This is a trivial change




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peterableda/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13552


commit 4472e2d6db3f8496ba4164a5dd8380665fded135
Author: Peter Ableda 
Date:   2016-06-08T03:31:58Z

Use past tense for the cancel container request message




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13543
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13543
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60146/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13543
  
**[Test build #60146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60146/consoleFull)**
 for PR 13543 at commit 
[`adcaaab`](https://github.com/apache/spark/commit/adcaaab52f711e9c9b0ad2f7fe7db374c7399064).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13550
  
@marymwu this has been fixed in 
https://github.com/apache/spark/commit/09b3c56c91831b3e8d909521b8f3ffbce4eb0395.

Could you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...

2016-06-07 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
What do you think `dropDuplicates`?

1. ds.select("_1", "_2", "_3").dropDuplicates(Seq("_1", 
"_2")).orderBy("_1", "_2").show()
2. ds.select("_1", "_2", "_3").dropDuplicates("_1", "_2").orderBy("_1", 
"_2").show()

I think the second is more consistent with the others, `select` and 
`orderBy`.
Do you dislike this one too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13551: merge original repository

2016-06-07 Thread AllenShi

Github user AllenShi closed the pull request at:

https://github.com/apache/spark/pull/13551


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13551: merge original repository

2016-06-07 Thread AllenShi

GitHub user AllenShi opened a pull request:

https://github.com/apache/spark/pull/13551

merge original repository

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/AllenShi/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13551


commit 1181b6df4798690a7049218e41f54cfdb4027bfb
Author: Allen Shi 
Date:   2015-05-22T13:38:56Z

Merge pull request #1 from apache/master

merge author's change

commit 1192d8d7b41b5e54de3fa2a51a5000a7bb3f82a7
Author: Allen Shi 
Date:   2015-06-25T17:23:19Z

Merge pull request #2 from apache/master

merge remote repo into my fork

commit d4f3fe50dded40ffa8a30f2b1d4c4566e0dceaaf
Author: Allen Shi 
Date:   2015-07-04T15:33:36Z

Merge pull request #3 from apache/master

pull to local




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13548
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13548
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60138/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13548: [DO NOT MERGE] lots of blacklist testing

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13548
  
**[Test build #60138 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60138/consoleFull)**
 for PR 13548 at commit 
[`5bc48f2`](https://github.com/apache/spark/commit/5bc48f23324a754e695535e036cf3759c0dfb040).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13550
  
It would be nicer if this PR follows 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and has 
a test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13550: SPARK-15755

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13550
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13371: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13371
  
cc @rxin Can you also take a look of this? This is staying for a while too. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13550: SPARK-15755

2016-06-07 Thread marymwu

GitHub user marymwu opened a pull request:

https://github.com/apache/spark/pull/13550

SPARK-15755

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-15755

java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marymwu/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13550


commit a2f43c2f59b461a37947a5696198a4aa7339579d
Author: Dongyang DY2 Tang 
Date:   2016-06-08T01:37:13Z

fix bug: java.lang.NullPointerException when run spark 2.0 setting 
spark.serializer=org.apache.spark.serializer.KryoSerializer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
@rxin hmm, I just think if we can improve it by just adding conditional 
check, it might be worth doing.

For the performance hurt, this is benchmark for on-heap and off-heap column 
vectors before this patch:

On Heap:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap 39 /   47  1.1  
   946.8   1.0X

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap 41 /   46  1.0  
   995.5   1.0X

Off Heap:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap65 /   75  0.6  
  1598.2   1.0X

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap63 /   74  0.7  
  1532.5   1.0X 

Looks like the performance is not hurt obviously/significantly.

But if you still have concerns about this, we can close this.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-07 Thread zhonghaihua

Github user zhonghaihua commented on the issue:

https://github.com/apache/spark/pull/12258
  
@vanzin my JIRA username is `iward`. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13549
  
**[Test build #60148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60148/consoleFull)**
 for PR 13549 at commit 
[`a287a9a`](https://github.com/apache/spark/commit/a287a9a6955b58609722947b3480bc5578d0b37d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13549: Added support for sorting after streaming aggregation wi...

2016-06-07 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/13549
  
@marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13549: Added support for sorting after streaming aggrega...

2016-06-07 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13549#discussion_r66182722
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -123,27 +159,6 @@ object UnsupportedOperationChecker {
 case _ =>
   }
 }
-
-// Checks related to aggregations
-val aggregates = plan.collect { case a @ Aggregate(_, _, _) if 
a.isStreaming => a }
--- End diff --

This is moved above to make sure that outputmode related failure occur 
before other failures, as those failures are more fundamental.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13549: Added support for sorting after streaming aggrega...

2016-06-07 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/13549

Added support for sorting after streaming aggregation with complete mode

## What changes were proposed in this pull request?

When the output mode is complete, then the output of a streaming 
aggregation essentially will contain the complete aggregates every time. So 
this is not different from a batch dataset within an incremental execution. 
Other non-streaming operations should be supported on this dataset. In this PR, 
I am just adding support for sorting, as it is a common useful functionality. 
Support for other operations will come later.

## How was this patch tested?
Additional unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-15812

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13549.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13549


commit a287a9a6955b58609722947b3480bc5578d0b37d
Author: Tathagata Das 
Date:   2016-06-08T01:51:38Z

Added support for sorting after streaming aggregation with complete mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60147/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13544
  
**[Test build #60147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60147/consoleFull)**
 for PR 13544 at commit 
[`e86119e`](https://github.com/apache/spark/commit/e86119e5ea59d1cafc4f3fc1884b1ae5044adf8f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66182476
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

I think people have been trying to be consistent within the given R source 
file but total agree there are many different forms in used and it would be 
nice to have a single format for all R sources.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13544
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13439
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60141/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13439
  
**[Test build #60141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60141/consoleFull)**
 for PR 13439 at commit 
[`2226efc`](https://github.com/apache/spark/commit/2226efca5172e67a09f0972ef5ba110f7abce800).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13540
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60145/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13300: [SPARK-15463][SQL] support creating dataframe out of Dat...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13300
  
@pjfanning we are now focusing on bug fixes and stability fixes rather than 
adding new features. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13540
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13540
  
**[Test build #60145 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60145/consoleFull)**
 for PR 13540 at commit 
[`d1c00da`](https://github.com/apache/spark/commit/d1c00da30601ca70f9f6385d532b5834aedb3182).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13545
  
For API design it would be better to be very conservative, because we 
cannot remove APIs. There is always value in adding something, but there is 
also a cost to maintenance and user experience (too many methods showing up).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13542
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60144/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13439
  
@viirya this is still a pretty major change for unclear benefits. There 
might be other more important things that need more eyes on...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13542
  
**[Test build #60144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60144/consoleFull)**
 for PR 13542 at commit 
[`1fc8a30`](https://github.com/apache/spark/commit/1fc8a309c697a121ac076c58e8106a825187b926).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13542
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...

2016-06-07 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13545#discussion_r66181659
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2262,6 +2275,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Returns a new [[Dataset]] that contains only the unique rows from 
this [[Dataset]], considering
+   * only the subset of columns. This is an alias for 
`dropDuplicates(cols)`.
+   *
+   * Note that, equality checking is performed directly on the encoded 
representation of the data
+   * and thus is not affected by a custom `equals` function defined on `T`.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  @scala.annotation.varargs
+  def distinct(cols: String*): Dataset[T] = dropDuplicates(cols)
--- End diff --

let's not have this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13439
  
Wouldn't this hurt performance even more due to the extra branch?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13544
  
**[Test build #60147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60147/consoleFull)**
 for PR 13544 at commit 
[`e86119e`](https://github.com/apache/spark/commit/e86119e5ea59d1cafc4f3fc1884b1ae5044adf8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13439
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60140/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60143/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13544
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13439
  
**[Test build #60140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60140/consoleFull)**
 for PR 13439 at commit 
[`07ef523`](https://github.com/apache/spark/commit/07ef523af03809837d1b73c3c8db56504f244fab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13544
  
**[Test build #60143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60143/consoleFull)**
 for PR 13544 at commit 
[`20bff4b`](https://github.com/apache/spark/commit/20bff4b6ae551d4a367dcc57e56bb1df2fc63bad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13543
  
**[Test build #60146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60146/consoleFull)**
 for PR 13543 at commit 
[`adcaaab`](https://github.com/apache/spark/commit/adcaaab52f711e9c9b0ad2f7fe7db374c7399064).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13540
  
**[Test build #60145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60145/consoleFull)**
 for PR 13540 at commit 
[`d1c00da`](https://github.com/apache/spark/commit/d1c00da30601ca70f9f6385d532b5834aedb3182).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13540: [SPARK-15788][PYSPARK][ML] PySpark IDFModel missing "idf...

2016-06-07 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13540
  
Thanks @BryanCutler @MechCoder @MLnick for the review. I just update the PR 
to make it as property.  Regarding the pyspark docs, I think there's umbrella 
jira to parity scala mllib and pyspark mllib, we can create sub task there. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13544
  
@rxin 
a small problem:
in `HiveContext` there is a method `refreshTable` for refreshing metadata 
of Hive table.
now using new SparkSession API with hive support, the method is removed,
so the new SparkSession API don't need user to refreshing metadata of Hive 
table ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13544
  
**[Test build #60143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60143/consoleFull)**
 for PR 13544 at commit 
[`20bff4b`](https://github.com/apache/spark/commit/20bff4b6ae551d4a367dcc57e56bb1df2fc63bad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in the sp...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13542
  
**[Test build #60144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60144/consoleFull)**
 for PR 13542 at commit 
[`1fc8a30`](https://github.com/apache/spark/commit/1fc8a309c697a121ac076c58e8106a825187b926).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] upd...

2016-06-07 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/12938#discussion_r66177599
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -183,7 +191,7 @@ def getThresholds(self):
 If :py:attr:`thresholds` is set, return its value.
 Otherwise, if :py:attr:`threshold` is set, return the equivalent 
thresholds for binary
 classification: (1-threshold, threshold).
-If neither are set, throw an error.
--- End diff --

ok - we can revisit a change to behaviour in a separate PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13189: [SPARK-14670][SQL] allow updating driver side sql metric...

2016-06-07 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13189
  
Seems it is fine to not have metrics when we use hiveResultString.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66177097
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,11 +197,10 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
-#'
 #' Makes predictions from a generalized linear model produced by glm() or 
spark.glm(),
 #' similarly to R's predict().
 #'
+#' @title predict
--- End diff --

Shouldn't we follow a single convention for defining the title, either 
using ```@title``` or using the first sentence in the description?  @shivaram 
do you have a preference?  Looking at the Github history, I see lots of SparkR 
contributors do both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12938
  
**[Test build #60139 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60139/consoleFull)**
 for PR 12938 at commit 
[`7b634b6`](https://github.com/apache/spark/commit/7b634b63be315a5f5830b0a4190a42567e6d9c92).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  implicit class AttributeSeq(val attrs: Seq[Attribute]) extends 
Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60139/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12938
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13526: [SPARK-15780][SQL] Support mapValues on KeyValueGroupedD...

2016-06-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13526
  
A possible approach maybe just keep the function given by `mapValues`, and 
apply it before calling the function given by `mapGroups`. By doing this, we at 
least won't make the performance worse, as the underlying plan doesn't change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12824: [SPARK-15046] When running hive-thriftserver with yarn o...

2016-06-07 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/12824
  
@tgravescs the problem is this code in Client.scala:

sparkConf.set(TOKEN_RENEWAL_INTERVAL, renewalInterval)

That will write the value to the config with the `ms` suffix. I think it 
would be better to move the `TOKEN_RENEWAL_INTERVAL` definition to the core 
module's config list, so that the code that reads it can also just use the 
constant instead of using its own parsing code. It shouldn't need a default 
either, since that config is set when it's needed (which why the config entry 
is defined as optional).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Modify ColumnVector to reduce memory ...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
@rxin I've updated this to more simple approach that doesn't introduce new 
classes. The main change is to check if the current vector is constant or not 
and do suitable data access. Please take a look. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13543
  
**[Test build #60142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60142/consoleFull)**
 for PR 13543 at commit 
[`6f29181`](https://github.com/apache/spark/commit/6f29181d92141db2270a90c2315c0399060bc7d0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13543
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60142/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13543
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13534: [SPARK-15789][SQL] Allow reserved keywords in most place...

2016-06-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13534
  
LGTM, merging to master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13543: [SPARK-15806] [Documentation] update doc for SPARK_MASTE...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13543
  
**[Test build #60142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60142/consoleFull)**
 for PR 13543 at commit 
[`6f29181`](https://github.com/apache/spark/commit/6f29181d92141db2270a90c2315c0399060bc7d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13189: [SPARK-14670][SQL] allow updating driver side sql metric...

2016-06-07 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13189
  
`QueryExecution.hiveResultString` will call `SparkPlan.executeCollect` 
without setting an execution id. This method is only used in test, should we 
just stop reporting metrics for this case, or create execution id in 
`hiveResultString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13439: [SPARK-15701][SQL] Constant ColumnVector only nee...

2016-06-07 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13439#discussion_r66174085
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -70,26 +71,106 @@ public long nullsNativeAddress() {
   public void close() {
   }
 
+  // Spilt this function out since it is the slow path.
--- End diff --

Not changed. Just add back the codes moved to new classes in previous 
commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13534: [SPARK-15789][SQL] Allow reserved keywords in mos...

2016-06-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13534


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13439
  
**[Test build #60141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60141/consoleFull)**
 for PR 13439 at commit 
[`2226efc`](https://github.com/apache/spark/commit/2226efca5172e67a09f0972ef5ba110f7abce800).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
The latest benchmark is run individually for each type of column vector. As 
stated in `ColumnarBatchBenchmark`, it is hard to reason about the JIT. If we 
put these 4 cases together to run benchmark, the numbers seems not accurate and 
looks weird.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13439
  
**[Test build #60140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60140/consoleFull)**
 for PR 13439 at commit 
[`07ef523`](https://github.com/apache/spark/commit/07ef523af03809837d1b73c3c8db56504f244fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-07 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13439
  
Benchmark again on new change:

Environment:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 
3.19.0-25-generic
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz

OnHeap, Not Constant:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap, Not Constant   42 /   49  1.0  
  1020.3   1.0X

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap, Not Constant   41 /   46  1.0  
   989.0   1.0X

OnHeap, Constant:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap, Constant   28 /   33  1.5  
   674.2   1.0X   

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


On Heap, Constant   27 /   33  1.5  
   658.4   1.0X   

OffHeap, Not Constant:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap, Not Constant  63 /   73  0.6  
  1547.3   1.0X   

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap, Not Constant  68 /   74  0.6  
  1663.5   1.0X   

OffHeap, Constant:

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap, Constant  27 /   33  1.5  
   662.5   1.0X

ColumnVector R/W:Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative


Off Heap, Constant  27 /   33  1.5  
   657.1   1.0X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13495: [SPARK-15751][MLLIB][PYSPARK] Add generateAssociationRul...

2016-06-07 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13495
  
\cc @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13530: [SPARK-14279][BUILD] Pick the spark version from pom

2016-06-07 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/13530
  
@dhruve could you close the PR? The bot doesn't do it automatically for 
backports. thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...

2016-06-07 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/9207
  
So I guess I'm wondering what our plans for PMML look like - I'm happy to 
update this or go in the direction @MLnick suggested if thats what we want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12938: [SPARK-15162][SPARK-15164][PySpark][DOCS][ML] update som...

2016-06-07 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12938
  
**[Test build #60139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60139/consoleFull)**
 for PR 12938 at commit 
[`7b634b6`](https://github.com/apache/spark/commit/7b634b63be315a5f5830b0a4190a42567e6d9c92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13335: [SPARK-15580][SQL]Add ContinuousQueryInfo to make...

2016-06-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13335


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 320 matches

Mail list logo