date:20160710

[GitHub] spark pull request #14097: [MINOR][Streaming][Docs] Minor changes on kinesis...

2016-07-10 Thread keypointt

Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14097#discussion_r70186200
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -9,7 +9,7 @@ Here we explain how to configure Spark Streaming to receive 
data from Kinesis.
 
  Configuring Kinesis
 
-A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards per the following
+A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards following
--- End diff --

I see... I'll revert it back to original


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14104
  
I fixed my comment above, sorry. They _do_ block the calling thread, but 
actions triggered in different threads on one RDD do not block each other. The 
first sentence isn't correct. My suggestion is to omit the table, not all of 
the text you added. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14127
  
You'll need to update the deps files. More importantly what are the 
potential incompatible changes that could affect Spark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14097: [MINOR][Streaming][Docs] Minor changes on kinesis...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14097#discussion_r70186026
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -9,7 +9,7 @@ Here we explain how to configure Spark Streaming to receive 
data from Kinesis.
 
  Configuring Kinesis
 
-A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards per the following
+A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards following
--- End diff --

Yes, but that's only removing "following" and I don't think that's an 
improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14097: [MINOR][Streaming][Docs] Minor changes on kinesis...

2016-07-10 Thread keypointt

Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14097#discussion_r70185756
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -9,7 +9,7 @@ Here we explain how to configure Spark Streaming to receive 
data from Kinesis.
 
  Configuring Kinesis
 
-A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards per the following
+A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards following
--- End diff --

excuse me for my English...
kinesis stream can be set (...) per the guide?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14127
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14127
  
**[Test build #62060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62060/consoleFull)**
 for PR 14127 at commit 
[`ce31dda`](https://github.com/apache/spark/commit/ce31ddade28039aafceb7ab1516b8f79bd2948f1).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62060/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62059/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14127
  
**[Test build #62060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62060/consoleFull)**
 for PR 14127 at commit 
[`ce31dda`](https://github.com/apache/spark/commit/ce31ddade28039aafceb7ab1516b8f79bd2948f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14014
  
**[Test build #62059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62059/consoleFull)**
 for PR 14014 at commit 
[`ddde8c6`](https://github.com/apache/spark/commit/ddde8c6240e7f43f82cbcffa7e31ce445246817c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14127: [SPARK-15467][build] update janino version to 3.0...

2016-07-10 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/14127

[SPARK-15467][build] update janino version to 3.0.0

## What changes were proposed in this pull request?

This PR updates version of Janino compiler from 2.7.8 to 3.0.0. This 
version fixes [an Janino 
issue](https://github.com/janino-compiler/janino/issues/1) that fixes [an 
issue](https://issues.apache.org/jira/browse/SPARK-15467), which throws Java 
exception, in Spark.

## How was this patch tested?

Manually tested using a program in [the JIRA 
entry](https://issues.apache.org/jira/browse/SPARK-15467)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-15467

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14127


commit ce31ddade28039aafceb7ab1516b8f79bd2948f1
Author: Kazuaki Ishizaki 
Date:   2016-07-10T17:32:06Z

update janino version to 3.0.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-10 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen yeah i agree with you its same like non async actions but yeah we 
should list it because its more useful in real life application when single 
application running multiple jobs of different users. so we should list it so 
user will more familiar with it and also know which are the async actions. i 
think its helpful for users still if you think its not a valid point then just 
tell me i change pull request and add just reference for async actions but i 
want user should know because when i first time see it i m surprise that in 
whole documentation no where mention about async actions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-10 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen yeah you are right its not blocking calling thread but they execute 
sequentially right but in async action its return future so its running on 
different threads soo not run sequentially soo its fully non blocking example 
if i have action in that we have some transformation like map n inside map i 
have blocking code then how its work its block or not?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62058/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #62058 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62058/consoleFull)**
 for PR 14034 at commit 
[`01137dc`](https://github.com/apache/spark/commit/01137dcf739e75be31ba8836e342537f66971aa3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13704
  
@cloud-fan could you please review this? As you pointed, I also changed 
code related to `cast`. I added benchmark results, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14081: [SPARK-16403][Examples] Cleanup to remove unused imports...

2016-07-10 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/14081
  
Thanks for the review Sean!
On Jul 10, 2016 4:29 AM, "Sean Owen"  wrote:

> I'm going to merge this to master only if nobody has further comments.
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14014
  
**[Test build #62059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62059/consoleFull)**
 for PR 14014 at commit 
[`ddde8c6`](https://github.com/apache/spark/commit/ddde8c6240e7f43f82cbcffa7e31ce445246817c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62057/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62057 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62057/consoleFull)**
 for PR 13704 at commit 
[`677d81e`](https://github.com/apache/spark/commit/677d81e8d066cf74f7a86ae61c17dbbd2d74dde6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13778
  
yea, so whatever the data type is(python udt or normal sql type), at java 
side there is no difference, the data is converted to corrected format by 
pickler. That's why I think maybe it's possible to just pass the corresponding 
sql type of python udt to java side.

My only concern is, sometimes we use the schema of java dataframe as the 
schema at python side. If we don't pass python udt to java side, the udt 
information will be lost. @viirya do you mind give it a try? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14014
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14034
  
LGTM except some style comments, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183976
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
+
+val sizesLocalLimit = df.queryExecution.analyzed.collect { case l: 
LocalLimit =>
+  l.statistics.sizeInBytes
+}
+assert(sizesLocalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesLocalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesLocalLimit.head}")
+  }
+}
+  }
+
+  test("estimates the size of a limit 0 on outer join") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  val df1 = spark.table("test")
+  val df2 = spark.table("test").limit(0)
+  val df = df1.join(df2, Seq("k"), "left")
+
+  val sizes = df.queryExecution.analyzed.collect { case g: Join =>
+g.statistics.sizeInBytes
+  }
+
+  assert(sizes.size === 1, s"Size wrong for:\n ${df.queryExecution}")
--- End diff --

how about `number of Join nodes is wrong`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,20 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
--- End diff --

this may be more readable:
```
limitExpr match {
  case e if !e.foldable => fail(...)
  case e if e.dataType != IntegerType => fail("the limit expression must be 
int type, but got ${e.dataType.simpleString}")
  case e if e.eval() < 0 => fail(...)
  case e =>
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183882
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala ---
@@ -31,4 +33,46 @@ class StatisticsSuite extends QueryTest with 
SharedSQLContext {
   spark.sessionState.conf.autoBroadcastJoinThreshold)
   }
 
+  test("estimates the size of limit") {
+withTempTable("test") {
+  Seq(("one", 1), ("two", 2), ("three", 3), ("four", 4)).toDF("k", "v")
+.createOrReplaceTempView("test")
+  Seq((0, 1), (1, 24), (2, 48)).foreach { case (limit, expected) =>
+val df = sql(s"""SELECT * FROM test limit $limit""")
+
+val sizesGlobalLimit = df.queryExecution.analyzed.collect { case 
g: GlobalLimit =>
+  g.statistics.sizeInBytes
+}
+assert(sizesGlobalLimit.size === 1, s"Size wrong for:\n 
${df.queryExecution}")
+assert(sizesGlobalLimit.head === BigInt(expected),
+  s"expected exact size 24 for table 'test', got: 
${sizesGlobalLimit.head}")
--- End diff --

why we hardcode `24` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183835
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -660,18 +660,51 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   test("limit") {
 checkAnswer(
-  sql("SELECT * FROM testData LIMIT 10"),
+  sql("SELECT * FROM testData LIMIT 9 + 1"),
   testData.take(10).toSeq)
 
 checkAnswer(
-  sql("SELECT * FROM arrayData LIMIT 1"),
+  sql("SELECT * FROM arrayData LIMIT CAST(1 AS Integer)"),
   arrayData.collect().take(1).map(Row.fromTuple).toSeq)
 
 checkAnswer(
   sql("SELECT * FROM mapData LIMIT 1"),
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test("non-foldable expressions in LIMIT") {
+val e = intercept[AnalysisException] {
+  sql("SELECT * FROM testData LIMIT key > 3")
+}.getMessage
+assert(e.contains("The argument to the LIMIT clause must evaluate to a 
constant value. " +
+  "Limit:(testdata.`key` > 3)"))
+  }
+
+  test("Limit: unable to evaluate and cast expressions in limit clauses to 
Int") {
--- End diff --

We should also update the test name, we don't try to cast the limit 
expression to int type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #62058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62058/consoleFull)**
 for PR 14034 at commit 
[`01137dc`](https://github.com/apache/spark/commit/01137dcf739e75be31ba8836e342537f66971aa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14124
  
```
val rdd = spark.sparkContext.makeRDD(Seq("{\"a\" : 1}", "{\"a\" : null}"))
val schema = StructType(StructField("a", IntegerType, nullable = false) :: 
Nil)
val df = spark.read.schema(schema).json(rdd)
df.printSchema()
```

When user-specified schemas are not nullable and the data contains null, 
the null value in the result becomes `0`. This looks like a bug, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70183276
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
+s"""number_rows in limit clause cannot be cast to 
integer:\"$o\".""")
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70183088
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1493,7 +1495,62 @@ def test_infer_schema(self):
 self.assertTrue(m, self.sm1)
 else:
 raise ValueError("Expected a matrix but got type %r" % 
type(m))
+class MultiVariateGaussianTests(PySparkTestCase):
+def test_univariate(self) :
+x1=Vectors.dense([0.0])
+x2=Vectors.dense([1.5])
+
+mu = Vectors.dense([0.0])
+sigma1= DenseMatrix(1, 1, [1.0])
+dist1= MultivariateGaussian(mu, sigma1)
+
+self.assertAlmostEqual(dist1.pdf(x1),0.39894, 5)
+self.assertAlmostEqual(dist1.pdf(x2),0.12952, 5)
+
+sigma2= DenseMatrix(1, 1, [4.0])
+dist2= MultivariateGaussian(mu, sigma2)
+
+self.assertAlmostEqual(dist2.pdf(x1),0.19947, 5)
+self.assertAlmostEqual(dist2.pdf(x2),0.15057, 5)
+
+def test_multivariate(self) :
+x1=Vectors.dense([0.0, 0.0])
+x2=Vectors.dense([1.0, 1.0])
+
+mu = Vectors.dense([0.0, 0.0])
+sigma1= DenseMatrix(2, 2, [1.0, 0.0, 0.0, 1.0])
+dist1= MultivariateGaussian(mu, sigma1)
+
+self.assertAlmostEqual(dist1.pdf(x1),0.159154, 5)
+self.assertAlmostEqual(dist1.pdf(x2),0.05855, 5)
+
+sigma2= DenseMatrix(2, 2, [4.0, -1.0, -1.0, 2.0])
+dist2= MultivariateGaussian(mu, sigma2)
+
+self.assertAlmostEqual(dist2.pdf(x1),0.060155, 5)
+self.assertAlmostEqual(dist2.pdf(x2),0.0339717, 5)
+
+def test_multivariate_degenerate(self) :
+x1=Vectors.dense([0.0, 0.0])
+x2=Vectors.dense([1.0, 1.0])
+
+mu = Vectors.dense([0.0, 0.0])
+sigma1= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+dist1= MultivariateGaussian(mu, sigma1)
+
+self.assertAlmostEqual(dist1.pdf(x1),0.11254, 5)
+self.assertAlmostEqual(dist1.pdf(x2),0.068259, 5)
+
+def test_SPARK_11302(self) :
+x=Vectors.dense([629, 640, 1.7188, 618.19])
 
+mu = Vectors.dense([1055.3910505836575, 1070.489299610895, 
1.39020554474708, 1040.5907503867697])
+sigma= DenseMatrix(4, 4, [166769.00466698944, 169336.6705268059, 
12.820670788921873, 164243.93314092053,
+  169336.6705268059, 172041.5670061245, 21.62590020524533, 
166678.01075856484,
+  12.820670788921873, 21.62590020524533, 0.872524191943962, 
4.283255814732373,
+  164243.93314092053, 166678.01075856484, 4.283255814732373, 
161848.9196719207])
+dist= MultivariateGaussian(mu, sigma)
--- End diff --

Please format the code better (e.g. two elements in each row).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14111
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14111
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62056/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14111
  
**[Test build #62056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62056/consoleFull)**
 for PR 14111 at commit 
[`1d7bd3c`](https://github.com/apache/spark/commit/1d7bd3c50b57c517888ebbaea4d0db9eadcf78e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70183009
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
+>>> sigma= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+>>> x = Vectors.dense([1.0, 1.0])
+>>> m = MultivariateGaussian(mu, sigma)
+>>> m.pdf(x)
+0.0682586811486
+
+"""
+
+def __init__(self, mu, sigma):
+"""
+__init__(self, mu, sigma)
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+mu and sigma must be instances of DenseVector and DenseMatrix 
respectively.
+
+"""
+
+
+assert (isinstance(mu, DenseVector)), "mu must be a DenseVector 
Object"
+assert (isinstance(sigma, DenseMatrix)), "sigma must be a 
DenseMatrix Object"
+
+sigma_shape=sigma.toArray().shape
+assert (sigma_shape[0]==sigma_shape[1]) , "Covariance matrix must 
be square"
+assert (sigma_shape[0]==mu.size) , "Mean vector length must match 
covariance matrix size"
+
+# initialize eagerly precomputed attributes
+
+self.mu=mu
+
+# storing sigma as numpy.ndarray
+# furthur calculations are done ndarray only
+self.sigma=sigma.toArray()
+
+
+# initialize attributes to be computed later
+
+self.prec_U = None
+self.log_det_cov = None
+
+# compute distribution dependent constants
+self.__calculateCovarianceConstants()
+
+
+def pdf(self,x):
+"""
+Returns density of this multivariate Gaussian at a point given by 
Vector x
+"""
+assert (isinstance(x, Vector)), "x must be of Vector Type"
+return float(self.__pdf(x))
+
+def logpdf(self,x):
+"""
+Returns the log-density of this multivariate Gaussian at a point 
given by Vector x
+"""
+assert (isinstance(x, Vector)), "x must be of Vector Type"
+return float(self.__logpdf(x))
+
+def __calculateCovarianceConstants(self):
+"""
+Calculates distribution dependent components used for the density 
function
+based on scipy multivariate library
+refer 
https://github.com/scipy/scipy/blob/master/scipy/stats/_multivariate.py
+tested with precision of 9 significant digits(refer testcase)
+
+
+"""
+
+try :
+# pre-processing input parameters
+# throws ValueError with invalid inputs
+self.dim, self.mu, self.sigma = 
self.__process_parameters(None, self.mu, self.sigma)
+
+# return the eigenvalues and eigenvectors 
+# of a Hermitian or symmetric matrix.
+# s =  eigen values
+# u = eigen vectors
+s, u = np.linalg.eigh(self.sigma)
+
+#Singular values are considered to be non-zero only if 
+#they exceed a tolerance based on machine precision, matrix 
size, and
+#relation to the m

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70183002
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
+>>> sigma= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+>>> x = Vectors.dense([1.0, 1.0])
+>>> m = MultivariateGaussian(mu, sigma)
+>>> m.pdf(x)
+0.0682586811486
+
+"""
+
+def __init__(self, mu, sigma):
+"""
+__init__(self, mu, sigma)
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+mu and sigma must be instances of DenseVector and DenseMatrix 
respectively.
+
+"""
+
+
+assert (isinstance(mu, DenseVector)), "mu must be a DenseVector 
Object"
+assert (isinstance(sigma, DenseMatrix)), "sigma must be a 
DenseMatrix Object"
+
+sigma_shape=sigma.toArray().shape
+assert (sigma_shape[0]==sigma_shape[1]) , "Covariance matrix must 
be square"
+assert (sigma_shape[0]==mu.size) , "Mean vector length must match 
covariance matrix size"
+
+# initialize eagerly precomputed attributes
+
+self.mu=mu
--- End diff --

Insert one space before and after the `=`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70182998
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
+>>> sigma= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+>>> x = Vectors.dense([1.0, 1.0])
+>>> m = MultivariateGaussian(mu, sigma)
+>>> m.pdf(x)
+0.0682586811486
+
+"""
+
+def __init__(self, mu, sigma):
+"""
+__init__(self, mu, sigma)
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+mu and sigma must be instances of DenseVector and DenseMatrix 
respectively.
+
+"""
+
+
+assert (isinstance(mu, DenseVector)), "mu must be a DenseVector 
Object"
--- End diff --

Two many blank lines above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70182979
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
--- End diff --

`Vectors` is not imported in this module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-07-10 Thread lins05

Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13248#discussion_r70182969
  
--- Diff: python/pyspark/ml/stat/distribution.py ---
@@ -0,0 +1,267 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.ml.linalg import DenseVector, DenseMatrix, Vector
+import numpy as np
+
+__all__ = ['MultivariateGaussian']
+
+
+
+class MultivariateGaussian():
+"""
+This class provides basic functionality for a Multivariate Gaussian 
(Normal) Distribution. In
+ the event that the covariance matrix is singular, the density will be 
computed in a
+reduced dimensional subspace under which the distribution is supported.
+(see 
[[http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Degenerate_case]])
+
+mu The mean vector of the distribution
+sigma The covariance matrix of the distribution
+
+
+>>> mu = Vectors.dense([0.0, 0.0])
+>>> sigma= DenseMatrix(2, 2, [1.0, 1.0, 1.0, 1.0])
+>>> x = Vectors.dense([1.0, 1.0])
+>>> m = MultivariateGaussian(mu, sigma)
+>>> m.pdf(x)
+0.0682586811486
--- End diff --

To run the doctest, I think we need to call the `doctest.testmod()` 
explicitly like other modules do. Check 
[mllib/util.py](https://github.com/apache/spark/blob/v2.0.0-rc2/python/pyspark/mllib/util.py#L509-L528).

Also need to add this module to the `python_test_goals` to pyspark_ml 
module object of 
[dev/sparktestsupport/modules.py](https://github.com/apache/spark/blob/v2.0.0-rc2/dev/sparktestsupport/modules.py#L401-L411)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62057/consoleFull)**
 for PR 13704 at commit 
[`677d81e`](https://github.com/apache/spark/commit/677d81e8d066cf74f7a86ae61c17dbbd2d74dde6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast null value...

2016-07-10 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
@HyukjinKwon hi. The explanation above intends to help reviewers better 
understand how we introduced the regression. Regarding whether `StringType` 
should be ignored or not, I don't have strong preference :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14118: [SPARK-16462][SPARK-16460][SQL] Make CSV cast nul...

2016-07-10 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14118#discussion_r70182426
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -238,59 +238,55 @@ private[csv] object CSVTypeCast {
   nullable: Boolean = true,
   options: CSVOptions = CSVOptions()): Any = {
 
-castType match {
-  case _: ByteType => if (datum == options.nullValue && nullable) null 
else datum.toByte
-  case _: ShortType => if (datum == options.nullValue && nullable) 
null else datum.toShort
-  case _: IntegerType => if (datum == options.nullValue && nullable) 
null else datum.toInt
-  case _: LongType => if (datum == options.nullValue && nullable) null 
else datum.toLong
-  case _: FloatType =>
-if (datum == options.nullValue && nullable) {
-  null
-} else if (datum == options.nanValue) {
-  Float.NaN
-} else if (datum == options.negativeInf) {
-  Float.NegativeInfinity
-} else if (datum == options.positiveInf) {
-  Float.PositiveInfinity
-} else {
-  Try(datum.toFloat)
-
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).floatValue())
-}
-  case _: DoubleType =>
-if (datum == options.nullValue && nullable) {
-  null
-} else if (datum == options.nanValue) {
-  Double.NaN
-} else if (datum == options.negativeInf) {
-  Double.NegativeInfinity
-} else if (datum == options.positiveInf) {
-  Double.PositiveInfinity
-} else {
-  Try(datum.toDouble)
-
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).doubleValue())
-}
-  case _: BooleanType => datum.toBoolean
-  case dt: DecimalType =>
-if (datum == options.nullValue && nullable) {
-  null
-} else {
+if (datum == options.nullValue && nullable && 
(!castType.isInstanceOf[StringType])) {
--- End diff --

> ... why StringType is excluded?

Hi @HyukjinKwon, it's just to keep consistency with we did in `spark-csv` 
for 1.6. Actually I don't have strong preference here -- maybe we should not 
ignore `StringType`? @rxin could you share some thoughts? Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62054/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
 for PR 14124 at commit 
[`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14111: [SPARK-16456][SQL] Reuse the uncorrelated scalar subquer...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14111
  
**[Test build #62056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62056/consoleFull)**
 for PR 14111 at commit 
[`1d7bd3c`](https://github.com/apache/spark/commit/1d7bd3c50b57c517888ebbaea4d0db9eadcf78e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14122: [SPARK-16470][ML][Optimizer] Check linear regress...

2016-07-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14122#discussion_r70181416
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -327,6 +327,11 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 throw new SparkException(msg)
   }
 
+  if (!state.actuallyConverged) {
+logWarning("LinearRegression training fininshed but the result is 
not converged, " +
--- End diff --

@srowen Done. thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14122: [SPARK-16470][ML][Optimizer] Check linear regression tra...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14122
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14122: [SPARK-16470][ML][Optimizer] Check linear regression tra...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14122: [SPARK-16470][ML][Optimizer] Check linear regression tra...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14122
  
**[Test build #62055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62055/consoleFull)**
 for PR 14122 at commit 
[`36dadb2`](https://github.com/apache/spark/commit/36dadb2a51f0e20d0999d46f06d9fc05d6e0e5ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14068: [SPARK-16469] enhanced simulate multiply

2016-07-10 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/14068
  
LGTM, just a minor nit! Thanks for this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14068: [SPARK-16469] enhanced simulate multiply

2016-07-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/14068#discussion_r70181303
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -426,16 +426,19 @@ class BlockMatrix @Since("1.3.0") (
   partitioner: GridPartitioner): (BlockDestinations, 
BlockDestinations) = {
 val leftMatrix = blockInfo.keys.collect() // blockInfo should already 
be cached
 val rightMatrix = other.blocks.keys.collect()
+
+val rightCounterpartsHelper = 
rightMatrix.groupBy(_._1).mapValues(_.map(_._2))
 val leftDestinations = leftMatrix.map { case (rowIndex, colIndex) =>
-  val rightCounterparts = rightMatrix.filter(_._1 == colIndex)
-  val partitions = rightCounterparts.map(b => 
partitioner.getPartition((rowIndex, b._2)))
-  ((rowIndex, colIndex), partitions.toSet)
+  ((rowIndex, colIndex), rightCounterpartsHelper.getOrElse(colIndex, 
Array()).map(b =>
+partitioner.getPartition((rowIndex, b))).toSet)
 }.toMap
+
+val leftCounterpartsHelper = 
leftMatrix.groupBy(_._2).mapValues(_.map(_._1))
 val rightDestinations = rightMatrix.map { case (rowIndex, colIndex) =>
-  val leftCounterparts = leftMatrix.filter(_._2 == rowIndex)
-  val partitions = leftCounterparts.map(b => 
partitioner.getPartition((b._1, colIndex)))
-  ((rowIndex, colIndex), partitions.toSet)
+  ((rowIndex, colIndex), leftCounterpartsHelper.getOrElse(rowIndex, 
Array()).map(b =>
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14068: [SPARK-16469] enhanced simulate multiply

2016-07-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/14068#discussion_r70181291
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -426,16 +426,19 @@ class BlockMatrix @Since("1.3.0") (
   partitioner: GridPartitioner): (BlockDestinations, 
BlockDestinations) = {
 val leftMatrix = blockInfo.keys.collect() // blockInfo should already 
be cached
 val rightMatrix = other.blocks.keys.collect()
+
+val rightCounterpartsHelper = 
rightMatrix.groupBy(_._1).mapValues(_.map(_._2))
 val leftDestinations = leftMatrix.map { case (rowIndex, colIndex) =>
-  val rightCounterparts = rightMatrix.filter(_._1 == colIndex)
-  val partitions = rightCounterparts.map(b => 
partitioner.getPartition((rowIndex, b._2)))
-  ((rowIndex, colIndex), partitions.toSet)
+  ((rowIndex, colIndex), rightCounterpartsHelper.getOrElse(colIndex, 
Array()).map(b =>
--- End diff --

nit: for readability could you assign this to a variable instead of 
inlining it?
In addition, for multi-line expressions
```scala
.map { b =>
  blah
}
```
is more preferred


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14122: [SPARK-16470][ML][Optimizer] Check linear regression tra...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14122
  
**[Test build #62055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62055/consoleFull)**
 for PR 14122 at commit 
[`36dadb2`](https://github.com/apache/spark/commit/36dadb2a51f0e20d0999d46f06d9fc05d6e0e5ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...

2016-07-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13778
  
Oh, I mean they should be serialized/deserialized by pickler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62054/consoleFull)**
 for PR 14124 at commit 
[`3980681`](https://github.com/apache/spark/commit/39806815fbbef2aafb32d3173c23386fcfbc5edf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62053/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
 for PR 14124 at commit 
[`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14122: [SPARK-16470][ML][Optimizer] Check linear regress...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14122#discussion_r70180443
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -327,6 +327,11 @@ class LinearRegression @Since("1.3.0") 
(@Since("1.3.0") override val uid: String
 throw new SparkException(msg)
   }
 
+  if (!state.actuallyConverged) {
+logWarning("LinearRegression training fininshed but the result is 
not converged, " +
--- End diff --

LGTM though you could use string interpolation here.  Maybe slightly better 
as "Linear regression training fininshed but the result is not converged 
because: ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14068: [SPARK-16469] enhanced simulate multiply

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14068
  
LGTM but CC @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14104: [SPARK-16438] Add Asynchronous Actions documentat...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14104#discussion_r70180403
  
--- Diff: docs/programming-guide.md ---
@@ -1099,6 +1099,34 @@ for details.
 
 
 
+ Asynchronous Actions
+Spark provide asynchronous actions to execute two or more actions 
concurrently, these actions are execute asynchronously without blocking each 
other. 
--- End diff --

"without blocking each other": actions never block each other anyway. They 
don't block the calling thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14104
  
Mostly this just duplicates the non-async documentation. I think it's worth 
mentioning the existence of the async methods with a pointer to the API docs, 
but, I don't think this table adds value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14104: [SPARK-16438] Add Asynchronous Actions documentat...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14104#discussion_r70180404
  
--- Diff: docs/programming-guide.md ---
@@ -1099,6 +1099,34 @@ for details.
 
 
 
+ Asynchronous Actions
+Spark provide asynchronous actions to execute two or more actions 
concurrently, these actions are execute asynchronously without blocking each 
other. 
+The following table lists some of the asynchronous actions supported by 
Spark. Refer to the RDD API doc 
([Scala](api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions),[Java](api/java/org/apache/spark/rdd/AsyncRDDActions.html))
+
+
+Asynchronous ActionMeaning
+
+   collectAsync() 
+   Returns a future for retrieving all the elements of the dataset as 
an array at the driver program. This is usually useful after a filter or other 
operation that returns a sufficiently small subset of the data. 
+
+
+   countAsync() 
+   Returns a future for counting the number of elements in the RDD. 

+
+
+   foreachAsync(func) 
+   Applies a function f to all elements of this RDD. 
--- End diff --

This isn't specific to the async version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14054: [SPARK-16226] [SQL] Weaken JDBC isolation level to avoid...

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14054
  
I'm going to merge this to `master` only if I don't hear back soon. however 
if anyone feels moderately strongly about adding a config param I can do that. 
It could even be done later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14081: [SPARK-16403][Examples] Cleanup to remove unused imports...

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14081
  
I'm going to merge this to master only if nobody has further comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14097: [MINOR][Streaming][Docs] Minor changes on kinesis...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14097#discussion_r70180299
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -9,7 +9,7 @@ Here we explain how to configure Spark Streaming to receive 
data from Kinesis.
 
  Configuring Kinesis
 
-A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards per the following
+A Kinesis stream can be set up at one of the valid Kinesis endpoints with 
1 or more shards following
--- End diff --

The sentence was correct before but now is ungrammatical


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14115: [SPARK-16459][SQL] Prevent dropping current datab...

2016-07-10 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14115#discussion_r70180202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -49,6 +49,8 @@ class SessionCatalog(
 hadoopConf: Configuration) extends Logging {
   import CatalogTypes.TablePartitionSpec
 
+  val DEFAULT_DATABASE = "default"
--- End diff --

Nit: this should be in an `object SessionCatalog`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14126: 123update

2016-07-10 Thread ystop

Github user ystop commented on the issue:

https://github.com/apache/spark/pull/14126
  
l want delete it. how can l do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14126: 123update

2016-07-10 Thread ystop

Github user ystop closed the pull request at:

https://github.com/apache/spark/pull/14126


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14126: 123update

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14126
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14126: 123update

2016-07-10 Thread ystop

GitHub user ystop opened a pull request:

https://github.com/apache/spark/pull/14126

123update

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ystop/spark my_change

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14126


commit d139d4b67861dce393fa32216f80c10e7498e955
Author: ystop 
Date:   2016-07-10T09:09:43Z

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14086
  
Hm, it seems like it should be another `SaveMode` if anything. Its 
semantics would be identical to `Overwrite`, I guess, for other non-JDBC 
sources. CC @yhuai ?

However... is it not possible to just TRUNCATE when the schema hasn't 
changed, and DROP/CREATE when it has? that seems like the best solution if it's 
feasible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14013
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14013
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62052/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14013
  
**[Test build #62052 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62052/consoleFull)**
 for PR 14013 at commit 
[`b942dca`](https://github.com/apache/spark/commit/b942dcaca9cfa52f57e8f5034ca83f74c33da14f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62053/consoleFull)**
 for PR 14124 at commit 
[`adae8de`](https://github.com/apache/spark/commit/adae8de39ffcec8ca3785c1123da900a457691c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62051/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14121: [MINOR][ML] update comment where is inconsistent with co...

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14121
  
OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14014
  
**[Test build #62051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62051/consoleFull)**
 for PR 14014 at commit 
[`ddde8c6`](https://github.com/apache/spark/commit/ddde8c6240e7f43f82cbcffa7e31ce445246817c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14125: New Feature!

2016-07-10 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14125
  
There's enough wrong with this PR that you need to close it, then read 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14125: New Feature!

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14125
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14125: New Feature!

2016-07-10 Thread tianxuan911

GitHub user tianxuan911 opened a pull request:

https://github.com/apache/spark/pull/14125

New Feature!

Configure file named slaves in directory conf support ssh  param custom!

Before this commit,only username and ip.Such as
locahost
name@localhost

After, looks like in shell

-p 23229 name@localhost

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tianxuan911/spark 
1.6-newfeature-slaves-support-ssh-param-custom

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14125


commit a007d8c9f161f7c95516d2be8cf19a9e14a8b4fa
Author: ç°æ 
Date:   2016-07-10T09:48:59Z

New Feature!
Configure file named slaves in directory conf support ssh  param custom!

Before this commit,only username and ip.Such as
locahost
name@localhost

After, looks like in shell

-p 23229 name@localhost




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62050/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62050 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62050/consoleFull)**
 for PR 13704 at commit 
[`43ced15`](https://github.com/apache/spark/commit/43ced1576f31a202c1c514c4bfe28e0ad0d4c964).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PrimitiveArrayBenchmark extends BenchmarkBase `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62049/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #62049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62049/consoleFull)**
 for PR 14124 at commit 
[`a917678`](https://github.com/apache/spark/commit/a917678886779f236b1feffa23a11529ce67e97c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62048/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...

2016-07-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14014
  
**[Test build #62048 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62048/consoleFull)**
 for PR 14014 at commit 
[`35b85ae`](https://github.com/apache/spark/commit/35b85ae16d8f0b777a954930ecb95ef6a4ec828d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14123
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62047/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14123: [SPARK-16471] [SQL] Remove Hive-specific CreateHiveTable...

2016-07-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14123
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r70178966
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,21 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+if (!limitExpr.foldable) {
+  failAnalysis(
+"The argument to the LIMIT clause must evaluate to a constant 
value. " +
+s"Limit:${limitExpr.sql}")
+}
+limitExpr.eval() match {
+  case o: Int if o >= 0 => // OK
+  case o: Int => failAnalysis(
+s"number_rows in limit clause must be equal to or greater than 0. 
number_rows:$o")
+  case o => failAnalysis(
+s"""number_rows in limit clause cannot be cast to 
integer:\"$o\".""")
--- End diff --

`cannot be cast to integer` -> `must be integer`? e.g. byte is castable to 
int.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

201 - 300 of 328 matches

Mail list logo