date:20161005

[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15358
  
**[Test build #66370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66370/consoleFull)**
 for PR 15358 at commit 
[`d3cc470`](https://github.com/apache/spark/commit/d3cc47025df10012940f281af5db94c90fc83917).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15358
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66370/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15358
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15258
  
**[Test build #66374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66374/consoleFull)**
 for PR 15258 at commit 
[`01cb666`](https://github.com/apache/spark/commit/01cb6664ea9ea2da7bc861432c19e3ac14ede524).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15258
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66374/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15258
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14828: [SPARK-17258][SQL] Parse scientific decimal liter...

2016-10-05 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14828#discussion_r81912268
  
--- Diff: sql/core/src/test/resources/sql-tests/results/literals.sql.out ---
@@ -197,9 +197,9 @@ select .e3
 -- !query 20
 select 1E309, -1E309
 -- !query 20 schema
-struct
+struct<1E+309:decimal(1,-309),-1E+309:decimal(1,-309)>
--- End diff --

the scale of decimal type can be less than 0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15357
  
**[Test build #66371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66371/consoleFull)**
 for PR 15357 at commit 
[`45e46a9`](https://github.com/apache/spark/commit/45e46a969919c3fb184a3678764fa094054d223a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15357
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66371/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15357
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #66380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66380/consoleFull)**
 for PR 9 at commit 
[`95bf12f`](https://github.com/apache/spark/commit/95bf12f352f085587ff7772ffcd4ccdf9f7f084b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66372 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66372/consoleFull)**
 for PR 15354 at commit 
[`5f185e3`](https://github.com/apache/spark/commit/5f185e36aba86865e2cae772351e90fb8bec6492).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66372/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMean...

2016-10-05 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15342#discussion_r81913935
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -558,6 +475,7 @@ object KMeans {
* Trains a k-means model using specified parameters and the default 
values for unspecified.
*/
   @Since("0.8.0")
+  @deprecated("Use train method without 'runs'", "2.1.0")
--- End diff --

Yes, though there's no alternative to those with the same arguments. We 
could add another overload and deprecate the others. I'm OK with that too, just 
felt a little gross to add yet more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15342
  
That's right. `k` seems like the requested number of centroids, which may 
not match the actual number in corner cases. What about just documenting that 
more?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15342
  
Otherwise updated to reflect all the other review comments, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-10-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14531
  
IIUC, before 2.0, we use hive to run CREATE TABLE LIKE, and hive doesn't 
include the table properties. So this PR actually fixed a regression in 2.0, I 
think we should keep this behaviour.

cc @rxin @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15354
  
interesting. It's unicode only for Python 3.4? I will look into this deeper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15342
  
**[Test build #66381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66381/consoleFull)**
 for PR 15342 at commit 
[`ebbb852`](https://github.com/apache/spark/commit/ebbb852ad69f3a0a9c9facbd4a55ecce0eb86df0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14087
  
**[Test build #66376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66376/consoleFull)**
 for PR 14087 at commit 
[`ecdf653`](https://github.com/apache/spark/commit/ecdf6539c8c19da3f019601309993fde634d6c22).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66376/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14087
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14151
  
**[Test build #66375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66375/consoleFull)**
 for PR 14151 at commit 
[`e263b15`](https://github.com/apache/spark/commit/e263b1508a77424b371a0796ea4f9c05bc1c0121).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HadoopFileWholeTextReader(file: PartitionedFile, conf: 
Configuration)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14151
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14151
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66375/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-10-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14531
  
@cloud-fan are you sure Hive doesn't copy the table properties? How would 
@sitalkedia's case work if it does not copy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15072
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-10-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15358
  
Can you also put after the fix behavior in the description?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15072
  
**[Test build #66382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66382/consoleFull)**
 for PR 15072 at commit 
[`e27fe51`](https://github.com/apache/spark/commit/e27fe5187818e34ed6b8279327f5dab90b663ec7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE an...

2016-10-05 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15358#discussion_r81918715
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -703,4 +703,17 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * Masking credentials in the option lists. For example, in the sql plan 
explain output
+   * for JDBC data sources.
+   */
+  def maskCredentials(options: Map[String, String]): Map[String, String] = 
{
+options.map {
--- End diff --

we should consolidate this code path with the one above. Otherwise the two 
will diverge over time. Perhaps add them to some utils class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15262
  
**[Test build #66373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66373/consoleFull)**
 for PR 15262 at commit 
[`3a1cd22`](https://github.com/apache/spark/commit/3a1cd221402f4ade6b496996b81665ad19ce3e86).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class AddTextHDFSFileData(content: String, src: Path, tmp: 
File, fs: FileSystem)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15262
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66373/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15350: [SPARK-17778][Tests]Mock SparkContext to reduce m...

2016-10-05 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15350#discussion_r81919017
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -107,7 +107,8 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 rpcEnv = RpcEnv.create("test", "localhost", 0, conf, securityMgr)
 conf.set("spark.driver.port", rpcEnv.address.port.toString)
 
-sc = new SparkContext("local", "test", conf)
+sc = mock(classOf[SparkContext])
--- End diff --

might want to comment on the reason for this change too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15072
  
I don't have a better idea either, so this LGTM

cc @liancheng do you have any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL][WIP] Add `to_json` supporting to conv...

2016-10-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15354
  
Ah, this was a known issue 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66383/consoleFull)**
 for PR 15354 at commit 
[`26fc01f`](https://github.com/apache/spark/commit/26fc01f5e8373133fdcf0dba951a6061fd65492b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-10-05 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14531
  
I suspect @sitalkedia built his application based on 2.0 and got broken in 
2.0.1, @sitalkedia is that true?

@gatorsmile can you double check that Hive doesn't copy the table 
properties?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15360
  
**[Test build #66378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66378/consoleFull)**
 for PR 15360 at commit 
[`0ad7c88`](https://github.com/apache/spark/commit/0ad7c8837d0ef860e398349652f7589870358c14).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15360
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15360
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66378/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15350: [SPARK-17778][Tests]Mock SparkContext to reduce memory u...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15350
  
Does this not defeat some of the purpose of testing, if it isn't using an 
actual SparkContext?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #66380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66380/consoleFull)**
 for PR 9 at commit 
[`95bf12f`](https://github.com/apache/spark/commit/95bf12f352f085587ff7772ffcd4ccdf9f7f084b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66380/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15351
  
**[Test build #66379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66379/consoleFull)**
 for PR 15351 at commit 
[`5ced339`](https://github.com/apache/spark/commit/5ced339c1d6fd64c3bdfcb2af3522dc88ede8d85).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15351
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66379/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15342
  
**[Test build #66381 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66381/consoleFull)**
 for PR 15342 at commit 
[`ebbb852`](https://github.com/apache/spark/commit/ebbb852ad69f3a0a9c9facbd4a55ecce0eb86df0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15351
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15342
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66381/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15342: [SPARK-11560] [SPARK-3261] [MLLIB] Optimize KMeans imple...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15342
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15353
  
Hm, it does seem to me like this should be fixed at the source. I'm not 
sure when it would be desirable to render a newline as literally `\n` -- where 
is this escaped?

The rest of the changes are not bad but not related. I think touching up 
surrounding code is OK but this is touching unrelated code. Neutral on that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/15258
  
retest, this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/15258
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15258
  
**[Test build #66384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66384/consoleFull)**
 for PR 15258 at commit 
[`01cb666`](https://github.com/apache/spark/commit/01cb6664ea9ea2da7bc861432c19e3ac14ede524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15355
  
Good point, I don't see separate config for the 0.10 module in 
`dev/sparktestsupport/modules.py` though there is for 0.8 module. That could be 
a better way forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-05 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15324
  
I'm against to merge ```theta``` and ```sigma``` together, it should be two 
individual variables of model. To the question that whether GaussianNB should 
be part of the ```NaiveBayes``` estimator, I'm ambivalent between these two 
opinions but I'm more prefer to add another variable ```sigma``` to the model. 
When it's a gaussian NB model, both ```theta``` and ```sigma``` are effective; 
otherwise, only ```theta``` is valid. We keep private constructor for all 
models, so add another variable is not a big issue. I'm open to hear others' 
thoughts. cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.

2016-10-05 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15359
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15332: [SPARK-10364][SQL] Support Parquet logical type T...

2016-10-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15332#discussion_r81931879
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 ---
@@ -362,7 +363,15 @@ private void readLongBatch(int rowId, int num, 
ColumnVector column) throws IOExc
 if (column.dataType() == DataTypes.LongType ||
 DecimalType.is64BitDecimalType(column.dataType())) {
   defColumn.readLongs(
-  num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
+  num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
--- End diff --

@dilipbiswal Per our offline discussion, I think you should add 
`TimestampType` support for `INT64` in `decodeDictionaryIds`. In order to test 
it, a test case of mixing dictionary-encoded values and non dictionary-encoded 
values is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15332: [SPARK-10364][SQL] Support Parquet logical type T...

2016-10-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15332#discussion_r81932694
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
 ---
@@ -362,7 +363,15 @@ private void readLongBatch(int rowId, int num, 
ColumnVector column) throws IOExc
 if (column.dataType() == DataTypes.LongType ||
 DecimalType.is64BitDecimalType(column.dataType())) {
   defColumn.readLongs(
-  num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
+  num, column, rowId, maxDefLevel, (VectorizedValuesReader) 
dataColumn);
--- End diff --

I've tested the following test case:

test("SPARK-10634 timestamp written and read as INT64 - 
TIMESTAMP_MILLIS") {
  val data = (1 to 1000).map { i =>
if (i < 500) {
  Row(new java.sql.Timestamp(10))
} else {
  Row(new java.sql.Timestamp(i))
}
  }
  val schema = StructType(List(StructField("time", TimestampType, 
false)).toArray)
  withSQLConf(ParquetOutputFormat.DICTIONARY_PAGE_SIZE -> "64",
  ParquetOutputFormat.PAGE_SIZE -> "128") {
withSQLConf(SQLConf.PARQUET_INT64_AS_TIMESTAMP_MILLIS.key -> 
"true") {
  withTempPath { file =>
val df = spark.createDataFrame(sparkContext.parallelize(data), 
schema)
df.coalesce(1).write.parquet(file.getCanonicalPath)
("true" :: Nil).foreach { vectorized =>
  withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
vectorized) {
val df2 = spark.read.parquet(file.getCanonicalPath)
checkAnswer(df2, df.collect().toSeq)
  }
}
  }
}
  }
}

It will cause an exception:

[info]  org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 
in stage 3.0 (TID 4, localhost): java.lang.UnsupportedOperationException: 
Unimplemented type: TimestampType
[info]  at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:256)
[info]  at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:177)
[info]  at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230)






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81933097
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,239 @@
+---
+layout: global
+title: Structured Streaming + Kafka Integration Guide (Kafka broker 
version 0.10.0 or higher)
+---
+
+Structured Streaming integration for Kafka 0.10 to poll data from Kafka.
+
+### Linking
+For Scala/Java applications using SBT/Maven project definitions, link your 
application with the following artifact:
+
+groupId = org.apache.spark
+artifactId = spark-sql-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
+version = {{site.SPARK_VERSION_SHORT}}
+
+For Python applications, you need to add this above library and its 
dependencies when deploying your
+application. See the [Deploying](#deploying) subsection below.
+
+### Creating a Kafka Source Stream
+
+
+
+
+// Subscribe to 1 topic
+val ds1 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to multiple topics
+val ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to a pattern
+val ds3 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+
+
+
+// Subscribe to 1 topic
+Dataset ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to multiple topics
+Dataset ds2 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to a pattern
+Dataset ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+# Subscribe to 1 topic
+ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to multiple topics
+ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to a pattern
+ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+Each row in the source has the following schema:
+
+ColumnType
+
+  key
+  binary
+
+
+  value
+  binary
+
+
+  topic
+  string
+
+
+  partition
+  int
+
+
+  offset
+  long
+
+
+  timestamp
+  long
+
+
+  timestampType
+  int
+
+
+
+The following options should be set for the Kafka source.
--- End diff --

nit: should --> must be


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
T

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81926721
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -0,0 +1,396 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
+import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.UninterruptibleThread
+
+/**
+ * A [[Source]] that uses Kafka's own [[KafkaConsumer]] API to reads data 
from Kafka. The design
+ * for this source is as follows.
+ *
+ * - The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ *   a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ *   example if the last record in a Kafka topic "t", partition 2 is 
offset 5, then
+ *   KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ *   with the semantics of `KafkaConsumer.position()`.
+ *
+ * - The [[ConsumerStrategy]] class defines which Kafka topics and 
partitions should be read
+ *   by this source. These strategies directly correspond to the different 
consumption options
+ *   in . This class is designed to return a configured [[KafkaConsumer]] 
that is used by the
+ *   [[KafkaSource]] to query for the offsets. See the docs on
+ *   [[org.apache.spark.sql.kafka010.KafkaSource.ConsumerStrategy]] for 
more details.
+ *
+ * - The [[KafkaSource]] written to do the following.
+ *
+ *  - As soon as the source is created, the pre-configured KafkaConsumer 
returned by the
+ *[[ConsumerStrategy]] is used to query the initial offsets that this 
source should
+ *start reading from. This used to create the first batch.
+ *
+ *   - `getOffset()` uses the KafkaConsumer to query the latest available 
offsets, which are
+ * returned as a [[KafkaSourceOffset]].
+ *
+ *   - `getBatch()` returns a DF that reads from the 'start offset' until 
the 'end offset' in
+ * for each partition. The end offset is excluded to be consistent 
with the semantics of
+ * [[KafkaSourceOffset]] and `KafkaConsumer.position()`.
+ *
+ *   - The DF returned is based on [[KafkaSourceRDD]] which is constructed 
such that the
+ * data from Kafka topic + partition is consistently read by the same 
executors across
+ * batches, and cached KafkaConsumers in the executors can be reused 
efficiently. See the
+ * docs on [[KafkaSourceRDD]] for more details.
+ *
+ * Zero data lost is not guaranteed when topics are deleted. If zero data 
lost is critical, the user
+ * must make sure all messages in a topic have been processed when 
deleting a topic.
+ *
+ * There is a known issue caused by KAFKA-1894: the query using 
KafkaSource maybe cannot be stopped.
+ * To avoid this issue, you should make sure stopping the query before 
stopping the Kafka brokers
+ * and not use wrong broker addresses.
+ */
+private[kafka010] case class KafkaSource(
+sqlContext: SQLContext,
+consumerStrategy: ConsumerStrategy,
+executorKafkaParams: ju.Map[String, Object],
+sourceOptions: Map[String, String],
+metadataPath: String,
+failOnDataLoss: Boolean)
+  extends Source with Logging {
+
+  private val sc = sqlContext.sparkContext

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81930482
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -0,0 +1,396 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
+import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.UninterruptibleThread
+
+/**
+ * A [[Source]] that uses Kafka's own [[KafkaConsumer]] API to reads data 
from Kafka. The design
+ * for this source is as follows.
+ *
+ * - The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ *   a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ *   example if the last record in a Kafka topic "t", partition 2 is 
offset 5, then
+ *   KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ *   with the semantics of `KafkaConsumer.position()`.
+ *
+ * - The [[ConsumerStrategy]] class defines which Kafka topics and 
partitions should be read
+ *   by this source. These strategies directly correspond to the different 
consumption options
+ *   in . This class is designed to return a configured [[KafkaConsumer]] 
that is used by the
+ *   [[KafkaSource]] to query for the offsets. See the docs on
+ *   [[org.apache.spark.sql.kafka010.KafkaSource.ConsumerStrategy]] for 
more details.
+ *
+ * - The [[KafkaSource]] written to do the following.
+ *
+ *  - As soon as the source is created, the pre-configured KafkaConsumer 
returned by the
+ *[[ConsumerStrategy]] is used to query the initial offsets that this 
source should
+ *start reading from. This used to create the first batch.
+ *
+ *   - `getOffset()` uses the KafkaConsumer to query the latest available 
offsets, which are
+ * returned as a [[KafkaSourceOffset]].
+ *
+ *   - `getBatch()` returns a DF that reads from the 'start offset' until 
the 'end offset' in
+ * for each partition. The end offset is excluded to be consistent 
with the semantics of
+ * [[KafkaSourceOffset]] and `KafkaConsumer.position()`.
+ *
+ *   - The DF returned is based on [[KafkaSourceRDD]] which is constructed 
such that the
+ * data from Kafka topic + partition is consistently read by the same 
executors across
+ * batches, and cached KafkaConsumers in the executors can be reused 
efficiently. See the
+ * docs on [[KafkaSourceRDD]] for more details.
+ *
+ * Zero data lost is not guaranteed when topics are deleted. If zero data 
lost is critical, the user
+ * must make sure all messages in a topic have been processed when 
deleting a topic.
+ *
+ * There is a known issue caused by KAFKA-1894: the query using 
KafkaSource maybe cannot be stopped.
+ * To avoid this issue, you should make sure stopping the query before 
stopping the Kafka brokers
+ * and not use wrong broker addresses.
+ */
+private[kafka010] case class KafkaSource(
+sqlContext: SQLContext,
+consumerStrategy: ConsumerStrategy,
+executorKafkaParams: ju.Map[String, Object],
+sourceOptions: Map[String, String],
+metadataPath: String,
+failOnDataLoss: Boolean)
+  extends Source with Logging {
+
+  private val sc = sqlContext.sparkContext

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81927094
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -0,0 +1,396 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
+import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.UninterruptibleThread
+
+/**
+ * A [[Source]] that uses Kafka's own [[KafkaConsumer]] API to reads data 
from Kafka. The design
+ * for this source is as follows.
+ *
+ * - The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ *   a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ *   example if the last record in a Kafka topic "t", partition 2 is 
offset 5, then
+ *   KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ *   with the semantics of `KafkaConsumer.position()`.
+ *
+ * - The [[ConsumerStrategy]] class defines which Kafka topics and 
partitions should be read
+ *   by this source. These strategies directly correspond to the different 
consumption options
+ *   in . This class is designed to return a configured [[KafkaConsumer]] 
that is used by the
+ *   [[KafkaSource]] to query for the offsets. See the docs on
+ *   [[org.apache.spark.sql.kafka010.KafkaSource.ConsumerStrategy]] for 
more details.
+ *
+ * - The [[KafkaSource]] written to do the following.
+ *
+ *  - As soon as the source is created, the pre-configured KafkaConsumer 
returned by the
+ *[[ConsumerStrategy]] is used to query the initial offsets that this 
source should
+ *start reading from. This used to create the first batch.
+ *
+ *   - `getOffset()` uses the KafkaConsumer to query the latest available 
offsets, which are
+ * returned as a [[KafkaSourceOffset]].
+ *
+ *   - `getBatch()` returns a DF that reads from the 'start offset' until 
the 'end offset' in
+ * for each partition. The end offset is excluded to be consistent 
with the semantics of
+ * [[KafkaSourceOffset]] and `KafkaConsumer.position()`.
+ *
+ *   - The DF returned is based on [[KafkaSourceRDD]] which is constructed 
such that the
+ * data from Kafka topic + partition is consistently read by the same 
executors across
+ * batches, and cached KafkaConsumers in the executors can be reused 
efficiently. See the
+ * docs on [[KafkaSourceRDD]] for more details.
+ *
+ * Zero data lost is not guaranteed when topics are deleted. If zero data 
lost is critical, the user
+ * must make sure all messages in a topic have been processed when 
deleting a topic.
+ *
+ * There is a known issue caused by KAFKA-1894: the query using 
KafkaSource maybe cannot be stopped.
+ * To avoid this issue, you should make sure stopping the query before 
stopping the Kafka brokers
+ * and not use wrong broker addresses.
+ */
+private[kafka010] case class KafkaSource(
+sqlContext: SQLContext,
+consumerStrategy: ConsumerStrategy,
+executorKafkaParams: ju.Map[String, Object],
+sourceOptions: Map[String, String],
+metadataPath: String,
+failOnDataLoss: Boolean)
+  extends Source with Logging {
+
+  private val sc = sqlContext.sparkContext

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81932096
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
 ---
@@ -0,0 +1,422 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.util.Random
+
+import org.apache.kafka.clients.producer.RecordMetadata
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+
+abstract class KafkaSourceTest extends StreamTest with SharedSQLContext {
+
+  protected var testUtils: KafkaTestUtils = _
+
+  override val streamingTimeout = 30.seconds
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+testUtils = new KafkaTestUtils
+testUtils.setup()
+  }
+
+  override def afterAll(): Unit = {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+  super.afterAll()
+}
+  }
+
+  protected def makeSureGetOffsetCalled = AssertOnQuery { q =>
+// Because KafkaSource's initialPartitionOffsets is set lazily, we 
need to make sure
+// its "getOffset" is called before pushing any data. Otherwise, 
because of the race contion,
+// we don't know which data should be fetched when `startingOffset` is 
latest.
+q.processAllAvailable()
+true
+  }
+
+  /**
+   * Add data to Kafka.
+   *
+   * `topicAction` can be used to run actions for each topic before 
inserting data.
+   */
+  case class AddKafkaData(topics: Set[String], data: Int*)
+(implicit ensureDataInMultiplePartition: Boolean = false,
+  concurrent: Boolean = false,
+  message: String = "",
+  topicAction: (String, Option[Int]) => Unit = (_, _) => {}) extends 
AddData {
+
+override def addData(query: Option[StreamExecution]): (Source, Offset) 
= {
+  if (query.get.isActive) {
+// Make sure no Spark job is running when deleting a topic
+query.get.processAllAvailable()
+  }
+
+  val existingTopics = testUtils.getAllTopicsAndPartitionSize().toMap
+  val newTopics = topics.diff(existingTopics.keySet)
+  for (newTopic <- newTopics) {
+topicAction(newTopic, None)
+  }
+  for (existingTopicPartitions <- existingTopics) {
+topicAction(existingTopicPartitions._1, 
Some(existingTopicPartitions._2))
+  }
+
+  // Read all topics again in case some topics are delete.
+  val allTopics = testUtils.getAllTopicsAndPartitionSize().toMap.keys
+  require(
+query.nonEmpty,
+"Cannot add data when there is no query for finding the active 
kafka source")
+
+  val sources = query.get.logicalPlan.collect {
+case StreamingExecutionRelation(source, _) if 
source.isInstanceOf[KafkaSource] =>
+  source.asInstanceOf[KafkaSource]
+  }
+  if (sources.isEmpty) {
+throw new Exception(
+  "Could not find Kafka source in the StreamExecution logical plan 
to add data to")
+  } else if (sources.size > 1) {
+throw new Exception(
+  "Could not select the Kafka source in the StreamExecution 
logical plan as there" +
+"are multiple Kafka sources:\n\t" + sources.mkString("\n\t"))
+  }
+  val kafkaSource = sources.head
+  val topic = topics.toSeq(Random.nextInt(topics.size))
+  val sentMetadata = testUtils.sendMessages(topic, data.map { 
_.toString }.toArray)
+
+  def metadataToStr(m: (String, RecordMetadata)): String = {
+s"Sent ${m._1} to partition ${m._2.partition()}, offset 
${m._2.offset()}"

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81933155
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,239 @@
+---
+layout: global
+title: Structured Streaming + Kafka Integration Guide (Kafka broker 
version 0.10.0 or higher)
+---
+
+Structured Streaming integration for Kafka 0.10 to poll data from Kafka.
+
+### Linking
+For Scala/Java applications using SBT/Maven project definitions, link your 
application with the following artifact:
+
+groupId = org.apache.spark
+artifactId = spark-sql-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
+version = {{site.SPARK_VERSION_SHORT}}
+
+For Python applications, you need to add this above library and its 
dependencies when deploying your
+application. See the [Deploying](#deploying) subsection below.
+
+### Creating a Kafka Source Stream
+
+
+
+
+// Subscribe to 1 topic
+val ds1 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to multiple topics
+val ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to a pattern
+val ds3 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+
+
+
+// Subscribe to 1 topic
+Dataset ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to multiple topics
+Dataset ds2 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to a pattern
+Dataset ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+# Subscribe to 1 topic
+ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to multiple topics
+ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to a pattern
+ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+Each row in the source has the following schema:
+
+ColumnType
+
+  key
+  binary
+
+
+  value
+  binary
+
+
+  topic
+  string
+
+
+  partition
+  int
+
+
+  offset
+  long
+
+
+  timestamp
+  long
+
+
+  timestampType
+  int
+
+
+
+The following options should be set for the Kafka source.
+
+
+Optionvaluemeaning
+
+  subscribe
+  A comma-separated list of topics
+  The topic list to subscribe. Only one of "subscribe" and 
"subscribePattern" options can be
+  specified for Kafka source.
+
+
+  subscribePattern
+  Java regex string
+  The pattern used to subscribe the topic. Only one of "subscribe" and 
"subscribePattern"
+  options can be specified for Kafka source.
+

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81933327
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -0,0 +1,239 @@
+---
+layout: global
+title: Structured Streaming + Kafka Integration Guide (Kafka broker 
version 0.10.0 or higher)
+---
+
+Structured Streaming integration for Kafka 0.10 to poll data from Kafka.
+
+### Linking
+For Scala/Java applications using SBT/Maven project definitions, link your 
application with the following artifact:
+
+groupId = org.apache.spark
+artifactId = spark-sql-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
+version = {{site.SPARK_VERSION_SHORT}}
+
+For Python applications, you need to add this above library and its 
dependencies when deploying your
+application. See the [Deploying](#deploying) subsection below.
+
+### Creating a Kafka Source Stream
+
+
+
+
+// Subscribe to 1 topic
+val ds1 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to multiple topics
+val ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+// Subscribe to a pattern
+val ds3 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+  .as[(String, String)]
+
+
+
+
+// Subscribe to 1 topic
+Dataset ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to multiple topics
+Dataset ds2 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+// Subscribe to a pattern
+Dataset ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+# Subscribe to 1 topic
+ds1 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1")
+  .load()
+ds1.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to multiple topics
+ds2 = spark
+  .readStream
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribe", "topic1,topic2")
+  .load()
+ds2.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+# Subscribe to a pattern
+ds3 = spark
+  .readStream()
+  .format("kafka")
+  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
+  .option("subscribePattern", "topic.*")
+  .load()
+ds3.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
+
+
+
+
+Each row in the source has the following schema:
+
+ColumnType
+
+  key
+  binary
+
+
+  value
+  binary
+
+
+  topic
+  string
+
+
+  partition
+  int
+
+
+  offset
+  long
+
+
+  timestamp
+  long
+
+
+  timestampType
+  int
+
+
+
+The following options should be set for the Kafka source.
+
+
+Optionvaluemeaning
+
+  subscribe
+  A comma-separated list of topics
+  The topic list to subscribe. Only one of "subscribe" and 
"subscribePattern" options can be
+  specified for Kafka source.
+
+
+  subscribePattern
+  Java regex string
+  The pattern used to subscribe the topic. Only one of "subscribe" and 
"subscribePattern"
+  options can be specified for Kafka source.
+

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-10-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r81930531
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -0,0 +1,396 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
+import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.UninterruptibleThread
+
+/**
+ * A [[Source]] that uses Kafka's own [[KafkaConsumer]] API to reads data 
from Kafka. The design
+ * for this source is as follows.
+ *
+ * - The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ *   a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ *   example if the last record in a Kafka topic "t", partition 2 is 
offset 5, then
+ *   KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ *   with the semantics of `KafkaConsumer.position()`.
+ *
+ * - The [[ConsumerStrategy]] class defines which Kafka topics and 
partitions should be read
+ *   by this source. These strategies directly correspond to the different 
consumption options
+ *   in . This class is designed to return a configured [[KafkaConsumer]] 
that is used by the
+ *   [[KafkaSource]] to query for the offsets. See the docs on
+ *   [[org.apache.spark.sql.kafka010.KafkaSource.ConsumerStrategy]] for 
more details.
+ *
+ * - The [[KafkaSource]] written to do the following.
+ *
+ *  - As soon as the source is created, the pre-configured KafkaConsumer 
returned by the
+ *[[ConsumerStrategy]] is used to query the initial offsets that this 
source should
+ *start reading from. This used to create the first batch.
+ *
+ *   - `getOffset()` uses the KafkaConsumer to query the latest available 
offsets, which are
+ * returned as a [[KafkaSourceOffset]].
+ *
+ *   - `getBatch()` returns a DF that reads from the 'start offset' until 
the 'end offset' in
+ * for each partition. The end offset is excluded to be consistent 
with the semantics of
+ * [[KafkaSourceOffset]] and `KafkaConsumer.position()`.
+ *
+ *   - The DF returned is based on [[KafkaSourceRDD]] which is constructed 
such that the
+ * data from Kafka topic + partition is consistently read by the same 
executors across
+ * batches, and cached KafkaConsumers in the executors can be reused 
efficiently. See the
+ * docs on [[KafkaSourceRDD]] for more details.
+ *
+ * Zero data lost is not guaranteed when topics are deleted. If zero data 
lost is critical, the user
+ * must make sure all messages in a topic have been processed when 
deleting a topic.
+ *
+ * There is a known issue caused by KAFKA-1894: the query using 
KafkaSource maybe cannot be stopped.
+ * To avoid this issue, you should make sure stopping the query before 
stopping the Kafka brokers
+ * and not use wrong broker addresses.
+ */
+private[kafka010] case class KafkaSource(
+sqlContext: SQLContext,
+consumerStrategy: ConsumerStrategy,
+executorKafkaParams: ju.Map[String, Object],
+sourceOptions: Map[String, String],
+metadataPath: String,
+failOnDataLoss: Boolean)
+  extends Source with Logging {
+
+  private val sc = sqlContext.sparkContext

[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2016-10-05 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15324
  
I tend to make GaussianNB as a special `modelType` option in current NB. 
However, there are significant differences:
1, 'theta' matrix is used to store means
2, extra `sigma` matrix needed to store variance
3, param `smoothing` has no effert, according to [sklearn's 
implementation](http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB).
 

To keep current NB design, I think there are two methods to include 
GaussianNB:
1, merge `theta` and `sigma` together (you are all against this)
2, update `NaiveBayesModel` and include `sigma` as a extra matrix, and 
`sigma` is only meaningful when using GaussianNB

I'm open on this issue, and I will update this PR when we come to a 
conclusion.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15346
  
**[Test build #66385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66385/consoleFull)**
 for PR 15346 at commit 
[`ba48a08`](https://github.com/apache/spark/commit/ba48a086c17c370385bbf5f03e9c8022324adda6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15341: [SPARK-17768] [CORE] Small (Sum,Count,Mean)Evaluator pro...

2016-10-05 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15341
  
Possibly @MLnick would have some thoughts on this one. I know there's a lot 
going on here but mostly it's deletion and comments. The math changes are 
probably best proven by the unit tests. I think the tests showed the current 
behavior is definitely off.

CCing @mateiz just because he was the last person to touch this code 
apparently, but, that was on initial import years ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15346: [SPARK-17741][SQL] Grammar to parse top level and...

2016-10-05 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15346#discussion_r81936312
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -593,6 +593,14 @@ colTypeList
 ;
 
 colType
+: identifier dataType (COMMENT STRING)?
+;
+
+complexColTypeList
+: complexColType (',' complexColType)*
+;
+
+complexColType
 : identifier ':'? dataType (COMMENT STRING)?
--- End diff --

Perhaps we should follow the HIVE way:
```structs: STRUCT```
It makes the colon required, and also support comment for struct cols.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...

2016-10-05 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/15346
  
@hvanhovell Thank you for your suggestion! I've addressed your comment and 
amended the testcases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15346: [SPARK-17741][SQL] Grammar to parse top level and...

2016-10-05 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15346#discussion_r81937555
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
 ---
@@ -67,9 +86,133 @@ class SparkSqlParserSuite extends PlanTest {
   DescribeFunctionCommand(FunctionIdentifier("bar", database = None), 
isExtended = true))
 assertEqual("describe function foo.bar",
   DescribeFunctionCommand(
-FunctionIdentifier("bar", database = Option("foo")), isExtended = 
false))
+FunctionIdentifier("bar", database = Some("foo")), isExtended = 
false))
 assertEqual("describe function extended f.bar",
-  DescribeFunctionCommand(FunctionIdentifier("bar", database = 
Option("f")), isExtended = true))
+  DescribeFunctionCommand(FunctionIdentifier("bar", database = 
Some("f")), isExtended = true))
+  }
+
+  private def createTableUsing(
+  table: String,
+  database: Option[String] = None,
+  tableType: CatalogTableType = CatalogTableType.MANAGED,
+  storage: CatalogStorageFormat = CatalogStorageFormat.empty,
+  schema: StructType = new StructType,
+  provider: Option[String] = Some("parquet"),
+  partitionColumnNames: Seq[String] = Seq.empty,
+  bucketSpec: Option[BucketSpec] = None,
+  mode: SaveMode = SaveMode.ErrorIfExists,
+  query: Option[LogicalPlan] = None): CreateTable = {
+CreateTable(
+  CatalogTable(
+identifier = TableIdentifier(table, database),
+tableType = tableType,
+storage = storage,
+schema = schema,
+provider = provider,
+partitionColumnNames = partitionColumnNames,
+bucketSpec = bucketSpec
+  ), mode, query
+)
+  }
+
+  private def createTempViewUsing(
+  table: String,
+  database: Option[String] = None,
+  schema: Option[StructType] = None,
+  replace: Boolean = true,
+  provider: String = "parquet",
+  options: Map[String, String] = Map.empty): LogicalPlan = {
+CreateTempViewUsing(TableIdentifier(table, database), schema, replace, 
provider, options)
+  }
+
+  private def createTable(
+  table: String,
+  database: Option[String] = None,
+  tableType: CatalogTableType = CatalogTableType.MANAGED,
+  storage: CatalogStorageFormat = CatalogStorageFormat.empty.copy(
+inputFormat = HiveSerDe.sourceToSerDe("textfile").get.inputFormat,
+outputFormat = 
HiveSerDe.sourceToSerDe("textfile").get.outputFormat),
+  schema: StructType = new StructType,
+  provider: Option[String] = Some("hive"),
+  partitionColumnNames: Seq[String] = Seq.empty,
+  comment: Option[String] = None,
+  mode: SaveMode = SaveMode.ErrorIfExists,
+  query: Option[LogicalPlan] = None): CreateTable = {
+CreateTable(
+  CatalogTable(
+identifier = TableIdentifier(table, database),
+tableType = tableType,
+storage = storage,
+schema = schema,
+provider = provider,
+partitionColumnNames = partitionColumnNames,
+comment = comment
+  ), mode, query
+)
   }
 
+  test("create table - schema") {
+assertEqual("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING)",
+  createTable(
+table = "my_tab",
+schema = (new StructType)
+  .add("a", IntegerType, nullable = true, "test")
+  .add("b", StringType)
+  )
+)
+assertEqual("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING) " +
+  "PARTITIONED BY (c INT, d STRING COMMENT 'test2')",
+  createTable(
+table = "my_tab",
+schema = (new StructType)
+  .add("a", IntegerType, nullable = true, "test")
+  .add("b", StringType)
+  .add("c", IntegerType)
+  .add("d", StringType, nullable = true, "test2"),
+partitionColumnNames = Seq("c", "d")
+  )
+)
+assertEqual("CREATE TABLE my_tab(id BIGINT, nested STRUCT)",
+  createTable(
+table = "my_tab",
+schema = (new StructType)
+  .add("id", LongType)
+  .add("nested", (new StructType)
+.add("col1", StringType)
+.add("col2", IntegerType)
+  )
+  )
+)
+// Partitioned by a StructType should be accepted by `SparkSqlParser` 
but will fail an analyze
+// rule in `AnalyzeCreateTable`.
+assertEqual("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING) " +
+  "PARTITIONED BY (nested STRUCT)",
+  createTable(
+table = "my_tab",
+schema = (new

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15072
  
**[Test build #66382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66382/consoleFull)**
 for PR 15072 at commit 
[`e27fe51`](https://github.com/apache/spark/commit/e27fe5187818e34ed6b8279327f5dab90b663ec7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15072
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15072: [SPARK-17123][SQL] Use type-widened encoder for DataFram...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15072
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66382/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66383/consoleFull)**
 for PR 15354 at commit 
[`26fc01f`](https://github.com/apache/spark/commit/26fc01f5e8373133fdcf0dba951a6061fd65492b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15354
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66383/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15258
  
**[Test build #66384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66384/consoleFull)**
 for PR 15258 at commit 
[`01cb666`](https://github.com/apache/spark/commit/01cb6664ea9ea2da7bc861432c19e3ac14ede524).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15258
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66384/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15361: [SPARK-17765][SQL] Support for writing out user-d...

2016-10-05 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/15361

[SPARK-17765][SQL] Support for writing out user-defined type in ORC 
datasource

## What changes were proposed in this pull request?

`OrcStruct` is being created based on string from`DataType.catalogString`. 
For user-defined type, it seems it returns `sqlType.simpleString` for 
`catalogString` by default[1]. However, during type-dispatching to match the 
output with the schema, it tries to cast to, for example, `StructType`[2]. 

So, running the codes below (`MyDenseVector` was borrowed[3]) :

```scala
val data = Seq((1, new UDT.MyDenseVector(Array(0.25, 2.25, 4.25
val udtDF = data.toDF("id", "vectors")
udtDF.write.orc("/tmp/test.orc")
```

ends up throwing an exception as below:

```
java.lang.ClassCastException: org.apache.spark.sql.UDT$MyDenseVectorUDT 
cannot be cast to org.apache.spark.sql.types.ArrayType
at 
org.apache.spark.sql.hive.HiveInspectors$class.wrapperFor(HiveInspectors.scala:381)
at 
org.apache.spark.sql.hive.orc.OrcSerializer.wrapperFor(OrcFileFormat.scala:164)
...
```


[1]https://github.com/apache/spark/blob/dfdcab00c7b6200c22883baa3ebc5818be09556f/sql/catalyst/src/main/scala/org/apache/spark/sql/types/UserDefinedType.scala#L95

[2]https://github.com/apache/spark/blob/d2dc8c4a162834818190ffd82894522c524ca3e5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala#L326

[3]https://github.com/apache/spark/blob/2bfed1a0c5be7d0718fd574a4dad90f4f6b44be7/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala#L38-L70

## How was this patch tested?

Unit tests in `OrcQuerySuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17765

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15361


commit 948a5ca6204460bda0eeefc85ca326c626a707f8
Author: hyukjinkwon 
Date:   2016-10-05T10:27:54Z

Support for writing out user-defined type in ORC datasource




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15361
  
@yhuai and @liancheng Do you mind if I ask to review this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15361
  
**[Test build #66386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66386/consoleFull)**
 for PR 15361 at commit 
[`948a5ca`](https://github.com/apache/spark/commit/948a5ca6204460bda0eeefc85ca326c626a707f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15346
  
**[Test build #66385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66385/consoleFull)**
 for PR 15346 at commit 
[`ba48a08`](https://github.com/apache/spark/commit/ba48a086c17c370385bbf5f03e9c8022324adda6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15346
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15346: [SPARK-17741][SQL] Grammar to parse top level and nested...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66385/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15361
  
**[Test build #66386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66386/consoleFull)**
 for PR 15361 at commit 
[`948a5ca`](https://github.com/apache/spark/commit/948a5ca6204460bda0eeefc85ca326c626a707f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66386/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15361
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14918
  
**[Test build #66388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66388/consoleFull)**
 for PR 14918 at commit 
[`8b85d63`](https://github.com/apache/spark/commit/8b85d63e0498def276bdcb945986ce3776ec6531).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14963
  
**[Test build #66387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66387/consoleFull)**
 for PR 14963 at commit 
[`0ab6ef5`](https://github.com/apache/spark/commit/0ab6ef580f41af25df36a6a2470d3abde21e78c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and import...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14567
  
**[Test build #66389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66389/consoleFull)**
 for PR 14567 at commit 
[`4a3cde7`](https://github.com/apache/spark/commit/4a3cde78f52565d9ea6dee0cc34540702a2ce1fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-10-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14918
  
**[Test build #66388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66388/consoleFull)**
 for PR 14918 at commit 
[`8b85d63`](https://github.com/apache/spark/commit/8b85d63e0498def276bdcb945986ce3776ec6531).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14918
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...

2016-10-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14918
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66388/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-10-05 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14912
  
ping @cloud-fan @hvanhovell @srinathshankar Can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 482 matches

Mail list logo