[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...

2015-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/9840#discussion_r4624
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -124,17 +124,46 @@ object ScalaReflection extends ScalaReflection {
   path: Option[Expression]): Expression = 
ScalaReflectionLock.synchronized {
 
 /** Returns the current path with a sub-field extracted. */
-def addToPath(part: String): Expression = path
-  .map(p => UnresolvedExtractValue(p, expressions.Literal(part)))
-  .getOrElse(UnresolvedAttribute(part))
+def addToPath(part: String, dataType: DataType): Expression = {
+  val newPath = path
+.map(p => UnresolvedExtractValue(p, expressions.Literal(part)))
+.getOrElse(UnresolvedAttribute(part))
+  castToExpectedType(newPath, dataType)
+}
 
 /** Returns the current path with a field at ordinal extracted. */
-def addToPathOrdinal(ordinal: Int, dataType: DataType): Expression = 
path
-  .map(p => GetInternalRowField(p, ordinal, dataType))
-  .getOrElse(BoundReference(ordinal, dataType, false))
+def addToPathOrdinal(ordinal: Int, dataType: DataType): Expression = {
+  val newPath = path
+.map(p => GetStructField(p, new StructField("", dataType), 
ordinal))
+.getOrElse(BoundReference(ordinal, dataType, false))
+  castToExpectedType(newPath, dataType)
+}
 
 /** Returns the current path or `BoundReference`. */
-def getPath: Expression = path.getOrElse(BoundReference(0, 
schemaFor(tpe).dataType, true))
+def getPath: Expression = {
+  val dataType = schemaFor(tpe).dataType
+  path.getOrElse(castToExpectedType(BoundReference(0, dataType, true), 
dataType))
+}
+
+/**
+ * When we build the `fromRowExpression` for an encoder, we set up a 
lot of "unresolved" stuff
+ * and lost the required data type, which may lead to runtime error if 
the real type doesn't
+ * match the encoder's schema.
+ * For example, we build an encoder for `case class Data(a: Int, b: 
String)` and the real type
+ * is [a: int, b: long], then we will hit runtime error and say that 
we can't construct class
+ * `Data` with int and long, because we lost the information that `b` 
should be a string.
+ *
+ * This method help us "remember" the require data type by adding a 
`Cast`.  Note that we don't
+ * need to add `Cast` for struct type because there must be 
`UnresolvedExtractValue` or
+ * `GetStructField` wrapping it.
+ *
+ * TODO: this only works if the real type is compatible with the 
encoder's schema, we should
+ * also handle error cases.
--- End diff --

When you saying compatibility, is it like type promotion? Do we have 
defined such rules for type promotion in Spark? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9840#issuecomment-158735601
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9840#issuecomment-158735602
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46486/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9840#issuecomment-158735571
  
**[Test build #46486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46486/consoleFull)**
 for PR 9840 at commit 
[`8d6a6ff`](https://github.com/apache/spark/commit/8d6a6ffd1a048f3941bb6e5f36e3f84755fc9760).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `
 * For example, we build an encoder for `case class Data(a: Int, b: String)` 
and the real type`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11778][SQL]:add regression test

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9890#issuecomment-158716152
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11778][SQL]:add regression test

2015-11-21 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/9890

[SPARK-11778][SQL]:add regression test

Fix regression test for SPARK-11778.
 @marmbrus
Could you please take a look?
Thank you very much!!


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-11778-regression-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9890


commit cb04f55d79560393c454361807e60dbdb94640c4
Author: Huaxin Gao 
Date:   2015-11-22T03:14:02Z

[SPARK-11778][SQL]:add regression test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-11327] [MESOS] Dispatcher does no...

2015-11-21 Thread jayv
Github user jayv commented on the pull request:

https://github.com/apache/spark/pull/9752#issuecomment-158715518
  
I will get to it on Monday.

- Jo Voordeckers


On Sat, Nov 21, 2015 at 2:31 PM, Iulian Dragos 
wrote:

> @jayv  will you have time to update this PR?
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11856][SQL] add type cast if the real t...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9840#issuecomment-158714817
  
**[Test build #46486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46486/consoleFull)**
 for PR 9840 at commit 
[`8d6a6ff`](https://github.com/apache/spark/commit/8d6a6ffd1a048f3941bb6e5f36e3f84755fc9760).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158705760
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158705761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46485/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158705743
  
**[Test build #46485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46485/consoleFull)**
 for PR 9889 at commit 
[`c135e1f`](https://github.com/apache/spark/commit/c135e1fefd9b621e8c073d71913cc3f45af7b308).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11881][SQL] Fix for postgresql fetchsiz...

2015-11-21 Thread mariusvniekerk
Github user mariusvniekerk commented on the pull request:

https://github.com/apache/spark/pull/9861#issuecomment-158701845
  
Not entirely sure why this causes NPE exceptions in some of the unit 
tests...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158698890
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158698891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46484/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] [SQL] Support Persist/Cache and ...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9889#issuecomment-158698821
  
**[Test build #46485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46485/consoleFull)**
 for PR 9889 at commit 
[`c135e1f`](https://github.com/apache/spark/commit/c135e1fefd9b621e8c073d71913cc3f45af7b308).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11905] Support Persist/Cache and Unpers...

2015-11-21 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/9889

[SPARK-11905] Support Persist/Cache and Unpersist in Dataset APIs

Persist and Unpersist exist in both RDD and Dataframe APIs. I think they 
are still very critical in Dataset APIs. Not sure my implementation is 
acceptable, or you have a different plan about these functions. 

Please provide your opinions. @marmbrus @rxin @cloud-fan 

Thank you very much!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark persistDS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9889


commit f0616711a5721ae65e1db1954453a5f862aaa8c6
Author: gatorsmile 
Date:   2015-11-22T01:06:06Z

Support Persist/Cache and Unpersist in DataSet APIs

commit 88d5e9d3f779c11d60d123df0d218bd94fd21f0c
Author: gatorsmile 
Date:   2015-11-22T01:07:03Z

Merge remote-tracking branch 'upstream/master' into top




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11837] [EC2] python3 compatibility for ...

2015-11-21 Thread mortada
Github user mortada commented on the pull request:

https://github.com/apache/spark/pull/9797#issuecomment-158695890
  
@JoshRosen Jenkins seemed to have failed again, but this PR should be good 
to go 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158694089
  
Sorry about the failure, can we re-test please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9888#issuecomment-158690246
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread dtolpin
GitHub user dtolpin opened a pull request:

https://github.com/apache/spark/pull/9888

[SPARK-11904] [PySpark] reduceByKeyAndWindow does not require checkpointing 
when invFunc is None

when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, 
slidesize)` is equivalent to

 reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, 
slidesize)

and no checkpoint is necessary. The corresponding Scala code does exactly 
that, but Python code always creates a windowed stream with obligatory 
checkpointing. The patch fixes this. 

I do not know how to unit-test this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dtolpin/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9888


commit 6730f72d2d9aa2c535abc9719e589369cc7b4cdb
Author: David Tolpin 
Date:   2015-11-21T23:22:31Z

invFunc=None does not require checkpointing

reduceByKeyAndWindow(func, None, window_size, slide_size) is equivalent to 
reduceByKey(func).window(window_size, slide_size).reduceByKey(func) and should 
not require checkpointing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread dtolpin
Github user dtolpin closed the pull request at:

https://github.com/apache/spark/pull/9887


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread dtolpin
Github user dtolpin commented on the pull request:

https://github.com/apache/spark/pull/9887#issuecomment-158689771
  
I did something wrong with rebasing, will remove and redo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB][SPARK-7615, SPARK-7617, SPARK-7618]: A...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6245#issuecomment-158689542
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158689503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46483/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158689502
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158689476
  
**[Test build #2095 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2095/consoleFull)**
 for PR 9543 at commit 
[`92cb677`](https://github.com/apache/spark/commit/92cb6779dbf15fdb59c7e00fbd2296fe79f1141d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158689474
  
**[Test build #46483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46483/consoleFull)**
 for PR 9264 at commit 
[`a8ba899`](https://github.com/apache/spark/commit/a8ba89977e7d749ada73228adb8744098309017f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`trait JobSubmitter `\n  * `class ComplexFutureAction[T](run : JobSubmitter => 
Future[T])`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158689288
  
**[Test build #2095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2095/consoleFull)**
 for PR 9543 at commit 
[`92cb677`](https://github.com/apache/spark/commit/92cb6779dbf15fdb59c7e00fbd2296fe79f1141d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9880


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9880#discussion_r45552122
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala 
---
@@ -36,11 +37,13 @@ import org.apache.spark.sql.execution.QueryExecution
  * making this change to the class hierarchy would break some function 
signatures. As such, this
  * class should be considered a preview of the final API.  Changes will be 
made to the interface
  * after Spark 1.6.
+ *
+ * @since 1.6.0
  */
 @Experimental
-class GroupedDataset[K, T] private[sql](
+class GroupedDataset[K, V] private[sql](
 kEncoder: Encoder[K],
-tEncoder: Encoder[T],
+tEncoder: Encoder[V],
--- End diff --

yea that's a good catch. will fix it in my next pr.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9887#issuecomment-158688910
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11904] [PySpark] reduceByKeyAndWindow d...

2015-11-21 Thread dtolpin
GitHub user dtolpin opened a pull request:

https://github.com/apache/spark/pull/9887

[SPARK-11904] [PySpark] reduceByKeyAndWindow does not require checkpointing 
when invFunc is None

when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, 
slidesize)` is equivalent to

 reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, 
slidesize)

and no checkpoint is necessary. The corresponding Scala code does exactly 
that, but Python code always creates a windowed stream with obligatory 
checkpointing. The patch fixes this. 

I do not know how to unit-test this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dtolpin/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9887


commit 3f777e10abc68fc0c389c8bf55ad56c7c33ea095
Author: David Tolpin 
Date:   2015-11-17T20:37:21Z

invFunc=none work properly with python's reduceByKeyAndWindow

commit e76be0123aca50f055595fb6acb64b7defa981cf
Author: David Tolpin 
Date:   2015-11-19T11:44:12Z

added unit test for reduceByKeyAndWindow with invFunc=None

commit ba8baa9814fd104b107341037a2c1f6b33df0c16
Author: David Tolpin 
Date:   2015-11-21T22:49:21Z

reduceByKeyAndWindow with invFunc=None does require checkpointing

commit d54b880cc96462d418b2155db733881750d31365
Author: David Tolpin 
Date:   2015-11-21T22:50:50Z

merged with upstream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5337][Mesos][Standalone] respect spark....

2015-11-21 Thread dragos
Github user dragos commented on a diff in the pull request:

https://github.com/apache/spark/pull/8610#discussion_r45551887
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -639,10 +640,11 @@ private[deploy] class Master(
 // in the queue, then the second app, etc.
 for (app <- waitingApps if app.coresLeft > 0) {
   val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
+  val coreNumPerTask = app.desc.coresPerTask
--- End diff --

nitpick: `coreNumPerTask` sounds weird.  `coresPerTaks` is what you've used 
everywhere else, and the name above is `coresPerExecutor`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-11327] [MESOS] Dispatcher does no...

2015-11-21 Thread dragos
Github user dragos commented on the pull request:

https://github.com/apache/spark/pull/9752#issuecomment-158687388
  
@jayv will you have time to update this PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-11-21 Thread dragos
Github user dragos commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-158686949
  
@tnachen I think this trade-off has been discussed in [this 
comment](https://github.com/apache/spark/pull/4027#issuecomment-92553493) and 
the following three. Since there are so many comments, here's a summary:

- both standalone and Yarn are using a fixed number of executor cores, so 
it is more user-friendly to behave in the same way
- the downside is that some CPUs wouldn't be utilized this way (example: 10 
free cores, `spark.executor.cores = 3`, ==> 3 executors launched, 1 core not 
used)
- `spark.executor.cores` is optional, so when not set we can still grab all 
cores. Would a `max` value make sense here?

I tend to agree with @pwendell and @andrewor14 but I don't want to push 
back if you guys discussed this previously and changed your minds (I just went 
through the whole thread again and I didn't find anything).

Still to do:

- [ ] decide on `spark.executor.cores` or having a max value instead
- [ ] one 
[comment](https://github.com/apache/spark/pull/4027#discussion_r27091956) that 
wasn't addressed, related to config names.
- [ ] I still need to try this on a real Mesos cluster, won't be able to do 
it before Monday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6990] [Build] Add Java linting script; ...

2015-11-21 Thread dskrvk
Github user dskrvk commented on the pull request:

https://github.com/apache/spark/pull/9867#issuecomment-158683916
  
Added some more commits so that new changes are in line with the style 
guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6990] [Build] Add Java linting script; ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9867#issuecomment-158683705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46482/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6990] [Build] Add Java linting script; ...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9867#issuecomment-158683704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6990] [Build] Add Java linting script; ...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9867#issuecomment-158683645
  
**[Test build #46482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46482/consoleFull)**
 for PR 9867 at commit 
[`7a49ad7`](https://github.com/apache/spark/commit/7a49ad708d35bfd1ba7027bb07ae016b2057987c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  
public abstract static class PrefixComputer `\n  * `abstract class 
Aggregator[-I, B, O] extends Serializable `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158682569
  
**[Test build #46483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46483/consoleFull)**
 for PR 9264 at commit 
[`a8ba899`](https://github.com/apache/spark/commit/a8ba89977e7d749ada73228adb8744098309017f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11859][Mesos] SparkContext accepts inva...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9886#issuecomment-158680586
  
LGTM, though I tend to agree there's a little risk here in making something 
that shouldn't work actually not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-21 Thread reggert
Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-158680579
  
I've come up with a reusable way to make use of semaphores to control 
timing of tasks during unit tests. Please see the `Smuggle` class and let me 
know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9880#issuecomment-158679126
  
LGTM aside from one minor variable naming nit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9880#discussion_r45550214
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedDataset.scala 
---
@@ -36,11 +37,13 @@ import org.apache.spark.sql.execution.QueryExecution
  * making this change to the class hierarchy would break some function 
signatures. As such, this
  * class should be considered a preview of the final API.  Changes will be 
made to the interface
  * after Spark 1.6.
+ *
+ * @since 1.6.0
  */
 @Experimental
-class GroupedDataset[K, T] private[sql](
+class GroupedDataset[K, V] private[sql](
 kEncoder: Encoder[K],
-tEncoder: Encoder[T],
+tEncoder: Encoder[V],
--- End diff --

Should this variable be renamed to `vEncoder`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11628][SQL] support column datatype of ...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9612#issuecomment-158678513
  
@cloud-fan I have added a few tests per your suggestion. Do they look good 
to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158678073
  
@rxin Thanks, Reynold! Somehow no test was triggered. Not sure why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10864] [Web UI] app name is hidden if w...

2015-11-21 Thread ajbozarth
Github user ajbozarth commented on the pull request:

https://github.com/apache/spark/pull/9874#issuecomment-158673063
  
Here's a few screenshots. I included a before, an after with the same width 
as the before, an after right after it wraps and an after at the minimum width.

![new-min-width](https://cloud.githubusercontent.com/assets/13952758/11319993/836d56ec-9040-11e5-9418-b8d75362ab0d.png)

![new-just-wrapped](https://cloud.githubusercontent.com/assets/13952758/11319994/8389f946-9040-11e5-9218-d985837693d8.png)

![new-same-width](https://cloud.githubusercontent.com/assets/13952758/11319995/838aac10-9040-11e5-91af-639767d555f0.png)

![before](https://cloud.githubusercontent.com/assets/13952758/11319996/838b918e-9040-11e5-9dc0-4695965def20.png)






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6990] [Build] Add Java linting script; ...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9867#issuecomment-158671998
  
**[Test build #46482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46482/consoleFull)**
 for PR 9867 at commit 
[`7a49ad7`](https://github.com/apache/spark/commit/7a49ad708d35bfd1ba7027bb07ae016b2057987c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...

2015-11-21 Thread tnachen
Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-158668110
  
@andrewor14 I've updated the patch now. Originally you suggested me to look 
at deploy/master.scala to try to use the same configurations like 
spark.executor.cores. But in the end spark.executor.cores are referring to a 
set number of cores that will be used to launch per spark executor, but in this 
case we're trying to specify a maximum number of cores that can potentially 
launch your coarse grain executor/worker, and Mesos scheduler will launch an 
executors using between 1 to the max number of cores, and maximumly launch the 
"max executors per slave" amount per slave.

So I think having a spark.mesos.coarse.executor.cores.max or something 
similiar still makes sense. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11859][Mesos] SparkContext accepts inva...

2015-11-21 Thread dragos
Github user dragos commented on the pull request:

https://github.com/apache/spark/pull/9886#issuecomment-158667956
  
@andrewor14  I wonder if we shouldn't first warn about this, and defer the 
actual failure until 2.0. There might be people relying on this loophole. If I 
understand correctly, people could connect using `zk://` urls if they didn't go 
through spark-submit (spark-shell or hard coded). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158662488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46481/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158662487
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158662450
  
**[Test build #46481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46481/consoleFull)**
 for PR 9885 at commit 
[`76943f8`](https://github.com/apache/spark/commit/76943f8eed0d0d8a236b4cb33f69b8128ccf5890).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class Aggregator[-I, B, O] extends Serializable `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9301] [SQL] Add collect_set and collect...

2015-11-21 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/9526#issuecomment-158662413
  
Thanks.
And when I compile with hive is there a chance to do sth like this ?
"select id, collect_list(table.*) as data from table group by id" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11880][Windows][Spark Submit] bin/load-...

2015-11-21 Thread toddwan
Github user toddwan commented on the pull request:

https://github.com/apache/spark/pull/9863#issuecomment-158661692
  
I guess users on Windows platform seldom touch `spark-env.cmd`, and have 
lots of workarounds if they ran into this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9301] [SQL] Add collect_set and collect...

2015-11-21 Thread nburoojy
Github user nburoojy commented on the pull request:

https://github.com/apache/spark/pull/9526#issuecomment-158661662
  
This is a wrapper around the Hive collect fns. Try compiling with `-Phive 
-Phive-thriftserver`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11859][Mesos] SparkContext accepts inva...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9886#issuecomment-158661212
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11859][Mesos] SparkContext accepts inva...

2015-11-21 Thread toddwan
GitHub user toddwan opened a pull request:

https://github.com/apache/spark/pull/9886

[SPARK-11859][Mesos] SparkContext accepts invalid Master URLs in the form 
zk://host:port for a multi-master Mesos cluster using ZooKeeper

* According to below doc and validation logic in 
[SparkSubmit.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L231),
 master URL for a mesos cluster should always start with `mesos://` 

http://spark.apache.org/docs/latest/running-on-mesos.html
`The Master URLs for Mesos are in the form mesos://host:5050 for a 
single-master Mesos cluster, or mesos://zk://host:2181 for a multi-master Mesos 
cluster using ZooKeeper.`


* However, 
[SparkContext.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2749)
 fails the validation and can receive master URL in the form `zk://host:port`

* For the master URLs in the form `zk:host:port`, the valid form should be 
`mesos://zk://host:port`

* This PR restrict the validation in `SparkContext.scala`, and now only 
mesos master URLs prefixed with `mesos://` can be accepted.

* This PR also updated corresponding unit test. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/toddwan/spark S11859

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9886


commit c213acc1f5728a2cb8dc1cc68f83e13692c6eb5e
Author: toddwan 
Date:   2015-11-21T16:21:40Z

SparkContext accepts invalid Master URLs like 'zk://host:port' for a 
multi-master Mesos cluster using ZooKeeper




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158659615
  
**[Test build #46481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46481/consoleFull)**
 for PR 9885 at commit 
[`76943f8`](https://github.com/apache/spark/commit/76943f8eed0d0d8a236b4cb33f69b8128ccf5890).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Refactor SimpleFutureAction.onCom...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7385#issuecomment-158658397
  
**[Test build #2093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2093/consoleFull)**
 for PR 7385 at commit 
[`17edbcd`](https://github.com/apache/spark/commit/17edbcd06086b6a8cad922b4c535eb2a6265b2e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/9885#discussion_r45547407
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -84,6 +84,8 @@ class VectorAssembler(override val uid: String)
 val numAttrs = 
group.numAttributes.getOrElse(first.getAs[Vector](index).size)
 Array.fill(numAttrs)(NumericAttribute.defaultAttr)
   }
+case otherType =>
--- End diff --

will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3580][CORE] Add Consistent Method To Ge...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9767#issuecomment-158652339
  
**[Test build #2094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2094/consoleFull)**
 for PR 9767 at commit 
[`2324016`](https://github.com/apache/spark/commit/23240166994bee9f855472650d043e8015862e89).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11796] [test-maven] [WIP] Fixing httpcl...

2015-11-21 Thread markgrover
Github user markgrover commented on the pull request:

https://github.com/apache/spark/pull/9876#issuecomment-158651416
  
OK, I will do that. I am afk for some time but will take care of this when
I am back. Thanks.
On Nov 20, 2015 10:12 PM, "Josh Rosen"  wrote:

> My hunch is that you'll have to declare an explicit dependency on those
> drivers, since I think that there's a difference between Maven and SBT
> handling of test-jars' transitive dependencies. See sql/pom.xml for the
> drivers; if the versions are just hardcoded there, you might want to lift
> them up to the root pom's dependencyManagement when also declaring the 
dep.
> in the docker subproject.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3580][CORE] Add Consistent Method To Ge...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9767#issuecomment-158650509
  
**[Test build #2094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2094/consoleFull)**
 for PR 9767 at commit 
[`2324016`](https://github.com/apache/spark/commit/23240166994bee9f855472650d043e8015862e89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3580][CORE] Add Consistent Method To Ge...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9767#issuecomment-158650446
  
I think this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11880][Windows][Spark Submit] bin/load-...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9863#issuecomment-158650306
  
**[Test build #2092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2092/consoleFull)**
 for PR 9863 at commit 
[`094e3e3`](https://github.com/apache/spark/commit/094e3e3395d426f9deca81a576091a10391fae77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158649804
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46480/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158649802
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158649771
  
**[Test build #46480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46480/consoleFull)**
 for PR 9885 at commit 
[`1dde108`](https://github.com/apache/spark/commit/1dde1087fd6e02534724b0d4df560686eebcbcc8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`abstract class Aggregator[-I, B, O] extends Serializable `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9301] [SQL] Add collect_set and collect...

2015-11-21 Thread maver1ck
Github user maver1ck commented on the pull request:

https://github.com/apache/spark/pull/9526#issuecomment-158649521
  
Hi,
How can I run this ?
Spark 1.6.0-preview1
Compiled with:
mvn -e -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

I'm trying like this:
from pyspark.sql.functions import collect_list
df.select(collect_list(df.client_id))

But I'm getting:
Py4JJavaError: An error occurred while calling o42.select.
: org.apache.spark.sql.AnalysisException: undefined function collect_list;
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11898] [MLlib] Use broadcast for the gl...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9878#issuecomment-158649425
  
I think this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158649259
  
Thanks. I will update this later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158649170
  
Yeah, the point appears to be to match Hive's, so it has to be documented 
and tested as such. Otherwise a generic unspecified 'hash' function doesn't 
help much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9885#discussion_r45546783
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -84,6 +84,8 @@ class VectorAssembler(override val uid: String)
 val numAttrs = 
group.numAttributes.getOrElse(first.getAs[Vector](index).size)
 Array.fill(numAttrs)(NumericAttribute.defaultAttr)
   }
+case otherType =>
--- End diff --

Seems OK to me. Nit: you can use `s"... $otherType"` interpolation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2015-11-21 Thread josephlijia
Github user josephlijia commented on the pull request:

https://github.com/apache/spark/pull/1297#issuecomment-158647538
  
When we looked up one certain key-value by IndexedRDD, we found that it was 
even slower than ordinary RDD. We use 100, keys in our experiment. When we 
tested it by IndexedRDDPartition, it was faster than ordinary RDD. I am 
expecting your answer. Thanks you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9885#issuecomment-158645412
  
**[Test build #46480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46480/consoleFull)**
 for PR 9885 at commit 
[`1dde108`](https://github.com/apache/spark/commit/1dde1087fd6e02534724b0d4df560686eebcbcc8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11902] [ML] Unhandled case in VectorAss...

2015-11-21 Thread BenFradet
GitHub user BenFradet opened a pull request:

https://github.com/apache/spark/pull/9885

[SPARK-11902]  [ML] Unhandled case in VectorAssembler#transform

There is an unhandled case in the transform method of VectorAssembler if 
one of the input columns doesn't have one of the supported type DoubleType, 
NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic 
"scala.MatchError: StringType".

This PR aims to fix this, throwing a SparkException when dealing with an 
unknown column type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BenFradet/spark SPARK-11902

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9885


commit 1dde1087fd6e02534724b0d4df560686eebcbcc8
Author: BenFradet 
Date:   2015-11-21T13:52:31Z

throwing exception when dealing with an unhandled column type in 
VectorAssembler




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158641892
  
Hmm, do we need to make the hash function result consistent with Hive's?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC][Example]Replace example cod...

2015-11-21 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9735#issuecomment-158640370
  
@somideshmukh Do you still have time on this? I can help if you are busy. 
Pls let me know. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Refactor SimpleFutureAction.onCom...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7385#issuecomment-158639432
  
**[Test build #2093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2093/consoleFull)**
 for PR 7385 at commit 
[`17edbcd`](https://github.com/apache/spark/commit/17edbcd06086b6a8cad922b4c535eb2a6265b2e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4424] Remove spark.driver.allowMultiple...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9865#issuecomment-158638957
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11880][Windows][Spark Submit] bin/load-...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9863#issuecomment-158638902
  
**[Test build #2092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2092/consoleFull)**
 for PR 9863 at commit 
[`094e3e3`](https://github.com/apache/spark/commit/094e3e3395d426f9deca81a576091a10391fae77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11880][Windows][Spark Submit] bin/load-...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9863#issuecomment-158637912
  
LGTM though how did this ever work then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10864] [Web UI] app name is hidden if w...

2015-11-21 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9874#issuecomment-158636924
  
This is probably fine but yeah would be good to see screenshots. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9880#issuecomment-158634116
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9880#issuecomment-158634117
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46477/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11899][SQL] API audit for GroupedDatase...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9880#issuecomment-158634093
  
**[Test build #46477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46477/consoleFull)**
 for PR 9880 at commit 
[`aacd652`](https://github.com/apache/spark/commit/aacd6524da1f6b7e2c99a5fb351086147ac545b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11889] [SQL] Fix type inference for Gro...

2015-11-21 Thread dragos
Github user dragos commented on a diff in the pull request:

https://github.com/apache/spark/pull/9870#discussion_r45545095
  
--- Diff: 
repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala ---
@@ -315,6 +315,30 @@ class ReplSuite extends SparkFunSuite {
 }
   }
 
+  test("Datasets agg type-inference") {
+val output = runInterpreter("local",
+  """
+|import org.apache.spark.sql.functions._
+|import org.apache.spark.sql.Encoder
+|import org.apache.spark.sql.expressions.Aggregator
+|import org.apache.spark.sql.TypedColumn
+|/** An `Aggregator` that adds up any numeric type returned by the 
given function. */
+|class SumOf[I, N : Numeric](f: I => N) extends Aggregator[I, N, 
N] with Serializable {
+|  val numeric = implicitly[Numeric[N]]
+|  override def zero: N = numeric.zero
+|  override def reduce(b: N, a: I): N = numeric.plus(b, f(a))
+|  override def merge(b1: N,b2: N): N = numeric.plus(b1, b2)
+|  override def finish(reduction: N): N = reduction
+|}
+|
+|def sum[I, N : Numeric : Encoder](f: I => N): TypedColumn[I, N] = 
new SumOf(f).toColumn
+|val ds = Seq((1, 1, 2L), (1, 2, 3L), (1, 3, 4L), (2, 1, 
5L)).toDS()
+|ds.groupBy(_._1).agg(sum(_._2), sum(_._3)).collect()
--- End diff --

I can't reproduce the difference. It won't infer it in a standalone program 
[either](https://gist.github.com/dragos/7c2d3ec962ee2e6862f3).

As I mentioned in our conversation, it's a chicken and egg problem: type 
inference is guided by the expected type, but if the method is overloaded, the 
expected type is not known. And the argument type is what guides overload 
resolution. It works in simple cases, when the overloads have different 
aritites, but with varargs that's no longer the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9884#issuecomment-158627226
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9884#issuecomment-158627228
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46479/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9884#issuecomment-158626876
  
**[Test build #46479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46479/consoleFull)**
 for PR 9884 at commit 
[`a4b72a0`](https://github.com/apache/spark/commit/a4b72a0310372567509050845e77ffd517e13ce8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class RFormula(override val uid: String)`\n  * `  class 
VectorAttributeRewriterWriter(instance: VectorAttributeRewriter) extends 
MLWriter `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158625782
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46475/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158625781
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8233][SQL] Add misc function hash

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9883#issuecomment-158625768
  
**[Test build #46475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46475/consoleFull)**
 for PR 9883 at commit 
[`afe626c`](https://github.com/apache/spark/commit/afe626158691ec500a0653a4907d281044bb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11871] Add save/load for MLPC

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9854#issuecomment-158624340
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46478/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11871] Add save/load for MLPC

2015-11-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9854#issuecomment-158624338
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11871] Add save/load for MLPC

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9854#issuecomment-158624215
  
**[Test build #46478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46478/consoleFull)**
 for PR 9854 at commit 
[`881428f`](https://github.com/apache/spark/commit/881428f263b9c7ecdb353afd0c86a6512c45ac99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  
class MultilayerPerceptronClassificationModelWriter(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9884#issuecomment-158620275
  
**[Test build #46479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46479/consoleFull)**
 for PR 9884 at commit 
[`a4b72a0`](https://github.com/apache/spark/commit/a4b72a0310372567509050845e77ffd517e13ce8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11871] Add save/load for MLPC

2015-11-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9854#issuecomment-158617009
  
**[Test build #46478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46478/consoleFull)**
 for PR 9854 at commit 
[`881428f`](https://github.com/apache/spark/commit/881428f263b9c7ecdb353afd0c86a6512c45ac99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11871] Add save/load for MLPC

2015-11-21 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9854#issuecomment-158616947
  
Pls ignore my first comment, I have splited the test of model and 
classifier separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >