[GitHub] spark pull request: [SPARK-6223][SQL] Fix build warning- enable im...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4948#issuecomment-77841961
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6223][SQL] Fix build warning- enable im...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4948#issuecomment-77855093
  
As I mentioned, I don't think it's efficient to try to make changes like 
this one line at a time. There are a number of warnings like this, and other 
build warnings in general, we can resolve as one logical change. I have already 
created a set of fixes for this and will open a PR.

https://github.com/apache/spark/pull/4908
https://github.com/apache/spark/pull/4900


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4950#discussion_r26036340
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -199,12 +199,12 @@ object MatrixFactorizationModel extends 
Loader[MatrixFactorizationModel] {
   assert(formatVersion == thisFormatVersion)
   val rank = (metadata \ rank).extract[Int]
   val userFeatures = sqlContext.parquetFile(userPath(path))
-.map { case Row(id: Int, features: Seq[Double]) =
-  (id, features.toArray)
+.map { case Row(id: Int, features: Seq[_]) =
+  (id, features.asInstanceOf[Seq[Double]].toArray)
--- End diff --

Strangely, this is how the scala compiler wanted it. It doesn't like 
matching on a type with generics, since they are erased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4950#issuecomment-77856603
  
  [Test build #28392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28392/consoleFull)
 for   PR 4950 at commit 
[`c67985b`](https://github.com/apache/spark/commit/c67985b01538a8e4ede806ce7e7b23af7a985a65).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6195] [SQL] Adds in-memory column type ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4938#issuecomment-77860204
  
  [Test build #28391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28391/consoleFull)
 for   PR 4938 at commit 
[`e08ab5b`](https://github.com/apache/spark/commit/e08ab5bc376cd67b79bc3eb195ec2a4302df2e37).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6177][MLlib] LDA should check partition...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4899#discussion_r26038573
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala ---
@@ -174,6 +174,7 @@ object LDAExample {
 
 // Get dataset of document texts
 // One document per line in each text file.
+// One partition each text file. Consider using coalesce as necessary.
--- End diff --

As long as we're documenting this, let's edit this a bit more. It's not 
guaranteed that there will be a partition per text file. I'd say something more 
like:

If the input consists of many small files, this can result in a large 
number of small partitions, which can degrade performance. In this case, 
consider using coalesce() to create fewer, larger partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [EC2] [SPARK-6188] Instance types can be misla...

2015-03-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-09 Thread haiyangsea
Github user haiyangsea commented on a diff in the pull request:

https://github.com/apache/spark/pull/4929#discussion_r26027360
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -287,6 +282,20 @@ class SQLQuerySuite extends QueryTest with 
BeforeAndAfterAll {
   mapData.collect().take(1).map(Row.fromTuple).toSeq)
   }
 
+  test(CTE feature) {
+checkAnswer(
+  sql(with q1 as (select * from testData limit 10) select * from q1),
+  testData.take(10).toSeq)
+
+checkAnswer(
+  sql(
+|with q1 as (select * from testData where key= '5'),
--- End diff --

updated,thank you for your review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4950#discussion_r26036401
  
--- Diff: 
external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaRDDSuite.java
 ---
@@ -19,23 +19,19 @@
 
 import java.io.Serializable;
 import java.util.HashMap;
-import java.util.HashSet;
-import java.util.Arrays;
-
-import org.apache.spark.SparkConf;
 
 import scala.Tuple2;
 
-import junit.framework.Assert;
-
 import kafka.common.TopicAndPartition;
 import kafka.message.MessageAndMetadata;
 import kafka.serializer.StringDecoder;
 
+import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.api.java.function.Function;
 
+import org.junit.Assert;
--- End diff --

I did a little extra cleanup in these test files that wasn't strictly 
speaking related to a warning, like importing the modern JUnit classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6224][SQL] Also collect NamedExpression...

2015-03-09 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/4949


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6200] [SQL] Add a manager for dialects

2015-03-09 Thread haiyangsea
Github user haiyangsea commented on a diff in the pull request:

https://github.com/apache/spark/pull/4939#discussion_r26028022
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -208,6 +209,87 @@ case class DescribeCommand(
 }
 
 /**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class ShowDialectsCommand(
+isExtended: Boolean,
+isCurrent: Boolean) extends 
RunnableCommand {
--- End diff --

updated,thank you for your review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3830][MLlib] Implement genetic algorith...

2015-03-09 Thread epahomov
Github user epahomov closed the pull request at:

https://github.com/apache/spark/pull/2731


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77836711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28388/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3830][MLlib] Implement genetic algorith...

2015-03-09 Thread epahomov
Github user epahomov commented on the pull request:

https://github.com/apache/spark/pull/2731#issuecomment-77836797
  
My PR is too old for current architecture and I already found too much to 
improve in it. I'll do better and resubmit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77836705
  
  [Test build #28388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28388/consoleFull)
 for   PR 4947 at commit 
[`48ab7f9`](https://github.com/apache/spark/commit/48ab7f984c75bcb8bfa9eec6330c67d9592b356e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6195] [SQL] Adds in-memory column type ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4938#issuecomment-77848860
  
  [Test build #28391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28391/consoleFull)
 for   PR 4938 at commit 
[`e08ab5b`](https://github.com/apache/spark/commit/e08ab5bc376cd67b79bc3eb195ec2a4302df2e37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6224][SQL] Also collect NamedExpression...

2015-03-09 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/4949

[SPARK-6224][SQL] Also collect NamedExpressions in PhysicalOperation

Currently in `PhysicalOperation`, only `Alias` expressions are collected. 
Similarly, `NamedExpression` can be collected for substitution.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 collect_namedexpr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4949


commit cee75657aa30239c094b2b7d7671815b4adac5eb
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-03-09T12:57:12Z

Also collect NamedExpressions in PhysicalOperation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6224][SQL] Also collect NamedExpression...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4949#issuecomment-77848861
  
  [Test build #28390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28390/consoleFull)
 for   PR 4949 at commit 
[`cee7565`](https://github.com/apache/spark/commit/cee75657aa30239c094b2b7d7671815b4adac5eb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6224][SQL] Also collect NamedExpression...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4949#issuecomment-77853437
  
  [Test build #28390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28390/consoleFull)
 for   PR 4949 at commit 
[`cee7565`](https://github.com/apache/spark/commit/cee75657aa30239c094b2b7d7671815b4adac5eb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/4950

SPARK-6225 [CORE] [SQL] [STREAMING] Resolve most build warnings, 1.3.0 
edition

Resolve javac, scalac warnings of various types -- deprecations, Scala 
lang, unchecked cast, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-6225

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4950


commit c67985b01538a8e4ede806ce7e7b23af7a985a65
Author: Sean Owen so...@cloudera.com
Date:   2015-03-09T13:49:53Z

Resolve javac, scalac warnings of various types -- deprecations, Scala 
lang, unchecked cast, etc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-77834974
  
  [Test build #28389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28389/consoleFull)
 for   PR 4929 at commit 
[`0d56af4`](https://github.com/apache/spark/commit/0d56af4b80f0dc775cffcf400d882d5888ca717f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6223][SQL] Fix build warning- enable im...

2015-03-09 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/4948

[SPARK-6223][SQL] Fix build warning- enable implicit value 
scala.language.existentials visible



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark add_scala.language.existentials

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4948


commit 919ca8cda851efd6a35daaa8d4fb12dc22fdc749
Author: Vinod K C vinod...@huawei.com
Date:   2015-03-09T14:38:19Z

Fix Build warning- enable implicit value scala.language.existentials visible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6095] [MLLIB] Support model save/load i...

2015-03-09 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/4911#issuecomment-77854374
  
@mengxr Yes, it make sense, I will try to implement the save/load operation 
in Python which do the same thing as in Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-77844025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28389/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6199] [SQL] Support CTE in HiveContext ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4929#issuecomment-77844015
  
  [Test build #28389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28389/consoleFull)
 for   PR 4929 at commit 
[`0d56af4`](https://github.com/apache/spark/commit/0d56af4b80f0dc775cffcf400d882d5888ca717f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class With(child: LogicalPlan, subQueries: Map[String, Subquery]) 
extends UnaryNode `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4950#discussion_r26036283
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1104,7 +1104,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   if (!fs.exists(hadoopPath)) {
 throw new FileNotFoundException(sAdded file $hadoopPath does not 
exist.)
   }
-  val isDir = fs.isDirectory(hadoopPath)
--- End diff --

In case you're wondering: no this wasn't one of those things deprecated in 
Hadoop 2.x; this was deprecated in 1.0.4 even!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6223][SQL] Fix build warning- enable im...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4948#discussion_r26036226
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.sources
 
 import scala.language.implicitConversions
+import scala.language.existentials
--- End diff --

I know it's trivial, but this is not ordered correctly, even. This was a 
change I included in a PR I was working on this weekend, and just submitted: 
https://github.com/apache/spark/pull/4948/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77861051
  
  [Test build #28393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28393/consoleFull)
 for   PR 4951 at commit 
[`dce7055`](https://github.com/apache/spark/commit/dce70553cb0e5c25d1bb0a415929eb5066af964a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26033358
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

Original message (Available and requested size) in `KryoException`  is 
useful too. Is it better to include original message too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6195] [SQL] Adds in-memory column type ...

2015-03-09 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4938#discussion_r26034326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala ---
@@ -107,24 +110,28 @@ private[sql] class GenericColumnAccessor(buffer: 
ByteBuffer)
   with NullableColumnAccessor
 
 private[sql] object ColumnAccessor {
-  def apply(buffer: ByteBuffer): ColumnAccessor = {
+  def apply(dataType: DataType, buffer: ByteBuffer): ColumnAccessor = {
 val dup = buffer.duplicate().order(ByteOrder.nativeOrder)
-// The first 4 bytes in the buffer indicate the column type.
-val columnTypeId = dup.getInt()
-
-columnTypeId match {
-  case INT.typeId   = new IntColumnAccessor(dup)
-  case LONG.typeId  = new LongColumnAccessor(dup)
-  case FLOAT.typeId = new FloatColumnAccessor(dup)
-  case DOUBLE.typeId= new DoubleColumnAccessor(dup)
-  case BOOLEAN.typeId   = new BooleanColumnAccessor(dup)
-  case BYTE.typeId  = new ByteColumnAccessor(dup)
-  case SHORT.typeId = new ShortColumnAccessor(dup)
-  case STRING.typeId= new StringColumnAccessor(dup)
-  case DATE.typeId  = new DateColumnAccessor(dup)
-  case TIMESTAMP.typeId = new TimestampColumnAccessor(dup)
-  case BINARY.typeId= new BinaryColumnAccessor(dup)
-  case GENERIC.typeId   = new GenericColumnAccessor(dup)
+
+// The first 4 bytes in the buffer indicate the column type.  This 
field is not used now,
+// because we always know the data type of the column ahead of time.
+dup.getInt()
--- End diff --

This call has side effect, still need to call it to read 4 bytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6224][SQL] Also collect NamedExpression...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4949#issuecomment-77853448
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28390/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6195] [SQL] Adds in-memory column type ...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4938#issuecomment-77860216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28391/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6195] [SQL] Adds in-memory column type ...

2015-03-09 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4938#discussion_r26034375
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala ---
@@ -107,24 +110,28 @@ private[sql] class GenericColumnAccessor(buffer: 
ByteBuffer)
   with NullableColumnAccessor
 
 private[sql] object ColumnAccessor {
-  def apply(buffer: ByteBuffer): ColumnAccessor = {
+  def apply(dataType: DataType, buffer: ByteBuffer): ColumnAccessor = {
 val dup = buffer.duplicate().order(ByteOrder.nativeOrder)
-// The first 4 bytes in the buffer indicate the column type.
-val columnTypeId = dup.getInt()
-
-columnTypeId match {
-  case INT.typeId   = new IntColumnAccessor(dup)
-  case LONG.typeId  = new LongColumnAccessor(dup)
-  case FLOAT.typeId = new FloatColumnAccessor(dup)
-  case DOUBLE.typeId= new DoubleColumnAccessor(dup)
-  case BOOLEAN.typeId   = new BooleanColumnAccessor(dup)
-  case BYTE.typeId  = new ByteColumnAccessor(dup)
-  case SHORT.typeId = new ShortColumnAccessor(dup)
-  case STRING.typeId= new StringColumnAccessor(dup)
-  case DATE.typeId  = new DateColumnAccessor(dup)
-  case TIMESTAMP.typeId = new TimestampColumnAccessor(dup)
-  case BINARY.typeId= new BinaryColumnAccessor(dup)
-  case GENERIC.typeId   = new GenericColumnAccessor(dup)
+
+// The first 4 bytes in the buffer indicate the column type.  This 
field is not used now,
+// because we always know the data type of the column ahead of time.
+dup.getInt()
--- End diff --

However, we can remove this line after removing the whole column type ID 
stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread yinxusen
GitHub user yinxusen opened a pull request:

https://github.com/apache/spark/pull/4951

[SPARK-5986][MLLib] Add save/load for k-means

This PR adds save/load for K-means as described in SPARK-5986. Python 
version will be added in another PR.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yinxusen/spark SPARK-5986

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4951


commit dce70553cb0e5c25d1bb0a415929eb5066af964a
Author: Xusen Yin yinxu...@gmail.com
Date:   2015-03-09T14:12:59Z

add save/load for k-means for SPARK-5986




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread levkhomich
Github user levkhomich commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26041766
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

Sure, you can check example of stack trace 
[here](http://pastebin.com/VSb2gisk).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77874158
  
  [Test build #28397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28397/consoleFull)
 for   PR 4947 at commit 
[`0f7a947`](https://github.com/apache/spark/commit/0f7a947ac9de8ef66511b78822809aa414cf3ea7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6201] [SQL] promote string and do widen...

2015-03-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4945#discussion_r26044965
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -220,6 +220,22 @@ trait HiveTypeCoercion {
 b.makeCopy(Array(newLeft, newRight))
   }.getOrElse(b)  // If there is no applicable conversion, leave 
expression unchanged.
   }
+
+  // Also widen types for InExpressions.
+  case q: LogicalPlan = q transformExpressions {
+// Skip nodes who's children have not been resolved yet.
+case e if !e.childrenResolved = e
+
+case i @ In(a, b) if b.exists(_.dataType != a.dataType) =
+  b.map(_.dataType).foldLeft(None: Option[DataType])((r, c) = r 
match {
+case None = Some(c)
+case Some(dt) = findTightestCommonType(dt, c)
+  }) match {
+// If there is no applicable conversion, leave expression 
unchanged.
+case None = i.makeCopy(Array(a, b))
--- End diff --

Leave it as `i` instead of the `i.makeCopy(..)`? Or throwing exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread levkhomich
Github user levkhomich commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26038649
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

Original exception is preserved as `cause`, so it is printed anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77861968
  
  [Test build #28394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28394/consoleFull)
 for   PR 4951 at commit 
[`b144216`](https://github.com/apache/spark/commit/b144216f741776fdfe4c8e95d63650bd46c659d5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4044 [CORE] Thriftserver fails to start ...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4873#issuecomment-77863414
  
Sorry to bug @pwendell again but I think you may also be familiar with this 
script. I went to the extreme and removed the check for Hive jars entirely. 
Datanucleus goes on the classpath if it exists, full stop. This also resolves 
the JAR issue. But is there a reason that's a bad idea? Like, if I didn't build 
with Hive, but Datanucleus is lying around, does that cause a problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6198][SQL] Support select current_data...

2015-03-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4926#discussion_r26041674
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala 
---
@@ -179,7 +179,12 @@ private[hive] case class HiveGenericUdf(funcWrapper: 
HiveFunctionWrapper, childr
 })
   i += 1
 }
-unwrap(function.evaluate(deferedObjects), returnInspector)
+
+if (function.getUdfName().endsWith(UDFCurrentDB)) {
--- End diff --

Can you explain why you think returning a `null` is more reasonable than 
executing the `UDFCurrentDB`?  Seems it will not throws exception anymore in 
Hive 0.14: 
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-exec/0.14.0/org/apache/hadoop/hive/ql/udf/generic/UDFCurrentDB.java/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6201] [SQL] promote string and do widen...

2015-03-09 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4945#discussion_r26044586
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -269,6 +285,14 @@ trait HiveTypeCoercion {
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
   case i @ In(a, b) if a.dataType == TimestampType  
b.forall(_.dataType == DateType) =
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
+  case i @ In(a, b) if a.dataType == StringType
+ b.exists(_.dataType.isInstanceOf[NumericType]) =
+i.makeCopy(Array(Cast(a, DoubleType), b))
+  case i @ In(a, b) if b.exists(_.dataType == StringType)
+ a.dataType.isInstanceOf[NumericType] =
+i.makeCopy(Array(a, b.map(_.dataType match{
+  case StringType = Cast(a, DoubleType)
--- End diff --

Causes unmatched exception?
```scala
  case StringType = Cast(a, DoubleType)
  case x = x
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread levkhomich
Github user levkhomich commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26044200
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

@srowen @viirya I've squashed corresponding change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4734][Streaming]limit the file Dstream ...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3597#issuecomment-77874151
  
Mind closing this PR? I do not think this change is right for Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-77879123
  
  [Test build #28395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28395/consoleFull)
 for   PR 4602 at commit 
[`7fa6e0d`](https://github.com/apache/spark/commit/7fa6e0d3e3cf83072e4dcf37fe24a89bdf0f8da1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Explode(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4044 [CORE] Thriftserver fails to start ...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4873#issuecomment-77880845
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28396/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Build] SPARK-2614: (2nd patch) Create a spark...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1611#issuecomment-77875648
  
Mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4950#issuecomment-77875679
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28392/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/4805#issuecomment-77882744
  
As it stands now, no offsets are stored by spark unless you're
checkpointing.  Does it really make sense to have an option to
automatically store offsets in Kafka, but not store offsets in the
checkpoint?  Failure recovery in that case depends on user provided
starting offsets (or starting at the beginning / end of the log).  If
someone has the sophistication to get offsets from kafka in order to
provide them as a starting point, they probably have the sophistication to
save offsets to kafka themselves in the job.

If offsets are only being sent to Kafka when they are also stored in the
checkpoint, then does sending offsets to kafka in compute() also make
sense?  Yes, you can lag behind, but those offsets are in the queue to get
processed at least once.

I'm not 100% sure on the answer to this, its more a question of desired
behavior, but that's why I brought it up.



On Mon, Mar 9, 2015 at 12:14 AM, Saisai Shao notificati...@github.com
wrote:

 Hi @koeninger https://github.com/koeninger , would you please review
 this again? Thanks a lot and appreciate your time.

 Here I still keep using the HashMap for Time - offset relation mapping,
 since checkpoint data will only be updated when checkpoint is enabled, I
 hope this could also be worked even without checkpoint enabled.

 And I still use StreamingListener to update the offset, the reason is
 mentioned before.

 Besides I updated the configuration name, not sure is it suitable.

 Thanks a lot.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/4805#issuecomment-77801344.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-77862472
  
@pwendell @rxin I'd like to merge this, and while I'm all but sure the API 
change question is OK, I'd feel better if a maintainer could give it a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3188][MLLIB]: Add Robust Regression Alg...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2110#issuecomment-77870526
  
I think this contribution may have timed out, along with 
https://github.com/apache/spark/pull/2096 . They're probably good 
implementations, but I am not clear if this will be taken forward to be part of 
Spark. In any event it doesn't merge and is not necessarily written for the new 
ML pipelines API. Does anyone else have an opinion on whether this should be 
closed out, or needs to be revived?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/4805#discussion_r26048829
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -84,6 +83,11 @@ class DirectKafkaInputDStream[
 
   protected var currentOffsets = fromOffsets
 
+  // Map to manage the time - topic/partition+offset
+  private val offsetMap = new mutable.HashMap[Time, Map[TopicAndPartition, 
Long]]()
+  // Add to the listener bus for job completion hook
+  context.addStreamingListener(new DirectKafkaStreamingListener)
+
   @tailrec
   protected final def latestLeaderOffsets(retries: Int): 
Map[TopicAndPartition, LeaderOffset] = {
--- End diff --

Is there a reason to even add the streaming listener if the configuration 
option isn't turned on?  If the config option isn't on, couldn't you skip the 
listener and skip adding / removing items to the offset map altogether?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6191] [EC2] Generalize ability to downl...

2015-03-09 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/4919#issuecomment-77883455
  
Yeah, if @JoshRosen (who wrote the original `setup_boto()` function) can't 
take a look, maybe @shivaram can give this a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6191] [EC2] Generalize ability to downl...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4919#issuecomment-77869609
  
Obviously I'd like to get another actual active EC2 user to review this, 
but the principle looks fine. this is refactoring the boto-specific mechanism 
to be general and at the moment does not change behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2096#issuecomment-77870482
  
I think this contribution may have timed out, along with 
https://github.com/apache/spark/pull/2110 . They're probably good 
implementations, but I am not clear if this will be taken forward to be part of 
Spark. In any event it doesn't merge and is not necessarily written for the new 
ML pipelines API. Does anyone else have an opinion on whether this should be 
closed out, or needs to be revived?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77878863
  
  [Test build #28394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28394/consoleFull)
 for   PR 4951 at commit 
[`b144216`](https://github.com/apache/spark/commit/b144216f741776fdfe4c8e95d63650bd46c659d5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable 
with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77878875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28394/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4044 [CORE] Thriftserver fails to start ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4873#issuecomment-77880824
  
  [Test build #28396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28396/consoleFull)
 for   PR 4873 at commit 
[`18b53a0`](https://github.com/apache/spark/commit/18b53a01cdaf471580497c81629625173194b62d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4044 [CORE] Thriftserver fails to start ...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4873#issuecomment-77863875
  
  [Test build #28396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28396/consoleFull)
 for   PR 4873 at commit 
[`18b53a0`](https://github.com/apache/spark/commit/18b53a01cdaf471580497c81629625173194b62d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6223][SQL] Fix build warning- enable im...

2015-03-09 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/4948


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26041428
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

The cause stack trace / message would be printed by `printStackTrace`. It 
would not become part of the message from this new `SparkException`. Net-net I 
think it wouldn't hurt to just add additional info to the new `SparkException` 
message if it's deemed useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77874328
  
LGTM. I'll wait a bit longer for more comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4950#issuecomment-77875660
  
  [Test build #28392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28392/consoleFull)
 for   PR 4950 at commit 
[`c67985b`](https://github.com/apache/spark/commit/c67985b01538a8e4ede806ce7e7b23af7a985a65).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-77879139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28395/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-77862925
  
  [Test build #28395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28395/consoleFull)
 for   PR 4602 at commit 
[`7fa6e0d`](https://github.com/apache/spark/commit/7fa6e0d3e3cf83072e4dcf37fe24a89bdf0f8da1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6198][SQL] Support select current_data...

2015-03-09 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4926#issuecomment-77866817
  
`SELECT 1` Seems doesn't work in Hive 0.12, probably introduced since Hive 
0.13. See:https://issues.apache.org/jira/browse/HIVE-4144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4947#discussion_r26041118
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) 
=
+throw new SparkException(Serialization failed: Kryo buffer 
overflow. To avoid this,  +
--- End diff --

But as the Exception's Constructor Detail 
(http://docs.oracle.com/javase/7/docs/api/java/lang/Exception.html#Exception(java.lang.String,%20java.lang.Throwable)
 states, 

 Note that the detail message associated with cause is not automatically 
incorporated in this exception's detail message.

Is it sure that it will be printed?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Build] SPARK-3624: Failed to find Spark assem...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2477#issuecomment-77875745
  
Mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77877303
  
  [Test build #28393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28393/consoleFull)
 for   PR 4951 at commit 
[`dce7055`](https://github.com/apache/spark/commit/dce70553cb0e5c25d1bb0a415929eb5066af964a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable 
with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5986][MLLib] Add save/load for k-means

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4951#issuecomment-77877321
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28393/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/4805#discussion_r26048624
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -118,6 +123,7 @@ class DirectKafkaInputDStream[
   context.sparkContext, kafkaParams, currentOffsets, untilOffsets, 
messageHandler)
 
 currentOffsets = untilOffsets.map(kv = kv._1 - kv._2.offset)
+offsetMap += ((validTime, currentOffsets))
--- End diff --

Don't all mutations of the offsetMap need to be synchronized?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77892155
  
  [Test build #28397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28397/consoleFull)
 for   PR 4947 at commit 
[`0f7a947`](https://github.com/apache/spark/commit/0f7a947ac9de8ef66511b78822809aa414cf3ea7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BinaryClassificationMetrics(JavaModelWrapper):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method evaluat...

2015-03-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4906#discussion_r26056478
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -69,6 +74,42 @@ class GradientBoostedTrees(private val boostingStrategy: 
BoostingStrategy)
   case _ =
 throw new IllegalArgumentException(s$algo is not supported by the 
gradient boosting.)
 }
+baseLearners = fitGradientBoostingModel.trees
+baseLearnerWeights = fitGradientBoostingModel.treeWeights
+fitGradientBoostingModel
+  }
+
+  /**
+   * Method to compute error or loss for every iteration of gradient 
boosting.
+   * @param data: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @param loss: evaluation metric that defaults to boostingStrategy.loss
+   * @return an array with index i having the losses or errors for the 
ensemble
+   * containing trees 1 to i + 1
+   */
+  def evaluateEachIteration(
--- End diff --

This method should be implemented in the model, not in the estimator.  
There's no need to make a duplicate of the model in the estimator class.  (We 
try to keep estimator classes stateless except for parameter values so that 
they remain lightweight types.)

This change will require a bit of refactoring, so I'll hold off on more 
comments until then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r26061359
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/StatusJsonHandler.scala ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status
+
+import javax.servlet.http.{HttpServletResponse, HttpServlet, 
HttpServletRequest}
+
+import com.fasterxml.jackson.annotation.JsonInclude
+import com.fasterxml.jackson.databind.{SerializationFeature, ObjectMapper}
+import org.apache.spark.status.api.ApplicationInfo
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.ui.exec.ExecutorsJsonRoute
+import org.apache.spark.ui.jobs.{AllJobsJsonRoute, OneStageJsonRoute, 
AllStagesJsonRoute}
+import org.apache.spark.ui.storage.{AllRDDJsonRoute, RDDJsonRoute}
+import org.eclipse.jetty.servlet.{ServletHolder, ServletContextHandler}
+
+import scala.util.matching.Regex
+
+import org.apache.spark.{Logging, SecurityManager}
+import org.apache.spark.deploy.history.{OneApplicationJsonRoute, 
AllApplicationsJsonRoute}
+
+
+/**
+ * get the response for one endpoint in the json status api.
+ *
+ * Implementations only need to return the objects that are to be 
converted to json -- the framework
+ * will convert to json via jackson
+ */
+private[spark] trait StatusJsonRoute[T] {
+  def renderJson(request: HttpServletRequest): T
+}
+
+private[spark] class JsonRequestHandler(uiRoot: UIRoot, securityManager: 
SecurityManager) extends Logging {
+  def route(req: HttpServletRequest) : Option[StatusJsonRoute[_]] = {
+specs.collectFirst { case (pattern, route) if 
pattern.pattern.matcher(req.getPathInfo()).matches() =
+  route
+}
+  }
+
+  private val noSlash = [^/]
+
+  private val specs: IndexedSeq[(Regex, StatusJsonRoute[_])] = IndexedSeq(
+/applications/?.r - new AllApplicationsJsonRoute(uiRoot),
+s/applications/$noSlash+/?.r - new OneApplicationJsonRoute(uiRoot),
+s/applications/$noSlash+/jobs/?.r - new AllJobsJsonRoute(this),
+s/applications/$noSlash+/executors/?.r - new 
ExecutorsJsonRoute(this),
+s/applications/$noSlash+/stages/?.r - new AllStagesJsonRoute(this),
+s/applications/$noSlash+/stages/$noSlash+/?.r - new 
OneStageJsonRoute(this),
+s/applications/$noSlash+/storage/rdd/?.r - new 
AllRDDJsonRoute(this),
+s/applications/$noSlash+/storage/rdd/$noSlash+/?.r - new 
RDDJsonRoute(this)
+  )
+
+  private val jsonMapper = {
+val t = new ObjectMapper()
+t.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule)
+t.enable(SerializationFeature.INDENT_OUTPUT)
+t.setSerializationInclusion(JsonInclude.Include.NON_NULL)
+t
+  }
+
+  val jsonContextHandler = {
+
+//TODO throw out all the JettyUtils stuff, so I can set the response 
status code, etc.
+val servlet = new HttpServlet {
+  override def doGet(request: HttpServletRequest, response: 
HttpServletResponse) {
+if (securityManager.checkUIViewPermissions(request.getRemoteUser)) 
{
+  response.setContentType(text/json;charset=utf-8)
+  route(request) match {
+case Some(jsonRoute) =
+  response.setHeader(Cache-Control, no-cache, no-store, 
must-revalidate)
+  try {
+val responseObj = jsonRoute.renderJson(request)
+val result = jsonMapper.writeValueAsString(responseObj)
+response.setStatus(HttpServletResponse.SC_OK)
+response.getWriter.println(result)
+  } catch {
+case iae: IllegalArgumentException =
+  response.setStatus(HttpServletResponse.SC_BAD_REQUEST)
+  response.getOutputStream.print(iae.getMessage())
+  }
+case None =
+  println(no match for path:  + 

[GitHub] spark pull request: [Build] SPARK-2614: (2nd patch) Create a spark...

2015-03-09 Thread tzolov
Github user tzolov closed the pull request at:

https://github.com/apache/spark/pull/1611


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6201] [SQL] promote string and do widen...

2015-03-09 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4945#discussion_r26054740
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -269,6 +285,14 @@ trait HiveTypeCoercion {
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
   case i @ In(a, b) if a.dataType == TimestampType  
b.forall(_.dataType == DateType) =
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
+  case i @ In(a, b) if a.dataType == StringType
+ b.exists(_.dataType.isInstanceOf[NumericType]) =
+i.makeCopy(Array(Cast(a, DoubleType), b))
+  case i @ In(a, b) if b.exists(_.dataType == StringType)
+ a.dataType.isInstanceOf[NumericType] =
+i.makeCopy(Array(a, b.map(_.dataType match{
+  case StringType = Cast(a, DoubleType)
--- End diff --

Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77892679
  
Is this not needed or `serializeStream` as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5682] Reuse hadoop encrypted shuffle al...

2015-03-09 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4491#issuecomment-77904109
  
Hi @kellyzly ,

Renaming the PR sounds fine. But I see that the PR still has the old code. 
Are you planning on having the updated code up here soon? Otherwise, as @srowen 
suggests, we should close this, and you can open a new PR when you've addressed 
the issues with the current approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Build] SPARK-3624: Failed to find Spark assem...

2015-03-09 Thread tzolov
Github user tzolov commented on the pull request:

https://github.com/apache/spark/pull/2477#issuecomment-77895022
  
i'm closing this PR as this functionality is deprecated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Build] SPARK-2614: (2nd patch) Create a spark...

2015-03-09 Thread tzolov
Github user tzolov commented on the pull request:

https://github.com/apache/spark/pull/1611#issuecomment-77895099
  
i'm closing this PR as this functionality is deprecated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-03-09 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4634#discussion_r26056981
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 
---
@@ -233,18 +235,44 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
   def combineByKey[C](createCombiner: JFunction[V, C],
 mergeValue: JFunction2[C, V, C],
 mergeCombiners: JFunction2[C, C, C],
-partitioner: Partitioner): JavaPairRDD[K, C] = {
+partitioner: Partitioner,
+mapSideCombine: Boolean,
+serializer: Serializer): JavaPairRDD[K, C] = {
--- End diff --

looks ok. it would be better to add serializer to the doc if possible.

also style wise, can you indent 4 spaces for the function parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6186] [EC2] Make Tachyon version config...

2015-03-09 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/4901#discussion_r26060596
  
--- Diff: ec2/spark_ec2.py ---
@@ -872,9 +886,13 @@ def deploy_files(conn, root_dir, opts, master_nodes, 
slave_nodes, modules):
 if . in opts.spark_version:
 # Pre-built Spark deploy
 spark_v = get_validate_spark_version(opts.spark_version, 
opts.spark_git_repo)
+tachyon_v = get_tachyon_version(spark_v)
 else:
 # Spark-only custom deploy
 spark_v = %s|%s % (opts.spark_git_repo, opts.spark_version)
+tachyon_v = 
+print Deploy spark via git hash, Tachyon won't be set up
--- End diff --

`Deploy spark` - `Deploying Spark`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6186] [EC2] Make Tachyon version config...

2015-03-09 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4901#issuecomment-77911329
  
Thanks @uronce-cc - Change looks good to me but for the minor comment 
inline.
@nchammas -- Any other comments ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-03-09 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4634#discussion_r26056992
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 
---
@@ -233,18 +235,44 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
   def combineByKey[C](createCombiner: JFunction[V, C],
 mergeValue: JFunction2[C, V, C],
 mergeCombiners: JFunction2[C, C, C],
-partitioner: Partitioner): JavaPairRDD[K, C] = {
+partitioner: Partitioner,
+mapSideCombine: Boolean,
+serializer: Serializer): JavaPairRDD[K, C] = {
 implicit val ctag: ClassTag[C] = fakeClassTag
 fromRDD(rdd.combineByKey(
   createCombiner,
   mergeValue,
   mergeCombiners,
-  partitioner
+  partitioner,
+  mapSideCombine,
+  serializer
 ))
   }
 
   /**
-   * Simplified version of combineByKey that hash-partitions the output 
RDD.
+   * Generic function to combine the elements for each key using a custom 
set of aggregation
+   * functions. Turns a JavaPairRDD[(K, V)] into a result of type 
JavaPairRDD[(K, C)], for a
+   * combined type C * Note that V and C can be different -- for 
example, one might group an
+   * RDD of type (Int, Int) into an RDD of type (Int, List[Int]). Users 
provide three
+   * functions:
+   *
+   * - `createCombiner`, which turns a V into a C (e.g., creates a 
one-element list)
+   * - `mergeValue`, to merge a V into a C (e.g., adds it to the end of a 
list)
+   * - `mergeCombiners`, to combine two C's into a single one.
+   *
+   * In addition, users can control the partitioning of the output RDD. 
This method automatically
+   * uses map-side aggregation in shuffling the RDD.
+   */
+  def combineByKey[C](createCombiner: JFunction[V, C],
+mergeValue: JFunction2[C, V, C],
--- End diff --

4 space indent here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-03-09 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4142#issuecomment-77899873
  
Ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-03-09 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-77903115
  
Serializer seems ok to add.

One thing I am not sure about is the mapSideCombine thing -- I'm never a 
fan of that parameter even though I added it myself, for the following reasons:

1. mapSideCombine is a MR term used in Hive that doesn't mean much outside 
of MR. A more proper name is partialAggregation.
2. The underlying implementation should be able to avoid partial 
aggregation if it finds that partial aggregation is expensive (i.e. after 
trying 1 records, check whether the hash table size is less than a specific 
threshold). It is one of the things we can easily auto tune. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r26061003
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/StatusJsonHandler.scala ---
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status
+
+import javax.servlet.http.{HttpServletResponse, HttpServlet, 
HttpServletRequest}
+
+import com.fasterxml.jackson.annotation.JsonInclude
+import com.fasterxml.jackson.databind.{SerializationFeature, ObjectMapper}
+import org.apache.spark.status.api.ApplicationInfo
+import org.apache.spark.ui.SparkUI
+import org.apache.spark.ui.exec.ExecutorsJsonRoute
+import org.apache.spark.ui.jobs.{AllJobsJsonRoute, OneStageJsonRoute, 
AllStagesJsonRoute}
+import org.apache.spark.ui.storage.{AllRDDJsonRoute, RDDJsonRoute}
+import org.eclipse.jetty.servlet.{ServletHolder, ServletContextHandler}
+
+import scala.util.matching.Regex
+
+import org.apache.spark.{Logging, SecurityManager}
+import org.apache.spark.deploy.history.{OneApplicationJsonRoute, 
AllApplicationsJsonRoute}
+
+
+/**
+ * get the response for one endpoint in the json status api.
+ *
+ * Implementations only need to return the objects that are to be 
converted to json -- the framework
+ * will convert to json via jackson
+ */
+private[spark] trait StatusJsonRoute[T] {
+  def renderJson(request: HttpServletRequest): T
+}
+
+private[spark] class JsonRequestHandler(uiRoot: UIRoot, securityManager: 
SecurityManager) extends Logging {
+  def route(req: HttpServletRequest) : Option[StatusJsonRoute[_]] = {
+specs.collectFirst { case (pattern, route) if 
pattern.pattern.matcher(req.getPathInfo()).matches() =
+  route
+}
+  }
+
+  private val noSlash = [^/]
+
+  private val specs: IndexedSeq[(Regex, StatusJsonRoute[_])] = IndexedSeq(
+/applications/?.r - new AllApplicationsJsonRoute(uiRoot),
+s/applications/$noSlash+/?.r - new OneApplicationJsonRoute(uiRoot),
+s/applications/$noSlash+/jobs/?.r - new AllJobsJsonRoute(this),
+s/applications/$noSlash+/executors/?.r - new 
ExecutorsJsonRoute(this),
+s/applications/$noSlash+/stages/?.r - new AllStagesJsonRoute(this),
+s/applications/$noSlash+/stages/$noSlash+/?.r - new 
OneStageJsonRoute(this),
+s/applications/$noSlash+/storage/rdd/?.r - new 
AllRDDJsonRoute(this),
+s/applications/$noSlash+/storage/rdd/$noSlash+/?.r - new 
RDDJsonRoute(this)
+  )
+
+  private val jsonMapper = {
+val t = new ObjectMapper()
+t.registerModule(com.fasterxml.jackson.module.scala.DefaultScalaModule)
+t.enable(SerializationFeature.INDENT_OUTPUT)
+t.setSerializationInclusion(JsonInclude.Include.NON_NULL)
+t
+  }
+
+  val jsonContextHandler = {
+
+//TODO throw out all the JettyUtils stuff, so I can set the response 
status code, etc.
+val servlet = new HttpServlet {
+  override def doGet(request: HttpServletRequest, response: 
HttpServletResponse) {
+if (securityManager.checkUIViewPermissions(request.getRemoteUser)) 
{
+  response.setContentType(text/json;charset=utf-8)
+  route(request) match {
+case Some(jsonRoute) =
+  response.setHeader(Cache-Control, no-cache, no-store, 
must-revalidate)
+  try {
+val responseObj = jsonRoute.renderJson(request)
+val result = jsonMapper.writeValueAsString(responseObj)
+response.setStatus(HttpServletResponse.SC_OK)
+response.getWriter.println(result)
+  } catch {
+case iae: IllegalArgumentException =
+  response.setStatus(HttpServletResponse.SC_BAD_REQUEST)
+  response.getOutputStream.print(iae.getMessage())
+  }
+case None =
+  println(no match for path:  + 

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r26061127
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/ApplicationInfo.scala ---
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.api
+
+case class ApplicationInfo(
--- End diff --

I agree; I think a single file would be clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Build] SPARK-3624: Failed to find Spark assem...

2015-03-09 Thread tzolov
Github user tzolov closed the pull request at:

https://github.com/apache/spark/pull/2477


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6201] [SQL] promote string and do widen...

2015-03-09 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4945#discussion_r26054354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -269,6 +285,14 @@ trait HiveTypeCoercion {
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
   case i @ In(a, b) if a.dataType == TimestampType  
b.forall(_.dataType == DateType) =
 i.makeCopy(Array(Cast(a, StringType), b.map(Cast(_, StringType
+  case i @ In(a, b) if a.dataType == StringType
+ b.exists(_.dataType.isInstanceOf[NumericType]) =
+i.makeCopy(Array(Cast(a, DoubleType), b))
--- End diff --

As I've commented on the JIRA ticket, this is not the behavior of Hive. 
Hive actually converts the numerics in the constant set into strings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6025] [MLlib] Add helper method evaluat...

2015-03-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4906#discussion_r26056473
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala ---
@@ -69,6 +74,42 @@ class GradientBoostedTrees(private val boostingStrategy: 
BoostingStrategy)
   case _ =
 throw new IllegalArgumentException(s$algo is not supported by the 
gradient boosting.)
 }
+baseLearners = fitGradientBoostingModel.trees
+baseLearnerWeights = fitGradientBoostingModel.treeWeights
+fitGradientBoostingModel
+  }
+
+  /**
+   * Method to compute error or loss for every iteration of gradient 
boosting.
+   * @param data: RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @param loss: evaluation metric that defaults to boostingStrategy.loss
+   * @return an array with index i having the losses or errors for the 
ensemble
+   * containing trees 1 to i + 1
+   */
+  def evaluateEachIteration(
+  data: RDD[LabeledPoint],
+  loss: Loss = boostingStrategy.loss) : Array[Double] = {
+
+val algo = boostingStrategy.treeStrategy.algo
+val remappedData = algo match {
+  case Classification = data.map(x = new LabeledPoint((x.label * 2) 
- 1, x.features))
+  case _ = data
+}
+val initialTree = baseLearners(0)
+val evaluationArray = Array.fill(numIterations)(0.0)
+
+// Initial weight is 1.0
+var predictionRDD = remappedData.map(i = 
initialTree.predict(i.features))
+evaluationArray(0) = loss.computeError(remappedData, predictionRDD)
+
+(1 until numIterations).map {nTree =
--- End diff --

This does numIterations maps, broadcasting the model numIterations times.  
I'd recommend using a broadcast variable for the model to make sure it's only 
sent once.

You could keep the current approach pretty much as-is, but it does 
numIterations actions, so it's a bit inefficient.  You could optimize it by 
using only 1 map, but that would require modifying the computeError method as 
follows:
* computeError could be overloaded to take (prediction: Double, datum: 
LabeledPoint).  This could replace the computeError method you implemented.
* Here, in evaluateEachIteration, you could call predictionRDD.map, and 
within the map, for each data point, you could evaluate each tree on the data 
point, compute the prediction from each iteration via a cumulative sum, and 
then calling computeError on each prediction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6191] [EC2] Generalize ability to downl...

2015-03-09 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4919#issuecomment-77905979
  
This seems fine to me.  I guess the alternatives would be 

1. storing the libraries in our source tree, which is a bad option for 
several reasons, including licensing, file size, upgradability, etc.
2. requiring the users to install the libraries themselves using a `pip` 
requirements file, but that adds another dependency on pip

I think that this is fine for now.  As part of our binary release packaging 
scripts, we could download and include these archives so that only users who 
build from source will need to perform these downloads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-03-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r26060515
  
--- Diff: 
core/src/test/scala/org/apache/spark/status/JsonRequestHandlerTest.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.status
+
+import org.apache.spark.JobExecutionStatus
+import org.apache.spark.status.api.StageStatus
+import org.scalatest.{Matchers, FunSuite}
+
+class JsonRequestHandlerTest extends FunSuite with Matchers {
--- End diff --

This should be named `JsonRequestHandlerSuite` to be consistent with the 
`*Suite` naming convention that we use for our tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6087][CORE] Provide actionable exceptio...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4947#issuecomment-77892167
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28397/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6228] [network] Move SASL classes from ...

2015-03-09 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4953#issuecomment-77946908
  
It's strictly a code move, from child to parent module. Although I've never 
been that familiar with this code, I understand the motivation, to use it from 
the other child module, which seems sound. I'll let it stay open for a day or 
two in case there are other thoughts. If not I think this can merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6225 [CORE] [SQL] [STREAMING] Resolve mo...

2015-03-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4950#discussion_r26078523
  
--- Diff: 
external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaRDDSuite.java
 ---
@@ -19,23 +19,19 @@
 
 import java.io.Serializable;
 import java.util.HashMap;
-import java.util.HashSet;
-import java.util.Arrays;
-
-import org.apache.spark.SparkConf;
 
 import scala.Tuple2;
 
-import junit.framework.Assert;
-
 import kafka.common.TopicAndPartition;
 import kafka.message.MessageAndMetadata;
 import kafka.serializer.StringDecoder;
 
+import org.apache.spark.SparkConf;
 import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 import org.apache.spark.api.java.function.Function;
 
+import org.junit.Assert;
--- End diff --

Organize imports as long as you're at it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079948
  
--- Diff: bin/spark-sql ---
@@ -43,15 +46,12 @@ function usage {
   echo
   echo CLI options:
   $FWDIR/bin/spark-class $CLASS --help 21 | grep -v $pattern 12
+  exit $2
--- End diff --

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079926
  
--- Diff: bin/spark-sql ---
@@ -25,12 +25,15 @@ set -o posix
 
 # NOTE: This exact class name is matched downstream by SparkSubmit.
 # Any changes need to be reflected there.
-CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
+export CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
 
 # Figure out where Spark is installed
-FWDIR=$(cd `dirname $0`/..; pwd)
+export FWDIR=$(cd `dirname $0`/..; pwd)
 
 function usage {
+  if [ -n $1 ]; then
+echo $1
--- End diff --

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079987
  
--- Diff: bin/spark-submit ---
@@ -17,58 +17,18 @@
 # limitations under the License.
 #
 
-# NOTE: Any changes in this file must be reflected in 
SparkSubmitDriverBootstrapper.scala!
-
-export SPARK_HOME=$(cd `dirname $0`/..; pwd)
-ORIG_ARGS=($@)
-
-# Set COLUMNS for progress bar
-export COLUMNS=`tput cols`
-
-while (($#)); do
-  if [ $1 = --deploy-mode ]; then
-SPARK_SUBMIT_DEPLOY_MODE=$2
-  elif [ $1 = --properties-file ]; then
-SPARK_SUBMIT_PROPERTIES_FILE=$2
-  elif [ $1 = --driver-memory ]; then
-export SPARK_SUBMIT_DRIVER_MEMORY=$2
-  elif [ $1 = --driver-library-path ]; then
-export SPARK_SUBMIT_LIBRARY_PATH=$2
-  elif [ $1 = --driver-class-path ]; then
-export SPARK_SUBMIT_CLASSPATH=$2
-  elif [ $1 = --driver-java-options ]; then
-export SPARK_SUBMIT_OPTS=$2
-  elif [ $1 = --master ]; then
-export MASTER=$2
-  fi
-  shift
-done
-
-if [ -z $SPARK_CONF_DIR ]; then
-  export SPARK_CONF_DIR=$SPARK_HOME/conf
-fi
-DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
-if [ $MASTER == yarn-cluster ]; then
-  SPARK_SUBMIT_DEPLOY_MODE=cluster
+SPARK_HOME=$(cd `dirname $0`/..; pwd)
+
+# Only define a usage function if an upstream script hasn't done so.
+if ! type -t usage /dev/null 21; then
+  usage() {
+if [ -n $1 ]; then
+  echo $1
+fi
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit 
--help
+exit $2
--- End diff --

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079978
  
--- Diff: bin/spark-submit ---
@@ -17,58 +17,18 @@
 # limitations under the License.
 #
 
-# NOTE: Any changes in this file must be reflected in 
SparkSubmitDriverBootstrapper.scala!
-
-export SPARK_HOME=$(cd `dirname $0`/..; pwd)
-ORIG_ARGS=($@)
-
-# Set COLUMNS for progress bar
-export COLUMNS=`tput cols`
-
-while (($#)); do
-  if [ $1 = --deploy-mode ]; then
-SPARK_SUBMIT_DEPLOY_MODE=$2
-  elif [ $1 = --properties-file ]; then
-SPARK_SUBMIT_PROPERTIES_FILE=$2
-  elif [ $1 = --driver-memory ]; then
-export SPARK_SUBMIT_DRIVER_MEMORY=$2
-  elif [ $1 = --driver-library-path ]; then
-export SPARK_SUBMIT_LIBRARY_PATH=$2
-  elif [ $1 = --driver-class-path ]; then
-export SPARK_SUBMIT_CLASSPATH=$2
-  elif [ $1 = --driver-java-options ]; then
-export SPARK_SUBMIT_OPTS=$2
-  elif [ $1 = --master ]; then
-export MASTER=$2
-  fi
-  shift
-done
-
-if [ -z $SPARK_CONF_DIR ]; then
-  export SPARK_CONF_DIR=$SPARK_HOME/conf
-fi
-DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
-if [ $MASTER == yarn-cluster ]; then
-  SPARK_SUBMIT_DEPLOY_MODE=cluster
+SPARK_HOME=$(cd `dirname $0`/..; pwd)
+
+# Only define a usage function if an upstream script hasn't done so.
+if ! type -t usage /dev/null 21; then
+  usage() {
+if [ -n $1 ]; then
+  echo $1
--- End diff --

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >