[GitHub] spark pull request: [SPARK-5097][SQL] Address DataFrame code revie...

2015-01-28 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4241

[SPARK-5097][SQL] Address DataFrame code review feedback.

Also removed the literal implicit transformation since it is pretty scary 
for API design. Instead, created a new lit method for creating literals.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark df-docupdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4241


commit 22acd4c04dfb3859a7ced2056243b77fe2e3ad95
Author: Reynold Xin 
Date:   2015-01-28T07:59:36Z

[SPARK-5097][SQL] Address DataFrame code review feedback.

Also removed the literal implicit transformation since it is pretty scary 
for API design. Instead, created a new lit method for creating literals.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71795022
  
  [Test build #26215 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26215/consoleFull)
 for   PR 3732 at commit 
[`a2fdd4e`](https://github.com/apache/spark/commit/a2fdd4e1dca7067ae3fc7a76efde50851980fece).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5097][SQL] Address DataFrame code revie...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4241#issuecomment-71795018
  
  [Test build #26214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26214/consoleFull)
 for   PR 4241 at commit 
[`04dd442`](https://github.com/apache/spark/commit/04dd44290505922964687a295aab8955aedff10f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71795107
  
  [Test build #26213 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26213/consoleFull)
 for   PR 3732 at commit 
[`c37832b`](https://github.com/apache/spark/commit/c37832bc3a48493639b7a74d3277c11349942526).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71795113
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26213/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3951#discussion_r23671034
  
--- Diff: python/pyspark/mllib/tests.py ---
@@ -179,10 +179,27 @@ def test_classification(self):
 self.assertTrue(dt_model.predict(features[2]) <= 0)
 self.assertTrue(dt_model.predict(features[3]) > 0)
 
+rf_model = \
--- End diff --

Similarly, this should be

~~~
rf_model = RandomForest.trainClassifier(
rdd, numClasses=2, 
categoricalFeaturesInfo=categoricalFeaturesInfo, numTrees=100)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3951#discussion_r23671027
  
--- Diff: examples/src/main/python/mllib/gradient_boosted_trees.py ---
@@ -0,0 +1,82 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Gradient boosted Trees classification and regression using MLlib.
+"""
+
+import sys
+
+from pyspark.context import SparkContext
+from pyspark.mllib.tree import GradientBoostedTrees
+from pyspark.mllib.util import MLUtils
+
+
+def testClassification(trainingData, testData):
+# Train a GradientBoostedTrees model.
+#  Empty categoricalFeaturesInfo indicates all features are continuous.
+model = GradientBoostedTrees.trainClassifier(trainingData,
+ 
categoricalFeaturesInfo={},
+ numIterations=30,
+ maxDepth=4)
--- End diff --

For the code style, we don't chop down arguments in method calls. For 
example: 
https://github.com/apache/spark/blob/master/python/pyspark/mllib/tree.py#L137

So this should be 

~~~
model = GradientBoostedTrees.trainClassifier(trainingData, 
categoricalFeaturesInfo={},
 numIterations=30, maxDepth=4)
~~~

or 

~~~
model = GradientBoostedTrees.trainClassifier(
 trainingData, categoricalFeaturesInfo={}, numIterations=30, 
maxDepth=4)
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3951#discussion_r23671080
  
--- Diff: python/pyspark/mllib/tree.py ---
@@ -383,6 +381,132 @@ def trainRegressor(cls, data, 
categoricalFeaturesInfo, numTrees, featureSubsetSt
   featureSubsetStrategy, impurity, maxDepth, 
maxBins, seed)
 
 
+class GradientBoostedTreesModel(TreeEnsembleModel):
+"""
+.. note:: Experimental
+
+Represents a gradient-boosted tree model.
+"""
+
+
+class GradientBoostedTrees(object):
--- End diff --

need doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71795859
  
@yanbohappy thanks for the discussion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71796119
  
  [Test build #26215 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26215/consoleFull)
 for   PR 3732 at commit 
[`a2fdd4e`](https://github.com/apache/spark/commit/a2fdd4e1dca7067ae3fc7a76efde50851980fece).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71796124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26215/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4284] BinaryClassificationMetrics preci...

2015-01-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3933#issuecomment-71796333
  
@Lewuathe Do you mind closing this PR? If you want to update the official 
document, we can do that in another PR. Thanks for the discussion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71796516
  
  [Test build #26216 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26216/consoleFull)
 for   PR 3732 at commit 
[`0ed0fdc`](https://github.com/apache/spark/commit/0ed0fdc13ec043e16058128011428445a62c7581).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2015-01-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3382#issuecomment-71796637
  
@fjiang6 It seems that we don't really have a scalable solution at this 
time. Do you mind closing this PR? It would be great if you can maintain this 
k-medoids implementation as a package. For example: 
http://spark-packages.org/package/26


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71796968
  
  [Test build #26217 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26217/consoleFull)
 for   PR 4216 at commit 
[`5879286`](https://github.com/apache/spark/commit/5879286885326fed6c77ab381fa3130b1146ec00).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-28 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-71797016
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5262] [SPARK-5244] [SQL] add coalesce i...

2015-01-28 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/4057#issuecomment-71797084
  
ping @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71797429
  
@vanzin I understand your concern. Although this patch adds many lines, its 
scope is actually limited only to standalone cluster mode, and the default 
submit behavior is actually unchanged. The idea is to provide only an 
alternative to the existing submission gateway, not a replacement. Also, it 
should be fairly straightforward to test this, as we only need to test one 
mode. I will be sure to reiterate on review comments promptly so as not to 
potentially delay the release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71797621
  
  [Test build #26210 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26210/consoleFull)
 for   PR 3732 at commit 
[`f0005b1`](https://github.com/apache/spark/commit/f0005b166a705f7b1c52960b72c4ff29d010e5ff).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71797625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26210/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4809] Rework Guava library shading.

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3658


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5154] [PySpark] [Streaming] Kafka strea...

2015-01-28 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3715#discussion_r23672024
  
--- Diff: make-distribution.sh ---
@@ -188,6 +188,7 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 # Copy jars
 cp "$SPARK_HOME"/assembly/target/scala*/*assembly*hadoop*.jar 
"$DISTDIR/lib/"
 cp "$SPARK_HOME"/examples/target/scala*/spark-examples*.jar "$DISTDIR/lib/"
+cp "$SPARK_HOME"/external/kafka/scala*/*kafka*assembly*.jar "$DISTDIR/lib/"
--- End diff --

We will publish the jar to maven central. It's fine IMO in this patch to 
recommend downloading the published jar. And later on we can just suggest 
linking against the maven coordinates. But I'd like to avoid actually including 
this JAR in the binary distribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4238#issuecomment-71798309
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26211/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4238#issuecomment-71798301
  
  [Test build #26211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26211/consoleFull)
 for   PR 4238 at commit 
[`24ed322`](https://github.com/apache/spark/commit/24ed3223f96ec8a2c93fe01f51e846b3e8d92c54).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672436
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
--- End diff --

Since `Compress` is only used in `ChiSqSelector`, shall we move its 
implementation here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672440
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Creates a ChiSquared feature selector.
+ */
+@Experimental
+object ChiSqSelector {
+
+  /**
+   * Returns a ChiSquared feature selector.
+   *
+   * @param data data used to compute the Chi Squared statistic.
+   * @param numTopFeatures number of features that selector will select
+   *   (ordered by statistic value descending)
+   */
+  def fit(data: RDD[LabeledPoint], numTopFeatures: Int): 
ChiSqSelectorModel = {
+val (_, indices) = Statistics.chiSqTest(data).zipWithIndex.sortBy{ 
case(res, index) =>
--- End diff --

It might be easier to read if we use multiple lines:

~~~
val indices = Statistics.chiSqTest(data)
  .zipWithIndex
  .sortBy(-_._1.statistic)
  .take(numTopFeatures)
  .map(_._2)
  .toArray
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672433
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
--- End diff --

It should be okay to remove this line and use `Vector` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672442
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Creates a ChiSquared feature selector.
+ */
+@Experimental
+object ChiSqSelector {
+
+  /**
+   * Returns a ChiSquared feature selector.
+   *
+   * @param data data used to compute the Chi Squared statistic.
+   * @param numTopFeatures number of features that selector will select
+   *   (ordered by statistic value descending)
+   */
+  def fit(data: RDD[LabeledPoint], numTopFeatures: Int): 
ChiSqSelectorModel = {
+val (_, indices) = Statistics.chiSqTest(data).zipWithIndex.sortBy{ 
case(res, index) =>
+  -res.statistic}.take(numTopFeatures).unzip
+new ChiSqSelectorModel(indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Filters features in a given vector
+ */
+@Experimental
+object Compress {
+  /**
+   * Returns a vector with features filtered
+   * @param features vector
+   * @param indexes indexes of features to filter
+   */
+  def apply(features: Vector, indexes: IndexedSeq[Int]): Vector = {
+val (values, _) =
+  features.toArray.zipWithIndex.filter { case (value, index) =>
+indexes.contains(index)}.unzip
--- End diff --

This is not efficient: 1) `toArray` creates a dense array, 2) 
`indexes.contains` is O(n)-time. We can handle sparsity in a separate PR. For 
2), we can do this

~~~
val values = features.toArray
Vector.dense(indices.map(i => values(i)))
~~~
  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672451
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.scalatest.FunSuite
+
+class ChiSqSelectorSuite extends FunSuite with MLlibTestSparkContext {
+
+  lazy val labeledDiscreteData = sc.parallelize(
+Seq( new LabeledPoint(0.0, Vectors.dense(Array(8.0, 7.0, 0.0))),
--- End diff --

Remove space after `(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672435
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
--- End diff --

Maybe it is better to use `Array[Int]`, which is Java compatible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672438
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Creates a ChiSquared feature selector.
+ */
+@Experimental
+object ChiSqSelector {
--- End diff --

Make `ChiSqSelector` a class and `numTopFeatures` its parameter. This 
should be quite similar to, e.g., IDF: 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/IDF.scala#L42


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672450
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.scalatest.FunSuite
+
+class ChiSqSelectorSuite extends FunSuite with MLlibTestSparkContext {
+
+  lazy val labeledDiscreteData = sc.parallelize(
--- End diff --

This may be risky because `sc` is initialized in `beforeAll`. We can either 
move it to the test closure or initialize it in `beforeAll`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672445
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.scalatest.FunSuite
--- End diff --

organize imports into groups: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672452
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.scalatest.FunSuite
+
+class ChiSqSelectorSuite extends FunSuite with MLlibTestSparkContext {
+
+  lazy val labeledDiscreteData = sc.parallelize(
+Seq( new LabeledPoint(0.0, Vectors.dense(Array(8.0, 7.0, 0.0))),
+  new LabeledPoint(1.0, Vectors.dense(Array(0.0, 9.0, 6.0))),
+  new LabeledPoint(1.0, Vectors.dense(Array(0.0, 9.0, 8.0))),
+  new LabeledPoint(2.0, Vectors.dense(Array(8.0, 9.0, 5.0)))
+), 2)
+
+  /*
+   *  Contingency tables
+   *  feature0 = {8.0, 0.0}
+   *  class  0 1 2
+   *8.0||1|0|1|
+   *0.0||0|2|0|
+   *
+   *  feature1 = {7.0, 9.0}
+   *  class  0 1 2
+   *7.0||1|0|0|
+   *9.0||0|2|1|
+   *
+   *  feature2 = {0.0, 6.0, 8.0, 5.0}
+   *  class  0 1 2
+   *0.0||1|0|0|
+   *6.0||0|1|0|
+   *8.0||0|1|0|
+   *5.0||0|0|1|
+   *
+   *  Use chi-squared calculator from Internet
+   */
+
+  test("ChiSqSelector transform test") {
+val preFilteredData =
+  Set( new LabeledPoint(0.0, Vectors.dense(Array(0.0))),
--- End diff --

Remove space after `(`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71799001
  
  [Test build #26218 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26218/consoleFull)
 for   PR 4216 at commit 
[`efa5e18`](https://github.com/apache/spark/commit/efa5e1815d7113881213e1effb50349b2cbc8479).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...

2015-01-28 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23672454
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.scalatest.FunSuite
+
+class ChiSqSelectorSuite extends FunSuite with MLlibTestSparkContext {
+
+  lazy val labeledDiscreteData = sc.parallelize(
+Seq( new LabeledPoint(0.0, Vectors.dense(Array(8.0, 7.0, 0.0))),
+  new LabeledPoint(1.0, Vectors.dense(Array(0.0, 9.0, 6.0))),
+  new LabeledPoint(1.0, Vectors.dense(Array(0.0, 9.0, 8.0))),
+  new LabeledPoint(2.0, Vectors.dense(Array(8.0, 9.0, 5.0)))
+), 2)
+
+  /*
+   *  Contingency tables
+   *  feature0 = {8.0, 0.0}
+   *  class  0 1 2
+   *8.0||1|0|1|
+   *0.0||0|2|0|
+   *
+   *  feature1 = {7.0, 9.0}
+   *  class  0 1 2
+   *7.0||1|0|0|
+   *9.0||0|2|1|
+   *
+   *  feature2 = {0.0, 6.0, 8.0, 5.0}
+   *  class  0 1 2
+   *0.0||1|0|0|
+   *6.0||0|1|0|
+   *8.0||0|1|0|
+   *5.0||0|0|1|
+   *
+   *  Use chi-squared calculator from Internet
+   */
+
+  test("ChiSqSelector transform test") {
+val preFilteredData =
+  Set( new LabeledPoint(0.0, Vectors.dense(Array(0.0))),
+new LabeledPoint(1.0, Vectors.dense(Array(6.0))),
+new LabeledPoint(1.0, Vectors.dense(Array(8.0))),
+new LabeledPoint(2.0, Vectors.dense(Array(5.0)))
+  )
+val model = ChiSqSelector.fit(labeledDiscreteData, 1)
+val filteredData = labeledDiscreteData.map(lp =>
+  new LabeledPoint(lp.label, 
model.transform(lp.features))).collect().toSet
--- End diff --

minor: If we cannot fit into a single line, it is common in Spark to use 
the following style:

~~~
val filteredData = labeledDiscreteData.map { lp =>
  new LabeledPoint(lp.label, model.transform(lp.features))
}.collect().toSet
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5437] Fix DriverSuite and SparkSubmitSu...

2015-01-28 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4230#discussion_r23672455
  
--- Diff: core/src/test/scala/org/apache/spark/DriverSuite.scala ---
@@ -28,31 +28,30 @@ import org.apache.spark.util.Utils
 
 class DriverSuite extends FunSuite with Timeouts {
 
+  // Regression test for SPARK-530: "Spark driver process doesn't exit 
after finishing"
--- End diff --

I just moved this line, but I can move it there too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5447][SQL] Replaced reference to Schema...

2015-01-28 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4242

[SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.

and

[SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark sqlCleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4242


commit 728c017349959a1628035a2a298ac30eb44d466b
Author: Reynold Xin 
Date:   2015-01-28T08:40:53Z

[SPARK-5447][SQL] Replaced reference to SchemaRDD with DataFrame.

Also

[SPARK-5448][SQL] Make CacheManager a concrete class and field in SQLContext




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5447][SQL] Replaced reference to Schema...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4242#issuecomment-71799528
  
  [Test build #26219 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26219/consoleFull)
 for   PR 4242 at commit 
[`6545c42`](https://github.com/apache/spark/commit/6545c42410282d01d53aa3697d6ff432f43cad1c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Implement Describe Table for SQLContext

2015-01-28 Thread OopsOutOfMemory
Github user OopsOutOfMemory commented on the pull request:

https://github.com/apache/spark/pull/4207#issuecomment-71799906
  
@yanbohappy thanks for the discussion and the work efforts : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5444][Network]Add a retry to deal with ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4240#issuecomment-71800459
  
  [Test build #26212 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26212/consoleFull)
 for   PR 4240 at commit 
[`cc926d2`](https://github.com/apache/spark/commit/cc926d2d4f737dd76a9fa593c0f93b183d2ca21f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5444][Network]Add a retry to deal with ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4240#issuecomment-71800465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26212/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71800548
  
  [Test build #26220 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26220/consoleFull)
 for   PR 3951 at commit 
[`56f6c97`](https://github.com/apache/spark/commit/56f6c974b13cdbf8a3935305728ec32f8eea77d6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...

2015-01-28 Thread kul
GitHub user kul opened a pull request:

https://github.com/apache/spark/pull/4243

[SPARK-5426][SQL] Add SparkSQL Java API helper methods.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kul/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4243


commit b51b656f5397d36b7bb5e446e271c1714b662046
Author: kul 
Date:   2015-01-28T08:47:56Z

[SPARK-5426][SQL] Add SparkSQL Java API helper methods.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4243#issuecomment-71801054
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4239#issuecomment-71801048
  
I agree that 5xx means server error, and is not an appropriate response. 
But assuming a value of -1 doesn't seem like a great solution -- how about 
returning `400 Bad Request`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4232#issuecomment-71801249
  
@ScrapCodes See https://issues.apache.org/jira/browse/SPARK-5428 - it is to 
ensure `assembly` is built last.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71802645
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26217/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71802635
  
  [Test build #26217 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26217/consoleFull)
 for   PR 4216 at commit 
[`5879286`](https://github.com/apache/spark/commit/5879286885326fed6c77ab381fa3130b1146ec00).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class MasterStateResponse(`
  * `   *   (4) the main class for the child`
  * `  case class BoundPortsResponse(actorPort: Int, webUIPort: Int, 
stablePort: Option[Int])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5097][SQL] Address DataFrame code revie...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4241#issuecomment-71803672
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26214/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5097][SQL] Address DataFrame code revie...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4241#issuecomment-71803663
  
  [Test build #26214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26214/consoleFull)
 for   PR 4241 at commit 
[`04dd442`](https://github.com/apache/spark/commit/04dd44290505922964687a295aab8955aedff10f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread wangxiaojing
Github user wangxiaojing commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71803946
  
@srowen @andrewor14 This change can affects for users monitoring driver's 
metrics. The property is used for metrics value.
eg: ``,  parser error 
: Unescaped '<' not allowed in attributes values.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71803978
  
  [Test build #26216 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26216/consoleFull)
 for   PR 3732 at commit 
[`0ed0fdc`](https://github.com/apache/spark/commit/0ed0fdc13ec043e16058128011428445a62c7581).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4508] [SQL] build native date type to c...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3732#issuecomment-71803990
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26216/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71805342
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26218/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-71805326
  
  [Test build #26218 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26218/consoleFull)
 for   PR 4216 at commit 
[`efa5e18`](https://github.com/apache/spark/commit/efa5e1815d7113881213e1effb50349b2cbc8479).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class MasterStateResponse(`
  * `   *   (4) the main class for the child`
  * `  case class BoundPortsResponse(actorPort: Int, webUIPort: Int, 
stablePort: Option[Int])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...

2015-01-28 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4232#discussion_r23675195
  
--- Diff: pom.xml ---
@@ -105,6 +104,8 @@
 external/zeromq
 examples
 repl
+
--- End diff --

Can you reference the JIRA number (SPARK-5428) here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71806726
  
@sarutak I still don't get the purpose of this change, and as you say it 
not only potentially broke metrics, but seems to have. Although maybe you can 
fix it forward, is this a good change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5426][SQL] Add SparkSQL Java API helper...

2015-01-28 Thread kul
Github user kul commented on the pull request:

https://github.com/apache/spark/pull/4243#issuecomment-71807950
  
Looking into it further, seems like even in Scala one will have to do with 
`.rdd` for normal spark operations as functions like filter etc are being 
overwritten for the query dsl. So `.toJavaRDD` is consistent with the scala 
counterpart `.rdd`.
But functions which are not being overwitten e.g. `map` will work like 
normal spark operations which is a bit inconsistent though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5450][GraphX] Add APIs to save a graph ...

2015-01-28 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/4244

[SPARK-5450][GraphX] Add APIs to save a graph as a SequenceFile and load it

As the size of input data increases, building Graph eat much processing 
time via
GraphLoader.edgeListFile() or RDD transformations.
APIs to save Graph as a object file is useful for those cases.
The operations are currently slow because of the default
serializer (Java serialization), so efficient serialization
is of future work.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SaveGraphAPISpike

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4244


commit 42abf9334ddae759783495bd8017907c9e6bdfea
Author: Takeshi Yamamuro 
Date:   2015-01-28T09:34:20Z

Add APIs to save a graph as a SequenceFile and load it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5450][GraphX] Add APIs to save a graph ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4244#issuecomment-71808734
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71809380
  
  [Test build #26220 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26220/consoleFull)
 for   PR 3951 at commit 
[`56f6c97`](https://github.com/apache/spark/commit/56f6c974b13cdbf8a3935305728ec32f8eea77d6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TreeEnsembleModel(JavaModelWrapper):`
  * `class DecisionTreeModel(JavaModelWrapper):`
  * `class RandomForestModel(TreeEnsembleModel):`
  * `class GradientBoostedTreesModel(TreeEnsembleModel):`
  * `class GradientBoostedTrees(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-71809389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26220/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5447][SQL] Replaced reference to Schema...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4242#issuecomment-71809558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26219/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5447][SQL] Replaced reference to Schema...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4242#issuecomment-71809543
  
  [Test build #26219 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26219/consoleFull)
 for   PR 4242 at commit 
[`6545c42`](https://github.com/apache/spark/commit/6545c42410282d01d53aa3697d6ff432f43cad1c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UDFRegistration(sqlContext: SQLContext) extends Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3290 [GRAPHX] No unpersist callls in SVD...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4234#issuecomment-71809831
  
@ankurdave Ah of course. Hm, the thing is there is a loop of 
transformations here, so in theory you'd have to save up all the RDD references 
and unpersist at the very end -- but even that may be insufficient as the 
return value isn't even materialized before it's returned. Also, operations 
like `outerJoinVertices` seem to `cache()` vertices internally too. It's a bit 
like the old days of memory management!

What's the right thing here? don't `cache()` in this method at all since 
lower layers do some of that management? force materialization?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5415] bump sbt to version to 0.13.7

2015-01-28 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4211#issuecomment-71810049
  
Sure, let's do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71810090
  
Sorry I didn't notice this change affects for specific metrics sinks.
I still think this change is needed. The identifier is used for metrics 
name so without this change, the metrics name for the driver is different 
between local mode and any other mode. It confuses monitoring.
Also, regardless of this change, the parse issue could occur because the 
identifier `` has been used for `spark.executor.id` in local mode even 
before this change.

I'll fix this issue by renaming the identifier from `` to `driver`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5415] bump sbt to version to 0.13.7

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4211


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5434] [EC2] Preserve spaces in EC2 path

2015-01-28 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4224#discussion_r23677054
  
--- Diff: ec2/spark-ec2 ---
@@ -20,6 +20,6 @@
 
 # Preserve the user's CWD so that relative paths are passed correctly to 
 #+ the underlying Python script.
-SPARK_EC2_DIR="$(dirname $0)"
+SPARK_EC2_DIR="$(dirname "$0")"
--- End diff --

jw - what does it mean to nest quotes inside of other quotes like this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/4245

[HOTFIX] Rename the identifier of driver from `` to `driver`

This change is related to #3812.

The identifier of driver is defined as `` but this identifier is 
used for metrics name and some metrics sinks cannot parse `<` and `>`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark hotfix-driver-identifier

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4245


commit af59bf2199ec3b1a8fc6b6bb4d82427bf08f366a
Author: Kousuke Saruta 
Date:   2015-01-28T09:45:25Z

Rename the identifier of driver from "" to "driver" because the 
identifier is used for metrics name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4284] BinaryClassificationMetrics preci...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3933


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5049: ParquetTableScan always prepends t...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3870


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4660: Use correct class loader in JavaSe...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4114


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2572] Delete the local dir on executor ...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [GRAPHX] Spark 3789 - Python Bindings for Grap...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4205


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4510][MLlib]: Add k-medoids Partitionin...

2015-01-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3382


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71810650
  
OK. Can this be fixed by just escaping the value in the XML output? because 
if it's `` elsewhere then it seems like it would also break the output 
elsewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4245#issuecomment-71810738
  
  [Test build #26221 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26221/consoleFull)
 for   PR 4245 at commit 
[`af59bf2`](https://github.com/apache/spark/commit/af59bf2199ec3b1a8fc6b6bb4d82427bf08f366a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4245#issuecomment-71810748
  
This could provide a usability regression for people since the name now 
appears differently in the UI. Is there any way we can intercept this somewhere 
downstream and not change it here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1825] Make Windows Spark client work fi...

2015-01-28 Thread aniketbhatnagar
Github user aniketbhatnagar commented on the pull request:

https://github.com/apache/spark/pull/3943#issuecomment-71810814
  
@tsudukim Would you have time this week to look at this please? I was 
hoping for this fix to make it in 1.2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...

2015-01-28 Thread tzolov
Github user tzolov commented on a diff in the pull request:

https://github.com/apache/spark/pull/4232#discussion_r23677265
  
--- Diff: pom.xml ---
@@ -105,6 +104,8 @@
 external/zeromq
 examples
 repl
+
--- End diff --

Sure,  I will add the JIRA reference in the comment and will update the 
pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/4245#issuecomment-71811080
  
How about escaping specific characters like `<` or `>` in each sink instead 
of changing the identifier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71811250
  
So, how about escaping the specific characters like `<` or `>` in each sink 
instead of changing the identifier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71811339
  
  [Test build #26222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26222/consoleFull)
 for   PR 4014 at commit 
[`ccb71e3`](https://github.com/apache/spark/commit/ccb71e30b8a65fcea5d0d57865bdf5928ef9a534).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5428]: Declare the 'assembly' module at...

2015-01-28 Thread tzolov
Github user tzolov commented on a diff in the pull request:

https://github.com/apache/spark/pull/4232#discussion_r23677757
  
--- Diff: pom.xml ---
@@ -105,6 +104,8 @@
 external/zeromq
 examples
 repl
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71815425
  
Maybe, we can resolve the parse issue by updating `com.codahale.metrics` 
library from `3.0.0` to `3.1.0`. `3.1.0` sanitizes `<` and `>`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][SPARK-5453] Use property 'mapreduce.inpu...

2015-01-28 Thread saucam
GitHub user saucam opened a pull request:

https://github.com/apache/spark/pull/4246

[SQL][SPARK-5453] Use property 'mapreduce.input.pathFilter.class' to set a 
custom filter class for input files

This PR adds support for using a custom filter class for input files for 
queries. We can re-use the existing property in hive-site.xml for this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/saucam/spark hive_site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4246


commit 53e86c88890932f40502ab1c81647e321ba8
Author: Yash Datta 
Date:   2015-01-28T10:43:21Z

SPARK-5453: Use property 'mapreduce.input.pathFilter.class' to set a custom 
filter class for input files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-28 Thread catap
Github user catap commented on the pull request:

https://github.com/apache/spark/pull/4239#issuecomment-71815969
  
Ok, the patch was updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71816219
  
Already proposed at https://issues.apache.org/jira/browse/SPARK-5413. 
Perhaps we can close this issue and PRs then if that's the way forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][SPARK-5453] Use property 'mapreduce.inpu...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4246#issuecomment-71816217
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4239#discussion_r23679494
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -32,7 +32,12 @@ private[ui] class JobPage(parent: JobsTab) extends 
WebUIPage("job") {
 
   def render(request: HttpServletRequest): Seq[Node] = {
 listener.synchronized {
-  val jobId = request.getParameter("id").toInt
+  val paramaterId = request.getParameter("id")
--- End diff --

Nit: paramater -> parameter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4239#discussion_r23679510
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -32,7 +32,12 @@ private[ui] class JobPage(parent: JobsTab) extends 
WebUIPage("job") {
 
   def render(request: HttpServletRequest): Seq[Node] = {
 listener.synchronized {
-  val jobId = request.getParameter("id").toInt
+  val paramaterId = request.getParameter("id")
+  if (paramaterId.equals("")) {
--- End diff --

It is `null` if not specified, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-28 Thread catap
Github user catap commented on a diff in the pull request:

https://github.com/apache/spark/pull/4239#discussion_r23679953
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala ---
@@ -32,7 +32,12 @@ private[ui] class JobPage(parent: JobsTab) extends 
WebUIPage("job") {
 
   def render(request: HttpServletRequest): Seq[Node] = {
 listener.synchronized {
-  val jobId = request.getParameter("id").toInt
+  val paramaterId = request.getParameter("id")
+  if (paramaterId.equals("")) {
--- End diff --

Nope, is return "" when not specified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-71817474
  
O.K, let's wait for being merged the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] SPARK-4846: throw a RuntimeException a...

2015-01-28 Thread jinntrance
GitHub user jinntrance opened a pull request:

https://github.com/apache/spark/pull/4247

[MLLIB] SPARK-4846: throw a RuntimeException and give users hints to 
increase the minCount

When the vocabSize*vectorSize is larger than Int.MaxValue/8, we try to 
throw a RuntimeException. Because under this circumstance it would definitely 
throw an OOM when allocating memory to serialize the arrays 
syn0Global&syn1Global.   syn0Global&syn1Global are float arrays. Serializing 
them should need a byte array of more than 8*(size of syn0Global).
Also if we catch an OOM even if vocabSize*vectorSize is less than 
Int.MaxValue/8, we should give users hints to increase the minCount or decrease 
the vectorSize.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinntrance/spark w2v-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4247


commit 57eeae0e2c4ff1b768b043ecce6096322c4cfdf5
Author: Joseph J.C. Tang 
Date:   2015-01-27T03:30:13Z

throw a RuntimeException and give users hints regarding the 
vectorSize&minCount




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/4245#issuecomment-71817663
  
I found this issue is resolved by updating `com.codahale.metrics` library 
from `3.0.0` to `3.1.0` and it's proposed in #4209. So I'll close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [HOTFIX] Rename the identifier of driver from ...

2015-01-28 Thread sarutak
Github user sarutak closed the pull request at:

https://github.com/apache/spark/pull/4245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] SPARK-4846: throw a RuntimeException a...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4247#issuecomment-71817862
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] SPARK-4846: When the vocabulary size i...

2015-01-28 Thread jinntrance
Github user jinntrance commented on the pull request:

https://github.com/apache/spark/pull/3697#issuecomment-71817963
  
This PR was replaced by PR #4247 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >