date:20150130

[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1767#issuecomment-72234437
  
  [Test build #26409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26409/consoleFull)
 for   PR 1767 at commit 
[`1c35298`](https://github.com/apache/spark/commit/1c35298bfd3bea5b8eeba6bb4804b3fe74ff7fd9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1767#issuecomment-72234451
  
  [Test build #26409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26409/consoleFull)
 for   PR 1767 at commit 
[`1c35298`](https://github.com/apache/spark/commit/1c35298bfd3bea5b8eeba6bb4804b3fe74ff7fd9).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1767#issuecomment-72234457
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26409/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4261#issuecomment-72230800
  
  [Test build #26403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26403/consoleFull)
 for   PR 4261 at commit 
[`cf167ce`](https://github.com/apache/spark/commit/cf167cea9457e933b1b8ed5f0eff708e6535ef99).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JDBCUtils `
  * `  logWarning(sCouldn't find class $driver, e);`
  * `  implicit class JDBCDataFrame(rdd: DataFrame) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...

2015-01-30 Thread Liuchang0812

Github user Liuchang0812 commented on the pull request:

https://github.com/apache/spark/pull/4162#issuecomment-72235154
  
ubuntu@ip-172-31-24-113:~/spark/ec2$ ../dev/lint-python 
PEP 8 checks passed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5400 [MLlib] Changed name of GaussianMix...

2015-01-30 Thread tgaloppo

GitHub user tgaloppo opened a pull request:

https://github.com/apache/spark/pull/4290

SPARK-5400 [MLlib] Changed name of GaussianMixtureEM to GaussianMixture

Decoupling the model and the algorithm


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgaloppo/spark spark-5400

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4290


commit d8480761d98119a37b10a43a24bc6720e0e6eb87
Author: Travis Galoppo tjg2...@columbia.edu
Date:   2015-01-30T15:20:55Z

SPARK-5400 Changed name of GaussianMixtureEM to GaussianMixture to separate 
model from algorithm




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread zapletal-martin

Github user zapletal-martin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3519#discussion_r23841288
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala 
---
@@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.regression
+
+import java.io.Serializable
+import java.util.Arrays.binarySearch
+
+import org.apache.spark.api.java.{JavaDoubleRDD, JavaRDD}
+import org.apache.spark.rdd.RDD
+
+/**
+ * Regression model for Isotonic regression
+ *
+ * @param features Array of features.
+ * @param labels Array of labels associated to the features at the same 
index.
+ */
+class IsotonicRegressionModel (
+features: Array[Double],
+val labels: Array[Double])
+  extends Serializable {
+
+  /**
+   * Predict labels for provided features
+   * Using a piecewise constant function
+   *
+   * @param testData features to be labeled
+   * @return predicted labels
+   */
+  def predict(testData: RDD[Double]): RDD[Double] =
+testData.map(predict)
+
+  /**
+   * Predict labels for provided features
+   * Using a piecewise constant function
+   *
+   * @param testData features to be labeled
+   * @return predicted labels
+   */
+  def predict(testData: JavaRDD[java.lang.Double]): JavaDoubleRDD =
+JavaDoubleRDD.fromRDD(predict(testData.rdd.asInstanceOf[RDD[Double]]))
+
+  /**
+   * Predict a single label
+   * Using a piecewise constant function
+   *
+   * @param testData feature to be labeled
+   * @return predicted label
+   */
+  def predict(testData: Double): Double = {
+val result = binarySearch(features, testData)
+
+val index =
+  if (result == -1) {
--- End diff --

As for the special singularity case I believe this requires further 
considerations. Currently we just sort the input to PAV by feature therefore 
order of multiple data points with the same feature is undefined.

Consider a case where features are 1, 2, 2, 3 and labels are in first case 
1, 4, 2, 5 and in second case 1, 2, 4, 5. For first case the result of PAV 
would be 1, 3, 3, 5 but in second case 1, 2, 4, 5.

Similarly for `IsotonicRegressionModel` with boundaries 1, 2, 2, 3 and 
predictions in first case 1, 4, 2, 5 and in second case 1, 2, 4, 5. Now the 
first mode would return predict(1.5)=2.5, predict(2.5)=3.5, but the second 
would return 1.5 and 4.5 respectively for the same input values.

I suggest to sort the input by features and then by labels if features are 
equal. The same would be true for the model. Therefore both PAV and the 
predictions of values between boundaries would be deterministic. The 
predictions for the boundary with multiple values would remain 
non-deterministic (based on `Java.util.Arrays.binarySearch()` which in this 
case also returns one of the correct results, but does not specify which).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-72228772
  
  [Test build #26401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26401/consoleFull)
 for   PR 3798 at commit 
[`0090553`](https://github.com/apache/spark/commit/0090553eba09240b6ad4cf508ea33503705b12d9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class DeterministicKafkaInputDStreamCheckpointData extends 
DStreamCheckpointData(this) `
  * `class KafkaCluster(val kafkaParams: Map[String, String]) extends 
Serializable `
  * `  case class LeaderOffset(host: String, port: Int, offset: Long)`
  * `class KafkaRDDPartition(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...

2015-01-30 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1484#issuecomment-72231302
  
@mengxr I'll do the updates today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-72228780
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26401/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2015-01-30 Thread tgravescs

GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/4292

[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure 
hdfs

.this was https://github.com/apache/spark/pull/2676

https://issues.apache.org/jira/browse/SPARK-3778

This affects if someone is trying to access secure hdfs something like:
val lines = {
val hconf = new Configuration()
hconf.set(mapred.input.dir, mydir)
hconf.set(textinputformat.record.delimiter,\003432\n)
sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text])
}

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-3788

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4292


commit cf3b45337a1fb1da6492779709b2bf213bccbb16
Author: Thomas Graves tgra...@apache.org
Date:   2014-10-06T14:53:29Z

newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5472][SQL] A JDBC data source for Spark...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4261#issuecomment-72230810
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26403/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4216#discussion_r23867076
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -121,6 +122,17 @@ private[spark] class Master(
 throw new SparkException(spark.deploy.defaultCores must be positive)
   }
 
+  // Alternative application submission gateway that is stable across 
Spark versions
+  private val restServerEnabled = 
conf.getBoolean(spark.master.rest.enabled, true)
+  private val restServer =
+if (restServerEnabled) {
+  val port = conf.getInt(spark.master.rest.port, 17077)
--- End diff --

I made this 6066. Let me know if you have any objections


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-30 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r23879189
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala 
---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.ml.classification.{Classifier, ClassifierParams, 
ClassificationModel}
+import org.apache.spark.ml.param.{Params, IntParam, ParamMap}
+import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+
+
+/**
+ * A simple example demonstrating how to write your own learning algorithm 
using Estimator,
+ * Transformer, and other abstractions.
+ * This mimics [[org.apache.spark.ml.classification.LogisticRegression]].
+ * Run with
+ * {{{
+ * bin/run-example ml.DeveloperApiExample
+ * }}}
+ */
+object DeveloperApiExample {
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName(DeveloperApiExample)
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Prepare training data.
+val training = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5
+
+// Create a LogisticRegression instance.  This instance is an 
Estimator.
+val lr = new MyLogisticRegression()
+// Print out the parameters, documentation, and any default values.
+println(MyLogisticRegression parameters:\n + lr.explainParams() + 
\n)
+
+// We may set parameters using setter methods.
+lr.setMaxIter(10)
+
+// Learn a LogisticRegression model.  This uses the parameters stored 
in lr.
+val model = lr.fit(training)
+
+// Prepare test data.
+val test = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)),
+  LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5
+
+// Make predictions on test data.
+val sumPredictions: Double = model.transform(test)
+  .select(features, label, prediction)
+  .collect()
+  .map { case Row(features: Vector, label: Double, prediction: Double) 
=
+prediction
+  }.sum
+assert(sumPredictions == 0.0,
+  MyLogisticRegression predicted something other than 0, even though 
all weights are 0!)
+
+sc.stop()
+  }
+}
+
+/**
+ * Example of defining a parameter trait for a user-defined type of 
[[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private trait MyLogisticRegressionParams extends ClassifierParams {
+
+  /**
+   * Param for max number of iterations
+   *
+   * NOTE: The usual way to add a parameter to a model or algorithm is to 
include:
+   *   - val myParamName: ParamType
+   *   - def getMyParamName
+   *   - def setMyParamName
--- End diff --

Is the setter missing in this example or is it auto generated somehow ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72252929
  
  [Test build #26417 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26417/consoleFull)
 for   PR 3519 at commit 
[`3da56e5`](https://github.com/apache/spark/commit/3da56e530276a2ff7104993da893fe04e124392d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72284082
  
  [Test build #26434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26434/consoleFull)
 for   PR 3519 at commit 
[`e3c0e44`](https://github.com/apache/spark/commit/e3c0e442ab591731c322ec9cc78530a7665a00b9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5498][SPARK-SQL]fix bug when query the ...

2015-01-30 Thread jeanlyn

GitHub user jeanlyn opened a pull request:

https://github.com/apache/spark/pull/4289

[SPARK-5498][SPARK-SQL]fix bug when query the data when partition schema 
does not match table schema

In hive,the schema of partition may be difference from  the table 
schema.When we use spark-sql to query the data of partition which schema is 
difference from the table schema,we will get the exceptions as the description 
of the [jira](https://issues.apache.org/jira/browse/SPARK-5498) .For example:
1.We take a look of the schema for the partition and the table 

```sql
DESCRIBE partition_test PARTITION (dt='1');
id  int None
namestring  None
dt  string  None
 
# Partition Information  
# col_name  data_type   comment 
 
dt  string  None 
```
```
DESCRIBE partition_test;
OK
id  bigint  None
namestring  None   
dt  string  None
 
# Partition Information  
# col_name  data_type   comment 
 
dt  string  None 
```
2. run the sql
```sql
SELECT * FROM partition_test where dt='1';
```
we will get the cast exception `java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to 
org.apache.spark.sql.catalyst.expressions.MutableInt`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeanlyn/spark schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4289


commit adfc7defb278667d0c27c6128b00339bb8d52bb1
Author: jeanlyn jeanly...@gmail.com
Date:   2015-01-30T13:48:21Z

SPARK-5498:fix bug when query the data when partition schema does not match 
table schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-72232842
  
  [Test build #26407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26407/consoleFull)
 for   PR 2847 at commit 
[`ec21f7d`](https://github.com/apache/spark/commit/ec21f7dfcad6191e0c2d6d7fd93ac77012098e6c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23858269
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
 // In yarn-cluster mode, use yarn.Client as a wrapper around the user 
class
 if (isYarnCluster) {
   childMainClass = org.apache.spark.deploy.yarn.Client
-  if (args.primaryResource != SPARK_INTERNAL) {
-childArgs += (--jar, args.primaryResource)
+  if (args.isPython) {// yarn-cluster mode for python application
+  val primaryResourceLocalPath = new Path(args.primaryResource)
+childArgs += (--primaryResource, 
primaryResourceLocalPath.getName)
+val pyFilesLocalNames:String = if (args.pyFiles != null) {
+  args.pyFiles.split(,).map { p = (new Path(p)).getName 
}.mkString(,)
--- End diff --

Also, it seems like if the primary resource is a jar, it isn't truncated 
with getName.  Is there a reason this needs to be different for a python file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23878978
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -431,6 +458,155 @@ object SparkSubmit {
   }
 }
 
+/** Provides utility functions to be used inside SparkSubmit. */
+private[spark] object SparkSubmitUtils extends Logging {
+
+  // Directories for caching downloads through ivy and storing the jars 
when maven coordinates are
+  // supplied to spark-submit
+  private var PACKAGES_DIRECTORY: File = null
+
+  /**
+   * Represents a Maven Coordinate
+   * @param groupId the groupId of the coordinate
+   * @param artifactId the artifactId of the coordinate
+   * @param version the version of the coordinate
+   */
+  private[spark] case class MavenCoordinate(groupId: String, artifactId: 
String, version: String)
+
+  /**
+   * Resolves any dependencies that were supplied through maven coordinates
+   * @param coordinates Comma-delimited string of maven coordinates
+   * @param remoteRepos Comma-delimited string of remote repositories 
other than maven central
+   * @param ivyPath The path to the local ivy repository
+   * @return The comma-delimited path to the jars of the given maven 
artifacts including their
+   * transitive dependencies
+   */
+  private[spark] def resolveMavenCoordinates(
+  coordinates: String,
+  remoteRepos: String,
+  ivyPath: String,
+  isTest: Boolean = false): String = {
--- End diff --

Also, what do you think about returning a Seq of paths and leaving it up to 
the caller to join them with commas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5498][SPARK-SQL]fix bug when query the ...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4289#issuecomment-72206551
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4294#issuecomment-72268532
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26426/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72248433
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26414/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4216#discussion_r23876918
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/SubmitRestProtocolMessage.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.rest
+
+import com.fasterxml.jackson.annotation._
+import com.fasterxml.jackson.annotation.JsonAutoDetect.Visibility
+import com.fasterxml.jackson.annotation.JsonInclude.Include
+import com.fasterxml.jackson.databind.ObjectMapper
+import org.json4s.JsonAST._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.util.Utils
+
+/**
+ * An abstract message exchanged in the REST application submission 
protocol.
+ *
+ * This message is intended to be serialized to and deserialized from JSON 
in the exchange.
+ * Each message can either be a request or a response and consists of 
three common fields:
+ *   (1) the action, which fully specifies the type of the message
+ *   (2) the Spark version of the client / server
+ *   (3) an optional message
+ */
+@JsonInclude(Include.NON_NULL)
+@JsonAutoDetect(getterVisibility = Visibility.ANY, setterVisibility = 
Visibility.ANY)
+@JsonPropertyOrder(alphabetic = true)
+abstract class SubmitRestProtocolMessage {
+  private val messageType = Utils.getFormattedClassName(this)
+  protected val action: String = messageType
+  protected val sparkVersion: SubmitRestProtocolField[String]
+  protected val message = new SubmitRestProtocolField[String](message)
+
+  // Required for JSON de/serialization and not explicitly used
+  private def getAction: String = action
+  private def setAction(s: String): this.type = this
+
+  // Intended for the user and not for JSON de/serialization, which 
expects more specific keys
+  @JsonIgnore
+  def getSparkVersion: String
+  @JsonIgnore
+  def setSparkVersion(s: String): this.type
+
+  def getMessage: String = message.toString
+  def setMessage(s: String): this.type = setField(message, s)
+
+  /**
+   * Serialize the message to JSON.
+   * This also ensures that the message is valid and its fields are in the 
expected format.
+   */
+  def toJson: String = {
+validate()
+val mapper = new ObjectMapper
+pretty(parse(mapper.writeValueAsString(this)))
--- End diff --

great


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-01-30 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-72265786
  
@MechCoder  Sorry, there are a lot of PRs out there now, so this may not 
get merged before the code freeze.  It's a good cleanup, though, so I'll 
definitely take a look when I can.  Thanks for your patience.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72268511
  
@jacek-lewandowski I think Sean meant that you can do `new 
Properties().putAll(oldProperties)` instead of cloning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23861868
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -172,7 +172,8 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
 }
 
 // Require all python files to be local, so we can add them to the 
PYTHONPATH
-if (isPython) {
+// when yarn-cluster, all python files can be non-local
+if (isPython  !master.equalsIgnoreCase(yarn-cluster)) {
--- End diff --

this is not sufficient. Users can manually specify `--master yarn 
--deploy-mode cluster`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4294#issuecomment-72268528
  
  [Test build #26426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26426/consoleFull)
 for   PR 4294 at commit 
[`a2f9c06`](https://github.com/apache/spark/commit/a2f9c0695ecd1c5a0ae334bde21740588ab81c29).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait TableScan extends BaseRelation `
  * `trait PrunedScan extends BaseRelation `
  * `trait PrunedFilteredScan extends BaseRelation `
  * `trait CatalystScan extends BaseRelation `
  * `trait InsertableRelation extends BaseRelation `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72248428
  
  [Test build #26414 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26414/consoleFull)
 for   PR 4047 at commit 
[`3e0c894`](https://github.com/apache/spark/commit/3e0c8945640523f0747e2145ce5ca7b1d405b4ab).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Tokenizer(sc: SparkContext, stopwordFile: String) extends 
Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23874490
  
--- Diff: bin/windows-utils.cmd ---
@@ -32,7 +32,7 @@ SET opts=\--master\ \--deploy-mode\ \--class\ 
\--name\ \--jars\ \--p
 SET opts=%opts:~1,-1% \--conf\ \--properties-file\ 
\--driver-memory\ \--driver-java-options\
 SET opts=%opts:~1,-1% \--driver-library-path\ \--driver-class-path\ 
\--executor-memory\
 SET opts=%opts:~1,-1% \--driver-cores\ \--total-executor-cores\ 
\--executor-cores\ \--queue\
-SET opts=%opts:~1,-1% \--num-executors\ \--archives\
+SET opts=%opts:~1,-1% \--num-executors\ \--archives\ \--packages\ 
\--repositories\
--- End diff --

Looks like this line is now missing a closing quote?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread jacek-lewandowski

Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72274441
  
It serializes the object and then deserializes so I suppose this is a deep 
copy. 

For the `stringPropertyNames` - you can, but this will not be a 1:1 copy:

```scala
val parent = new Properties()
parent.setProperty(test1, A)

val child = new Properties(parent)
child.put(test1, C)
child.put(test2, B)

child.getProperty(test1)
child.remove(test1)
child.getProperty(test1)
```
will give you
```
scala res17: Object = C
scala res18: String = A
```

When you copy in the way you suggested, there will be `null` after removal.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...

2015-01-30 Thread avulanov

Github user avulanov commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23870580
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
--- End diff --

I though it would be useful in general for filtering features. Does it make 
sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3975] Added support for BlockMatrix add...

2015-01-30 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4274#discussion_r23875903
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -246,4 +248,86 @@ class BlockMatrix(
 val localMat = toLocalMatrix()
 new BDM[Double](localMat.numRows, localMat.numCols, localMat.toArray)
   }
+
+  /** Adds two block matrices together. The matrices must have the same 
size and matching
+* `rowsPerBlock` and `colsPerBlock` values. If one of the blocks that 
are being added are
+* instances of [[SparseMatrix]], the resulting sub matrix will also be 
a [[SparseMatrix]], even
+* if it is being added to a [[DenseMatrix]]. If two dense matrices are 
added, the output will
+* also be a [[DenseMatrix]].
+*/
+  def add(other: BlockMatrix): BlockMatrix = {
+require(numRows() == other.numRows(), Both matrices must have the 
same number of rows.  +
+  sA.numRows: ${numRows()}, B.numRows: ${other.numRows()})
+require(numCols() == other.numCols(), Both matrices must have the 
same number of columns.  +
+  sA.numCols: ${numCols()}, B.numCols: ${other.numCols()})
+if (rowsPerBlock == other.rowsPerBlock  colsPerBlock == 
other.colsPerBlock) {
+  val addedBlocks = blocks.cogroup(other.blocks, createPartitioner())
+.map { case ((blockRowIndex, blockColIndex), (a, b)) =
+  if (a.size  1 || b.size  1) {
+throw new SparkException(There are MatrixBlocks with 
duplicate indices. Please  +
--- End diff --

Put `blockRowIndex` and `blockColIndex` in the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23862088
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterArguments.scala
 ---
@@ -48,6 +50,14 @@ class ApplicationMasterArguments(val args: 
Array[String]) {
   userClass = value
   args = tail
 
+case (--primaryResource) :: value :: tail =
--- End diff --

this should be `--primary-resource` instead of camel case for consistency


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove redundant field childOutput from exec...

2015-01-30 Thread kai-zeng

GitHub user kai-zeng opened a pull request:

https://github.com/apache/spark/pull/4291

remove redundant field childOutput from execution.Aggregate, use 
child.output instead



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kai-zeng/spark aggregate-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4291


commit 78658efffb3fe0632a5aafe45997c2ab24791475
Author: kai kaiz...@eecs.berkeley.edu
Date:   2015-01-30T16:10:19Z

remove redundant field childOutput




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3975] Added support for BlockMatrix add...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4274#issuecomment-72277417
  
  [Test build #26425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26425/consoleFull)
 for   PR 4274 at commit 
[`ac25783`](https://github.com/apache/spark/commit/ac25783cb125e1eea4728d0933f1295d43d0c442).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-72255456
  
  [Test build #26413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26413/consoleFull)
 for   PR 4216 at commit 
[`bf696ff`](https://github.com/apache/spark/commit/bf696ff0b7135883e53e5fb275b4afa0db6c4a4a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72261726
  
  [Test build #26416 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26416/consoleFull)
 for   PR 3519 at commit 
[`75eac55`](https://github.com/apache/spark/commit/75eac55d2a168ca3452f08b403187c503cdbb45a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23862118
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -103,11 +104,15 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   userClass = value
   args = tail
 
+case (--primaryResource) :: value :: tail =
--- End diff --

I agree, since we don't ever set this to the main jar if we're not running 
python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5176] The thrift server does not suppor...

2015-01-30 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4137#issuecomment-72280943
  
@andrewor14 Per our offline discuss, it still requires some minor work to 
make the Thrift server support standalone cluster mode (mainly related to the 
`spark-internal` argument). Currently, at least we don't want to add it in 
1.3.0 yet. So this PR LGTM.

@tpanningnextcen Thanks for working on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-72255468
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26413/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23878483
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -431,6 +458,155 @@ object SparkSubmit {
   }
 }
 
+/** Provides utility functions to be used inside SparkSubmit. */
+private[spark] object SparkSubmitUtils extends Logging {
+
+  // Directories for caching downloads through ivy and storing the jars 
when maven coordinates are
+  // supplied to spark-submit
+  private var PACKAGES_DIRECTORY: File = null
+
+  /**
+   * Represents a Maven Coordinate
+   * @param groupId the groupId of the coordinate
+   * @param artifactId the artifactId of the coordinate
+   * @param version the version of the coordinate
+   */
+  private[spark] case class MavenCoordinate(groupId: String, artifactId: 
String, version: String)
+
+  /**
+   * Resolves any dependencies that were supplied through maven coordinates
+   * @param coordinates Comma-delimited string of maven coordinates
+   * @param remoteRepos Comma-delimited string of remote repositories 
other than maven central
+   * @param ivyPath The path to the local ivy repository
+   * @return The comma-delimited path to the jars of the given maven 
artifacts including their
+   * transitive dependencies
+   */
+  private[spark] def resolveMavenCoordinates(
+  coordinates: String,
+  remoteRepos: String,
+  ivyPath: String,
+  isTest: Boolean = false): String = {
--- End diff --

Similarly, maybe the configuration of the ChainResolver could be done in 
its own helper method that takes the comma-separated list of remoteRepos and 
returns a resolver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-01-30 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4293#issuecomment-72285133
  
This is good to put. One idea that just came to my mind is ... why don't 
the downstream operators inspect whether they need to do copys or not?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...

2015-01-30 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/4162#issuecomment-72223944
  
OK, LGTM pending Python style tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Don't return `ERROR 500` when have missing arg...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4239#issuecomment-72286357
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4254#issuecomment-72279293
  
LGTM except minor user guide issues, which will be addressed in SPARK-5503. 
I've merged this into master. Thanks for the contributing! (Now MLlib depends 
on GraphX.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23878900
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import org.apache.spark.util.ResetSystemProperties
+import org.scalatest.{Matchers, FunSuite}
+
+class SparkSubmitUtilsSuite extends FunSuite with Matchers with 
ResetSystemProperties {
+
+  def beforeAll() {
+System.setProperty(spark.testing, true)
--- End diff --

If I recall, Maven already sets this property to `true` before running the 
tests.  Is there a reason that we need this (and ResetSystemProperties) here, 
or is it a carry-over from another test suite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4254


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23879324
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -431,6 +458,155 @@ object SparkSubmit {
   }
 }
 
+/** Provides utility functions to be used inside SparkSubmit. */
+private[spark] object SparkSubmitUtils extends Logging {
+
+  // Directories for caching downloads through ivy and storing the jars 
when maven coordinates are
+  // supplied to spark-submit
+  private var PACKAGES_DIRECTORY: File = null
+
+  /**
+   * Represents a Maven Coordinate
+   * @param groupId the groupId of the coordinate
+   * @param artifactId the artifactId of the coordinate
+   * @param version the version of the coordinate
+   */
+  private[spark] case class MavenCoordinate(groupId: String, artifactId: 
String, version: String)
+
+  /**
+   * Resolves any dependencies that were supplied through maven coordinates
+   * @param coordinates Comma-delimited string of maven coordinates
+   * @param remoteRepos Comma-delimited string of remote repositories 
other than maven central
+   * @param ivyPath The path to the local ivy repository
+   * @return The comma-delimited path to the jars of the given maven 
artifacts including their
+   * transitive dependencies
+   */
+  private[spark] def resolveMavenCoordinates(
+  coordinates: String,
+  remoteRepos: String,
+  ivyPath: String,
+  isTest: Boolean = false): String = {
+if (coordinates == null || coordinates.trim.isEmpty) {
+  
+} else {
+  val artifacts = coordinates.split(,).map { p =
+val splits = p.split(:)
+require(splits.length == 3, sProvided Maven Coordinates must be 
in the form  +
+  s'groupId:artifactId:version'. The coordinate provided is: $p)
+require(splits(0) != null  splits(0).trim.nonEmpty, sThe 
groupId cannot be null or  +
+  sbe whitespace. The groupId provided is: ${splits(0)})
+require(splits(1) != null  splits(1).trim.nonEmpty, sThe 
artifactId cannot be null or  +
+  sbe whitespace. The artifactId provided is: ${splits(1)})
+require(splits(2) != null  splits(2).trim.nonEmpty, sThe 
version cannot be null or  +
+  sbe whitespace. The version provided is: ${splits(2)})
+new MavenCoordinate(splits(0), splits(1), splits(2))
+  }
+  // Default configuration name for ivy
+  val conf = default
+  // set ivy settings for location of cache
+  val ivySettings: IvySettings = new IvySettings
+  if (ivyPath == null || ivyPath.trim.isEmpty) {
--- End diff --

Is `ivyPath` acting like an optional value here, since it can be null?  If 
that's the case, it might be nice to use an `Option` to make its optional 
nature more explicit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23879684
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -123,6 +126,7 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
   .orNull
 name = 
Option(name).orElse(sparkProperties.get(spark.app.name)).orNull
 jars = Option(jars).orElse(sparkProperties.get(spark.jars)).orNull
+ivyRepoPath = sparkProperties.get(spark.jars.ivy).orNull
--- End diff --

Actually in order to not expose it to users in spark-submit. I still wanted 
to have it as a configuration just for the flexibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5473] [EC2] Expose SSH failures after s...

2015-01-30 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4262#issuecomment-72285208
  
I haven't tried out this solution, so I am not exactly sure what gets 
printed (I can do it over the weekend sometime). At a high-level my comment is 
that every attempt that checks if the cluster is `ssh-ready` should print some 
feedback on the screen so the user knows the script is not hun. If that is the 
case I'm fine with this solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72265197
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26419/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread jacek-lewandowski

Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72272628
  
Look at this simple example:
```scala
val parent = new Properties()
parent.setProperty(test1, A)

val child = new Properties(parent)
child.put(test2, B)

val copy = new Properties()
copy.putAll(child)

child.getProperty(test1)
child.getProperty(test2)

copy.getProperty(test1)
copy.getProperty(test2)
```
which will result in:
```
scala res3: String = A
scala res4: String = B
scala res5: String = null
scala res6: String = B
```

In other words: `new Properties(oldProperties)` initialises a new 
properties by setting oldProperties as a parent (defaults). On the other hand 
`new Properties().putAll(oldProperties)` copies only those properties which 
were explicitly set and cuts the whole hierarchy with defaults. Only cloning 
gives you the same object.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/4216#discussion_r23876895
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/SubmitRestProtocolField.scala 
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.rest
+
+/**
+ * A field used in [[SubmitRestProtocolMessage]]s.
+ */
+class SubmitRestProtocolField[T](val name: String) {
--- End diff --

sounds fair. An earlier commit had exactly what you suggest here actually. 
I just thought if we wanted to do extra validation and throw a different 
exception then we could re-use the name here, but this is no longer the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5264][SQL] Support `drop temporary tabl...

2015-01-30 Thread OopsOutOfMemory

Github user OopsOutOfMemory commented on a diff in the pull request:

https://github.com/apache/spark/pull/4060#discussion_r23841357
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala ---
@@ -231,6 +241,32 @@ private [sql] case class CreateTempTableUsing(
   }
 }
 
+private[sql] case class DropTable(
+tableName: String,
+isExists: Boolean,
+temporary: Boolean) extends Command
--- End diff --

hi, @liancheng, I think we do need this logical `DropTable`. 
Since all parser should go first ddlParser, if not get plan, then it will 
try dialect parser. 
If I remove this `logical drop table`, when execute `drop table xxx`, it 
will always use `DropTableCommand` in `SQLContext`.  Sorry if I'm wrong.
In HiveContext.

 else if (conf.dialect == hiveql) {
  new SchemaRDD(this, ddlParser(sqlText, 
false).getOrElse(HiveQl.parseSql(substituted)))



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23864907
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
 // In yarn-cluster mode, use yarn.Client as a wrapper around the user 
class
 if (isYarnCluster) {
   childMainClass = org.apache.spark.deploy.yarn.Client
-  if (args.primaryResource != SPARK_INTERNAL) {
-childArgs += (--jar, args.primaryResource)
+  if (args.isPython) {// yarn-cluster mode for python application
+  val primaryResourceLocalPath = new Path(args.primaryResource)
+childArgs += (--primaryResource, 
primaryResourceLocalPath.getName)
+val pyFilesLocalNames:String = if (args.pyFiles != null) {
+  args.pyFiles.split(,).map { p = (new Path(p)).getName 
}.mkString(,)
--- End diff --

@lianhuiwang quick question are you stripping the path prefix here because 
all the python files in YARN are already found in the working directory? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread brkyvz

Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-72282474
  
@JoshRosen thank you very much for the time and comments. I'll fix things 
immediately


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4216#discussion_r23875609
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/rest/SubmitRestProtocolField.scala 
---
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.rest
+
+/**
+ * A field used in [[SubmitRestProtocolMessage]]s.
+ */
+class SubmitRestProtocolField[T](val name: String) {
--- End diff --

I think you don't need `name` anymore -- you end up needing to repeat the 
field name a lot, when now jackson is taking care of putting the field name in 
the json.  Looks like its only used in `assertFieldIsSet`, which is only called 
from `DriverStatusRequest` -- so you could just pass in a message for that one 
case and dry up a lot of the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4254#discussion_r23876362
  
--- Diff: docs/mllib-clustering.md ---
@@ -34,6 +34,26 @@ a given dataset, the algorithm returns the best 
clustering result).
 * *initializationSteps* determines the number of steps in the k-means\|\| 
algorithm.
 * *epsilon* determines the distance threshold within which we consider 
k-means to have converged. 
 
+### Power Iteration Clustering
+
+Power iteration clustering is a scalable and efficient algorithm for 
clustering points given pointwise mutual affinity values.  Internally the 
algorithm:
+
+* accepts a 
[Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph)
 that represents a  normalized pairwise affinity between all input points.
+* calculates the principal eigenvalue and eigenvector
+* Clusters each of the input points according to their principal 
eigenvector component value
+
+Details of this algorithm are found within [Power Iteration Clustering, 
Lin and Cohen]{www.icml2010.org/papers/387.pdf}
--- End diff --

This is not the correct syntax for links in markdown. Use `[](...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23879289
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import org.apache.spark.util.ResetSystemProperties
+import org.scalatest.{Matchers, FunSuite}
+
+class SparkSubmitUtilsSuite extends FunSuite with Matchers with 
ResetSystemProperties {
+
+  def beforeAll() {
+System.setProperty(spark.testing, true)
--- End diff --

Carry-over from SparkSubmitSuite. It's my first time writing a core test, I 
can remove it if it's unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4254#discussion_r23876361
  
--- Diff: docs/mllib-clustering.md ---
@@ -34,6 +34,26 @@ a given dataset, the algorithm returns the best 
clustering result).
 * *initializationSteps* determines the number of steps in the k-means\|\| 
algorithm.
 * *epsilon* determines the distance threshold within which we consider 
k-means to have converged. 
 
+### Power Iteration Clustering
+
+Power iteration clustering is a scalable and efficient algorithm for 
clustering points given pointwise mutual affinity values.  Internally the 
algorithm:
+
+* accepts a 
[Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph)
 that represents a  normalized pairwise affinity between all input points.
--- End diff --

Should use relative path api/graphx/ See examples in this markdown 
file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-30 Thread mccheah

Github user mccheah commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-72286220
  
Tried running the Streaming CheckpointSuite locally, and it broke because 
of the new CommitDeniedException logic I added. Don't have any ideas as to how 
this happens except that streaming might not be using SparkHadoopWriter in a 
way that is compatible with this design, perhaps...

I don't think I'll be able to take this any further. Feel free to pick 
things up from here, @JoshRosen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72286197
  
@JoshRosen who investigated this a bunch for tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5473] [EC2] Expose SSH failures after s...

2015-01-30 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/4262#issuecomment-72286941
  
You can see example output in [the PR 
description](https://github.com/apache/spark/pull/4262#issue-55856344).

I will look into adding feedback while the script is waiting on the cluster 
to reach a certain state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread jacek-lewandowski

Github user jacek-lewandowski commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72268328
  
@srowen - unfortunately they are something more - they inherit from the 
`HashTable` but they makes a hierarchy by referencing the parent `Properties` 
which are the defaults. As the defaults is also `Properties`, it has its own 
parent and so on.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...

2015-01-30 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2676#issuecomment-72223705
  
I'll try to bring it up to date today.  I'm out all next week though so if 
you find issues someone else might need to take it over.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23879094
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -431,6 +458,155 @@ object SparkSubmit {
   }
 }
 
+/** Provides utility functions to be used inside SparkSubmit. */
+private[spark] object SparkSubmitUtils extends Logging {
+
+  // Directories for caching downloads through ivy and storing the jars 
when maven coordinates are
+  // supplied to spark-submit
+  private var PACKAGES_DIRECTORY: File = null
+
+  /**
+   * Represents a Maven Coordinate
+   * @param groupId the groupId of the coordinate
+   * @param artifactId the artifactId of the coordinate
+   * @param version the version of the coordinate
+   */
+  private[spark] case class MavenCoordinate(groupId: String, artifactId: 
String, version: String)
+
+  /**
+   * Resolves any dependencies that were supplied through maven coordinates
+   * @param coordinates Comma-delimited string of maven coordinates
+   * @param remoteRepos Comma-delimited string of remote repositories 
other than maven central
+   * @param ivyPath The path to the local ivy repository
+   * @return The comma-delimited path to the jars of the given maven 
artifacts including their
+   * transitive dependencies
+   */
+  private[spark] def resolveMavenCoordinates(
+  coordinates: String,
+  remoteRepos: String,
+  ivyPath: String,
+  isTest: Boolean = false): String = {
+if (coordinates == null || coordinates.trim.isEmpty) {
+  
+} else {
+  val artifacts = coordinates.split(,).map { p =
+val splits = p.split(:)
+require(splits.length == 3, sProvided Maven Coordinates must be 
in the form  +
+  s'groupId:artifactId:version'. The coordinate provided is: $p)
+require(splits(0) != null  splits(0).trim.nonEmpty, sThe 
groupId cannot be null or  +
+  sbe whitespace. The groupId provided is: ${splits(0)})
+require(splits(1) != null  splits(1).trim.nonEmpty, sThe 
artifactId cannot be null or  +
+  sbe whitespace. The artifactId provided is: ${splits(1)})
+require(splits(2) != null  splits(2).trim.nonEmpty, sThe 
version cannot be null or  +
+  sbe whitespace. The version provided is: ${splits(2)})
+new MavenCoordinate(splits(0), splits(1), splits(2))
+  }
+  // Default configuration name for ivy
+  val conf = default
+  // set ivy settings for location of cache
+  val ivySettings: IvySettings = new IvySettings
+  if (ivyPath == null || ivyPath.trim.isEmpty) {
+PACKAGES_DIRECTORY = new File(ivySettings.getDefaultIvyUserDir, 
jars)
+  } else {
+ivySettings.setDefaultCache(new File(ivyPath, cache))
+PACKAGES_DIRECTORY = new File(ivyPath, jars)
+  }
+  logInfo(sIvy Default Cache set to: 
${ivySettings.getDefaultCache.getAbsolutePath})
+  logInfo(sThe jars for the packages stored in: $PACKAGES_DIRECTORY)
+
+  // create a pattern matcher
+  ivySettings.addMatcher(new GlobPatternMatcher)
+
+  // the biblio resolver resolves POM declared dependencies
+  val br: IBiblioResolver = new IBiblioResolver
+  br.setM2compatible(true)
+  br.setUsepoms(true)
+  br.setName(central)
+
+  // We need a chain resolver if we want to check multiple repositories
+  val cr = new ChainResolver
+  cr.setName(list)
+  cr.add(br)
+
+  // Add an exclusion rule for Spark
+  val sparkArtifacts = new ArtifactId(new ModuleId(org.apache.spark, 
*), *, *, *)
+  val sparkDependencyExcludeRule =
+new DefaultExcludeRule(sparkArtifacts, 
ivySettings.getMatcher(glob), null)
+  sparkDependencyExcludeRule.addConfiguration(conf)
+
+  // add any other remote repositories other than maven central
+  if (remoteRepos != null  remoteRepos.trim.nonEmpty) {
+var i = 1
+remoteRepos.split(,).foreach { repo =
+  val brr: IBiblioResolver = new IBiblioResolver
+  brr.setM2compatible(true)
+  brr.setUsepoms(true)
+  brr.setRoot(repo)
+  brr.setName(srepo-$i)
+  cr.add(brr)
+  logInfo(s$repo added as a remote repository with the name: 
${brr.getName})
+  i += 1
+}
+  }
+  ivySettings.addResolver(cr)
+  ivySettings.setDefaultResolver(cr.getName)
+  val ivy = Ivy.newInstance(ivySettings)
+  // Set resolve options to download transitive dependencies as well
+  val resolveOptions = new ResolveOptions
+

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72198487
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26400/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...

2015-01-30 Thread avulanov

Github user avulanov commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23861260
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: IndexedSeq[Int]) extends 
VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: linalg.Vector): linalg.Vector = {
+Compress(vector, indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Creates a ChiSquared feature selector.
+ */
+@Experimental
+object ChiSqSelector {
--- End diff --

Done! However, why do you think it is better than having static function 
given that this class does nothing but storing an integer (same for IDF)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3976#issuecomment-72240509
  
This looks like the right approach.  Added some comments inline.  Are you 
able to add a test for this in `YarnClusterSuite`?

Also, one last small thing: in `PythonRunner` are you able to remove the 
reference to Spark submit in the header comment, as this is now used in a 
more general way? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72198477
  
  [Test build #26400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26400/consoleFull)
 for   PR 3519 at commit 
[`e60a34f`](https://github.com/apache/spark/commit/e60a34f3479ce3b642f5941497c3e5c1bbeebdd4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IsotonicRegressionModel (`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5504] [sql] convertToCatalyst should su...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4295#issuecomment-72280766
  
  [Test build #26429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26429/consoleFull)
 for   PR 4295 at commit 
[`6b7276d`](https://github.com/apache/spark/commit/6b7276d44d0d578545f5c543de4167c0569fe4e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23861241
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -138,8 +140,9 @@ object SparkSubmit {
 (clusterManager, deployMode) match {
   case (MESOS, CLUSTER) =
 printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
+  case (STANDALONE, CLUSTER) if args.isPython =
+printErrorAndExit(Standalone-Cluster deploy mode is currently not 
supported +
--- End diff --

yes please try to be consistent with other error messages here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...

2015-01-30 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4220#issuecomment-72273213
  
I see, makes sense, thanks for the details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72265084
  
  [Test build #26419 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26419/consoleFull)
 for   PR 4047 at commit 
[`6fd1f71`](https://github.com/apache/spark/commit/6fd1f718ccef2464601256a84e99523c1f7d033f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...

2015-01-30 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1484#discussion_r23877192
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.feature
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vectors, 
Vector}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.stat.Statistics
+import org.apache.spark.rdd.RDD
+
+/**
+ * :: Experimental ::
+ * Chi Squared selector model.
+ *
+ * @param indices list of indices to select (filter)
+ */
+@Experimental
+class ChiSqSelectorModel(indices: Array[Int]) extends VectorTransformer {
+  /**
+   * Applies transformation on a vector.
+   *
+   * @param vector vector to be transformed.
+   * @return transformed vector.
+   */
+  override def transform(vector: Vector): Vector = {
+Compress(vector, indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Creates a ChiSquared feature selector.
+ * @param numTopFeatures number of features that selector will select
+ *   (ordered by statistic value descending)
+ */
+@Experimental
+class ChiSqSelector (val numTopFeatures: Int) {
+
+  /**
+   * Returns a ChiSquared feature selector.
+   *
+   * @param data data used to compute the Chi Squared statistic.
+   */
+  def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = {
+val indices = Statistics.chiSqTest(data)
+  .zipWithIndex.sortBy { case(res, _) = -res.statistic }
+  .take(numTopFeatures)
+  .map{ case(_, indices) = indices }
+new ChiSqSelectorModel(indices)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Filters features in a given vector
+ */
+@Experimental
+object Compress {
+  /**
+   * Returns a vector with features filtered.
+   * Preserves the order of filtered features the same as their indices 
are stored.
+   * @param features vector
+   * @param filterIndices indices of features to filter
+   */
+  def apply(features: Vector, filterIndices: Array[Int]): Vector = {
+features match {
+  case SparseVector(size, indices, values) =
+val filterMap = filterIndices.zipWithIndex.toMap
--- End diff --

This is slow due to hash map creation and hash lookups. Since both arrays 
are order, we can use the one-catch-another approach to extract indices, for 
example,


https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L344

Btw, please use `ArrayBuilder` to build new index/value arrays, which 
doesn't have the boxing/unboxing issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3975] Added support for BlockMatrix add...

2015-01-30 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/4274#discussion_r23862043
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -237,4 +239,88 @@ class BlockMatrix(
 val localMat = toLocalMatrix()
 new BDM[Double](localMat.numRows, localMat.numCols, localMat.toArray)
   }
+
+  /** Adds two block matrices together. The matrices must have the same 
size and matching
+* `rowsPerBlock` and `colsPerBlock` values. */
+  def add(other: BlockMatrix): BlockMatrix = {
+require(numRows() == other.numRows(), Both matrices must have the 
same number of rows.  +
+  sA.numRows: ${numRows()}, B.numRows: ${other.numRows()})
+require(numCols() == other.numCols(), Both matrices must have the 
same number of columns.  +
+  sA.numCols: ${numCols()}, B.numCols: ${other.numCols()})
+if (checkPartitioning(other, OperationNames.add)) {
+  val addedBlocks = blocks.cogroup(other.blocks, partitioner).
+map { case ((blockRowIndex, blockColIndex), (a, b)) =
+  if (a.isEmpty) {
+new MatrixBlock((blockRowIndex, blockColIndex), b.head)
+  } else if (b.isEmpty) {
+new MatrixBlock((blockRowIndex, blockColIndex), a.head)
+  } else {
+val result = a.head.toBreeze + b.head.toBreeze
+new MatrixBlock((blockRowIndex, blockColIndex), 
Matrices.fromBreeze(result))
+  }
+  }
+  new BlockMatrix(addedBlocks, rowsPerBlock, colsPerBlock, numRows(), 
numCols())
+} else {
+  throw new SparkException(
+Cannot add matrices with non-matching partitioners)
+}
+  }
+
+  /** Left multiplies this [[BlockMatrix]] to `other`, another 
[[BlockMatrix]]. The `colsPerBlock`
+* of this matrix must equal the `rowsPerBlock` of `other`. If `other` 
contains
+* [[SparseMatrix]], they will have to be converted to a
+* [[DenseMatrix]]. This may cause some performance issues until 
support for multiplying
+* two sparse matrices is added.
+*/
+  def multiply(other: BlockMatrix): BlockMatrix = {
+require(numCols() == other.numRows(), The number of columns of A and 
the number of rows  +
+  sof B must be equal. A.numCols: ${numCols()}, B.numRows: 
${other.numRows()}. If you  +
+  sthink they should be equal, try setting the dimensions of A and B 
explicitly while  +
+  sinitializing them.)
+if (checkPartitioning(other, OperationNames.multiply)) {
+  val resultPartitioner = GridPartitioner(numRowBlocks, 
other.numColBlocks,
+math.min(partitioner.numPartitions, 
other.partitioner.numPartitions))
+  // Each block of A must be multiplied with the corresponding blocks 
in each column of B.
+  val flatA = blocks.flatMap{ case ((blockRowIndex, blockColIndex), 
block) =
+Array.tabulate(other.numColBlocks)(j = ((blockRowIndex, j, 
blockColIndex), block))
+  }
+  // Each block of B must be multiplied with the corresponding blocks 
in each row of A.
+  val flatB = other.blocks.flatMap{ case ((blockRowIndex, 
blockColIndex), block) =
+Array.tabulate(numRowBlocks)(i = ((i, blockColIndex, 
blockRowIndex), block))
+  }
+  val newBlocks: RDD[MatrixBlock] = flatA.join(flatB, 
resultPartitioner).
+map { case ((blockRowIndex, blockColIndex, _), (mat1, mat2)) =
+  val C = mat2 match {
+case dense: DenseMatrix = mat1.multiply(dense)
+case sparse: SparseMatrix = mat1.multiply(sparse.toDense())
+case _ =  throw new SparkException(sUnrecognized matrix type 
${mat2.getClass}.)
+  }
+  ((blockRowIndex, blockColIndex), C.toBreeze)
+  }.reduceByKey(resultPartitioner, (a, b) = a + 
b).mapValues(Matrices.fromBreeze)
--- End diff --

The only problem I see there is that we need to know whether the block is 
on the right or bottom edge to properly initialize a `ZeroValue`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4254#issuecomment-72277872
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26423/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-72283292
  
We should probably mention this feature in the Submitting Applications 
section of the docs: 
https://spark.apache.org/docs/latest/submitting-applications.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5400 [MLlib] Changed name of GaussianMix...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4290#issuecomment-72246727
  
  [Test build #26412 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26412/consoleFull)
 for   PR 4290 at commit 
[`9c1534c`](https://github.com/apache/spark/commit/9c1534cd1c37953c1c592a2ce419eaee68dd853c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize Scala/Java test...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3564#issuecomment-72281399
  
  [Test build #26430 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26430/consoleFull)
 for   PR 3564 at commit 
[`5ef856d`](https://github.com/apache/spark/commit/5ef856d9e4a6a4eb7d04a3f999a27a41618b1fd9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-72286100
  
  [Test build #26435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26435/consoleFull)
 for   PR 3803 at commit 
[`9a3715a`](https://github.com/apache/spark/commit/9a3715a1e6a71040d234da52bf848b0bb109a591).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4155#issuecomment-72278827
  
  [Test build #26428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26428/consoleFull)
 for   PR 4155 at commit 
[`594e41a`](https://github.com/apache/spark/commit/594e41abecf5a48084608ab20112f884f28fc920).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-72231925
  
  [Test build #26406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26406/consoleFull)
 for   PR 2847 at commit 
[`93f3280`](https://github.com/apache/spark/commit/93f3280fb1b9897f40b695683824aef619a5b8c2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

2015-01-30 Thread jackylk

Github user jackylk commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-72232107
  
@mengxr 
I have modified according to the comments, please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5504] [sql] convertToCatalyst should su...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4295#issuecomment-72280446
  
  [Test build #573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/573/consoleFull)
 for   PR 4295 at commit 
[`6b7276d`](https://github.com/apache/spark/commit/6b7276d44d0d578545f5c543de4167c0569fe4e1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...

2015-01-30 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1484#issuecomment-72281109
  
@avulanov Please check my inline comments on Compress and the 
Estimator/Model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-72232096
  
  [Test build #26406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26406/consoleFull)
 for   PR 2847 at commit 
[`93f3280`](https://github.com/apache/spark/commit/93f3280fb1b9897f40b695683824aef619a5b8c2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FPGrowthModel (val frequentPattern: Array[(Array[String], 
Long)]) extends Serializable `
  * `class FPTree extends Serializable `
  * `class FPTreeNode(val item: String, var count: Int) extends 
Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3519#issuecomment-72283444
  
  [Test build #26433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26433/consoleFull)
 for   PR 3519 at commit 
[`ded071c`](https://github.com/apache/spark/commit/ded071c51d0669eaedee062692c9accf13233c18).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5504] [sql] convertToCatalyst should su...

2015-01-30 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4295#issuecomment-72283313
  
Awesome, thanks.  LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3976#discussion_r23858127
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -267,10 +277,22 @@ object SparkSubmit {
 // In yarn-cluster mode, use yarn.Client as a wrapper around the user 
class
 if (isYarnCluster) {
   childMainClass = org.apache.spark.deploy.yarn.Client
-  if (args.primaryResource != SPARK_INTERNAL) {
-childArgs += (--jar, args.primaryResource)
+  if (args.isPython) {// yarn-cluster mode for python application
+  val primaryResourceLocalPath = new Path(args.primaryResource)
+childArgs += (--primaryResource, 
primaryResourceLocalPath.getName)
+val pyFilesLocalNames:String = if (args.pyFiles != null) {
+  args.pyFiles.split(,).map { p = (new Path(p)).getName 
}.mkString(,)
+} else {
+  null
+}
+childArgs += (--py-files, pyFilesLocalNames.toString)
--- End diff --

No need for `toString`, this is already a string.  Also, can we avoid 
adding this arg at all instead of using null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread brkyvz

Github user brkyvz commented on the pull request:

https://github.com/apache/spark/pull/4215#issuecomment-72283499
  
I will add documentation during the QA period


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...

2015-01-30 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4215#discussion_r23879255
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -431,6 +458,155 @@ object SparkSubmit {
   }
 }
 
+/** Provides utility functions to be used inside SparkSubmit. */
+private[spark] object SparkSubmitUtils extends Logging {
+
+  // Directories for caching downloads through ivy and storing the jars 
when maven coordinates are
+  // supplied to spark-submit
+  private var PACKAGES_DIRECTORY: File = null
+
+  /**
+   * Represents a Maven Coordinate
+   * @param groupId the groupId of the coordinate
+   * @param artifactId the artifactId of the coordinate
+   * @param version the version of the coordinate
+   */
+  private[spark] case class MavenCoordinate(groupId: String, artifactId: 
String, version: String)
+
+  /**
+   * Resolves any dependencies that were supplied through maven coordinates
+   * @param coordinates Comma-delimited string of maven coordinates
+   * @param remoteRepos Comma-delimited string of remote repositories 
other than maven central
+   * @param ivyPath The path to the local ivy repository
+   * @return The comma-delimited path to the jars of the given maven 
artifacts including their
+   * transitive dependencies
+   */
+  private[spark] def resolveMavenCoordinates(
+  coordinates: String,
+  remoteRepos: String,
+  ivyPath: String,
+  isTest: Boolean = false): String = {
+if (coordinates == null || coordinates.trim.isEmpty) {
+  
+} else {
+  val artifacts = coordinates.split(,).map { p =
+val splits = p.split(:)
+require(splits.length == 3, sProvided Maven Coordinates must be 
in the form  +
+  s'groupId:artifactId:version'. The coordinate provided is: $p)
+require(splits(0) != null  splits(0).trim.nonEmpty, sThe 
groupId cannot be null or  +
+  sbe whitespace. The groupId provided is: ${splits(0)})
+require(splits(1) != null  splits(1).trim.nonEmpty, sThe 
artifactId cannot be null or  +
+  sbe whitespace. The artifactId provided is: ${splits(1)})
+require(splits(2) != null  splits(2).trim.nonEmpty, sThe 
version cannot be null or  +
+  sbe whitespace. The version provided is: ${splits(2)})
+new MavenCoordinate(splits(0), splits(1), splits(2))
+  }
+  // Default configuration name for ivy
+  val conf = default
--- End diff --

Could you rename this to something like `ivyConfName`?  By itself, `conf` 
is kind of ambiguous to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-30 Thread freeman-lab

Github user freeman-lab commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-72285668
  
@JoshRosen I finished the refactored tests and added better handling of the 
`getBytes` based on your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: remove redundant field childOutput from exec...

2015-01-30 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4291#issuecomment-72240710
  
Hi Kai, mind tagging this [SQL] so it can get properly sorted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4216#issuecomment-72247522
  
  [Test build #26413 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26413/consoleFull)
 for   PR 4216 at commit 
[`bf696ff`](https://github.com/apache/spark/commit/bf696ff0b7135883e53e5fb275b4afa0db6c4a4a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72265194
  
  [Test build #26419 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26419/consoleFull)
 for   PR 4047 at commit 
[`6fd1f71`](https://github.com/apache/spark/commit/6fd1f718ccef2464601256a84e99523c1f7d033f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Tokenizer(sc: SparkContext, stopwordFile: String) extends 
Serializable `
  * `  class EMOptimizer(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5486] Added validate method to BlockMat...

2015-01-30 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4279


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4259][MLlib]: Add Power Iteration Clust...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4254#issuecomment-72265723
  
  [Test build #26420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26420/consoleFull)
 for   PR 4254 at commit 
[`f292f31`](https://github.com/apache/spark/commit/f292f31309201ed01186a221824675bed84a6f17).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-30 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-72265733
  
  [Test build #26421 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26421/consoleFull)
 for   PR 4047 at commit 
[`1db89e2`](https://github.com/apache/spark/commit/1db89e2964aee34f9c33300be271dde41d61a782).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 469 matches

Mail list logo