[GitHub] spark pull request: [SPARK-1161] Add saveAsObjectFile and SparkCon...

2014-05-14 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/755#issuecomment-43131213
  
I see, I think that with this change, we'd promise to keep the pickle 
serializer's output the same in future versions, which should be easy. We can 
always add a new "pickle2" serializer if we find major problems with the 
current one. So I'd suggest changing this to pickleFile and saying that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Refactor the JAVA example to Java 8 lambda ver...

2014-05-14 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/777#issuecomment-43156225
  
The problem is users won't be able to build these example programs unless 
they are using Java 8.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/690#issuecomment-42520178
  
We are incorrectly detecting the `Option`s as repeated types (`Seq`), which 
are not supported in parquet yet.

[Fix here](https://github.com/ash211/spark/pull/1)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [HOTFIX] SPARK-1637: There are some Streaming ...

2014-05-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/673#issuecomment-42387240
  
This test was stuck so I killed it, let's try again. Jenkins retest this 
please.

```
[info] - repeatedly failing task that crashes JVM (7 seconds, 56 
milliseconds)
[info] - caching (3 seconds, 226 milliseconds)
[info] - caching on disk (3 seconds, 313 milliseconds)
[info] - caching in memory, replicated (3 seconds, 253 milliseconds)
[info] - caching in memory, serialized, replicated (3 seconds, 368 
milliseconds)
[info] - caching on disk, replicated (3 seconds, 454 milliseconds)
[info] - caching in memory and disk, replicated (3 seconds, 332 
milliseconds)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1563 Add package-info.java and package.s...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/599#issuecomment-43151776
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1563 Add package-info.java and package.s...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/599#issuecomment-43151756
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The org.datanucleus:* should not be packaged i...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/688#issuecomment-42506261
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Improve SparkSQL Aggregates

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/683#issuecomment-42489548
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565 (Addendum): Replace `run-example` w...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/704#issuecomment-42632330
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14838/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1752][MLLIB] Standardize text format fo...

2014-05-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/685#issuecomment-43150558
  
@mateiz @srowen I made the following updates:

1. Use `StringTokenizer` to simplify the logic. It is not deprecated in 
Java 8 
(http://docs.oracle.com/javase/8/docs/api/java/util/StringTokenizer.html). I 
didn't use Guava's Splitter because it is regex-based.
2. pyspark's `loadLabledPoints` uses Scala's implementation to avoid 
implementing a parser in pyspark.

So now `saveAsTextFile` works for `RDD[LabeledPoint]` in pyspark, for 
`RDD[Vector]` and `RDD[LabeledPoint]` in Scala/Java.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] Bug fix: lossHistory shoul...

2014-05-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/582#issuecomment-42623415
  
LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1745] Move interrupted flag from TaskCo...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/675#issuecomment-42491119
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [Docs] Warn about PySpark on YARN on Red Hat

2014-05-14 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/682

[Docs] Warn about PySpark on YARN on Red Hat

In #30, @tgraves and I both ran into the issue of building an assembly jar 
on a Red Hat system. The resulting jar does not load the python files properly, 
which were needed for running PySpark on YARN.

In the medium term, we should figure out what the issue is, since Red Hat 
is used quite commonly. For now, we should at the very least document it so 
people don't run into the same headaches that plagued us for days.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark pyspark-on-yarn-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #682






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: MLlib documentation fix

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/703#issuecomment-42621179
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1752][MLLIB] Standardize text format fo...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/685#issuecomment-42492518
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1768] History server enhancements.

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/718#issuecomment-42717198
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-1682: Add gradient descent w/o s...

2014-05-14 Thread dongwang218
Github user dongwang218 commented on a diff in the pull request:

https://github.com/apache/spark/pull/643#discussion_r12463232
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassification.scala
 ---
@@ -42,19 +45,33 @@ object BinaryClassification {
 
   object RegType extends Enumeration {
 type RegType = Value
-val L1, L2 = Value
+val L1, L2, RDA = Value
+  }
+
+  object Mode extends Enumeration {
+type Mode = Value
+val TRAIN, TEST, SPLIT = Value
--- End diff --

--test added and --mode is removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1755] Respect SparkSubmit --name on YAR...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/699#issuecomment-42596163
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14817/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1686: keep schedule() calling in the mai...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/639#issuecomment-42707168
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-571: forbid return statements in cleaned...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/717#issuecomment-42715644
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1774] Respect SparkSubmit --jars on YAR...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/710#issuecomment-42704433
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use numpy directly for matrix multiply.

2014-05-14 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/687#issuecomment-42512234
  
Thanks. Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1754] [SQL] Add missing arithmetic DSL ...

2014-05-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/689#discussion_r12416135
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala
 ---
@@ -381,6 +381,30 @@ class ExpressionEvaluationSuite extends FunSuite {
 checkEvaluation(Add(c1, Literal(null, IntegerType)), null, row)
 checkEvaluation(Add(Literal(null, IntegerType), c2), null, row)
 checkEvaluation(Add(Literal(null, IntegerType), Literal(null, 
IntegerType)), null, row)
+
+checkEvaluation(-c1, -1, row)
+checkEvaluation(c1 + c2, 3, row)
+checkEvaluation(c1 - c2, -1, row)
+checkEvaluation(c1 * c2, 2, row)
+checkEvaluation(c1 / c2, 0, row)
+checkEvaluation(c1 % c2, 1, row)
+  }
+
+  test("BinaryPredicate") {
--- End diff --

The tests above use the DSL too and test the whole truth table for each 
operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/690#issuecomment-42526092
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: fix broken in link in python docs

2014-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/650


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565 (Addendum): Replace `run-example` w...

2014-05-14 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/704#discussion_r12462341
  
--- Diff: bin/run-example ---
@@ -49,46 +31,31 @@ fi
 
 if [[ -z $SPARK_EXAMPLES_JAR ]]; then
   echo "Failed to find Spark examples assembly in $FWDIR/lib or 
$FWDIR/examples/target" >&2
-  echo "You need to build Spark with sbt/sbt assembly before running this 
program" >&2
+  echo "You need to build Spark before running this program" >&2
   exit 1
 fi
 
+SPARK_EXAMPLES_JAR_REL=${SPARK_EXAMPLES_JAR#$FWDIR/}
 
-# Since the examples JAR ideally shouldn't include spark-core (that 
dependency should be
-# "provided"), also add our standard Spark classpath, built using 
compute-classpath.sh.
-CLASSPATH=`$FWDIR/bin/compute-classpath.sh`
-CLASSPATH="$SPARK_EXAMPLES_JAR:$CLASSPATH"
-
-if $cygwin; then
-CLASSPATH=`cygpath -wp $CLASSPATH`
-export SPARK_EXAMPLES_JAR=`cygpath -w $SPARK_EXAMPLES_JAR`
-fi
-
-# Find java binary
-if [ -n "${JAVA_HOME}" ]; then
-  RUNNER="${JAVA_HOME}/bin/java"
-else
-  if [ `command -v java` ]; then
-RUNNER="java"
-  else
-echo "JAVA_HOME is not set" >&2
-exit 1
-  fi
-fi
+EXAMPLE_CLASS=""
+EXAMPLE_ARGS="[]"
+EXAMPLE_MASTER=${MASTER:-""}
 
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-JAVA_OPTS="$SPARK_JAVA_OPTS"
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e "$FWDIR/conf/java-opts" ] ; then
-  JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"
+if [ -n "$1" ]; then
+  EXAMPLE_CLASS="$1"
+  shift
 fi
-export JAVA_OPTS
 
-if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
-  echo -n "Spark Command: "
-  echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
-  echo ""
-  echo
+if [ -n "$1" ]; then
+  EXAMPLE_ARGS="$@"
 fi
 
-exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
+echo "NOTE: This script has been replaced with ./bin/spark-submit. Please 
run:" >&2
+echo
+echo "./bin/spark-submit \\" >&2
--- End diff --

Yes, I completely agree. We dont want the user to have to type out this 
more complicated stuff with library path and all. Just 
bin/run-example 

In fact, now that all the examples are inside spark.examples. package, we 
can try to make it even simpler. To run SparkPi, one should be able to just say

./bin/run-example SparkPi 

That would very simple!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...

2014-05-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/714#discussion_r12484987
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
 ---
@@ -76,6 +76,16 @@ class ExternalAppendOnlyMap[K, V, C](
   private val maxMemoryThreshold = {
 val memoryFraction = 
sparkConf.getDouble("spark.shuffle.memoryFraction", 0.3)
 val safetyFraction = 
sparkConf.getDouble("spark.shuffle.safetyFraction", 0.8)
+
+if (memoryFraction > 1 && memoryFraction <= 0) {
+  throw new Exception("spark.shuffle.memoryFraction should be between 
0 and 1.")
+}
+
+if (safetyFraction > 1 && safetyFraction <= 0) {
+  throw new Exception("spark.shuffle.safetyFraction should be between 
0 and 1.")
+}
+
--- End diff --

nit: can you remove all the blank lines that you added?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565, update examples to be used with sp...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/552#issuecomment-42417264
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14772/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1752][MLLIB] Standardize text format fo...

2014-05-14 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/685#discussion_r12502571
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/util/NumericParser.scala ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.util
+
+import scala.collection.mutable.{ArrayBuffer, ListBuffer}
+
+private[mllib] object NumericTokenizer {
+  val NUMBER = -1
+  val END = -2
+}
+
+import NumericTokenizer._
+
+/**
+ * Simple tokenizer for a numeric structure consisting of three types:
+ *
+ *  - number: a double in Java's floating number format
+ *  - array: an array of numbers stored as `[v0,v1,...,vn]`
+ *  - tuple: a list of numbers, arrays, or tuples stored as `(...)`
+ *
+ * @param s input string
+ * @param start start index
+ * @param end end index
+ */
+private[mllib] class NumericTokenizer(s: String, start: Int, end: Int) {
+
+  /**
+   * Creates a tokenizer for the entire input string.
+   */
+  def this(s: String) = this(s, 0, s.length)
+
+  private var cur = start
+  private var allowComma = false
+  private var _value = Double.NaN
+
+  /**
+   * Returns the most recent parsed number.
+   */
+  def value: Double = _value
+
+  /**
+   * Returns the next token, which could be any of the following:
+   *  - '[', ']', '(', or ')'.
+   *  - [[org.apache.spark.mllib.util.NumericTokenizer#NUMBER]], call 
value() to get its value.
+   *  - [[org.apache.spark.mllib.util.NumericTokenizer#END]].
+   */
+  def next(): Int = {
+if (cur < end) {
+  val c = s(cur)
+  c match {
+case '(' | '[' =>
+  allowComma = false
+  cur += 1
+  c
+case ')' | ']' =>
+  allowComma = true
+  cur += 1
+  c
+case ',' =>
+  if (allowComma) {
+cur += 1
+allowComma = false
+next()
+  } else {
+sys.error("Found a ',' at a wrong location.")
+  }
+case other => // expecting a number
+  var inNumber = true
+  val sb = new StringBuilder()
+  while (cur < end && inNumber) {
+val d = s(cur)
+if (d == ')' || d == ']' || d == ',') {
+  inNumber = false
+} else {
+  sb.append(d)
+  cur += 1
+}
+  }
+  _value = sb.toString().toDouble
--- End diff --

Double.parseDouble is probably more efficient


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/597#issuecomment-42405181
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/690#issuecomment-42520746
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...

2014-05-14 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/597#discussion_r12407647
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala ---
@@ -88,7 +92,27 @@ object MovieLensALS {
 
 val ratings = sc.textFile(params.input).map { line =>
   val fields = line.split("::")
-  Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble)
+  if (params.implicitPrefs) {
+/*
+ * MovieLens ratings are on a scale of 1-5:
+ * 5: Must see
+ * 4: Will enjoy
+ * 3: It's okay
+ * 2: Fairly bad
+ * 1: Awful
+ * So we should not recommend a movie if the predicted rating is 
less than 3.
+ * To map ratings to confidence scores, we use
+ * 5 -> 2.5, 4 -> 1.5, 3 -> 0.5, 2 -> -0.5, 1 -> -1.5. This 
mappings means unobserved
+ * entries are generally between It's okay and Fairly bad.
+ * The semantics of 0 in this expanded world of non-positive 
weights
+ * are "the same as never having interacted at all".
+ * It's possible that 0 values are ignored when constructing the 
sparse representation,
+ * because the 0s are implicit. This would be a problem, at least, 
a theoretical one.
--- End diff --

Shall we remove lines 109 and 110? MovieLens data does not have `0` ratings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42596705
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42499304
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Include the sbin/spark-config.sh in spark-exec...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/651#issuecomment-42611902
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1460] Returning SchemaRDD instead of no...

2014-05-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/448#issuecomment-42451285
  
Thanks for updating this. I'm merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1778] [SQL] Add 'limit' transformation ...

2014-05-14 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/711

[SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD.

Add `limit` transformation to `SchemaRDD`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-1778

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #711


commit 33169dfbea1fcac0e50f70b647f2542441ae84cb
Author: Takuya UESHIN 
Date:   2014-05-09T07:51:08Z

Add 'limit' transformation to SchemaRDD.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP][SPARK-1776] Have Spark's SBT build read ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/772#issuecomment-43081358
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Enabled incremental build that comes with sbt ...

2014-05-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/525#issuecomment-42404934
  
@markhamstra I was curious if you are convinced ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-897: preemptively serialize closures

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/143#issuecomment-42703898
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Include the sbin/spark-config.sh in spark-exec...

2014-05-14 Thread bouk
Github user bouk commented on the pull request:

https://github.com/apache/spark/pull/651#issuecomment-42455404
  
JIRA: https://issues.apache.org/jira/browse/SPARK-1725

The error that happens is `No module named pyspark`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565, update examples to be used with sp...

2014-05-14 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/552#issuecomment-42576140
  
This LGTM. Thanks @ScrapCodes for all the effort!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42512581
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/702#discussion_r12499609
  
--- Diff: docs/mllib-optimization.md ---
@@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
 Available algorithms for gradient descent:
 
 * 
[GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+
+### Limited-memory BFGS
+L-BFGS is currently only a low-level optimization primitive in `MLlib`. If 
you want to use L-BFGS in various 
+ML algorithms such as Linear Regression, and Logistic Regression, you have 
to pass the gradient of objective
+function, and updater into optimizer yourself instead of using the 
training APIs like 

+[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression).
+See the example below. It will be addressed in the next release. 
+
+The L1 regularization by using 

+[Updater.L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.Updater)
 will not work since the 
+soft-thresholding logic in L1Updater is designed for gradient descent.
+
+The L-BFGS method

+[LBFGS.runLBFGS](api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS)
+has the following parameters:
+
+* `gradient` is a class that computes the gradient of the objective 
function
+being optimized, i.e., with respect to a single training example, at the
+current parameter value. MLlib includes gradient classes for common loss
+functions, e.g., hinge, logistic, least-squares.  The gradient class takes 
as
+input a training example, its label, and the current parameter value. 
+* `updater` is a class originally designed for gradient decent which 
computes 
--- End diff --

Agree. I will move it into the comment in the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [FIX] do not load defaults when testing SparkC...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/775#issuecomment-43141644
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14992/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1752][MLLIB] Standardize text format fo...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/685#issuecomment-43141068
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify GraphImpl RDDs + other graph load optimi...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/497#issuecomment-42753685
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [FIX] do not load defaults when testing SparkC...

2014-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/775


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1752][MLLIB] Standardize text format fo...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/685#issuecomment-43143519
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1833 - Have an empty SparkContext constr...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/774#issuecomment-43127959
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support leftsemijoin for sparkSQL

2014-05-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/395#issuecomment-42614695
  
Just checking in to see if there is anything I can help with here.  Would 
be cool to have this feature!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-14 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/702#discussion_r12460419
  
--- Diff: docs/mllib-optimization.md ---
@@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
 Available algorithms for gradient descent:
 
 * 
[GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+
+### Limited-memory BFGS
+L-BFGS is currently only a low-level optimization primitive in `MLlib`. If 
you want to use L-BFGS in various 
+ML algorithms such as Linear Regression, and Logistic Regression, you have 
to pass the gradient of objective
+function, and updater into optimizer yourself instead of using the 
training APIs like 

+[LogisticRegression.LogisticRegressionWithSGD](api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegression).
+See the example below. It will be addressed in the next release. 
+
+The L1 regularization by using 

+[Updater.L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.Updater)
 will not work since the 
--- End diff --

`L1Updater` is not under `Updater`. Should change to

~~~

[L1Updater](api/mllib/index.html#org.apache.spark.mllib.optimization.L1Updater)
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1620] Handle uncaught exceptions in fun...

2014-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/622


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: The org.datanucleus:* should not be packaged i...

2014-05-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/688#issuecomment-42730490
  
@witgo - is there a jira for this? If so, mind adding the jira number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Enable repartitioning of graph over different ...

2014-05-14 Thread ankurdave
Github user ankurdave commented on a diff in the pull request:

https://github.com/apache/spark/pull/719#discussion_r12506314
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala ---
@@ -78,8 +78,14 @@ class GraphImpl[VD: ClassTag, ED: ClassTag] protected (
 this
   }
 
-  override def partitionBy(partitionStrategy: PartitionStrategy): 
Graph[VD, ED] = {
-val numPartitions = edges.partitions.size
+
+override def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, 
ED] = {
+  val numPartitions = edges.partitions.size
+  partitionBy(partitionStrategy, numPartitions)
+}
+
+override def partitionBy(partitionStrategy: PartitionStrategy, 
numPartitions: Int): Graph[VD, ED] = {
--- End diff --

Indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Improve SparkSQL Aggregates

2014-05-14 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/683#discussion_r12404117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
 ---
@@ -86,6 +86,67 @@ abstract class AggregateFunction
   override def newInstance() = makeCopy(productIterator.map { case a: 
AnyRef => a }.toArray)
 }
 
+case class Min(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
+  override def references = child.references
+  override def nullable = child.nullable
+  override def dataType = child.dataType
+  override def toString = s"MIN($child)"
+
+  override def asPartial: SplitEvaluation = {
+val partialMin = Alias(Min(child), "PartialMin")()
+SplitEvaluation(Min(partialMin.toAttribute), partialMin :: Nil)
+  }
+
+  override def newInstance() = new MinFunction(child, this)
+}
+
+case class MinFunction(expr: Expression, base: AggregateExpression) 
extends AggregateFunction {
--- End diff --

Good point, though this is not an issue in the code gen version.
On May 7, 2014 2:28 PM, "Reynold Xin"  wrote:

> In
> 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala:
>
> > @@ -86,6 +86,67 @@ abstract class AggregateFunction
> >override def newInstance() = makeCopy(productIterator.map { case a: 
AnyRef => a }.toArray)
> >  }
> >
> > +case class Min(child: Expression) extends PartialAggregate with 
trees.UnaryNode[Expression] {
> > +  override def references = child.references
> > +  override def nullable = child.nullable
> > +  override def dataType = child.dataType
> > +  override def toString = s"MIN($child)"
> > +
> > +  override def asPartial: SplitEvaluation = {
> > +val partialMin = Alias(Min(child), "PartialMin")()
> > +SplitEvaluation(Min(partialMin.toAttribute), partialMin :: Nil)
> > +  }
> > +
> > +  override def newInstance() = new MinFunction(child, this)
> > +}
> > +
> > +case class MinFunction(expr: Expression, base: AggregateExpression) 
extends AggregateFunction {
>
> this is unrelated to this pr - but I just realized the way we are storing
> the aggregation buffer in Spark SQL uses much more memory than needed,
> because there are two extra pointers to expr/base, which is identical for
> every tuple.
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [HOTFIX] SPARK-1637: There are some Streaming ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/673#issuecomment-42389413
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1639. Tidy up some Spark on YARN code

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/561#issuecomment-42469134
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42590141
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Converted bang to ask to avoid scary warning w...

2014-05-14 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/708

Converted bang to ask to avoid scary warning when a block is removed

Removing a block through the blockmanager gave a scary warning messages in 
the driver. 
```
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: 
true
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: 
true
2014-05-08 20:16:19,172 WARN BlockManagerMasterActor: Got unknown message: 
true
```

This is because the 
[BlockManagerSlaveActor](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManagerSlaveActor.scala#L44)
 would send back an acknowledgement ("true"). But the BlockManagerMasterActor 
would have sent the RemoveBlock message as a send, not as ask(), so would 
reject the receiver "true" as a unknown message. 
@pwendell





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark bm-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/708.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #708


commit ed4ef151c891825293f2fc596ce7361fc8e6ca3f
Author: Tathagata Das 
Date:   2014-05-09T03:11:59Z

Converted bang to ask to avoid scary warning when a block is removed.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Added SparkGCE Script for Version 0.9.1

2014-05-14 Thread sigmoidanalytics
GitHub user sigmoidanalytics opened a pull request:

https://github.com/apache/spark/pull/681

Added SparkGCE Script for Version 0.9.1

I have added the SparkGCE script in this pull request. Just like the 
spark_ec2 script, this one also reads certain command-line arguments (See the 
README.md for more details) like the cluster name and all, then starts the 
machines in the google cloud, sets up the network, adds a 500GB empty disk to 
all machines, generate the ssh keys on master and transfer it to all slaves and 
install java and downloads and configures 
Spark-v0.9.1/Shark-v0.9.1/Hadoop-v0.9.1. Also it starts the shark server 
automatically. Currently the version is 0.9.1 but I'm happy to add/support more 
versions

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sigmoidanalytics/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #681


commit fc228d0bb45b5ecfbc1099b1b2bc9fe0cc3c4855
Author: AkhlD 
Date:   2014-05-07T15:34:26Z

Added SparkGCE Script for Version 0.9.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Improve SparkSQL Aggregates

2014-05-14 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/683

[SQL] Improve SparkSQL Aggregates

* Add native min/max (was using hive before).
* Handle nulls correctly in Avg and Sum.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark aggFixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #683


commit 64fe30b0f58e12a139fe53f1b33eb8b45ef6e9a8
Author: Michael Armbrust 
Date:   2014-05-07T20:45:13Z

Improve SparkSQL Aggregates
* Add native min/max (was using hive before).
* Handle nulls correctly in Avg and Sum.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1753] Warn about PySpark on YARN on Red...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/682#issuecomment-42515594
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14805/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42503964
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1743][MLLIB] add loadLibSVMFile and sav...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/672#issuecomment-42392939
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify a typo in monitoring.md

2014-05-14 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/698

Modify a typo in monitoring.md

As I mentioned in SPARK-1765, there is a word 'JXM' in monitoring.md.
I think it's typo for 'JMX'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-1765

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/698.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #698


commit bae984363813194f179b07a76847496bd167cb06
Author: Kousuke Saruta 
Date:   2014-05-07T09:18:38Z

modified a typoe in monitoring.md




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1770: Revert accidental(?) fix

2014-05-14 Thread aarondav
GitHub user aarondav opened a pull request:

https://github.com/apache/spark/pull/716

SPARK-1770: Revert accidental(?) fix

Looks like this change was accidentally committed here:

https://github.com/apache/spark/commit/06b15baab25951d124bbe6b64906f4139e037deb

but the change does not show up in the PR itself (#704).

Other than not intending to go in with that PR, this also broke the test 
JavaAPISuite.repartition.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aarondav/spark shufflerand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #716


commit b1cf70b1aa3745b6a44c6407af904047caf8a5d0
Author: Aaron Davidson 
Date:   2014-05-09T21:08:05Z

SPARK-1770: Revert accidental(?) fix

Looks like this change was accidentally committed here:

https://github.com/apache/spark/commit/06b15baab25951d124bbe6b64906f4139e037deb

but the change does not show up in the PR itself (#704).

Other than not intending to go in with that PR, this also broke the test
JavaApiSuite.repartition.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1569 Spark on Yarn, authentication broke...

2014-05-14 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/649#issuecomment-42422825
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/702#issuecomment-42624704
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14829/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565 (Addendum): Replace `run-example` w...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/704#issuecomment-42633954
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1708. Add a ClassTag on Serializer and t...

2014-05-14 Thread mateiz
GitHub user mateiz opened a pull request:

https://github.com/apache/spark/pull/700

SPARK-1708. Add a ClassTag on Serializer and things that depend on it

This pull request contains a rebased patch from @heathermiller 
(https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer 
and types that depend on it (Broadcast and AccumulableCollection). Putting 
these in the public API signatures now will allow us to use Scala Pickling for 
serialization down the line without breaking binary compatibility.

One question remaining is whether we also want them on Accumulator -- 
Accumulator is passed as part of a bigger Task or TaskResult object via the 
closure serializer so it doesn't seem super useful to add the ClassTag there. 
Broadcast and AccumulableCollection in contrast were being serialized directly.

CC @rxin, @pwendell, @heathermiller

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mateiz/spark spark-1708

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/700.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #700


commit 9d48830dce909634b68bb6e34f928345aeab42c1
Author: Matei Zaharia 
Date:   2014-05-08T21:37:34Z

Add a ClassTag on Serializer and things that depend on it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Proposal: clarify Scala programming guide on c...

2014-05-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/668#issuecomment-42386975
  
Okay I can merge this, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] Fix Performance Issue in data type casti...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/679#issuecomment-42402714
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/636#discussion_r12380922
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -523,6 +504,90 @@ private[spark] class Master(
 }
   }
 
+  private def startMultiExecutorsPerWorker() {
+// allow user to run multiple executors in the same worker
+// (within the same worker JVM process)
+if (spreadOutApps) {
+  for (app <- waitingApps if app.coresLeft > 0) {
+val coreNumPerExecutor = app.desc.corePerExecutor
+var usableWorkers = workers.toArray.filter(_.state == 
WorkerState.ALIVE)
+  .filter(_.coresFree > 0).sortBy(_.coresFree).reverse
+var leftCoreToAssign = math.min(app.coresLeft, 
usableWorkers.map(_.coresFree).sum)
+val numUsable = usableWorkers.length
+// Number of cores of each executor assigned to each worker
+val assigned = Array.fill[ListBuffer[Int]](numUsable)(new 
ListBuffer[Int])
+var pos = 0
+val memoryNotEnoughFlags = new Array[Boolean](numUsable)
+while (leftCoreToAssign > 0 && 
memoryNotEnoughFlags.contains(false)) {
--- End diff --

This is an extremely expensive loop - would be better to rewrite it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/690#issuecomment-42521008
  
Michael's fix looks good and the test passes now, so unless there's an 
overhaul of this technique coming soon it probably makes sense to merge this in 
(pre-1.0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...

2014-05-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/714#discussion_r12484999
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1045,6 +1045,9 @@ private[spark] object BlockManager extends Logging {
 
   def getMaxMemory(conf: SparkConf): Long = {
 val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 
0.6)
+if (memoryFraction > 1 && memoryFraction <= 0) {
--- End diff --

(Same in `ExternalAppendOnlyMap`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify GraphImpl RDDs + other graph load optimi...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/497#issuecomment-42753772
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42630686
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Use numpy directly for matrix multiply.

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/687#issuecomment-42504227
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/690

SPARK-1757 Failing test for saving null primitives with .saveAsParquetFile()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark rdd-parquet-save

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #690


commit 8f3f28144180ab9e1ea29b15ca1010f40690d0b0
Author: Andrew Ash 
Date:   2014-05-08T06:18:22Z

SPARK-1757 Add failing test for saving SparkSQL Schemas with Option[?] 
fields as parquet




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update RoutingTable.scala

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/647#issuecomment-42470889
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14781/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1565, update examples to be used with sp...

2014-05-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/552#discussion_r12389167
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/JavaLogQuery.java ---
@@ -98,15 +99,11 @@ public static Stats extractStats(String line) {
   }
 
   public static void main(String[] args) {
-if (args.length == 0) {
-  System.err.println("Usage: JavaLogQuery  [logFile]");
--- End diff --

Great, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1757 Failing test for saving null primit...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/690#issuecomment-42520853
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42509043
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1644] The org.datanucleus:* should not ...

2014-05-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/688#issuecomment-42748037
  
Okay thanks sounds good. I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [MLLIB] SPARK-1682: Add gradient descent w/o s...

2014-05-14 Thread dongwang218
Github user dongwang218 commented on a diff in the pull request:

https://github.com/apache/spark/pull/643#discussion_r12463263
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala ---
@@ -128,6 +128,45 @@ class L1Updater extends Updater {
 
 /**
  * :: DeveloperApi ::
+ * Updater for Enhanced L1-RDA regularized problems.
+ *  R(w) = ||w||_1
+ * Ignore the existing weights, but use average of gradient to compute new 
weights
+ * and apply L1 saturated thresholding. The enhanced version has `rho` 
which results
+ * in even sparse weights.
--- End diff --

this will be another pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1827. LICENSE and NOTICE files need a re...

2014-05-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/770


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: bugfix: overflow of graphx Edge compare functi...

2014-05-14 Thread zhpengg
GitHub user zhpengg opened a pull request:

https://github.com/apache/spark/pull/769

bugfix: overflow of graphx Edge compare function



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhpengg/spark bugfix-graphx-edge-compare

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #769


commit 413c2580783c4d33f1bb4f848e9325bd3565d65a
Author: Zhen Peng 
Date:   2014-05-14T08:00:23Z

there maybe a overflow for two Long's substraction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42660436
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify GraphImpl RDDs + other graph load optimi...

2014-05-14 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/497#issuecomment-42483761
  
@pwendell Merged and added an upgrade section to the GraphX programming 
guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP]SPARK-1706: Allow multiple executors per ...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42654535
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: L-BFGS Documentation

2014-05-14 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/702#discussion_r12460330
  
--- Diff: docs/mllib-optimization.md ---
@@ -128,10 +128,24 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n = 
1$`, then the algorithm is
 standard SGD. In that case, the step direction depends from the uniformly 
random sampling of the
 point.
 
+### Limited-memory BFGS
+[Limited-memory BFGS 
(L-BFGS)](http://en.wikipedia.org/wiki/Limited-memory_BFGS) is an optimization 
+algorithm in the family of quasi-Newton methods to solve the optimization 
problems of the form 
+`$\min_{\wv \in\R^d} \; f(\wv)$`. The L-BFGS approximates the objective 
function locally as a quadratic
+without evaluating the second partial derivatives of the objective 
function to construct the 
+Hessian matrix. The Hessian matrix is approximated by previous gradient 
evaluations, so there is no 
+vertical scalability issue (the number of training features) when 
computing the Hessian matrix 
+explicitly in Newton method. As a result, L-BFGS often achieves rapider 
convergence compared with 
--- End diff --

`Newton's method`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Bug fix of sparse vector conversion

2014-05-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/661#issuecomment-42575036
  
LGTM. Do you mind creating a JIRA for this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixing typo in als.py

2014-05-14 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/696#issuecomment-42596950
  
LGTM. Merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Synthetic GraphX Benchmark

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/720#issuecomment-42724915
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1745] Move interrupted flag from TaskCo...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/675#issuecomment-42458933
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1706: Allow multiple executors per worke...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/636#issuecomment-42509501
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: 【SPARK-1779】add warning when memoryFractio...

2014-05-14 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-42688273
  
ok, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1712: TaskDescription instance is too bi...

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/694#issuecomment-42553850
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1749] Job cancellation when SchedulerBa...

2014-05-14 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/686#discussion_r12408395
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1148,7 +1154,11 @@ private[scheduler] class 
DAGSchedulerActorSupervisor(dagScheduler: DAGScheduler)
   case x: Exception =>
 logError("eventProcesserActor failed due to the error %s; shutting 
down SparkContext"
   .format(x.getMessage))
-dagScheduler.doCancelAllJobs()
+try {
+  dagScheduler.doCancelAllJobs()
+} catch {
+  case t: Throwable => logError("DAGScheduler failed to cancel all 
jobs.", t)
--- End diff --

Who knows just what the SchedulerBackend is going to throw now or in the 
future?  UnsupportedOperationException is handled in 
failJobAndIndependentStages, but if something else is thrown out of the backend 
or doCancelAllJobs fails for any other reason, we'll just log it here and 
continue trying to shutdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   >