[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1780#issuecomment-51157549
  
QA results for PR 1780:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-51156997
  
QA tests have started for PR 714. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17932/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-51156861
  
QA results for PR 1775:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17924/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-51156760
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1779#issuecomment-51156680
  
QA results for PR 1779:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17925/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1780#discussion_r15797096
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -47,7 +47,9 @@ class KryoSerializer(conf: SparkConf)
   with Logging
   with Serializable {
 
-  private val bufferSize = conf.getInt("spark.kryoserializer.buffer.mb", 
2) * 1024 * 1024
+  private val bufferSize =
+(conf.getDouble("spark.kryoserializer.buffer.mb", 0.064) * 1024 * 
1024).toInt
--- End diff --

maybe add a comment `// 64KB`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1707#issuecomment-51156443
  
Thanks for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1707#issuecomment-51156435
  
Jenkins actually passed this (see 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17919/consoleFull)
 but a glitch in the reporting script made it not post here, so going to merge 
it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796941
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging {
   .map { case (k, v) => s"-D$k=$v" }
   }
 
+  /**
+   * Attempt to start a service on the given port, or fail after a number 
of attempts.
+   * Each subsequent attempt uses 1 + the port used in the previous 
attempt.
+   *
+   * @param startPort The initial port to start the service on.
+   * @param maxRetries Maximum number of retries to attempt.
+   *   A value of 3 means attempting ports n, n+1, n+2, 
and n+3, for example.
+   * @param startService Function to start service on a given port.
+   * This is expected to throw java.net.BindException 
on port collision.
+   * @throws SparkException When unable to start the service after a given 
number of attempts
+   */
+  def startServiceOnPort[T](
+  startPort: Int,
+  startService: Int => (T, Int),
+  serviceName: String = "",
+  maxRetries: Int = 3): (T, Int) = {
--- End diff --

sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796780
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -84,7 +84,8 @@ private[spark] class Executor(
   // Initialize Spark environment (using system properties read above)
   private val env = {
 if (!isLocal) {
-  val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
+  val port = conf.getInt("spark.executor.env.port", 0) // TODO: 
document this
--- End diff --

There's already a `spark.executor.port`, and these two overlap


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51155814
  
Hey Andrew - overall this looks good. I think ultimately we'll need to just 
lock down a cluster and test this by opening up ports "one by one", but I think 
this is worth merging with the current coverage. Some comments about docs mostly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51155708
  
Alright, merged it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796709
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging {
   .map { case (k, v) => s"-D$k=$v" }
   }
 
+  /**
+   * Attempt to start a service on the given port, or fail after a number 
of attempts.
+   * Each subsequent attempt uses 1 + the port used in the previous 
attempt.
+   *
+   * @param startPort The initial port to start the service on.
+   * @param maxRetries Maximum number of retries to attempt.
+   *   A value of 3 means attempting ports n, n+1, n+2, 
and n+3, for example.
+   * @param startService Function to start service on a given port.
+   * This is expected to throw java.net.BindException 
on port collision.
+   * @throws SparkException When unable to start the service after a given 
number of attempts
+   */
+  def startServiceOnPort[T](
+  startPort: Int,
+  startService: Int => (T, Int),
+  serviceName: String = "",
+  maxRetries: Int = 3): (T, Int) = {
--- End diff --

That seems reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1309#issuecomment-51155585
  
QA results for PR 1309:- This patch PASSES unit tests.For more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796608
  
--- Diff: docs/spark-standalone.md ---
@@ -311,76 +311,103 @@ configure those ports.
   
   
 Browser
-Standalone Cluster Master
+Master
 8080
 Web UI
-master.ui.port
+master.ui.portSPARK_MASTER_WEBUI_PORT
 Jetty-based
   
   
 Browser
-Driver
-4040
+Worker
+8081
 Web UI
-spark.ui.port
+worker.ui.portSPARK_WORKER_WEBUI_PORT
 Jetty-based
   
   
 Browser
-History Server
-18080
+Application
+4040
 Web UI
-spark.history.ui.port
+spark.ui.port
 Jetty-based
   
   
 Browser
-Worker
-8081
+History Server
+18080
 Web UI
-worker.ui.port
+spark.history.ui.port
 Jetty-based
   
   
   
-Application
-Standalone Cluster Master
+DriverWorker
+Master
 7077
-Submit job to cluster
-spark.driver.port
-Akka-based.  Set to "0" to choose a port randomly
+Submit job to clusterJoin cluster
+SPARK_MASTER_PORT
+Akka-based. Set to "0" to choose a port randomly.
   
   
+Master
 Worker
-Standalone Cluster Master
--- End diff --

This overall could use some reorganization. I'd actually move the ones that 
are not specific to standalone mode to the "Security" page. Also, the new 
options should be listed in the "Networking" section of `docs/configuration.md`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796591
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging {
   .map { case (k, v) => s"-D$k=$v" }
   }
 
+  /**
+   * Attempt to start a service on the given port, or fail after a number 
of attempts.
+   * Each subsequent attempt uses 1 + the port used in the previous 
attempt.
+   *
+   * @param startPort The initial port to start the service on.
+   * @param maxRetries Maximum number of retries to attempt.
+   *   A value of 3 means attempting ports n, n+1, n+2, 
and n+3, for example.
+   * @param startService Function to start service on a given port.
+   * This is expected to throw java.net.BindException 
on port collision.
+   * @throws SparkException When unable to start the service after a given 
number of attempts
+   */
+  def startServiceOnPort[T](
+  startPort: Int,
+  startService: Int => (T, Int),
+  serviceName: String = "",
+  maxRetries: Int = 3): (T, Int) = {
--- End diff --

Part of me is worried that the maxRetries parameter here is an effective 
cap on the number of concurrent shells/executors/drivers that can be run on one 
machine at once when in restricted firewall mode.  Because in Standalone mode 
each app gets its own set of executors across the cluster, this is a cap on the 
number of concurrent applications on a cluster.

What do you think of creating a config option, say spark.ports.maxRetries 
that can be set to change this?  I might change the default to be a bit higher, 
like 16 or so.

The way I'd expect network teams to run this then, is set 
spark.ports.maxRetries to say 20, and then open up a range of size 20 starting 
from each of the relevant ports in the config list you pasted in the summary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796416
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -84,7 +84,8 @@ private[spark] class Executor(
   // Initialize Spark environment (using system properties read above)
   private val env = {
 if (!isLocal) {
-  val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
+  val port = conf.getInt("spark.executor.env.port", 0) // TODO: 
document this
--- End diff --

what about just `spark.executor.port` (ala `spark.driver.port`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796377
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala ---
@@ -30,8 +30,9 @@ object DriverWrapper {
 args.toList match {
   case workerUrl :: mainClass :: extraArgs =>
 val conf = new SparkConf()
+val watcherPort = conf.getInt("spark.worker.watcher.port", 0) // 
TODO: document this
--- End diff --

this is only ever used within one machine, I'm pretty sure these types of 
ports don't need to be configurable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1781#issuecomment-51154536
  
QA tests have started for PR 1781. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17930/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796365
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala ---
@@ -146,6 +146,7 @@ object Client {
 }
 
 val conf = new SparkConf()
+val port = conf.getInt("spark.standalone.client.port", 0) // TODO: 
document this
--- End diff --

btw - there is a comment below to this effect, but I think there may be an 
akka option where it doesn't need us to open up a server on the client at all. 
Not worth spending time on though for this patch...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15796327
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala ---
@@ -146,6 +146,7 @@ object Client {
 }
 
 val conf = new SparkConf()
+val port = conf.getInt("spark.standalone.client.port", 0) // TODO: 
document this
--- End diff --

Are you planning to do this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...

2014-08-04 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/1779#issuecomment-51153773
  
+1 I've always used SPARK_MASTER_WEBUI_PORT and SPARK_WORKER_WEBUI_PORT in 
spark-env.sh , I'd imagine everyone else has been also


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...

2014-08-04 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1781

[SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 
32KB.


This can substantially reduce memory usage during shuffle.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark 
SPARK-2503-spark.shuffle.file.buffer.kb

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1781.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1781


commit 1b8f72b387c10d53b54ae3a2e7eb6b9b72f14d36
Author: Reynold Xin 
Date:   2014-08-05T06:06:02Z

[SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 
32KB.

This can substantially reduce memory usage during shuffle.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15796202
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -73,11 +73,36 @@ def predict(self, x):
 
 class LogisticRegressionWithSGD(object):
 @classmethod
-def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, 
initialWeights=None):
-"""Train a logistic regression model on the given data."""
+def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0,
+  initialWeights=None, regParam=1.0, regType=None, 
intercept=False):
+"""
+Train a logistic regression model on the given data.
+
+@param data:  The training data.
+@param iterations:The number of iterations (default: 100).
+@param step:  The step parameter used in SGD
+  (default: 1.0).
+@param miniBatchFraction: Fraction of data to be used for each SGD
+  iteration.
+@param initialWeights:The initial weights (default: None).
+@param regParam:  The regularizer parameter (default: 1.0).
+@param regType:   The type of regularizer used for training
+  our model.
+  Allowed values: "l1" for using L1Updater,
+  "l2" for using
+   SquaredL2Updater,
+  "none" for no 
regularizer.
+  (default: "none")
+@param intercept: Boolean parameter which indicates the use
+  or not of the augmented representation 
for
+  training data (i.e. whether bias features
+  are activated or not).
+"""
 sc = data.context
+if regType is None:
--- End diff --

Ok fair enough

(@mengxr I wasn't suggesting enumerations, just a pattern match on the 
`Option[String]` value as per comment. Don't believe this adds more code or 
complexity, but no strong feelings either way)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1780#issuecomment-51153633
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...

2014-08-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1648#discussion_r15796222
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -38,6 +39,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
 
   override def beforeAll() {
 TestHive.cacheTables = true
+TestHive.set(SQLConf.SHUFFLE_PARTITIONS, "2")
--- End diff --

We should keep it at 2 to speed up tests ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1779#issuecomment-51153508
  
LGTM pending tests - thanks Andrew. I'm guessing these were simply un-used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1507#discussion_r15796149
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala ---
@@ -131,7 +122,9 @@ object BlockFetcherIterator {
 val networkSize = blockMessage.getData.limit()
 results.put(new FetchResult(blockId, sizeMap(blockId),
   () => dataDeserialize(blockId, blockMessage.getData, 
serializer)))
-_remoteBytesRead += networkSize
+// TODO: race conditions can occur here with 
NettyBlockFetcherIterator
--- End diff --

Also this comment is pretty vague. It would be good if you could elaborate 
on it (what you described in the JIRA itself is good enough)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2379] Fix the bug that streaming's rece...

2014-08-04 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/1694#issuecomment-51153366
  
Well, I have merged this patch already, in an attempt to squeeze it in 1.1 
release. If you open another patch to make the change, I can try squeezing that 
too. Thanks for detecting and fixing this bug!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1507#discussion_r15796125
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -98,19 +105,22 @@ class TaskMetrics extends Serializable {
*/
   var updatedBlocks: Option[Seq[(BlockId, BlockStatus)]] = None
 
-  /** Adds the given ShuffleReadMetrics to any existing shuffle metrics 
for this task. */
-  def updateShuffleReadMetrics(newMetrics: ShuffleReadMetrics) = 
synchronized {
-_shuffleReadMetrics match {
-  case Some(existingMetrics) =>
-existingMetrics.shuffleFinishTime = math.max(
-  existingMetrics.shuffleFinishTime, newMetrics.shuffleFinishTime)
-existingMetrics.fetchWaitTime += newMetrics.fetchWaitTime
-existingMetrics.localBlocksFetched += newMetrics.localBlocksFetched
-existingMetrics.remoteBlocksFetched += 
newMetrics.remoteBlocksFetched
-existingMetrics.remoteBytesRead += newMetrics.remoteBytesRead
-  case None =>
-_shuffleReadMetrics = Some(newMetrics)
+  def createShuffleReadMetricsForDependency(): ShuffleReadMetrics = 
synchronized {
+val readMetrics = new ShuffleReadMetrics()
+depsShuffleReadMetrics += readMetrics
+readMetrics
+  }
+
+  def mergeShuffleReadMetrics() = synchronized {
--- End diff --

Could you add a brief comment on what this does? (and when this happens)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1778#issuecomment-51153091
  
QA tests have started for PR 1778. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17929/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1507#discussion_r15796079
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala ---
@@ -191,7 +184,7 @@ object BlockFetcherIterator {
 }
   }
   logInfo("Getting " + _numBlocksToFetch + " non-empty blocks out of " 
+
-(numLocal + numRemote) + " blocks")
+totalBlocks + " blocks")
--- End diff --

Is this ever used other than for logging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1780#issuecomment-51152729
  
QA tests have started for PR 1780. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1507#discussion_r15796012
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala ---
@@ -131,7 +122,9 @@ object BlockFetcherIterator {
 val networkSize = blockMessage.getData.limit()
 results.put(new FetchResult(blockId, sizeMap(blockId),
   () => dataDeserialize(blockId, blockMessage.getData, 
serializer)))
-_remoteBytesRead += networkSize
+// TODO: race conditions can occur here with 
NettyBlockFetcherIterator
--- End diff --

Could you add a reference to the JIRA to the comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...

2014-08-04 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/1780

[SPARK-2856] Decrease initial buffer size for Kryo to 64KB.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark kryo-init-size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1780.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1780


commit 551b935c3db56cb214ebbea6922cc5b6d37d229a
Author: Reynold Xin 
Date:   2014-08-05T05:52:05Z

[SPARK-2856] Decrease initial buffer size for Kryo to 64KB.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1778#issuecomment-51152551
  
QA results for PR 1778:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17926/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1777#discussion_r15795950
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -84,7 +84,8 @@ private[spark] class Executor(
   // Initialize Spark environment (using system properties read above)
   private val env = {
 if (!isLocal) {
-  val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
+  val port = conf.getInt("spark.executor.env.port", 0) // TODO: 
document this
--- End diff --

There is probably a better name for this. I just don't know what else to 
call it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1779#issuecomment-51152374
  
QA tests have started for PR 1779. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17925/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1778#issuecomment-51152380
  
QA tests have started for PR 1778. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17926/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51152366
  
I see, that makes sense. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...

2014-08-04 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-51152316
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread miccagiann
Github user miccagiann commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795902
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -73,11 +73,36 @@ def predict(self, x):
 
 class LogisticRegressionWithSGD(object):
 @classmethod
-def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, 
initialWeights=None):
-"""Train a logistic regression model on the given data."""
+def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0,
+  initialWeights=None, regParam=1.0, regType=None, 
intercept=False):
+"""
+Train a logistic regression model on the given data.
+
+@param data:  The training data.
+@param iterations:The number of iterations (default: 100).
+@param step:  The step parameter used in SGD
+  (default: 1.0).
+@param miniBatchFraction: Fraction of data to be used for each SGD
+  iteration.
+@param initialWeights:The initial weights (default: None).
+@param regParam:  The regularizer parameter (default: 1.0).
+@param regType:   The type of regularizer used for training
+  our model.
+  Allowed values: "l1" for using L1Updater,
+  "l2" for using
+   SquaredL2Updater,
+  "none" for no 
regularizer.
+  (default: "none")
+@param intercept: Boolean parameter which indicates the use
+  or not of the augmented representation 
for
+  training data (i.e. whether bias features
+  are activated or not).
+"""
 sc = data.context
+if regType is None:
--- End diff --

Xiangrui suggested to keep Scala code as simple as possible and only to 
throw the `IllegalArgumentException` from there. I tried with pattern matching 
and by creating enumerations however the result was complicated and I ended up 
adding more classes to the scala and to python code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2857] Correct properties to set Master ...

2014-08-04 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/1779

[SPARK-2857] Correct properties to set Master / Worker ports

`master.ui.port` and `worker.ui.port` were never picked up by SparkConf, 
simply because they are not prefixed with "spark." Unfortunately, this is also 
currently the documented way of setting these values.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark master-worker-port

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1779


commit 4db3d5d30b8a3ae6d186a4595ff6d52b39590200
Author: Andrew Or 
Date:   2014-08-05T05:44:17Z

Stop using configs that don't actually work

commit 8475e95ea2724918fc29c132384c2c4723acf0c4
Author: Andrew Or 
Date:   2014-08-05T05:44:32Z

Update docs to reflect changes in configs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: DIMSUM: Dimension Independent Matrix Square us...

2014-08-04 Thread rezazadeh
GitHub user rezazadeh opened a pull request:

https://github.com/apache/spark/pull/1778

DIMSUM: Dimension Independent Matrix Square using Mapreduce

# DIMSUM
Compute all pairs of similar vectors using brute force approach, and also 
DIMSUM sampling approach.

Laying down some notation: we are looking for all pairs of similar columns 
in an m x n matrix whose entries are denoted a_ij, with the i’th row denoted 
r_i and the j’th column denoted c_j. There is an oversampling parameter 
labeled ɣ that should be set to 4 log(n)/s to get provably correct results 
(with high probability), where s is the similarity threshold.

The algorithm is stated with a Map and Reduce, with proofs of correctness 
and efficiency in published papers [1] [2]. The reducer is simply the summation 
reducer. The mapper is more interesting, and is also the heart of the scheme. 
As an exercise, you should try to see why in expectation, the map-reduce below 
outputs cosine similarities.


![dimsumv2](https://cloud.githubusercontent.com/assets/3220351/3807272/d1d9514e-1c62-11e4-9f12-3cfdb1d78b3a.png)

[1] Bosagh-Zadeh, Reza and Carlsson, Gunnar (2013), Dimension Independent 
Matrix Square using MapReduce, arXiv:1304.1467

[2] Bosagh-Zadeh, Reza and Goel, Ashish (2012), Dimension Independent 
Similarity Computation, arXiv:1206.2082

# Testing

Tests for all invocations included. 
Added magnitude computation to MultivariateStatisticalSummary since it was 
needed. Added a test for this.

Scaling it up now and will report back with results.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rezazadeh/spark dimsumv2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1778


commit 5b8cd7deb3f29d3c2533b01f496f41175471f023
Author: Reza Zadeh 
Date:   2014-08-04T02:19:45Z

Initial files

commit 6bebabb9364eb917dd86acbea4438a9e4d301f18
Author: Reza Zadeh 
Date:   2014-08-04T18:37:31Z

remove changes to MatrixSuite

commit 3726ca97ab184a8d5a9b3c0003d3afa6fd973890
Author: Reza Zadeh 
Date:   2014-08-04T20:47:57Z

Remove MatrixAlgebra

commit 654c4fb1136cfa856fc354b5ddb710758d38948f
Author: Reza Zadeh 
Date:   2014-08-04T21:38:18Z

default methods

commit 502ce526fc8ec84fd2c1f3b2b9a74b07e76c2d65
Author: Reza Zadeh 
Date:   2014-08-04T22:02:36Z

new interface

commit 05e59b8e883fd126dc81707b90aaf1011a2d1ee5
Author: Reza Zadeh 
Date:   2014-08-04T22:59:55Z

Add test

commit 75edb257e33a23f87fa379be597483d12a421626
Author: Reza Zadeh 
Date:   2014-08-05T01:02:33Z

All tests passing!

commit 029aa9c3d71960cb63293d721b96eebb6bdfcfbf
Author: Reza Zadeh 
Date:   2014-08-05T05:12:40Z

javadoc and new test

commit 139c8e1d20274322dfe1c513d6872e47f5eb5138
Author: Reza Zadeh 
Date:   2014-08-05T05:16:23Z

Syntax changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-51152156
  
QA tests have started for PR 1775. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17924/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1777#issuecomment-51152157
  
QA tests have started for PR 1777. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17923/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51152075
  
@javadba Thanks for detail.
Let me replay the sequence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795801
  
--- Diff: python/pyspark/mllib/classification.py ---
@@ -73,11 +73,36 @@ def predict(self, x):
 
 class LogisticRegressionWithSGD(object):
 @classmethod
-def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, 
initialWeights=None):
-"""Train a logistic regression model on the given data."""
+def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0,
+  initialWeights=None, regParam=1.0, regType=None, 
intercept=False):
+"""
+Train a logistic regression model on the given data.
+
+@param data:  The training data.
+@param iterations:The number of iterations (default: 100).
+@param step:  The step parameter used in SGD
+  (default: 1.0).
+@param miniBatchFraction: Fraction of data to be used for each SGD
+  iteration.
+@param initialWeights:The initial weights (default: None).
+@param regParam:  The regularizer parameter (default: 1.0).
+@param regType:   The type of regularizer used for training
+  our model.
+  Allowed values: "l1" for using L1Updater,
+  "l2" for using
+   SquaredL2Updater,
+  "none" for no 
regularizer.
+  (default: "none")
+@param intercept: Boolean parameter which indicates the use
+  or not of the augmented representation 
for
+  training data (i.e. whether bias features
+  are activated or not).
+"""
 sc = data.context
+if regType is None:
--- End diff --

As per above comment, you can just pass `regType` straight through if you 
then wrap the null in `Option` on the Scala/Java side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread miccagiann
Github user miccagiann commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795787
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable {
   stepSize: Double,
   regParam: Double,
   miniBatchFraction: Double,
-  initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = {
+  initialWeightsBA: Array[Byte],
+  regType: String,
+  intercept: Boolean): java.util.List[java.lang.Object] = {
+val SVMAlg = new SVMWithSGD()
+SVMAlg.setIntercept(intercept)
+SVMAlg.optimizer
+  .setNumIterations(numIterations)
+  .setRegParam(regParam)
+  .setStepSize(stepSize)
--- End diff --

Thanks! I am fixing it right now!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2157] Enable tight firewall rules for S...

2014-08-04 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/1777

[SPARK-2157] Enable tight firewall rules for Spark

The goal of this PR is to allow users of Spark to write tight firewall 
rules for their clusters. This is currently not possible because Spark uses 
random ports in many places, notably the communication between executors and 
drivers. The changes in this PR are based on top of @ash211's changes in #1107.

The list covered here may or may not be the complete set of port needed for 
Spark to operate perfectly. However, as of the latest commit there are no known 
sources of random ports (except in tests). I have not documented a few of the 
more obscure configs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark configure-ports

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1777


commit 1c0981a3f93c3bbb27425d76342c98c8c7d469cf
Author: Andrew Ash 
Date:   2014-06-17T05:09:59Z

Make port in HttpServer configurable

commit 49ee29b49e1275b48d18aef5182dba2937c11358
Author: Andrew Ash 
Date:   2014-06-17T05:31:10Z

SPARK-1174 Add port configuration for HttpFileServer

Uses spark.fileserver.port

commit f34115d59b83163d9542be09eb0c89a87ea89309
Author: Andrew Ash 
Date:   2014-06-17T05:31:52Z

SPARK-1176 Add port configuration for HttpBroadcast

Uses spark.broadcast.port

commit 17c79bbd66708d24c093be5f43e60c61f504d19d
Author: Andrew Ash 
Date:   2014-06-17T06:00:19Z

Add a configuration option for spark-shell's class server

spark.replClassServer.port

commit b80d2fd8e9b27a4d49561d31f100ffbb75393685
Author: Andrew Ash 
Date:   2014-06-17T06:40:32Z

Make Spark's block manager port configurable

spark.blockManager.port

commit c5a05684ace9332077dbf63848d08f39a8b91628
Author: Andrew Ash 
Date:   2014-06-17T08:10:21Z

Fix ConnectionManager to retry with increment

Fails when running master+worker+executor+shell on the same machine.  I 
think
the issue is that both the shell and the executor attempt to start a
ConnectionManager, which causes port conflicts.  Solution is to attempt and
increment on BindExceptions

commit cad16dacb1b7dbac1122b38c2b02fe35f1303a59
Author: Andrew Ash 
Date:   2014-06-17T16:45:59Z

Add fallover increment logic for HttpServer

commit 066dc7ac936cfbf268e6ca7adfa1388f5c4049d6
Author: Andrew Ash 
Date:   2014-06-17T17:08:49Z

Fix up HttpServer port increments

commit 5d84e0e9285aec53aa9c57d64313c0e513e41d30
Author: Andrew Ash 
Date:   2014-06-17T17:43:33Z

Document new port configuration options

- spark.fileserver.port
- spark.broadcast.port
- spark.replClassServer.port
- spark.blockManager.port

commit 9e4ad9628f7ff0f96a3881a1a5aaedcb8be6b80d
Author: Andrew Ash 
Date:   2014-06-17T18:14:08Z

Reformat for style checker

commit 24a4c327c7441e6af6b82dbddacd71c57384dc04
Author: Andrew Ash 
Date:   2014-06-30T00:25:44Z

Remove type on val to match surrounding style

commit 0347aef2b686d1bcc1b8f5c230ba8ff99cbd0691
Author: Andrew Ash 
Date:   2014-06-30T05:26:48Z

Unify port fallback logic to a single place

commit 7c5bdc44df32fb550f375de3518b628fbb360d20
Author: Andrew Ash 
Date:   2014-06-30T05:34:47Z

Fix style issue

commit 038a579a26ffcfc1c5540f28176f236779eef12a
Author: Andrew Ash 
Date:   2014-06-30T07:02:17Z

Trust the server start function to report the port the service started on

commit ec676f4f74b7a8402047fb849b9dca7172cd32f5
Author: Andrew Or 
Date:   2014-08-04T21:46:50Z

Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports

commit 73fbe892794a6f7e4a051401f356c89f4aa7f81f
Author: Andrew Or 
Date:   2014-08-04T22:39:01Z

Move start service logic to Utils

commit 6b550b0681ae8c0394685f6e929c4a14a48d10ec
Author: Andrew Or 
Date:   2014-08-04T23:56:17Z

Assorted fixes

commit ba322807d2e5ed1ce69dae449238a1df16a74ae9
Author: Andrew Or 
Date:   2014-08-05T00:00:31Z

Minor fixes

commit 1d7e40813e6ae98ee5cffb3e9e61807f3a01e941
Author: Andrew Or 
Date:   2014-08-05T00:40:27Z

Treat 0 ports specially + return correct ConnectionManager port

commit 470f38cf3c54941fbbcc358a358cc8a1fe2d6edd
Author: Andrew Or 
Date:   2014-08-05T00:43:24Z

Special case non-"Address already in use" exceptions

commit e111d080b4a7c0103c30b3a6e29c058d0ac4c3d0
Author: Andrew Or 
Date:   2014-08-05T01:46:11Z

Add names for UI services

commit 3f8e51bbb82669b43d7d52ece09ac957b35e994e
Author: Andrew Or 
Date:   2014-08-05T01:46:29Z

Correct erroneous docs...

commit 4d9e6f348cc408064173a91ecf9b509804eadf01
Author: Andrew Or 
Date:   2014-08-05T02:32:31Z

Fix super subtle bug

We were p

[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795778
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable {
   stepSize: Double,
   regParam: Double,
   miniBatchFraction: Double,
-  initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = {
+  initialWeightsBA: Array[Byte],
+  regType: String,
+  intercept: Boolean): java.util.List[java.lang.Object] = {
+val SVMAlg = new SVMWithSGD()
+SVMAlg.setIntercept(intercept)
+SVMAlg.optimizer
--- End diff --

Also maybe prefer to do the pattern matching on `regType` before this, and 
do something like:

```
val updater = Option(regType) match { 
...
}
optimizer
  .setUpdater(updater)
  .setNumIterations ...
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795743
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable {
   stepSize: Double,
   regParam: Double,
   miniBatchFraction: Double,
-  initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = {
+  initialWeightsBA: Array[Byte],
+  regType: String,
+  intercept: Boolean): java.util.List[java.lang.Object] = {
+val SVMAlg = new SVMWithSGD()
+SVMAlg.setIntercept(intercept)
+SVMAlg.optimizer
+  .setNumIterations(numIterations)
+  .setRegParam(regParam)
+  .setStepSize(stepSize)
+if (regType == "l2") {
--- End diff --

Py4j will pass through Python `None` as null (at least it should, if I 
recall), so on the Java side you can wrap that in an `Option` instead of making 
it "none".

So you could do:
```
Option(regType) match {
  case Some("l1") => ...
  case Some("l2") => ...
  case None => ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795654
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -363,15 +373,27 @@ class PythonMLLibAPI extends Serializable {
   numIterations: Int,
   stepSize: Double,
   miniBatchFraction: Double,
-  initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = {
+  initialWeightsBA: Array[Byte],
+  regParam: Double,
+  regType: String,
+  intercept: Boolean): java.util.List[java.lang.Object] = {
+val LogRegAlg = new LogisticRegressionWithSGD()
+LogRegAlg.setIntercept(intercept)
+LogRegAlg.optimizer
+  .setNumIterations(numIterations)
+  .setRegParam(regParam)
+  .setStepSize(stepSize)
--- End diff --

miniBatchFraction missing here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/1775#discussion_r15795634
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -341,16 +341,26 @@ class PythonMLLibAPI extends Serializable {
   stepSize: Double,
   regParam: Double,
   miniBatchFraction: Double,
-  initialWeightsBA: Array[Byte]): java.util.List[java.lang.Object] = {
+  initialWeightsBA: Array[Byte],
+  regType: String,
+  intercept: Boolean): java.util.List[java.lang.Object] = {
+val SVMAlg = new SVMWithSGD()
+SVMAlg.setIntercept(intercept)
+SVMAlg.optimizer
+  .setNumIterations(numIterations)
+  .setRegParam(regParam)
+  .setStepSize(stepSize)
--- End diff --

You forgot to set miniBatchFraction here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1309#issuecomment-51151466
  
QA tests have started for PR 1309. This patch DID NOT merge cleanly! 
View progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17922/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51151464
  
QA tests have started for PR 1481. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17921/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1758#issuecomment-51151458
  
QA tests have started for PR 1758. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17920/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-08-04 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-51151346
  
It's too late to get into 1.1, but I'll try to make it happen in 1.2. We'll 
use this at Alpine implementation first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1758#issuecomment-51151333
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-04 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1481#issuecomment-51151290
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2505][MLlib] Weighted Regularizer for G...

2014-08-04 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/1518#issuecomment-51151194
  
This looks promising. FWIW, I support decoupling regularization from the 
raw gradient update and believe it is a good way to go - it will allow various 
update/learning rate schemes (adagrad, normalized adaptive gradient, etc) to be 
applied independent of the regularization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1707#issuecomment-51150303
  
QA tests have started for PR 1707. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17919/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2711. Create a ShuffleMemoryManager to t...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1707#issuecomment-51150135
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-5114
  
QA results for PR 1775:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17918/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51149992
  
QA results for PR 1773:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17917/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51149151
  
@ueshin  

I have git clone'd to a completely new area, and I reverted my last commit. 
 

git clone https://github.com/javadba/spark.git strlen2 
cd strlen2
git chckout strlen
git revert 22eddbce6a201c8f5b5c31859ceb972e60657377
 mvn -DskipTests  -Pyarn -Phive -Phadoop-2.3 clean compile package
 mvn  -Pyarn -Phive -Phadoop-2.3 test 
-DwildcardSuites=org.apache.spark.sql.hive.execution.HiveQuerySuite,org.apache.spark.sql.SQLQuerySuite,org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite

I get precisely the same error:

HiveQuerySuite:
21:03:31.120 WARN org.apache.spark.util.Utils: Your hostname, mithril 
resolves to a loopback address: 127.0.1.1; using 10.0.0.33 instead (on 
interface eth0)
21:03:31.121 WARN org.apache.spark.util.Utils: Set SPARK_LOCAL_IP if 
you need to bind to another address
21:03:37.294 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable
21:03:40.045 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. 
Setting to 20
21:03:49.464 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. 
Setting to 20
21:03:49.487 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version 
information not found in metastore. hive.metastore.schema.verification is not 
enabled so recording the schema version 0.12.0
21:03:57.157 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. 
Setting to 20
21:03:57.593 WARN com.jolbox.bonecp.BoneCPConfig: Max Connections < 1. 
Setting to 20
- single case
- double case
- case else null
- having no references
- boolean = number
- CREATE TABLE AS runs once
- between
- div
- division
*** RUN ABORTED ***
  java.lang.StackOverflowError:
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
  at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)

Now, let's revert the revert :
git log 
commit db09cd132c2d7e995287eea54f3415726934138c 
Author: Stephen Boesch  
Date:   Mon Aug 4 20:54:24 2014 -0700

  Revert "Use Octet/Char_Len instead of Octet/Char_length due to 
apparent preexisting spark ParserCombinator bug."

This reverts commit 22eddbce6a201c8f5b5c31859ceb972e60657377.
git revert db09cd132c2d7e995287eea54f3415726934138c
mvn  -Pyarn -Phive -Phadoop-2.3 test 
-DwildcardSuites=org.apache.spark.sql.hive.execution.HiveQuerySuite,org.apache.spark.sql.SQLQuerySuite,org.apache.spark.sql.catalyst.expressions.ExpressionEvaluationSuite


Now those three test sutes pass again (specifically HiveQuerySuite did not 
fail)

And .. just to be *extra* sure here- that we can toggle between pass/fail 
arbitrary # of times: 

commit 602adedc9ca58d99957eb12bd91098ffe904604c
Author: Stephen Boesch 
Date:   Mon Aug 4 21:18:53 2014 -0700

Revert "Revert "Use Octet/Char_Len instead of Octet/Char_length due 
to apparent preexisting spark ParserCombinator bug.""

git revert 602adedc9ca58d99957eb12bd91098ffe904604c

And once again HiveQuerySuite fails with the same error.

So I have established clearly  the following:  
the strlen branch on my fork fails with SOF if we rollback the commit 
that changes OCTET/CHAR_LENGTH -> OCTET/CHAR_LEN. 






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.o

[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...

2014-08-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1772


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1779] Throw an exception if memory frac...

2014-08-04 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-51147893
  
hi @andrewor14,unit tests error in sparkstreaming,we may retest this

[info] - flume polling test multiple hosts *** FAILED ***
[info]   org.jboss.netty.channel.ChannelException: Failed to bind to: 
localhost/127.0.0.1:56218
[info]   at 
org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-08-04 Thread li-zhihui
Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-51147815
  
Thaks @JoshRosen sorry I missed the important operation (and I missed 
FileUtil.chmod(targetFile.getAbsolutePath, "a+x") too).

I add a new commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2379] Fix the bug that streaming's rece...

2014-08-04 Thread joyyoj
Github user joyyoj commented on the pull request:

https://github.com/apache/spark/pull/1694#issuecomment-51147129
  
@tdas Thanks for reminding me to add reportError, i didn't notice your 
reply before. It's a good idea to add it in this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...

2014-08-04 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1648#discussion_r15794174
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -38,6 +39,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
 
   override def beforeAll() {
 TestHive.cacheTables = true
+TestHive.set(SQLConf.SHUFFLE_PARTITIONS, "2")
--- End diff --

Just a note: we need to remove this setting before merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-51146931
  
QA tests have started for PR 1775. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17918/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51146932
  
QA tests have started for PR 1773. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17917/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread miccagiann
Github user miccagiann commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-51146861
  
Found the error. It was a typo. Let's see what Jenkins is going to say...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1766#issuecomment-51146856
  
QA results for PR 1766:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17914/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark 2381] stop the streaming application if...

2014-08-04 Thread joyyoj
Github user joyyoj commented on the pull request:

https://github.com/apache/spark/pull/1693#issuecomment-51146846
  
Ok, I'll do it soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1616#issuecomment-51146817
  
QA results for PR 1616:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17915/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1715#issuecomment-51146747
  
Sure, that sounds good. The only thing if you use the first entry in 
`--jar` is I wouldn't automatically look for a main class in that (there's this 
part that checks the JAR manifest). Instead, make them use `--main-class` in 
that case. Otherwise it's a bit confusing that you give only JARs and it starts 
running some program.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51146679
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51146519
  
QA results for PR 1773:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17916/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1648#issuecomment-51146461
  
QA results for PR 1648:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17906/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread javadba
Github user javadba commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51146419
  
@ueshinI repeatably verified that simply changing "OCTET_LEN" to 
"OCTET_LENGTH" ended up causing SOF.  By "repeatably" I mean:

  Set the 'constant'  val OCTET_LENGTH="OCTET_LENGTH"
  observe the error
  change to something like val OCTET_LENGTH="OCTET_LEN" or  val 
OCTET_LENGTH="OCTET_LENG"
  observe the error has gone away
  Rinse, cleanse, repeat

 i have been able to demonstrate this multiple times.  Now  the regression 
tests have been  run against the modified and reliable code.   

Please re-run your tests in a fresh area.  I will do the same .. but i am 
hesitant to consider to revert because we have positive test results now with 
the latest commit (as well as my results of the problem before the commit). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...

2014-08-04 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1772#issuecomment-51146442
  
Actually I merged it in master, branch-1.0. and branch-1.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...

2014-08-04 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1772#issuecomment-51146388
  
Merging this in master. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-08-04 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/1313#issuecomment-51146059
  
finally, 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2678][Core] Added "--" to prevent spark...

2014-08-04 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1715#issuecomment-51145806
  
Thanks Matei, Patrick's last a few comments have already convinced me to 
remove the "primary" notion from a user's perspective. And yes, 
`spark-internal` can be removed in this way, `spark-shell` and `pyspark-shell` 
can also be removed by checking `--class` in Spark scripts. Internally, to keep 
code relies on `primaryResource` intact (one example is the cluster deploy mode 
in a standalone cluster), we can still pick the first entry in `--jar` for 
Java/Scala apps and `--main-file` for Python apps as primary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51145793
  
Hi @javadba, I tested `org.apache.spark.sql.SQLQuerySuite` and 
`org.apache.spark.sql.hive.execution.HiveQuerySuite` locally, and they worked 
fine even if I reverted the last commit 22eddbc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2550][MLLIB][APACHE SPARK] Support regu...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1775#issuecomment-51145763
  
QA results for PR 1775:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17912/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1022][Streaming] Add Kafka real unit te...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1751#issuecomment-51145617
  
QA results for PR 1751:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17911/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1772#issuecomment-51145063
  
QA results for PR 1772:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17910/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1779] add warning when memoryFraction i...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/714#issuecomment-51144509
  
QA results for PR 714:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17909/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1309#issuecomment-51144330
  
QA results for PR 1309:- This patch FAILED unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):class Accumulator[T](@transient initialValue: T, param: 
AccumulatorParam[T], name: Option[String])class AccumulableInfo 
(For more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17908/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1313#issuecomment-51143952
  
QA results for PR 1313:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds no public classesFor more 
information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17907/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1586#issuecomment-51143772
  
QA results for PR 1586:- This patch PASSES unit tests.- This patch 
merges cleanly- This patch adds the following public classes 
(experimental):case class Length(child: Expression) extends UnaryExpression 
{case class OctetLength(child: Expression, encoding : Expression) extends 
UnaryExpressionFor more information see test 
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17902/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51143734
  
(Basically before, in the common case of one key per hash code, we 
allocated a whole new ArrayBuffer for each key, which is 2 Java objects and 
probably around 100 bytes.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...

2014-08-04 Thread willb
Github user willb commented on the pull request:

https://github.com/apache/spark/pull/1750#issuecomment-51143752
  
@marmbrus I'll file a JIRA for that and am happy to put it at the front of 
my plate; sounds like a fun problem!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51143730
  
QA tests have started for PR 1773. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17916/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2685. Update ExternalAppendOnlyMap to av...

2014-08-04 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1773#issuecomment-51143669
  
@andrewor14 alright, I've pushed a new commit that updates the comments. I 
also made it reuse ArrayBuffers, which should avoid quite a bit of young gen GC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....

2014-08-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1766#issuecomment-51143483
  
QA tests have started for PR 1766. This patch merges cleanly. View 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17914/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...

2014-08-04 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/1744#issuecomment-51143491
  
I don't see how these test failures might be related to the changes 
introduced in this PR. I see that the issue @JoshRosen called out earlier here 
has been [resolved](https://github.com/apache/spark/pull/1771), so that can't 
be it.

More confusingly, the report ends with this:
```
[info] All tests passed.
[info] Passed: Total 797, Failed 0, Errors 0, Passed 797, Ignored 7
[error] (streaming-flume/test:test) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 3157 s, completed Aug 4, 2014 7:30:52 PM
```

@rxin - Any pointers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >