date:20150105

[GitHub] spark pull request: [SPARK-5068][SQL]fix bug query data when path ...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3891#issuecomment-68833142
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25090/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala(override the...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68833371
  
Finally, please file a JIRA and add it to the PR title.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-68834477
  
Thanks for doing this, I've been getting a ton of requests for this feature!

Can you also add this to the sql programming guide?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4843 [YARN] Squash ExecutorRunnableUtil ...

2015-01-05 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3696#issuecomment-68834508
  
LGTM, so I'm going to merge this into `master` (1.3.0).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4843 [YARN] Squash ExecutorRunnableUtil ...

2015-01-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3696


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5068][SQL]fix bug query data when path ...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3891#issuecomment-68832961
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5068][SQL]fix bug query data when path ...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3891#issuecomment-68833139
  
  [Test build #25090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25090/consoleFull)
 for   PR 3891 at commit 
[`55636f3`](https://github.com/apache/spark/commit/55636f3764e93e997c15d2de46e7118af34989b2).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5068][SQL]fix bug query data when path ...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3891#issuecomment-68833133
  
  [Test build #25090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25090/consoleFull)
 for   PR 3891 at commit 
[`55636f3`](https://github.com/apache/spark/commit/55636f3764e93e997c15d2de46e7118af34989b2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala(override the...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68833152
  
Can you also add a regression test to 
[`CachedTableSuite`](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala(override the...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68833170
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala(override the...

2015-01-05 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3898#discussion_r22511002
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -427,6 +427,13 @@ private[hive] case class MetastoreRelation
   .getOrElse(sqlContext.defaultSizeInBytes))
 }
   )
+  override def sameResult(plan: LogicalPlan): Boolean = {
+val new_plan = plan.asInstanceOf[MetastoreRelation];
--- End diff --

what about when its not a metastore relation?  how about:

```scala
plan match {
  case mr: MetastoreRelation =
mr.databaseName == databaseName  mr.tableName == tableName
  case _ = false
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala(override the...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68833443
  
  [Test build #25091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25091/consoleFull)
 for   PR 3898 at commit 
[`8d910aa`](https://github.com/apache/spark/commit/8d910aa45372c0cf0a3b2a4704b0e1a480bbeb1c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4226: Add support for subqueries in wher...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3888#issuecomment-68833871
  
This is simpler, but it has several disadvantages to the other approach:
 - The InSet it collected to the driver and thus could cause OOMs when large
 - I don't think that it handles correlated subqueries
 - The `execute()` involves eager evaluation and breaks RDD lineage

For these reasons I think we should stick to extending the approach taken 
by the other PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-05 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3820#discussion_r22511350
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -141,6 +142,12 @@ private[sql] trait SQLConf {
 getConf(PARQUET_BINARY_AS_STRING, false).toBoolean
 
   /**
+   * When set to true, we always treat INT96Values in Parquet files as 
timestamp.
+   */
+  private[spark] def isParquetINT96AsTimestamp: Boolean =
+getConf(PARQUET_INT96_AS_TIMESTAMP, false).toBoolean
--- End diff --

We don't really use INT96 for anything else (and I don't think other 
systems do either?) so maybe this should be true by default?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2165][YARN]add support for setting maxA...

2015-01-05 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3878#issuecomment-68681027
  
My opinion is that this is more of a general app property than an AM 
property, so I'd go for `spark.yarn.maxAppAttempts` instead.  That also avoids 
confusion with the fact that elsewhere we've claimed the `spark.yarn.am.*` 
namespace for the yarn-client AM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][Mllib] Simplify loss function

2015-01-05 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/3899

[Minor][Mllib] Simplify loss function

This is a minor pr where I think that we can simply take minus of `margin` 
here, instead of subtracting  `margin`.

Mathematically, they are equal. But the modified equation is the common 
form of logistic loss function and so more readable. It also computes more 
accurate value as some quick tests show.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 logit_func

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3899.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3899


commit 2bc5712a6fe374c0821233edab719b58d31ab01b
Author: Liang-Chi Hsieh vii...@gmail.com
Date:   2015-01-05T09:13:07Z

Simplify loss function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][Mllib] Simplify loss function

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3899#issuecomment-68686539
  
  [Test build #25053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25053/consoleFull)
 for   PR 3899 at commit 
[`2bc5712`](https://github.com/apache/spark/commit/2bc5712a6fe374c0821233edab719b58d31ab01b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2015-01-05 Thread Bilna

Github user Bilna commented on the pull request:

https://github.com/apache/spark/pull/3844#issuecomment-68686800
  
@tdas, Thanks 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][Mllib] Simplify loss function

2015-01-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3899#issuecomment-68687443
  
+1 looks like a small good improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2015-01-05 Thread Bilna

Github user Bilna commented on a diff in the pull request:

https://github.com/apache/spark/pull/3844#discussion_r22453522
  
--- Diff: 
external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
 ---
@@ -17,31 +17,111 @@
 
 package org.apache.spark.streaming.mqtt
 
-import org.scalatest.FunSuite
+import java.net.{URI, ServerSocket}
 
-import org.apache.spark.streaming.{Seconds, StreamingContext}
+import org.apache.activemq.broker.{TransportConnector, BrokerService}
+import org.apache.spark.util.Utils
+import org.scalatest.{BeforeAndAfter, FunSuite}
+import org.scalatest.concurrent.Eventually
+import scala.concurrent.duration._
+import org.apache.spark.streaming.{Milliseconds, StreamingContext}
 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.streaming.dstream.ReceiverInputDStream
+import org.eclipse.paho.client.mqttv3._
+import org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence
 
-class MQTTStreamSuite extends FunSuite {
-
-  val batchDuration = Seconds(1)
+class MQTTStreamSuite extends FunSuite with Eventually with BeforeAndAfter 
{
 
+  private val batchDuration = Milliseconds(500)
   private val master: String = local[2]
-
   private val framework: String = this.getClass.getSimpleName
+  private val freePort = findFreePort()
+  private val brokerUri = //localhost: + freePort
+  private val topic = def
+  private var ssc: StreamingContext = _
+  private val persistenceDir = Utils.createTempDir()
+  private var broker: BrokerService = _
+  private var connector: TransportConnector = _
 
-  test(mqtt input stream) {
-val ssc = new StreamingContext(master, framework, batchDuration)
-val brokerUrl = abc
-val topic = def
+  before {
+ssc = new StreamingContext(master, framework, batchDuration)
+setupMQTT()
+  }
 
-// tests the API, does not actually test data receiving
-val test1: ReceiverInputDStream[String] = MQTTUtils.createStream(ssc, 
brokerUrl, topic)
-val test2: ReceiverInputDStream[String] =
-  MQTTUtils.createStream(ssc, brokerUrl, topic, 
StorageLevel.MEMORY_AND_DISK_SER_2)
+  after {
+if (ssc != null) {
+  ssc.stop()
+  ssc = null
+}
+Utils.deleteRecursively(persistenceDir)
+tearDownMQTT()
+  }
 
-// TODO: Actually test receiving data
+  test(mqtt input stream) {
+val sendMessage = MQTT demo for spark streaming
+val receiveStream: ReceiverInputDStream[String] =
+  MQTTUtils.createStream(ssc, tcp: + brokerUri, topic, 
StorageLevel.MEMORY_ONLY)
+var receiveMessage: List[String] = List()
+receiveStream.foreachRDD { rdd =
+  if (rdd.collect.length  0) {
+receiveMessage = receiveMessage ::: List(rdd.first)
+receiveMessage
+  }
+}
+ssc.start()
+publishData(sendMessage)
+eventually(timeout(1 milliseconds), interval(100 milliseconds)) {
+  assert(sendMessage.equals(receiveMessage(0)))
+}
 ssc.stop()
   }
+
+  private def setupMQTT() {
+broker = new BrokerService()
+connector = new TransportConnector()
+connector.setName(mqtt)
+connector.setUri(new URI(mqtt: + brokerUri))
+broker.addConnector(connector)
+broker.start()
+  }
+
+  private def tearDownMQTT() {
+if (broker != null) {
+  broker.stop()
+  broker = null
+}
+if (connector != null) {
+  connector.stop()
+  connector = null
+}
+  }
+
+  private def findFreePort(): Int = {
+Utils.startServiceOnPort(23456, (trialPort: Int) = {
+  val socket = new ServerSocket(trialPort)
+  socket.close()
+  (null, trialPort)
+})._2
+  }
+
+  def publishData(data: String): Unit = {
+var client: MqttClient = null
+try {
+  val persistence: MqttClientPersistence = new 
MqttDefaultFilePersistence(persistenceDir.getAbsolutePath)
+  client = new MqttClient(tcp: + brokerUri, 
MqttClient.generateClientId(), persistence)
+  client.connect()
+  if (client.isConnected) {
+val msgTopic: MqttTopic = client.getTopic(topic)
+val message: MqttMessage = new MqttMessage(data.getBytes(utf-8))
+message.setQos(1)
+message.setRetained(true)
+for (i - 0 to 100)
+  msgTopic.publish(message)
--- End diff --

Can you explain what is the correction here. Just to understand what went 
wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68688402
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25052/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68688395
  
  [Test build #25052 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25052/consoleFull)
 for   PR 3801 at commit 
[`b4442c3`](https://github.com/apache/spark/commit/b4442c3538ad462a0a7d39f4b2049ed230e92665).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][Mllib] Simplify loss function

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3899#issuecomment-68693770
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25053/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2165][YARN]add support for setting maxA...

2015-01-05 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3878#issuecomment-68693775
  
@sryza Thanks. That makes sense.
@tgravescs How do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][Mllib] Simplify loss function

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3899#issuecomment-68693757
  
  [Test build #25053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25053/consoleFull)
 for   PR 3899 at commit 
[`2bc5712`](https://github.com/apache/spark/commit/2bc5712a6fe374c0821233edab719b58d31ab01b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...

2015-01-05 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3868#issuecomment-68689798
  
I am happy to review the code if you take a pass on implementing (2). I can 
jump in if things get too hairy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68690112
  
  [Test build #25054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25054/consoleFull)
 for   PR 3222 at commit 
[`38efd6d`](https://github.com/apache/spark/commit/38efd6d6df6b53ec1f16490b37f3fb1cc0cce053).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504] Minor bug fixes in bin/run-exampl...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-68692219
  
  [Test build #25055 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25055/consoleFull)
 for   PR 3069 at commit 
[`24905b7`](https://github.com/apache/spark/commit/24905b7793fc1fc8802b370c63caf81671475c31).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4908][SQL] Prevent multiple concurrent ...

2015-01-05 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3834#issuecomment-68692301
  
Just for reference, the root cause behind this issue is discussed in 
[SPARK-4908] [1].

[1]: 
https://issues.apache.org/jira/browse/SPARK-4908?focusedCommentId=14264469page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14264469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2015-01-05 Thread seayi

Github user seayi commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68683132
  
override  the sameresult method only compare databasename and table name  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2015-01-05 Thread seayi

GitHub user seayi opened a pull request:

https://github.com/apache/spark/pull/3898

Update HiveMetastoreCatalog.scala

modify  the sameresult method only compare databasename and table name

because in previous :
cache table t1;
select count(*) from t1;
it will read data from memory  but the sql below will not,instead it read 
from hdfs:
select count(*) from t1 t; 

because cache data is keyed by logical plan and compare with sameResult ,so 
 when table with alias  the same table 's logicalplan is not the same logical 
plan  so modify  the sameresult method only compare databasename and table name

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seayi/spark branch-1.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3898


commit 8d910aa45372c0cf0a3b2a4704b0e1a480bbeb1c
Author: seayi 405078...@qq.com
Date:   2015-01-05T09:02:51Z

Update HiveMetastoreCatalog.scala

in previous :
cache table t1;
select count(*) from t1;
it will read data from memory  but the sql below will not,instead it read 
from hdfs:
select count(*) from t1 t; 

because cache data is keyed by logical plan and compare with sameResult ,so 
 when with alias  the same table 's logicalplan is not the same logical plan  
so modify  the sameresult method only compare databasename and table name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...

2015-01-05 Thread cleaton

Github user cleaton commented on the pull request:

https://github.com/apache/spark/pull/3868#issuecomment-68678753
  
@tdas Thank you for the input.
Yes, the main purpose of this patch is to make ReceiverTracker graceful by 
waiting for   ssc.sparkContext.runJob(tempRDD, 
ssc.sparkContext.clean(startReceiver)) to terminate and all receivers to 
de-register (possible redundant?). I borrowed the aproach used in JobGenerator 
and you are right I forgot to keep timeWhenStopStarted global.

The second approach sounds good to me. Would make it easier to follow the 
shutdown sequence if it is consolidated in one place.

And for unit test my idea is to create a dummy receiver implementation that 
blocks on shutdown while still producing a fixed number of records.

Do you think you or someone else working more closely with spark streaming 
should take over this patch? Seems it is about deciding which approach is best 
suited for spark in the long run. I can still try to provide a unit test for 
this though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-68680860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25051/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-68680857
  
  [Test build #25051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25051/consoleFull)
 for   PR 3897 at commit 
[`ed906b7`](https://github.com/apache/spark/commit/ed906b7ae16c08521f8e3b4aa3da47fd293abff3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  val executorPath = new File(executorSparkHome, 
s/bin/spark-class $executorBackendName)`
  * `  command.setValue(scd $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68683294
  
  [Test build #25052 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25052/consoleFull)
 for   PR 3801 at commit 
[`b4442c3`](https://github.com/apache/spark/commit/b4442c3538ad462a0a7d39f4b2049ed230e92665).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68683258
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update HiveMetastoreCatalog.scala

2015-01-05 Thread seayi

Github user seayi commented on the pull request:

https://github.com/apache/spark/pull/3898#issuecomment-68683408
  
i test with hive table,after modify the sameresult method it is ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2015-01-05 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68684233
  
Pushed some commits addressing most of the feedback, but I'm still 
struggling to remove that last `Thread.sleep(1000)`.  I think that the problem 
here is that the writing of the checkpoint is asynchronous and without the 
sleep, we wind up in a state where batch 3 has started processing but has not 
finished, and the StreamingContext shuts down before a snapshot including batch 
3's file info is written.  I plan to dig into this tomorrow to see whether this 
is actually the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4991][CORE] Worker should reconnect to ...

2015-01-05 Thread liyezhang556520

Github user liyezhang556520 commented on the pull request:

https://github.com/apache/spark/pull/3825#issuecomment-68684212
  
@JoshRosen , If we want to use the supervision mechanism. We need to add 
another actor level as parent of the current Master actor. I don't know if that 
is suitable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68696993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25054/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, A...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3222#issuecomment-68696985
  
  [Test build #25054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25054/consoleFull)
 for   PR 3222 at commit 
[`38efd6d`](https://github.com/apache/spark/commit/38efd6d6df6b53ec1f16490b37f3fb1cc0cce053).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AdaGradUpdater(`
  * `class DBN(val stackedRBM: StackedRBM)`
  * `class MLP(`
  * `class MomentumUpdater(val momentum: Double) extends Updater `
  * `class RBM(`
  * `class StackedRBM(val innerRBMs: Array[RBM])`
  * `case class MinstItem(label: Int, data: Array[Int]) `
  * `class MinstDatasetReader(labelsFile: String, imagesFile: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5052] Add common/base classes to fix gu...

2015-01-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3874#issuecomment-68698508
  
This one always confuses me, but here's what I think I know:

The compiled `Optional` in Spark won't have the correct (meaning, 
matching the Google Guava `Optional`) signatures on its methods since other 
Guava classes are shaded. It's there for the Spark API to compile against the 
Guava class in the package that the user app expects.

Apps that uses the API method that uses `Optional` will be bundling Guava. 
Spark uses Guava 14, although in theory you can use any version... that still 
has the very few methods on `Optional` that Spark actually calls, I suppose. 
Because Spark will be using the user app's copy of `Optional` at runtime.

You say you tried it and got `java.lang.NoClassDefFoundError: 
org/apache/spark/Partition` though. That's weird. `Optional` will be in a 
different classloader (?) but shouldn't refer to Spark classes. Right? if 
there's a problem it's somewhere in there, since that's where how I thought 
this works seems to not match your experience. Or else, there's something else 
subtly not-quite-right about how the user app is run here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504] Minor bug fixes in bin/run-exampl...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-68698622
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504] Minor bug fixes in bin/run-exampl...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-68698616
  
  [Test build #25055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25055/consoleFull)
 for   PR 3069 at commit 
[`24905b7`](https://github.com/apache/spark/commit/24905b7793fc1fc8802b370c63caf81671475c31).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5036][Graphx]Better support sending par...

2015-01-05 Thread shijinkui

Github user shijinkui commented on the pull request:

https://github.com/apache/spark/pull/3866#issuecomment-68710366
  
@ankurdave @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5073] spark.storage.memoryMapThreshold ...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3900#issuecomment-68711124
  
  [Test build #25056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25056/consoleFull)
 for   PR 3900 at commit 
[`fcce2e5`](https://github.com/apache/spark/commit/fcce2e593754fd8bfe389df0c4ddf79cceebdd97).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5073] spark.storage.memoryMapThreshold ...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3900#issuecomment-68711139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25056/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4764] Ensure that files are fetched ato...

2015-01-05 Thread preaudc

Github user preaudc commented on the pull request:

https://github.com/apache/spark/pull/2855#issuecomment-68702554
  
Yes, my bad, {{targetDir}} is indeed already a {{File}}. @JoshRosen , how 
could I fix this, should I create a new pull request, or can this one be 
reopened?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5073] spark.storage.memoryMapThreshold ...

2015-01-05 Thread Lewuathe

GitHub user Lewuathe opened a pull request:

https://github.com/apache/spark/pull/3900

[SPARK-5073] spark.storage.memoryMapThreshold have two default value

Because major OS page sizes is about 4KB, the default value of 
spark.storage.memoryMapThreshold is integrated to 2 * 4096

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Lewuathe/spark integrate-memoryMapThreshold

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3900.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3900


commit fcce2e593754fd8bfe389df0c4ddf79cceebdd97
Author: lewuathe lewua...@me.com
Date:   2015-01-05T12:47:28Z

[SPARK-5073] spark.storage.memoryMapThreshold have two default value




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4631] unit test for MQTT

2015-01-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3844#discussion_r22459212
  
--- Diff: 
external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
 ---
@@ -17,31 +17,111 @@
 
 package org.apache.spark.streaming.mqtt
 
-import org.scalatest.FunSuite
+import java.net.{URI, ServerSocket}
 
-import org.apache.spark.streaming.{Seconds, StreamingContext}
+import org.apache.activemq.broker.{TransportConnector, BrokerService}
+import org.apache.spark.util.Utils
+import org.scalatest.{BeforeAndAfter, FunSuite}
+import org.scalatest.concurrent.Eventually
+import scala.concurrent.duration._
+import org.apache.spark.streaming.{Milliseconds, StreamingContext}
 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.streaming.dstream.ReceiverInputDStream
+import org.eclipse.paho.client.mqttv3._
+import org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence
 
-class MQTTStreamSuite extends FunSuite {
-
-  val batchDuration = Seconds(1)
+class MQTTStreamSuite extends FunSuite with Eventually with BeforeAndAfter 
{
 
+  private val batchDuration = Milliseconds(500)
   private val master: String = local[2]
-
   private val framework: String = this.getClass.getSimpleName
+  private val freePort = findFreePort()
+  private val brokerUri = //localhost: + freePort
+  private val topic = def
+  private var ssc: StreamingContext = _
+  private val persistenceDir = Utils.createTempDir()
+  private var broker: BrokerService = _
+  private var connector: TransportConnector = _
 
-  test(mqtt input stream) {
-val ssc = new StreamingContext(master, framework, batchDuration)
-val brokerUrl = abc
-val topic = def
+  before {
+ssc = new StreamingContext(master, framework, batchDuration)
+setupMQTT()
+  }
 
-// tests the API, does not actually test data receiving
-val test1: ReceiverInputDStream[String] = MQTTUtils.createStream(ssc, 
brokerUrl, topic)
-val test2: ReceiverInputDStream[String] =
-  MQTTUtils.createStream(ssc, brokerUrl, topic, 
StorageLevel.MEMORY_AND_DISK_SER_2)
+  after {
+if (ssc != null) {
+  ssc.stop()
+  ssc = null
+}
+Utils.deleteRecursively(persistenceDir)
+tearDownMQTT()
+  }
 
-// TODO: Actually test receiving data
+  test(mqtt input stream) {
+val sendMessage = MQTT demo for spark streaming
+val receiveStream: ReceiverInputDStream[String] =
+  MQTTUtils.createStream(ssc, tcp: + brokerUri, topic, 
StorageLevel.MEMORY_ONLY)
+var receiveMessage: List[String] = List()
+receiveStream.foreachRDD { rdd =
+  if (rdd.collect.length  0) {
+receiveMessage = receiveMessage ::: List(rdd.first)
+receiveMessage
+  }
+}
+ssc.start()
+publishData(sendMessage)
+eventually(timeout(1 milliseconds), interval(100 milliseconds)) {
+  assert(sendMessage.equals(receiveMessage(0)))
+}
 ssc.stop()
   }
+
+  private def setupMQTT() {
+broker = new BrokerService()
+connector = new TransportConnector()
+connector.setName(mqtt)
+connector.setUri(new URI(mqtt: + brokerUri))
+broker.addConnector(connector)
+broker.start()
+  }
+
+  private def tearDownMQTT() {
+if (broker != null) {
+  broker.stop()
+  broker = null
+}
+if (connector != null) {
+  connector.stop()
+  connector = null
+}
+  }
+
+  private def findFreePort(): Int = {
+Utils.startServiceOnPort(23456, (trialPort: Int) = {
+  val socket = new ServerSocket(trialPort)
+  socket.close()
+  (null, trialPort)
+})._2
+  }
+
+  def publishData(data: String): Unit = {
+var client: MqttClient = null
+try {
+  val persistence: MqttClientPersistence = new 
MqttDefaultFilePersistence(persistenceDir.getAbsolutePath)
+  client = new MqttClient(tcp: + brokerUri, 
MqttClient.generateClientId(), persistence)
+  client.connect()
+  if (client.isConnected) {
+val msgTopic: MqttTopic = client.getTopic(topic)
+val message: MqttMessage = new MqttMessage(data.getBytes(utf-8))
+message.setQos(1)
+message.setRetained(true)
+for (i - 0 to 100)
+  msgTopic.publish(message)
--- End diff --

```
for (...) {
msgTopic.publish(message)
}
```

Such code block should either be in one line or be within braces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-5073] spark.storage.memoryMapThreshold ...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3900#issuecomment-68703864
  
  [Test build #25056 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25056/consoleFull)
 for   PR 3900 at commit 
[`fcce2e5`](https://github.com/apache/spark/commit/fcce2e593754fd8bfe389df0c4ddf79cceebdd97).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504][Examples] fix run-example failure...

2015-01-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3377#issuecomment-68706124
  
There was already a PR for this: https://github.com/apache/spark/pull/3069
But it seems to be fixing a different root cause, that the assembly 
generated by SBT and Maven are named differently. I am not sure if this also 
addresses this problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4504] Minor bug fixes in bin/run-exampl...

2015-01-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3069#issuecomment-68706026
  
I think this has been superseded by the discussion in 
https://github.com/apache/spark/pull/3377 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] SPARK-4986 Wait for receivers to d...

2015-01-05 Thread cleaton

Github user cleaton commented on the pull request:

https://github.com/apache/spark/pull/3868#issuecomment-68707568
  
OK sounds great. :+1: 
I can prepare an implementation of (2). Bit busy now, but I think I can 
have something to review in a week. 

Any specific unit test you can suggest for me to take a look at? The 
existing receiver tests? Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4644][Core] Implement skewed join

2015-01-05 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/3505#discussion_r22460587
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/SkewedJoinRDD.scala ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd
+
+import java.io.{ObjectOutputStream, IOException}
+
+import scala.reflect.ClassTag
+
+import org.apache.spark._
+import org.apache.spark.serializer.Serializer
+import org.apache.spark.shuffle.ShuffleHandle
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection._
+
+private[spark] sealed trait JoinType[K, L, R, PAIR : Product2[_, _]] 
extends Serializable {
+
+  def flatten(i: Iterator[(K, (Iterable[Chunk[L]], Iterable[Chunk[R]]))]): 
Iterator[(K, PAIR)]
+
+}
+
+private[spark] object JoinType {
+
+  def inner[K, L, R] = new JoinType[K, L, R, (L, R)] {
+
+override def flatten(i: Iterator[(K, (Iterable[Chunk[L]], 
Iterable[Chunk[R]]))]) =
+  i flatMap {
+case (key, pair) = {
+  if (pair._1.size  pair._2.size) {
+yieldPair(pair._1, pair._2, key, (p1: L, p2: R) = (p1, p2))
+  } else {
+yieldPair(pair._2, pair._1, key, (p2: R, p1: L) = (p1, p2))
+  }
+}
+  }
+  }
+
+  def leftOuter[K, L, R] = new JoinType[K, L, R, (L, Option[R])] {
+
+override def flatten(i: Iterator[(K, (Iterable[Chunk[L]], 
Iterable[Chunk[R]]))]) =
+  i flatMap {
+case (key, pair) = {
+  if (pair._2.size == 0) {
+for (chunk - pair._1.iterator;
+ v - chunk
+) yield (key, (v, None)): (K, (L, Option[R]))
+  }
+  else if (pair._1.size  pair._2.size) {
+yieldPair(pair._1, pair._2, key, (p1: L, p2: R) = (p1, 
Some(p2)))
+  } else {
+yieldPair(pair._2, pair._1, key, (p2: R, p1: L) = (p1, 
Some(p2)))
+  }
+}
+  }
+  }
+
+  def rightOuter[K, L, R] = new JoinType[K, L, R, (Option[L], R)] {
+
+override def flatten(i: Iterator[(K, (Iterable[Chunk[L]], 
Iterable[Chunk[R]]))]) =
+  i flatMap {
+case (key, pair) = {
+  if (pair._1.size == 0) {
+for (chunk - pair._2.iterator;
+ v - chunk
+) yield (key, (None, v)): (K, (Option[L], R))
+  }
+  else if (pair._1.size  pair._2.size) {
+yieldPair(pair._1, pair._2, key, (p1: L, p2: R) = (Some(p1), 
p2))
+  } else {
+yieldPair(pair._2, pair._1, key, (p2: R, p1: L) = (Some(p1), 
p2))
+  }
+}
+  }
+  }
+
+  def fullOuter[K, L, R] = new JoinType[K, L, R, (Option[L], Option[R])] {
+
+override def flatten(i: Iterator[(K, (Iterable[Chunk[L]], 
Iterable[Chunk[R]]))]) =
+  i flatMap {
+case (key, pair) = {
+  if (pair._1.size == 0) {
+for (chunk - pair._2.iterator;
+ v - chunk
+) yield (key, (None, Some(v))): (K, (Option[L], Option[R]))
+  }
+  else if (pair._2.size == 0) {
+for (chunk - pair._1.iterator;
+ v - chunk
+) yield (key, (Some(v), None)): (K, (Option[L], Option[R]))
+  }
+  else if (pair._1.size  pair._2.size) {
+yieldPair(pair._1, pair._2, key, (p1: L, p2: R) = (Some(p1), 
Some(p2)))
+  } else {
+yieldPair(pair._2, pair._1, key, (p2: R, p1: L) = (Some(p1), 
Some(p2)))
+  }
+}
+  }
+  }
+
+  private def yieldPair[K, OUT, IN, PAIR : Product2[_, _]](
+  outer: Iterable[Chunk[OUT]], inner: Iterable[Chunk[IN]], key: K, 
toPair: (OUT, IN) = PAIR) =
+for (
+  outerChunk - outer.iterator;
+  innerChunk - inner.iterator;
+

[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...

2015-01-05 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3607#issuecomment-68733624
  
@andrewor14 What to do now?
@vanzin @sryza @tgravescs  Someone has any better idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5006][Deploy]spark.port.maxRetries does...

2015-01-05 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3841#issuecomment-68734238
  
@andrewor14 Could you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5057]Add more details in log when using...

2015-01-05 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3875#issuecomment-68734038
  
@JoshRosen  Then is it ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2015-01-05 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1379#issuecomment-68741897
  
@dbtsai 
Just back from vacation too:) 

I used my old implementation of the matrix form of back propagation and 
made sure that it properly uses stride of matrices in breeze. Also, I optimized 
roll of parameters into vector combined with in-place update of cumulative sum. 
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2015-01-05 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3638#discussion_r22491893
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -256,15 +256,21 @@ private[spark] class TaskSchedulerImpl(
   val execId = shuffledOffers(i).executorId
   val host = shuffledOffers(i).host
   if (availableCpus(i) = CPUS_PER_TASK) {
-for (task - taskSet.resourceOffer(execId, host, maxLocality)) 
{
-  tasks(i) += task
-  val tid = task.taskId
-  taskIdToTaskSetId(tid) = taskSet.taskSet.id
-  taskIdToExecutorId(tid) = execId
-  executorsByHost(host) += execId
-  availableCpus(i) -= CPUS_PER_TASK
-  assert(availableCpus(i) = 0)
-  launchedTask = true
+try {
+  for (task - taskSet.resourceOffer(execId, host, 
maxLocality)) {
+tasks(i) += task
+val tid = task.taskId
+taskIdToTaskSetId(tid) = taskSet.taskSet.id
+taskIdToExecutorId(tid) = execId
+executorsByHost(host) += execId
+availableCpus(i) -= CPUS_PER_TASK
+assert(availableCpus(i) = 0)
+launchedTask = true
+  }
+} catch {
+  case e: TaskNotSerializableException = {
--- End diff --

What about scenarios where you have multiple concurrent jobs (e.g. in an 
environment like Databricks Cloud, Spark Jobserver, etc)?  I agree that the job 
associated with this task set is doomed, but other jobs should still be able to 
make progress and those jobs' task sets might still be schedulable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-05 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-68787775
  
@yu-iskw @rnowling, I asked @freeman-lab to make one pass on this PR. Let's 
ping him :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3638#issuecomment-68787801
  
  [Test build #25064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25064/consoleFull)
 for   PR 3638 at commit 
[`b2a430d`](https://github.com/apache/spark/commit/b2a430d9f3fc8dac3c2a20aab6bd07bae8f17691).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68787824
  
This looks like a legitimate test failure.  Ther AMPLab webserver is having 
some issues today, so here's a different link to reach the same test result: 
https://hadrian.ist.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25060/testReport/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread ksakellis

Github user ksakellis commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68788875
  
Yes it is. Not sure why I changed the #of cores between the two commits in 
the unit test - weird. Anyways. it has been fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68789356
  
  [Test build #25065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25065/consoleFull)
 for   PR 3711 at commit 
[`776d743`](https://github.com/apache/spark/commit/776d743c16cf3506b5c26983836c90066c365ee7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3711#discussion_r22495998
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorDetails.scala ---
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler.cluster
+
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: DeveloperApi ::
+ * Stores information about an executor to pass from the scheduler to 
SparkListeners.
+ */
+@DeveloperApi
+class ExecutorDetails(
--- End diff --

Should this be `ExecutorInfo`, consistent with `TaskInfo` and `StageInfo`, 
etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68798860
  
  [Test build #25065 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25065/consoleFull)
 for   PR 3711 at commit 
[`776d743`](https://github.com/apache/spark/commit/776d743c16cf3506b5c26983836c90066c365ee7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class SparkListenerAdapter implements SparkListener `
  * `case class SparkListenerExecutorAdded(executorId: String, 
executorDetails: ExecutorDetails)`
  * `case class SparkListenerExecutorRemoved(executorId: String, 
executorDetails: ExecutorDetails)`
  * `class ExecutorDetails(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5073] spark.storage.memoryMapThreshold ...

2015-01-05 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3900#issuecomment-68801129
  
Agree with 2MB with the caveat that this could cause some slowdown for the 
other code path (reading cache blocks from disk). However, memory mapping 
frequently can be dangerous (it was actually due to the JVM dying with a SIGBUS 
error that we added the threshold in the shuffle code path), which seems like a 
worse problem to have than the possibility of an undetermined slowdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2015-01-05 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3801#discussion_r22491068
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -319,102 +318,141 @@ class CheckpointSuite extends TestSuiteBase {
   // failure, are re-processed or not.
   test(recovery with file input stream) {
 // Set up the streaming context and input streams
+val batchDuration = Seconds(2)  // Due to 1-second resolution of 
setLastModified() on some OS's.
 val testDir = Utils.createTempDir()
-var ssc = new StreamingContext(master, framework, Seconds(1))
-ssc.checkpoint(checkpointDir)
-val fileStream = ssc.textFileStream(testDir.toString)
-// Making value 3 take large time to process, to ensure that the master
-// shuts down in the middle of processing the 3rd batch
-val mappedStream = fileStream.map(s = {
-  val i = s.toInt
-  if (i == 3) Thread.sleep(2000)
-  i
-})
-
-// Reducing over a large window to ensure that recovery from master 
failure
-// requires reprocessing of all the files seen before the failure
-val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), 
Seconds(1))
-val outputBuffer = new ArrayBuffer[Seq[Int]]
-var outputStream = new TestOutputStream(reducedStream, outputBuffer)
-outputStream.register()
-ssc.start()
-
-// Create files and advance manual clock to process them
-// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
-Thread.sleep(1000)
-for (i - Seq(1, 2, 3)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  // wait to make sure that the file is written such that it gets 
shown in the file listings
-  Thread.sleep(1000)
+val outputBuffer = new ArrayBuffer[Seq[Int]] with 
SynchronizedBuffer[Seq[Int]]
+
+def writeFile(i: Int, clock: ManualClock): Unit = {
+  val file = new File(testDir, i.toString)
+  Files.write(i + \n, file, Charsets.UTF_8)
+  assert(file.setLastModified(clock.currentTime()))
+  // Check that the file's modification date is actually the value we 
wrote, since rounding or
+  // truncation will break the test:
+  assert(file.lastModified() === clock.currentTime())
 }
-logInfo(Output =  + outputStream.output.mkString(,))
-assert(outputStream.output.size  0, No files processed before 
restart)
-ssc.stop()
 
-// Verify whether files created have been recorded correctly or not
-var fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-def recordedFiles = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
-
-// Create files while the master is down
-for (i - Seq(4, 5, 6)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  Thread.sleep(1000)
+def recordedFiles(ssc: StreamingContext): Seq[Int] = {
+  val fileInputDStream =
+ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, 
_, _]]
+  val filenames = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
+  filenames.map(_.split(File.separator).last.toInt).toSeq.sorted
 }
 
-// Recover context from checkpoint file and verify whether the files 
that were
-// recorded before failure were saved and successfully recovered
-logInfo(*** RESTARTING )
-ssc = new StreamingContext(checkpointDir)
-fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
+try {
+  // This is a var because it's re-assigned when we restart from a 
checkpoint
+  var clock: ManualClock = null
+  withStreamingContext(new StreamingContext(conf, batchDuration)) { 
ssc =
+ssc.checkpoint(checkpointDir)
+clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
+val waiter = new StreamingTestWaiter(ssc)
+val fileStream = ssc.textFileStream(testDir.toString)
+// Make value 3 take a large time to process, to ensure that the 
driver
+// shuts down in the middle of processing the 3rd batch
+val mappedStream = fileStream.map(s = {
+  val i = s.toInt
+  if (i == 3) Thread.sleep(4000)
+  i
+

[GitHub] spark pull request: [SPARK-5052] Add common/base classes to fix gu...

2015-01-05 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3874#issuecomment-68786110
  
Latest version LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5052] Add common/base classes to fix gu...

2015-01-05 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3874#issuecomment-68789982
  
Although further creep of the unshading-of-the-shading feels risky, it 
seems to resolve a problem, and is in principle OK on the same grounds that 
unshading `Optional` is. I'm still puzzled about why using Guava 14 isn't 
working but wouldn't argue with solving a problem this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68795630
  
  [Test build #25066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25066/consoleFull)
 for   PR 3803 at commit 
[`317b6d1`](https://github.com/apache/spark/commit/317b6d1dc45f0706987c3258beaa64be08df4b3c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68796368
  
I had some minor comments around naming, but overall this looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3638#issuecomment-68797550
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3638#issuecomment-68797542
  
  [Test build #25064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25064/consoleFull)
 for   PR 3638 at commit 
[`b2a430d`](https://github.com/apache/spark/commit/b2a430d9f3fc8dac3c2a20aab6bd07bae8f17691).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4921. TaskSetManager.dequeueTask returns...

2015-01-05 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3816#issuecomment-68798497
  
My conclusion (sorry if it was unclear above) was that dequeueTask 
returning NO_PREF instead of PROCESS_LOCAL should have no effect at all.  I 
still think it's worth changing for clarity, but it's obviously less important.

In what cases would dequeueTask return NO_PREF tasks when 
maxLocality=RACK_LOCAL or ANY?  My understanding is that, in a single 
resourceOffers call, dequeueTask gets called multiple times in order of 
TaskLocality, so any NO_PREF tasks would be returned when it's called with 
maxLocality=NO_PREF.  And none would remain when ANY and RACK_LOCAL come around.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68798867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25065/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5040][SQL] Support expressing unresolve...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3862#issuecomment-68801752
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...

2015-01-05 Thread freeman-lab

Github user freeman-lab commented on the pull request:

https://github.com/apache/spark/pull/2906#issuecomment-68794407
  
Hey all, thanks for the nudge =) I've been going through it, will get you 
feedback ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3711#discussion_r22495242
  
--- Diff: core/src/main/java/org/apache/spark/SparkListenerAdapter.java ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark;
+
+import org.apache.spark.scheduler.SparkListener;
+import org.apache.spark.scheduler.SparkListenerApplicationEnd;
+import org.apache.spark.scheduler.SparkListenerApplicationStart;
+import org.apache.spark.scheduler.SparkListenerBlockManagerAdded;
+import org.apache.spark.scheduler.SparkListenerBlockManagerRemoved;
+import org.apache.spark.scheduler.SparkListenerEnvironmentUpdate;
+import org.apache.spark.scheduler.SparkListenerExecutorAdded;
+import org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate;
+import org.apache.spark.scheduler.SparkListenerExecutorRemoved;
+import org.apache.spark.scheduler.SparkListenerJobEnd;
+import org.apache.spark.scheduler.SparkListenerJobStart;
+import org.apache.spark.scheduler.SparkListenerStageCompleted;
+import org.apache.spark.scheduler.SparkListenerStageSubmitted;
+import org.apache.spark.scheduler.SparkListenerTaskEnd;
+import org.apache.spark.scheduler.SparkListenerTaskGettingResult;
+import org.apache.spark.scheduler.SparkListenerTaskStart;
+import org.apache.spark.scheduler.SparkListenerUnpersistRDD;
+
+/**
+ * Java clients should extend this class instead of implementing
+ * SparkListener directly. This is to prevent java clients
+ * from breaking when new events are added to the SparkListener
+ * trait.
+ *
+ * This is a concrete class instead of abstract to enforce
+ * new events get added to both the SparkListener and this adapter
+ * in lockstep.
+ */
+public class SparkListenerAdapter implements SparkListener {
--- End diff --

What about calling this `JavaSparkListener` - that's what we've tended to 
use in the past for things that were supposed to be drop in substitutes for 
Scala classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5093] Set spark.network.timeout to 120s...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3903#issuecomment-68794632
  
  [Test build #25063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25063/consoleFull)
 for   PR 3903 at commit 
[`7c2138e`](https://github.com/apache/spark/commit/7c2138e8c5353490d5037e92e45d2516f34b3170).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  SparkSubmit.printErrorAndExit(sCannot load main class 
from JAR $primaryResource)`
  * `class BinaryClassificationMetrics(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab

Github user freeman-lab commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22495402
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with 
BeforeAndAfter {
 }
   }
 
+  def testBinaryRecordsStream() {
+var ssc: StreamingContext = null
+val testDir: File = null
+try {
+  val testDir = Utils.createTempDir()
+
+  Thread.sleep(1000)
+  // Set up the streaming context and input streams
+  val newConf = conf.clone.set(
+spark.streaming.clock, 
org.apache.spark.streaming.util.SystemClock)
--- End diff --

Ok great, I'll wait for your PR to be merged and then refactor this test 
accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5093] Set spark.network.timeout to 120s...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3903#issuecomment-68794641
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25063/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

2015-01-05 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-68795001
  
@aarondav


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab

Github user freeman-lab commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22495425
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with 
BeforeAndAfter {
 }
   }
 
+  def testBinaryRecordsStream() {
--- End diff --

Good catch, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab

Github user freeman-lab commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22496437
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -373,6 +393,25 @@ class StreamingContext private[streaming] (
   }
 
   /**
+   * Create an input stream that monitors a Hadoop-compatible filesystem
+   * for new files and reads them as flat binary files, assuming a fixed 
length per record,
+   * generating one byte array per record. Files must be written to the 
monitored directory
+   * by moving them from another location within the same file system. 
File names
+   * starting with . are ignored.
+   * @param directory HDFS directory to monitor for new file
+   * @param recordLength length of each record in bytes
+   */
+  def binaryRecordsStream(
+  directory: String,
+  recordLength: Int): DStream[Array[Byte]] = {
+val conf = sc_.hadoopConfiguration
+conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, 
recordLength)
+val br = fileStream[LongWritable, BytesWritable, 
FixedLengthBinaryInputFormat](directory, conf)
+val data = br.map{ case (k, v) = v.getBytes}
--- End diff --

Thanks for flagging this, I wasn't aware of that behavior of ``getBytes``. 
I think that, as you suggest, both here and in ``binaryRecords()`` it's not a 
problem in practice. The BytesWritable that comes from the 
FixedLengthBinaryInputFormat will always be backed by a Byte array that's of 
the fixed length. For consistency and good practice I'm happy to make the 
change from #2712 both here and in the other method. Or we just add a comment. 
Let me know which you'd prefer. One concern might be performance effects, as 
mentioned in the other PR? @sryza might have thoughts. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5093] Set spark.network.timeout to 120s...

2015-01-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3903


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5040][SQL] Support expressing unresolve...

2015-01-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3862


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4804] StringContext method to allow Str...

2015-01-05 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3649#issuecomment-68802004
  
Now that #3862 has been merged, can we close this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5052] Add common/base classes to fix gu...

2015-01-05 Thread elmer-garduno

Github user elmer-garduno commented on the pull request:

https://github.com/apache/spark/pull/3874#issuecomment-68785782
  
Thanks, that worked, I updated the PR to reflect those changes. And here is 
a list of the actual classes that get included into the jar:

jar tf 
assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop1.2.1.jar |grep 
com.google.common.base
com/google/common/base/
com/google/common/base/Absent.class
com/google/common/base/Function.class
com/google/common/base/Optional$1$1.class
com/google/common/base/Optional$1.class
com/google/common/base/Optional.class
com/google/common/base/Present.class
com/google/common/base/Supplier.class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Support for Mesos DockerInfo

2015-01-05 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-68791982
  
@ash211 Can you take a look at this patch again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2572] Delete the local dir on executor ...

2015-01-05 Thread tnachen

Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/1480#issuecomment-68792088
  
@watermen Can you update the patch as @andrewor14 mentioned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3711#discussion_r22495870
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -106,6 +108,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val actorSyste
   logDebug(sDecremented number of pending executors 
($numPendingExecutors left))
 }
   }
+  listenerBus.post(SparkListenerExecutorAdded(executorId, data))
--- End diff --

Don't we need to do the equivalent in `MesosSchedulerBackend`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

2015-01-05 Thread freeman-lab

Github user freeman-lab commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68797365
  
Thanks for the review! I'll wait for @JoshRosen 's PR to merge and then 
update the test here. And will wait for your thoughts on the `getBytes` issue. 
Otherwise, I think everything's addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-05 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3798#discussion_r22498258
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala ---
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd.kafka
+
+import scala.reflect.{classTag, ClassTag}
+
+import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.NextIterator
+
+import java.util.Properties
+import kafka.api.FetchRequestBuilder
+import kafka.common.{ErrorMapping, TopicAndPartition}
+import kafka.consumer.{ConsumerConfig, SimpleConsumer}
+import kafka.message.{MessageAndMetadata, MessageAndOffset}
+import kafka.serializer.Decoder
+import kafka.utils.VerifiableProperties
+
+case class KafkaRDDPartition(
+  override val index: Int,
+  topic: String,
+  partition: Int,
+  fromOffset: Long,
+  untilOffset: Long
+) extends Partition
+
+/** A batch-oriented interface for consuming from Kafka.
+  * Each given Kafka topic/partition corresponds to an RDD partition.
+  * Starting and ending offsets are specified in advance,
+  * so that you can control exactly-once semantics.
+  * For an easy interface to Kafka-managed offsets,
+  *  see {@link org.apache.spark.rdd.kafka.KafkaCluster}
+  * @param kafkaParams Kafka a 
href=http://kafka.apache.org/documentation.html#configuration;
+  * configuration parameters/a.
+  *   Requires metadata.broker.list or bootstrap.servers to be set 
with Kafka broker(s),
+  *   NOT zookeeper servers, specified in host1:port1,host2:port2 form.
+  * @param fromOffsets per-topic/partition Kafka offsets defining the 
(inclusive)
+  *  starting point of the batch
+  * @param untilOffsets per-topic/partition Kafka offsets defining the 
(exclusive)
+  *  ending point of the batch
+  * @param messageHandler function for translating each message into the 
desired type
+  */
+class KafkaRDD[
+  K: ClassTag,
+  V: ClassTag,
+  U : Decoder[_]: ClassTag,
+  T : Decoder[_]: ClassTag,
+  R: ClassTag](
+sc: SparkContext,
+val kafkaParams: Map[String, String],
+val fromOffsets: Map[TopicAndPartition, Long],
+val untilOffsets: Map[TopicAndPartition, Long],
+messageHandler: MessageAndMetadata[K, V] = R
+  ) extends RDD[R](sc, Nil) with Logging {
+
+  assert(fromOffsets.keys == untilOffsets.keys,
+Must provide both from and until offsets for each topic/partition)
+
+  override def getPartitions: Array[Partition] = 
fromOffsets.zipWithIndex.map { kvi =
+val ((tp, from), index) = kvi
+new KafkaRDDPartition(index, tp.topic, tp.partition, from, 
untilOffsets(tp))
+  }.toArray
+
+  override def compute(thePart: Partition, context: TaskContext) = {
+val part = thePart.asInstanceOf[KafkaRDDPartition]
+if (part.fromOffset = part.untilOffset) {
+  log.warn(Beginning offset is same or after ending offset  +
+sskipping ${part.topic} ${part.partition})
+  Iterator.empty
+} else {
+  new NextIterator[R] {
+context.addTaskCompletionListener{ context = closeIfNeeded() }
+
+val kc = new KafkaCluster(kafkaParams)
+log.info(sComputing topic ${part.topic}, partition 
${part.partition}  +
+  soffsets ${part.fromOffset} - ${part.untilOffset})
+val keyDecoder = 
classTag[U].runtimeClass.getConstructor(classOf[VerifiableProperties])
+  .newInstance(kc.config.props)
+  .asInstanceOf[Decoder[K]]
+val valueDecoder = 
classTag[T].runtimeClass.getConstructor(classOf[VerifiableProperties])
+  .newInstance(kc.config.props)
+  .asInstanceOf[Decoder[V]]
+val consumer: SimpleConsumer = kc.connectLeader(part.topic, 
part.partition).fold(
+  errs = throw new Exception(
+

[GitHub] spark pull request: [SPARK-5050][Mllib] Add unit test for sqdist

2015-01-05 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3869#discussion_r22498301
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -175,6 +177,33 @@ class VectorsSuite extends FunSuite {
 assert(v.size === x.rows)
   }
 
+  test(sqdist) {
+val random = new Random(System.nanoTime())
+for (m - 1 until 1000 by 10) {
+  val nnz = random.nextInt(m) + 1
+
+  val indices1 = random.shuffle(0 to m - 1).toArray.slice(0, 
nnz).sorted
--- End diff --

OH interesting...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4660: Use correct default classloader in...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3840#issuecomment-68785091
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25059/consoleFull)
 for   PR 3840 at commit 
[`86bc5eb`](https://github.com/apache/spark/commit/86bc5ebdfb2c5f0d58ffeaf184f94f60923fe676).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4660: Use correct default classloader in...

2015-01-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3840#issuecomment-68785101
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25059/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] SPARK-3505: Augmenting SparkStream...

2015-01-05 Thread xiliu82

Github user xiliu82 commented on the pull request:

https://github.com/apache/spark/pull/2267#issuecomment-68785881
  
I will try to do that this week.

 On Jan 5, 2015, at 11:50 AM, Tathagata Das notificati...@github.com 
wrote:
 
 Ping, for updating this PR.
 
 â
 Reply to this email directly or view it on GitHub 
https://github.com/apache/spark/pull/2267#issuecomment-68765456.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4857] [CORE] Adds Executor membership e...

2015-01-05 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3711#issuecomment-68787217
  
  [Test build #25060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25060/consoleFull)
 for   PR 3711 at commit 
[`6e06a79`](https://github.com/apache/spark/commit/6e06a79c8511ea71182e131ac2a3924aa60f1153).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class SparkListenerAdapter implements SparkListener `
  * `case class SparkListenerExecutorAdded(executorId: String, 
executorDetails: ExecutorDetails)`
  * `case class SparkListenerExecutorRemoved(executorId: String, 
executorDetails: ExecutorDetails)`
  * `class ExecutorDetails(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 323 matches

Mail list logo