[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40686907
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14201/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/422#issuecomment-40686906
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14202/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1332] Improve Spark Streaming's Network...

2014-04-17 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/300#discussion_r11721172
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/NetworkReceiverExecutor.scala
 ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.receiver
+
+import java.nio.ByteBuffer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{Logging, SparkConf}
+import org.apache.spark.storage.StreamBlockId
+import java.util.concurrent.CountDownLatch
+import scala.concurrent._
+import ExecutionContext.Implicits.global
+
+/**
+ * Abstract class that is responsible for executing a NetworkReceiver in 
the worker.
+ * It provides all the necessary interfaces for handling the data received 
by the receiver.
+ */
+private[streaming] abstract class NetworkReceiverExecutor(
--- End diff --

Usually the term `Executor` is used for something you dispatch tasks to 
(like the spark executor or java's thread pool executor). This seems more like 
a class that manages the lifecycle of a receiver... what about naming this 
`NetworkReceieverSupervisor`?

Also - is this made an abstract class for the purpose of testing - or do 
you expect to have multiple implementations of this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40686904
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/422#issuecomment-40686905
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1332] Improve Spark Streaming's Network...

2014-04-17 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/300#discussion_r11721384
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/NetworkReceiverExecutor.scala
 ---
@@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.receiver
+
+import java.nio.ByteBuffer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.{Logging, SparkConf}
+import org.apache.spark.storage.StreamBlockId
+import java.util.concurrent.CountDownLatch
+import scala.concurrent._
+import ExecutionContext.Implicits.global
+
+/**
+ * Abstract class that is responsible for executing a NetworkReceiver in 
the worker.
+ * It provides all the necessary interfaces for handling the data received 
by the receiver.
+ */
+private[streaming] abstract class NetworkReceiverExecutor(
+receiver: NetworkReceiver[_],
+conf: SparkConf
+  ) extends Logging {
+
+
+  /** Enumeration to identify current state of the StreamingContext */
+  object NetworkReceiverState extends Enumeration {
+type CheckpointState = Value
+val Initialized, Started, Stopped = Value
+  }
+  import NetworkReceiverState._
+
+  // Attach the executor to the receiver
+  receiver.attachExecutor(this)
+
+  /** Receiver id */
+  protected val receiverId = receiver.receiverId
+
+  /** Message associated with the stopping of the receiver */
+  protected var stopMessage = ""
+
+  /** Exception associated with the stopping of the receiver */
+  protected var stopException: Throwable = null
+
+  /** Has the receiver been marked for stop. */
+  private val stopLatch = new CountDownLatch(1)
+
+  /** Time between a receiver is stopped */
+  private val restartDelay = 
conf.getInt("spark.streaming.receiverRestartDelay", 2000)
+
+  /** State of the receiver */
+  private[streaming] var receiverState = Initialized
+
+  /** Push a single data item to backend data store. */
+  def pushSingle(data: Any)
+
+  /** Push a byte buffer to backend data store. */
+  def pushBytes(
+  bytes: ByteBuffer,
+  optionalMetadata: Option[Any],
+  optionalBlockId: Option[StreamBlockId]
+)
+
+  /** Push an iterator of objects as a block to backend data store. */
+  def pushIterator(
+  iterator: Iterator[_],
+  optionalMetadata: Option[Any],
+  optionalBlockId: Option[StreamBlockId]
+)
+
+  /** Push an ArrayBuffer of object as a block to back data store. */
--- End diff --

of objects


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40687475
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40687704
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Clean up and simplify Spark configuration

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11721526
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -333,15 +325,29 @@ trait ClientBase extends Logging {
 if (useConcurrentAndIncrementalGC) {
   // In our expts, using (default) throughput collector has severe 
perf ramifications in
   // multi-tenant machines
-  JAVA_OPTS += " -XX:+UseConcMarkSweepGC "
-  JAVA_OPTS += " -XX:+CMSIncrementalMode "
-  JAVA_OPTS += " -XX:+CMSIncrementalPacing "
-  JAVA_OPTS += " -XX:CMSIncrementalDutyCycleMin=0 "
-  JAVA_OPTS += " -XX:CMSIncrementalDutyCycle=10 "
+  JAVA_OPTS += "-XX:+UseConcMarkSweepGC"
+  JAVA_OPTS += "-XX:+CMSIncrementalMode"
+  JAVA_OPTS += "-XX:+CMSIncrementalPacing"
+  JAVA_OPTS += "-XX:CMSIncrementalDutyCycleMin=0"
+  JAVA_OPTS += "-XX:CMSIncrementalDutyCycle=10"
 }
 
-if (env.isDefinedAt("SPARK_JAVA_OPTS")) {
-  JAVA_OPTS += " " + env("SPARK_JAVA_OPTS")
+// TODO: it might be nicer to pass these as an internal environment 
variable rather than
+// as Java options, due to complications with string parsing of nested 
quotes.
+if (args.amClass == classOf[ExecutorLauncher].getName) {
+  // If we are being launched in client mode, forward the spark-conf 
options
+  // onto the executor launcher
+  for ((k, v) <- sparkConf.getAll) {
+JAVA_OPTS += "-D" + k + "=" + "\\\"" + v + "\\\""
--- End diff --

Oop you're right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40687713
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Clean up and simplify Spark configuration

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/299#discussion_r11721568
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -313,13 +305,13 @@ trait ClientBase extends Logging {
 
 val amMemory = calculateAMMemory(newApp)
 
-var JAVA_OPTS = ""
+var JAVA_OPTS = ListBuffer[String]()
--- End diff --

I think this can be a val


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1332] Improve Spark Streaming's Network...

2014-04-17 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/300#discussion_r11721584
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/NetworkReceiver.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.receiver
+
+import java.nio.ByteBuffer
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * Abstract class of a receiver that can be run on worker nodes to receive 
external data. A
+ * custom receiver can be defined by defining the functions onStart() and 
onStop(). onStart()
+ * should define the setup steps necessary to start receiving data,
+ * and onStop() should define the cleanup steps necessary to stop 
receiving data. A custom
+ * receiver would look something like this.
+ *
+ * class MyReceiver(storageLevel) extends 
NetworkReceiver[String](storageLevel) {
+ *   def onStart() {
+ * // Setup stuff (start threads, open sockets, etc.) to start 
receiving data.
+ * // Must start new thread to receive data, as onStart() must be 
non-blocking.
+ *
+ * // Call store(...) in those threads to store received data into 
Spark's memory.
+ *
+ * // Call stop(...), restart() or reportError(...) on any thread 
based on how
+ * // different errors should be handled.
+ *
+ * // See corresponding method documentation for more details.
+ *   }
+ *
+ *   def onStop() {
+ * // Cleanup stuff (stop threads, close sockets, etc.) to stop 
receiving data.
+ *   }
+ * }
+ */
+abstract class NetworkReceiver[T](val storageLevel: StorageLevel) extends 
Serializable {
+
+  /**
+   * This method is called by the system when the receiver is started. 
This function
+   * must initialize all resources (threads, buffers, etc.) necessary for 
receiving data.
+   * This function must be non-blocking, so receiving the data must occur 
on a different
+   * thread. Received data can be stored with Spark by calling 
`store(data)`.
+   *
+   * If there are errors in threads started here, then following options 
can be done
+   * (i) `reportError(...)` can be called to report the error to the 
driver.
+   * The receiving of data will continue uninterrupted.
+   * (ii) `stop(...)` can be called to stop receiving data. This will call 
`onStop()` to
+   * clear up all resources allocated (threads, buffers, etc.) during 
`onStart()`.
+   * (iii) `restart(...)` can be called to restart the receiver. This will 
call `onStop()`
+   * immediately, and then `onStart()` after a delay.
+   */
+  def onStart()
+
+  /**
+   * This method is called by the system when the receiver is stopped. All 
resources
+   * (threads, buffers, etc.) setup in `onStart()` must be cleaned up in 
this method.
+   */
+  def onStop()
+
+  /** Override this to specify a preferred location (hostname). */
+  def preferredLocation : Option[String] = None
+
+  /** Store a single item of received data to Spark's memory. */
+  def store(dataItem: T) {
+executor.pushSingle(dataItem)
+  }
+
+  /** Store a sequence of received data into Spark's memory. */
+  def store(dataBuffer: ArrayBuffer[T]) {
+executor.pushArrayBuffer(dataBuffer, None, None)
+  }
+
+  /**
+   * Store a sequence of received data into Spark's memory.
+   * The metadata will be associated with this block of data
+   * for being used in the corresponding InputDStream.
+   */
+  def store(dataBuffer: ArrayBuffer[T], metadata: Any) {
--- End diff --

There is a class of ingestion sources like flume that will allow you to 
receive data before fully "acknowledging" it in order to allow transactional 
semantics.

I'm not sure the current API here really supports using those, because it's 
not clear to the receiver implementer when the underlying blocks get 

[GitHub] spark pull request: SPARK-1382: Fix NPE in DStream.slice

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/365#issuecomment-40688205
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1382: Fix NPE in DStream.slice

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/365#issuecomment-40688206
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14203/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1512] improve spark sql to support tabl...

2014-04-17 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/428#issuecomment-40690569
  
hi, i update this patch to an example that query table with more than 22 
fields by implementing the Product interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40693683
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40693684
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14204/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/432

Reuses Row object in ExistingRdd.productToRowRdd()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark reuseRow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #432


commit 52acec9608af89e4f6ffdd8f98e1523c5459e76a
Author: Cheng Lian 
Date:   2014-04-17T07:43:54Z

Reuses Row object in ExistingRdd.productToRowRdd()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40694709
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40694718
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
GitHub user berngp opened a pull request:

https://github.com/apache/spark/pull/433

[SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN applicat...

...ion specific CP

The current implementation of 
`ClientBase.getDefaultYarnApplicationClasspath` inspects
the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` 
when it should
be really looking into `YarnConfiguration`. If the Application 
Configuration has no
`yarn.application.classpath` defined a NPE exception will be thrown.

Additional Changes include:
* ScalaBuild now points to Scalatest 2.1.3
* ScalaBuild project "root" renamed as "spark"
* Test Suite for ClientBase added
* Fixes for scalastyle in other yarn files.

[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522

Author  : bernardo.gomezpala...@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/berngp/spark feature/SPARK-1522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/433.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #433


commit b08beb96c4437f095c31f53a0877eb95aebab637
Author: Bernardo Gomez Palacio 
Date:   2014-04-17T09:35:52Z

[SPARK-1522] : YARN ClientBase throws a NPE if there is no YARN application 
specific CP

The current implementation of 
`ClientBase.getDefaultYarnApplicationClasspath` inspects
the `MRJobConfig` class for the field `DEFAULT_YARN_APPLICATION_CLASSPATH` 
when it should
be really looking into `YarnConfiguration`. If the Application 
Configuration has no
`yarn.application.classpath` defined a NPE exception will be thrown.

Additional Changes include:
* ScalaBuild now points to Scalatest 2.1.3
* ScalaBuild project "root" renamed as "spark"
* Test Suite for ClientBase added
* Fixes for scalastyle in other yarn files.

[ticket: SPARK-1522] : https://issues.apache.org/jira/browse/SPARK-1522

Author  : bernardo.gomezpala...@gmail.com
Testing : SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40699148
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11725943
  
--- Diff: project/SparkBuild.scala ---
@@ -52,7 +52,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = "jvm-1.6"
   val JAVAC_JVM_VERSION = "1.6"
 
-  lazy val root = Project("root", file("."), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

Why change this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726091
  
--- Diff: project/SparkBuild.scala ---
@@ -52,7 +52,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = "jvm-1.6"
   val JAVAC_JVM_VERSION = "1.6"
 
-  lazy val root = Project("root", file("."), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

When importing to IntelliJ Idea as an SBT project it uses the name of the 
Projects and "root" lacks a bit of context. I presume the usage of the word 
"root" has been based on the SBT multimodule example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726156
  
--- Diff: project/SparkBuild.scala ---
@@ -263,7 +263,7 @@ object SparkBuild extends Build {
 "org.eclipse.jetty" % "jetty-util" % jettyVersion,
 "org.eclipse.jetty" % "jetty-plus" % jettyVersion,
 "org.eclipse.jetty" % "jetty-security" % jettyVersion,
-"org.scalatest"%% "scalatest"   % "1.9.1"  % "test",
+"org.scalatest"%% "scalatest"   % "2.1.3"  % "test",
--- End diff --

Most likely the upgrade to scalatest 2.1.3 is causing failures on the 
bellow Test Suites which I will fix.

streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
spark/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40701225
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14205/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40701224
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11726876
  
--- Diff: project/SparkBuild.scala ---
@@ -52,7 +52,7 @@ object SparkBuild extends Build {
   val SCALAC_JVM_VERSION = "jvm-1.6"
   val JAVAC_JVM_VERSION = "1.6"
 
-  lazy val root = Project("root", file("."), settings = rootSettings) 
aggregate(allProjects: _*)
--- End diff --

So better?
```scala 
lazy val spark = Project("spark", file("."), settings = rootSettings) 
aggregate(allProjects: _*)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40702728
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40702744
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/85#issuecomment-40704558
  
 Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/85#issuecomment-40704573
  
Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1523: improve the readability of code in...

2014-04-17 Thread CodingCat
GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/434

SPARK-1523: improve the readability of code in AkkaUtil

Actually it is separated from https://github.com/apache/spark/pull/85 as 
suggested by @rxin

compare 


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L122
 

and 


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/AkkaUtils.scala#L117

the first one use get and then toLong, the second one getLongbetter to 
make them consistent

very very small fix

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK-1523

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #434


commit 0e86f3fc3ccb97ccecf3130b774c77c5491ab26d
Author: CodingCat 
Date:   2014-04-17T11:50:16Z

improve the readability of code in AkkaUtil




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1523: improve the readability of code in...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/434#issuecomment-40705778
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1523: improve the readability of code in...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/434#issuecomment-40705786
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40707049
  
Most of the changes in the diff look unrelated to what is mentioned in the 
summary.
In addition, they introduce additional bugs.

Please cleanup the diffs and include only what is required to fix the issue 
without unrelated changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/85#issuecomment-40708532
  
Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/85#issuecomment-40708536
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14207/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1523: improve the readability of code in...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/434#issuecomment-40708534
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1523: improve the readability of code in...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/434#issuecomment-40708537
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14208/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40708535
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14206/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40708533
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1483: Rename minSplits to minPartitions ...

2014-04-17 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/430#issuecomment-40709918
  
it's weird...always timeoutbecause I modified some core and 
hive-related file in the same PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: pom.xml modifications added to SparkBuild.scal...

2014-04-17 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/435

pom.xml modifications added to SparkBuild.scala



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SparkBuild

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #435


commit 6851becba7058571816848326713fa8d08998e5d
Author: witgo 
Date:   2014-04-17T13:45:48Z

Maintain consistent SparkBuild.scala, pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: pom.xml modifications added to SparkBuild.scal...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/435#issuecomment-40715978
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1395] Allow "local:" URIs to work on Ya...

2014-04-17 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/303#issuecomment-40727621
  
I merged this to master and branch-1.0  Thanks @vanzin!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1395] Allow "local:" URIs to work on Ya...

2014-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40734989
  
@mridulm reverted the changes not related with the issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11741951
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
--- End diff --

please follow style guide for indentations: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11741991
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) => s 
+  case None => f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH")
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH") 
+  if( field.getType  == classOf[String] )
--- End diff --

remove extra space after getType before ==



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1527] change rootDir*.getName to rootDi...

2014-04-17 Thread advancedxy
GitHub user advancedxy opened a pull request:

https://github.com/apache/spark/pull/436

[SPARK-1527] change rootDir*.getName to rootDir*.getAbsolutePath 

JIRA issue: [SPARK-1527](https://issues.apache.org/jira/browse/SPARK-1527)

getName() only gets the last component of the file path. When deleting 
test-generated directories,
we should pass the generated directory's absolute path to DiskBlockManager.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/advancedxy/spark SPARK-1527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/436.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #436


commit 4678bab3e4c972b3f92450886ec5ef7b91b8cd37
Author: Ye Xianjin 
Date:   2014-04-17T16:50:06Z

change rootDir*.getname to rootDir*.getAbsolutePath so the temporary
directories are deleted when the test is finished.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1527] change rootDir*.getName to rootDi...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/436#issuecomment-40737330
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1527] change rootDir*.getName to rootDi...

2014-04-17 Thread advancedxy
Github user advancedxy commented on the pull request:

https://github.com/apache/spark/pull/436#issuecomment-40737449
  
As discussed with @srowen on JIRA, I think maybe we should review that 
relative paths and absolute paths are used  appropriately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40738339
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40738354
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745730
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) => s 
+  case None => f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH")
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH") 
+  if( field.getType  == classOf[String] )
 StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
+  else
 field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
-}
-  }
+  }.getOrElse(Array.empty[String])
--- End diff --

I prefer the old try-catch to the scala.util.Try, which hides the specific 
exceptions you catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745870
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) => s 
+  case None => f
--- End diff --

`Option(...).getOrElse(f)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745925
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
--- End diff --

No need to yield, as you're not returning anything. I would just do

```
elements.foreach { e =>
  // add to environment
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11745968
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
--- End diff --

Also, I would just make `f` a third parameter, and rename it `default`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746121
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
--- End diff --

Actually I would just get rid of this method. You don't need an additional 
level of indirection (but instead do this directly in `getYarnAppClasspath` and 
`getMRAppClasspath`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746226
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
+for ( c <- elements ) 
+  yield (YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim, File.pathSeparator))
   }
 
-  def getDefaultYarnApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
-  field.get(null).asInstanceOf[Array[String]]
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
+  protected[yarn] def getAppClasspathForKey(key:String, conf:Configuration)
+   (f: => Array[String]) : 
Array[String] = 
+Option(conf.getStrings(key)) match {
+  case Some(s) => s 
+  case None => f
 }
-  }
+  
+  def getDefaultYarnApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[YarnConfiguration].getField("DEFAULT_YARN_APPLICATION_CLASSPATH")
+  field.get(null).asInstanceOf[Array[String]]
+  }.getOrElse(Array.empty[String])
 
   /**
* In Hadoop 0.23, the MR application classpath comes with the YARN 
application
* classpath.  In Hadoop 2.0, it's an array of Strings, and in 2.2+ it's 
a String.
* So we need to use reflection to retrieve it.
*/
-  def getDefaultMRApplicationClasspath(): Array[String] = {
-try {
-  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH")
-  if (field.getType == classOf[String]) {
+  def getDefaultMRApplicationClasspath : Array[String] = Try {
+  val field = 
classOf[MRJobConfig].getField("DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH") 
+  if( field.getType  == classOf[String] )
 StringUtils.getStrings(field.get(null).asInstanceOf[String])
-  } else {
+  else
 field.get(null).asInstanceOf[Array[String]]
-  }
-} catch {
-  case err: NoSuchFieldError => null
-  case err: NoSuchFieldException => null
-}
-  }
+  }.getOrElse(Array.empty[String])
--- End diff --

Ah, I see what you're trying to do. I think a better way is to return 
`Option[Array[String]]`. If there's an exception, return None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746387
  
--- Diff: 
yarn/common/src/test/scala/org/apache/spark/deploy/yarn/ClientBaseSpec.scala ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import java.net.URI
+
+import org.scalatest.FreeSpec
--- End diff --

Could you use `FunSuite` to be more consistent with the other test suites? 
E.g. 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ui/UISuite.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746514
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
--- End diff --

I don't think you need to return anything here, so you can use `{` instead 
of `= {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11746787
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
--- End diff --

I actually prefer that you move all of the logic of `getAppClasspathForKey` 
and `getDefaultMRApplicationClasspath` into this method. Right now you have to 
jump a few levels to understand what's going on, and I think it'll be clearer 
if it's in one isolated place. (Same in getYarnAppClasspath)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/433#issuecomment-40747230
  
@berngp Thanks for doing this. I literally ran into this NPE yesterday in 
my own YARN cluster. It turns out I forgot to point YARN_CONF_DIR to the proper 
place, but running into a NPE did not leave any clue as to what the problem is 
(until I dug into the code, which is bad user experience). This PR is a much 
needed fix.

I left a couple of comments. As @tgraves mentioned, the style of this PR is 
inconsistent with the Spark style guide. Further, it would be good if we could 
remove several levels of indirection to make the code clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40747761
  
@andrewor14  any additional comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11747100
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
+val classPathElementsToAdd = getYarnAppClasspath(conf) ++ 
getMRAppClasspath(conf)
+addToAppClasspath(env, classPathElementsToAdd)
+classPathElementsToAdd
+  }
 
-val mrClasspathEntries = Option(conf.getStrings(
-  "mapreduce.application.classpath")).getOrElse(
-getDefaultMRApplicationClasspath())
-if (mrClasspathEntries != null) {
-  for (c <- mrClasspathEntries) {
-YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-  File.pathSeparator)
-  }
-}
+  protected[yarn] def getYarnAppClasspath(conf: Configuration) = 
+getAppClasspathForKey(YarnConfiguration.YARN_APPLICATION_CLASSPATH, 
conf)(getDefaultYarnApplicationClasspath)
+
+  protected[yarn] def getMRAppClasspath(conf: Configuration) =
+getAppClasspathForKey("mapreduce.application.classpath", 
conf)(getDefaultMRApplicationClasspath)
+
+  protected[yarn] def addToAppClasspath(env: HashMap[String, String], 
elements : Iterable[String]) {
--- End diff --

Also, in Spark we try not to use `protected[*]`, since it's not exactly 
clear what that does. I think these can just be `private`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/362#discussion_r11747228
  
--- Diff: docs/running-on-yarn.md ---
@@ -42,6 +42,7 @@ System Properties:
 * `spark.yarn.preserve.staging.files`, set to true to preserve the staged 
files(spark jar, app jar, distributed cache files) at the end of the job rather 
then delete them.
 * `spark.yarn.scheduler.heartbeat.interval-ms`, the interval in ms in 
which the Spark application master heartbeats into the YARN ResourceManager. 
Default is 5 seconds. 
 * `spark.yarn.max.executor.failures`, the maximum number of executor 
failures before failing the application. Default is the number of executors 
requested times 2 with minimum of 3.
+* `spark.yarn.historyServer.address`, the address of the Spark history 
server (i.e. host.com:18080). The address should not contain a scheme 
(http://). Defaults to not being set since the history server is an optional 
service. This address is given to the Yarn ResourceManager when the Spark 
application finishes to link the application from the ResourceManaer UI to the 
Spark history server UI. 
--- End diff --

ResourceManager* (last sentence)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40748111
  
One small typo, but other than that this LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40748387
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14209/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Reuses Row object in ExistingRdd.productToRowR...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/432#issuecomment-40748385
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40752477
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40752469
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11751772
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
--- End diff --

@andrewor14 that's actually going to be deprecated in a future version of 
scala, so the approach here is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1522] : YARN ClientBase throws a NPE if...

2014-04-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/433#discussion_r11753045
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -378,55 +378,48 @@ object ClientBase {
   val APP_JAR: String = "app.jar"
   val LOG4J_PROP: String = "log4j.properties"
 
-  // Based on code from org.apache.hadoop.mapreduce.v2.util.MRApps
-  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) {
-val classpathEntries = Option(conf.getStrings(
-  YarnConfiguration.YARN_APPLICATION_CLASSPATH)).getOrElse(
-getDefaultYarnApplicationClasspath())
-for (c <- classpathEntries) {
-  YarnSparkHadoopUtil.addToEnvironment(env, 
Environment.CLASSPATH.name, c.trim,
-File.pathSeparator)
-}
+  def populateHadoopClasspath(conf: Configuration, env: HashMap[String, 
String]) = {
--- End diff --

Oh I see. Should we specify `: Unit = {` in the new style then? Or is the 
Unit optional


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40761472
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14210/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40761471
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40762962
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40762975
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40763303
  
Updated patch places the python files in the Spark jar itself, so no 
additional build or configuration steps are required.  I've only had the chance 
to test it on a pseudo-distributed YARN cluster so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40763499
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40763512
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40765918
  
thanks, I committed this to master and branch-1.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/362


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1520] remove fastutil from dependencies

2014-04-17 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/437

[SPARK-1520] remove fastutil from dependencies

A quick fix for https://issues.apache.org/jira/browse/SPARK-1520

By excluding fastutil, we bring the number of files in the assembly jar 
back under 65536, so Java 7 won't create the assembly jar in zip64 format, 
which cannot be read by Java 6.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark remove-fastutil

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #437


commit 00f9bebb973d87499a6d360545574b3cb0b00b0a
Author: Xiangrui Meng 
Date:   2014-04-17T21:54:01Z

remove fastutil from dependencies




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1520] remove fastutil from dependencies

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/437#issuecomment-40767815
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1520] remove fastutil from dependencies

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/437#issuecomment-40767829
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40770822
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14211/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40770821
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40771150
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40771152
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14212/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1520] remove fastutil from dependencies

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/437#issuecomment-40771151
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1520] remove fastutil from dependencies

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/437#issuecomment-40771153
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14213/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1496: Have jarOfClass return Option[Stri...

2014-04-17 Thread pwendell
GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/438

SPARK-1496: Have jarOfClass return Option[String]

A simple change, mostly had to change a bunch of example code.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark jar-of-class

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/438.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #438


commit aa010ff087c3584094bafb9bb89a59b2af8c1591
Author: Patrick Wendell 
Date:   2014-04-17T22:46:37Z

SPARK-1496: Have jarOfClass return Option[String]




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1496: Have jarOfClass return Option[Stri...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/438#issuecomment-40771684
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1496: Have jarOfClass return Option[Stri...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/438#issuecomment-40771693
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/422#issuecomment-40773267
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/422#issuecomment-40773276
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40773808
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1004. PySpark on YARN

2014-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/30#issuecomment-40773861
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >