master attempted to re-register the worker and then took all workers as unregistered

2014-01-14 Thread Nan Zhu
Hi, all  

I’m trying to deploy spark in standalone mode, everything goes as usual,  

the webUI is accessible, the master node wrote some logs saying all workers are 
registered

14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started  
14/01/15 01:37:31 INFO ActorSystemImpl: 
RemoteServerStarted@akka://sparkMaster@172.31.36.93:7077
14/01/15 01:37:31 INFO Master: Starting Spark master at 
spark://172.31.36.93:7077
14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at 
http://ip-172-31-36-93.us-west-2.compute.internal:8080
14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE
14/01/15 01:37:34 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:37:34 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:37:34 INFO Master: Registering worker 
ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sparkwor...@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:37:34 INFO Master: Registering worker 
ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO Master: Registering worker 
ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sparkwor...@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:37:34 INFO Master: Registering worker 
ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sparkwor...@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:37:34 INFO Master: Registering worker 
ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO ActorSystemImpl: 
RemoteClientStarted@akka://sp...@ip-172-31-37-160.us-west-2.compute.internal:43086




However, when I launched an application, the master firstly “attempted to 
re-register the worker” and then said that all heartbeats are from 
“unregistered” workers. Can anyone told me what happened here?

14/01/15 01:38:44 INFO Master: Registering app ALS  
14/01/15 01:38:44 INFO Master: Registered app ALS with ID 
app-20140115013844-
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/0 on 
worker worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/1 on 
worker worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/2 on 
worker worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/3 on 
worker worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-/4 on 
worker worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709
14/01/15 01:38:44 INFO Master: Registering worker 
ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: 
akka://sparkwor...@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:38:44 INFO Master: Registering worker 
ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: 
akka://sparkwor...@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:38:44 INFO Master: Registering worker 
ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: 
akka://sparkwor...@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:38:44 INFO Master: Registering worker 
ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: 
akka://sparkwor...@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:38:44 INFO Master: Registering worker 
ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: 
akka://sparkwor...@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker 
worker-20140115013844-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker 
worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker 
worker-20140115013844-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 WARN Master: Got he

Re: Spark SequenceFile Java API Repeat Key Values

2014-01-14 Thread Michael Quinlan
Matei and Andrew,

Thank you both for your prompt responses. Matei is correct in that I am
attempting to cache a large RDD for repeated query.

I was able to implement your suggestion in a Scala version of the code,
which I've copied below. I should point out two minor details:
LongWritable.clone() is a private method and both the key and value need to
be "cloned" in order for the data to be cached correctly.

My attempt at a Java version wasn't as successful. If you don't mind, could
you please suggest a better way if it currently exists? This is mostly
educational since I already have a working version in Scala. I'm new to
both.

Regards,

Mike

Java:

public class App 
{
public static void main(String[] args) throws Exception {
if (args.length < 3) {
  System.err.println("Usage: SynthesisService  
");
  System.exit(1);
}

System.setProperty("spark.serializer",
"org.apache.spark.serializer.KryoSerializer");
   
System.setProperty("spark.kryo.registrator","spark.synthesis.service.Init.MyRegistrator");

JavaSparkContext ctx = new JavaSparkContext(args[0], 
"SynthesisService",
"~/spark-0.8.0-incubating",args[2]); 

//Load DataCube via Spark sequenceFile
JavaPairRDD temp_DataCube =
ctx.sequenceFile(args[1], 
LongWritable.class, LongWritable.class);

JavaRDD> DataCube;
DataCube = temp_DataCube.map(
new
Function2> ()
{
@Override
public Tuple2 
call(LongWritable key, LongWritable value) {
return (new Tuple2(new LongWritable(key.get()),
value));
}

});

-
COMPILATION ERROR : 
-
spark/synthesis/service/Init/App.java:[51,32] error: no suitable method
found for map(>>)
1 error

Scala:

package testspark

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.serializer.KryoRegistrator

import org.apache.hadoop.io.LongWritable

import com.esotericsoftware.kryo.Kryo

class MyRegistrator extends KryoRegistrator{
def registerClasses(kryo: Kryo){
kryo.register(classOf[LongWritable]);
kryo.register(classOf[Tuple2[LongWritable,LongWritable]]);
}
}

object ScalaSynthesisServer {

def pseudoClone(x: LongWritable, y: LongWritable):
(LongWritable,LongWritable) = {
return new Tuple2(new LongWritable(x.get()) , new 
LongWritable(y.get()))
}

def main(args: Array[String]) {
if (args.length < 3) {
System.err.println("Usage: ScalaSynthesisServer 
 
")
System.exit(1)
}

System.setProperty("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")

System.setProperty("spark.kryo.registrator","testspark.MyRegistrator")

val sc = new SparkContext(args(0),
"ScalaSynthesisServer","~/spark-0.8.0-incubating",List(args(2)))

val DataCube = sc.sequenceFile(args(1), classOf[LongWritable],
classOf[LongWritable]).map(a => pseudoClone(a._1,a._2))

DataCube.cache()

val list = DataCube.collect();

var x = 0; 
for( x <- list ){
println("Key= " + x._1 + " Value= " + x._2);
}
}
}




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SequenceFile-Java-API-Repeat-Key-Values-tp353p552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark writing to disk when there's enough memory?!

2014-01-14 Thread Matei Zaharia
Hey Majd,

I believe Shark sets up data to spill to disk, even though the default storage 
level in Spark is memory-only. In terms of those executors, it looks like data 
distribution was unbalanced across them, possibly due to data locality in HDFS 
(some of the executors may have had more data). One thing you can do to prevent 
that is set Spark's data locality delay for disk to 0 
(spark.locality.wait.node=0 and spark.locality.wait.rack=0). It will still 
respect memory locality but not try to optimize disk locality on HDFS.

Matei

On Jan 13, 2014, at 4:24 AM, mharwida  wrote:

> Hi All,
> 
> I'm creating a cached table in memory via Shark using the command:
> create table tablename_cached as select * from tablename;
> 
> Monitoring this via the Spark UI, I have noticed that data is being written
> to disk when there's clearly enough available memory on 2 of the worker
> nodes. Please refer to attached image. Cass4 and Cass3 have 3GB of available
> memory yet the data is being written to disk on the worker nodes which have
> used all their memory.
> 
>  
> 
>  
> 
> Could anyone shed a light on this please?
> 
> Thanks
> Majd
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-writing-to-disk-when-there-s-enough-memory-tp502.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Stalling during large iterative PySpark jobs

2014-01-14 Thread Matei Zaharia
Hi Jeremy,

If you look at the stdout and stderr files on that worker, do you see any 
earlier errors? I wonder if one of the Python workers crashed earlier.

It would also be good to run “top” and see if more memory is used during the 
computation. I guess the cached RDD itself fits in less than 50% of the RAM as 
you said?

Matei


On Jan 12, 2014, at 8:45 PM, Jeremy Freeman  wrote:

> I'm reliably getting a bug in PySpark where jobs with many iterative
> calculations on cached data stall out. 
> 
> Data is a folder of ~40 text files, each with 2 mil rows and 360 entries per
> row, total size is ~250GB. 
> 
> I'm testing with the KMeans analyses included as examples (though I see the
> same error on my own iterative algorithms). The scala version completes 50+
> iterations fine. In PySpark, it successfully completes 9 iterations, and
> then stalls. On the driver, I'll get this: 
> 
> java.net.NoRouteToHostException: Cannot assign requested address 
>at java.net.PlainSocketImpl.socketConnect(Native Method) 
>at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) 
>at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>  
>at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) 
>at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 
>at java.net.Socket.connect(Socket.java:579) 
>at java.net.Socket.connect(Socket.java:528) 
>at java.net.Socket.(Socket.java:425) 
>at java.net.Socket.(Socket.java:208) 
>at
> org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:328)
>  
>at
> org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
>  
>at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70) 
>at
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253) 
>at
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251) 
>at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) 
>at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) 
>at scala.collection.Iterator$class.foreach(Iterator.scala:772) 
>at
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) 
>at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) 
>at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) 
>at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) 
>at org.apache.spark.Accumulators$.add(Accumulators.scala:251) 
>at
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:598)
>  
>at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:376) 
>at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
>  
>at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) 
> 
> But the job will continue after delivering this error, appearing to finish
> the remaining tasks, until it displays: 
> 
> INFO ClusterScheduler: Remove TaskSet 18.0 from pool 
> 
> And then just stalls. 
> 
> The web UI shows a subset of tasks completed; the number of the current task
> is the number displayed on the driver around the time the error message
> displayed. I don't see any errors in the stdout or stderr on the worker
> executing that task, just on the driver. Memory usage on all workers and
> driver are well below 50%. 
> 
> Other observations: 
> - It's data size dependent. If I load ~50 GB, it finishes 20 iterations
> before stalling. If I load ~10 GB, it finishes 35. 
> - It's not due to the multiple files; I see the same error on a single large
> file. 
> - I always get the error with 30 or 60 nodes, but I don't see it when using
> 20. 
> - For a given cluster/data size, it stalls at the same point on every run. 
> 
> I was going to test all this on EC2, in case it's something specific to our
> set up (private HPC running Spark in standalone mode, 16 cores and 100 GB
> used per node). But it'd be great if anyone had ideas in the meantime. 
> 
> Thanks!
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Stalling-during-large-iterative-PySpark-jobs-tp492.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-01-14 Thread Aureliano Buendia
On Tue, Jan 14, 2014 at 5:52 PM, Christopher Nguyen  wrote:

> Aureliano, this sort of jar-hell is something we have to deal with,
> whether Spark or elsewhere. How would you propose we fix this with Spark?
>
Do you mean that Spark's own scaffolding caused you to pull in both
> Protobuf 2.4 and 2.5?
>
I simply used the newer protobuf for higher efficiency. I had no idea this
could conflict with spark.

> Or do you mean the error message should have been more helpful?
>
That error is actually a warning, and the warning doesn't even know what
went wrong, it is asking the user to check the web ui for two unrelated
points: (1) that the workers are registered and (2) that there is enough
memory:

https://github.com/apache/incubator-spark/blob/fdaabdc67387524ffb84354f87985f48bd31cf60/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L150-L156

In my case, spark has no idea that hadoop is failing. I think there above
error checking is weak. If the workers are not registered, spark must
report so. More importantly, if there is not enough memory, spark must be
able to report exactly how much memory is potentially needed, and knowing
all about the allocated resources, it should even let the user know about
the memory shortage amount.

Another major problem is the settings mess in spark.

You can set spark.executor.memory property, or you could set SPARK_MEM env
variable.
After you set these, they are not bound to java heap size, so you need to
set these up too as spark-class
does.
Then there is another parameter: SPARK_WORKER_MEMORY.

So the user has to fiddle around with many parameters to get rid of that
warning, but even with doing that, it is not clear if that set of
parameters is the optimal way of using the resources. Spark probably could
automate this as much as possible.

> Sent while mobile. Pls excuse typos etc.
> On Jan 14, 2014 9:27 AM, "Aureliano Buendia"  wrote:
>
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur > > wrote:
>>
>>> How much memory you are setting for exector JVM.
>>> This problem comes when either there is a communication problem between
>>> Master/Worker. or you do not have any memory left. Eg, you specified 75G
>>> for your executor and your machine has a memory of 70G.
>>>
>>
>> This was not a memory problem. This could be considered a spark bug.
>>
>> Here is what happened: My app was using protobuf 2.5, while spark has a
>> protobuf 2.4 dependency, and classpath was like this:
>>
>> my_app.jar:spark_assembly.jar:..
>>
>> This caused spark, (or a dependency, probably hadoop) to use protobuf
>> 2.5, giving that misleading 'ensure that workers are registered and have
>> sufficient memory' error.
>>
>> Regenerating this error is easy, just download protobuf 2.5 and put it at
>> the beginning of your classpath for any app, you should get that error.
>>
>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia >> > wrote:
>>>
 The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
 values.

 There are many issues regarding the Initial job has not accepted any
 resources... error though:

- When I put my assembly jar 
 *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
 happens.
Moving my jar after the spark-assembly it works fine.
In my case, I need to put my jar before spark-assembly, as my jar
uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
- Sometimes when this error happens the whole cluster server must
be restarted, or even run-example script wouldn't work. It took me a 
 while
to find this out, making debugging very time consuming.
- The error message is absolutely irrelevant.

 I guess the problem should be somewhere with the spark context jar
 delivery part.


 On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia >>> > wrote:

>
>
>
> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia  > wrote:
>
>> Just follow the docs at
>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
>>  how to run an application. Spark is designed so that you can simply run
>> your application *without* any scripts whatsoever, and submit your JAR to
>> the SparkContext constructor, which will distribute it. You can launch 
>> your
>> application with “scala”, “java”, or whatever tool you’d prefer.
>>
>
> I'm afraid what you said about 'simply run your application *without*
> any scripts whatsoever' does not apply to spark at the moment, and it
> simply does not work.
>
> Try the simple Pi calculation this on a standard spark-ec2 instance:
>
> java -cp
> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/

RE: squestion on using spark parallelism vs using num partitions in spark api

2014-01-14 Thread Hussam_Jarada
I am using local

Thanks,
Hussam

From: Huangguowei [mailto:huangguo...@huawei.com]
Sent: Tuesday, January 14, 2014 4:43 AM
To: user@spark.incubator.apache.org
Subject: 答复: squestion on using spark parallelism vs using num partitions in 
spark api

“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”

Local or standalone(single node) ?

发件人: leosand...@gmail.com 
[mailto:leosand...@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark 
api

I think the parallelism param just control how many tasks could be run together 
in each work.
it could't control how many tasks should be split .


leosand...@gmail.com

From: hussam_jar...@dell.com
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org
Subject: squestion on using spark parallelism vs using num partitions in spark 
api
Hi,

Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node

It’s looks like upon setting spark parallelism using 
System.setProperty("spark.default.parallelism", 24) before creating my spark 
context as described in 
http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism 
has no effect on the default number of partitions that spark uses in its api’s 
like saveAsTextFile() .

For example if I set spark.default.parallelism to 24, I was expecting 24 tasks 
to be invoked upon calling saveAsTextFile() but it’s not the case as I am 
seeing only 1 task get invoked

If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);

I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24

Can someone explain the above behavior?

Thanks,
Hussam


Re: Controlling hadoop block size

2014-01-14 Thread Aureliano Buendia
On Tue, Jan 14, 2014 at 5:00 PM, Archit Thakur wrote:

> Hadoop block size decreased, do you mean HDFS block size? That is not
> possible.
>

Sorry for terminology mix up. In my question 'hadoop block size' should
probably be replaced by 'RDD partitions number'.

I'm getting a large number of small files (named part-*), and I'd like to
get a smaller number of larger files.

I used something like:

val rdd1 = sc.parallelize(Range(0, N, 1)) // N ~ 1e3
val rdd2 = rdd1.cartesian(rdd1)

Is the number of part-* files determined by rdd2.partitions.length?

Is there a way to keep the size of each part-* file a constant (eg 64 MB)
regardless of other parameters, including number of available cores and
scheduled tasks?


> Block size of HDFS is never affected by your spark jobs.
>
> "For a big number of tasks, I get a very high number of 1 MB files
> generated by saveAsSequenceFile()."
>
> What do you mean by "big number of tasks"
>
> No. of files generated by saveAsSequenceFile() increases if your
> partitions of RDD are increased.
>
> Are you using your custom RDD? If Yes, you would have overridden the
> method getPartitions - Check that.
> If not, you might have used an operation where you specify your
> partitioner or no. of output partitions, eg. groupByKey() - Check that.
>
> "How is it possible to control the block size by spark?" Do you mean "How
> is it possible to control the output partitions of an RDD?"
>
>
> On Tue, Jan 14, 2014 at 7:59 AM, Aureliano Buendia 
> wrote:
>
>> Hi,
>>
>> Does the output hadoop block size depend on spark tasks number?
>>
>> In my application, when the number of tasks increases, hadoop block size
>> decreases. For a big number of tasks, I get a very high number of 1 MB
>> files generated by saveAsSequenceFile().
>>
>> How is it possible to control the block size by spark?
>>
>
>


Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-01-14 Thread Christopher Nguyen
Aureliano, this sort of jar-hell is something we have to deal with, whether
Spark or elsewhere. How would you propose we fix this with Spark? Do you
mean that Spark's own scaffolding caused you to pull in both Protobuf 2.4
and 2.5? Or do you mean the error message should have been more helpful?

Sent while mobile. Pls excuse typos etc.
On Jan 14, 2014 9:27 AM, "Aureliano Buendia"  wrote:

>
>
>
> On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur 
> wrote:
>
>> How much memory you are setting for exector JVM.
>> This problem comes when either there is a communication problem between
>> Master/Worker. or you do not have any memory left. Eg, you specified 75G
>> for your executor and your machine has a memory of 70G.
>>
>
> This was not a memory problem. This could be considered a spark bug.
>
> Here is what happened: My app was using protobuf 2.5, while spark has a
> protobuf 2.4 dependency, and classpath was like this:
>
> my_app.jar:spark_assembly.jar:..
>
> This caused spark, (or a dependency, probably hadoop) to use protobuf 2.5,
> giving that misleading 'ensure that workers are registered and have
> sufficient memory' error.
>
> Regenerating this error is easy, just download protobuf 2.5 and put it at
> the beginning of your classpath for any app, you should get that error.
>
>
>>
>>
>> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia 
>> wrote:
>>
>>> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
>>> values.
>>>
>>> There are many issues regarding the Initial job has not accepted any
>>> resources... error though:
>>>
>>>- When I put my assembly jar 
>>> *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
>>> happens.
>>>Moving my jar after the spark-assembly it works fine.
>>>In my case, I need to put my jar before spark-assembly, as my jar
>>>uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>>>- Sometimes when this error happens the whole cluster server must be
>>>restarted, or even run-example script wouldn't work. It took me a while 
>>> to
>>>find this out, making debugging very time consuming.
>>>- The error message is absolutely irrelevant.
>>>
>>> I guess the problem should be somewhere with the spark context jar
>>> delivery part.
>>>
>>>
>>> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia 
>>> wrote:
>>>



 On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia 
 wrote:

> Just follow the docs at
> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
>  how to run an application. Spark is designed so that you can simply run
> your application *without* any scripts whatsoever, and submit your JAR to
> the SparkContext constructor, which will distribute it. You can launch 
> your
> application with “scala”, “java”, or whatever tool you’d prefer.
>

 I'm afraid what you said about 'simply run your application *without*
 any scripts whatsoever' does not apply to spark at the moment, and it
 simply does not work.

 Try the simple Pi calculation this on a standard spark-ec2 instance:

 java -cp
 /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
 org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`

 And you'll get the error:

 WARN cluster.ClusterScheduler: Initial job has not accepted any
 resources; check your cluster UI to ensure that workers are registered and
 have sufficient memory

 While the script way works:

 spark/run-example org.apache.spark.examples.SparkPi `cat
 spark-ec2/cluster-url`

 What am I missing in the above java command?


>
> Matei
>
> On Jan 8, 2014, at 8:26 PM, Aureliano Buendia 
> wrote:
>
>
>
>
> On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia  > wrote:
>
>> Oh, you shouldn’t use spark-class for your own classes. Just build
>> your job separately and submit it by running it with “java” and creating 
>> a
>> SparkContext in it. spark-class is designed to run classes internal to 
>> the
>> Spark project.
>>
>
> Really? Apparently Eugen runs his jobs by:
>
>
> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
>
> , as he instructed me 
> hereto
>  do this.
>
> I have to say while spark documentation is not sparse, it does not
> address enough, and as you can see the community is confused.
>
> Are the spark users supposed to create something like run-example for
> their own jobs?
>
>
>>
>> Matei
>>
>> On Jan 8, 2014, at 8:06 PM, Aureliano Buendia 
>> wrote:
>>
>>
>>
>>
>> On Thu, Jan 9, 2014 at 3:59 AM, Mat

Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-01-14 Thread Aureliano Buendia
On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur wrote:

> How much memory you are setting for exector JVM.
> This problem comes when either there is a communication problem between
> Master/Worker. or you do not have any memory left. Eg, you specified 75G
> for your executor and your machine has a memory of 70G.
>

This was not a memory problem. This could be considered a spark bug.

Here is what happened: My app was using protobuf 2.5, while spark has a
protobuf 2.4 dependency, and classpath was like this:

my_app.jar:spark_assembly.jar:..

This caused spark, (or a dependency, probably hadoop) to use protobuf 2.5,
giving that misleading 'ensure that workers are registered and have
sufficient memory' error.

Regenerating this error is easy, just download protobuf 2.5 and put it at
the beginning of your classpath for any app, you should get that error.


>
>
> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia 
> wrote:
>
>> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
>> values.
>>
>> There are many issues regarding the Initial job has not accepted any
>> resources... error though:
>>
>>- When I put my assembly jar 
>> *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
>> happens.
>>Moving my jar after the spark-assembly it works fine.
>>In my case, I need to put my jar before spark-assembly, as my jar
>>uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>>- Sometimes when this error happens the whole cluster server must be
>>restarted, or even run-example script wouldn't work. It took me a while to
>>find this out, making debugging very time consuming.
>>- The error message is absolutely irrelevant.
>>
>> I guess the problem should be somewhere with the spark context jar
>> delivery part.
>>
>>
>> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia 
>> wrote:
>>
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia 
>>> wrote:
>>>
 Just follow the docs at
 http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
  how to run an application. Spark is designed so that you can simply run
 your application *without* any scripts whatsoever, and submit your JAR to
 the SparkContext constructor, which will distribute it. You can launch your
 application with “scala”, “java”, or whatever tool you’d prefer.

>>>
>>> I'm afraid what you said about 'simply run your application *without*
>>> any scripts whatsoever' does not apply to spark at the moment, and it
>>> simply does not work.
>>>
>>> Try the simple Pi calculation this on a standard spark-ec2 instance:
>>>
>>> java -cp
>>> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
>>> org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`
>>>
>>> And you'll get the error:
>>>
>>> WARN cluster.ClusterScheduler: Initial job has not accepted any
>>> resources; check your cluster UI to ensure that workers are registered and
>>> have sufficient memory
>>>
>>> While the script way works:
>>>
>>> spark/run-example org.apache.spark.examples.SparkPi `cat
>>> spark-ec2/cluster-url`
>>>
>>> What am I missing in the above java command?
>>>
>>>

 Matei

 On Jan 8, 2014, at 8:26 PM, Aureliano Buendia 
 wrote:




 On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia 
 wrote:

> Oh, you shouldn’t use spark-class for your own classes. Just build
> your job separately and submit it by running it with “java” and creating a
> SparkContext in it. spark-class is designed to run classes internal to the
> Spark project.
>

 Really? Apparently Eugen runs his jobs by:

 $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob

 , as he instructed me 
 hereto
  do this.

 I have to say while spark documentation is not sparse, it does not
 address enough, and as you can see the community is confused.

 Are the spark users supposed to create something like run-example for
 their own jobs?


>
> Matei
>
> On Jan 8, 2014, at 8:06 PM, Aureliano Buendia 
> wrote:
>
>
>
>
> On Thu, Jan 9, 2014 at 3:59 AM, Matei Zaharia  > wrote:
>
>> Have you looked at the cluster UI, and do you see any workers
>> registered there, and your application under running applications? Maybe
>> you typed in the wrong master URL or something like that.
>>
>
> No, it's automated: cat spark-ec2/cluster-url
>
> I think the problem might be caused by spark-class script. It seems to
> assign too much memory.
>
> I forgot the fact that run-example doesn't use spark-class.
>
>
>>
>> Matei
>>
>> On Jan 8, 2014, at 7:07 PM, Aureliano

Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-01-14 Thread Archit Thakur
How much memory you are setting for exector JVM.
This problem comes when either there is a communication problem between
Master/Worker. or you do not have any memory left. Eg, you specified 75G
for your executor and your machine has a memory of 70G.


On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia wrote:

> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
> values.
>
> There are many issues regarding the Initial job has not accepted any
> resources... error though:
>
>- When I put my assembly jar 
> *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar, this error 
> happens.
>Moving my jar after the spark-assembly it works fine.
>In my case, I need to put my jar before spark-assembly, as my jar uses
>protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>- Sometimes when this error happens the whole cluster server must be
>restarted, or even run-example script wouldn't work. It took me a while to
>find this out, making debugging very time consuming.
>- The error message is absolutely irrelevant.
>
> I guess the problem should be somewhere with the spark context jar
> delivery part.
>
>
> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia wrote:
>
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia wrote:
>>
>>> Just follow the docs at
>>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
>>>  how to run an application. Spark is designed so that you can simply run
>>> your application *without* any scripts whatsoever, and submit your JAR to
>>> the SparkContext constructor, which will distribute it. You can launch your
>>> application with “scala”, “java”, or whatever tool you’d prefer.
>>>
>>
>> I'm afraid what you said about 'simply run your application *without* any
>> scripts whatsoever' does not apply to spark at the moment, and it simply
>> does not work.
>>
>> Try the simple Pi calculation this on a standard spark-ec2 instance:
>>
>> java -cp
>> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
>> org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`
>>
>> And you'll get the error:
>>
>> WARN cluster.ClusterScheduler: Initial job has not accepted any
>> resources; check your cluster UI to ensure that workers are registered and
>> have sufficient memory
>>
>> While the script way works:
>>
>> spark/run-example org.apache.spark.examples.SparkPi `cat
>> spark-ec2/cluster-url`
>>
>> What am I missing in the above java command?
>>
>>
>>>
>>> Matei
>>>
>>> On Jan 8, 2014, at 8:26 PM, Aureliano Buendia 
>>> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia 
>>> wrote:
>>>
 Oh, you shouldn’t use spark-class for your own classes. Just build your
 job separately and submit it by running it with “java” and creating a
 SparkContext in it. spark-class is designed to run classes internal to the
 Spark project.

>>>
>>> Really? Apparently Eugen runs his jobs by:
>>>
>>> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
>>>
>>> , as he instructed me 
>>> hereto
>>>  do this.
>>>
>>> I have to say while spark documentation is not sparse, it does not
>>> address enough, and as you can see the community is confused.
>>>
>>> Are the spark users supposed to create something like run-example for
>>> their own jobs?
>>>
>>>

 Matei

 On Jan 8, 2014, at 8:06 PM, Aureliano Buendia 
 wrote:




 On Thu, Jan 9, 2014 at 3:59 AM, Matei Zaharia 
 wrote:

> Have you looked at the cluster UI, and do you see any workers
> registered there, and your application under running applications? Maybe
> you typed in the wrong master URL or something like that.
>

 No, it's automated: cat spark-ec2/cluster-url

 I think the problem might be caused by spark-class script. It seems to
 assign too much memory.

 I forgot the fact that run-example doesn't use spark-class.


>
> Matei
>
> On Jan 8, 2014, at 7:07 PM, Aureliano Buendia 
> wrote:
>
> The strange thing is that spark examples work fine, but when I include
> a spark example in my jar and deploy it, I get this error for the very 
> same
> example:
>
> WARN ClusterScheduler: Initial job has not accepted any resources;
> check your cluster UI to ensure that workers are registered and have
> sufficient memory
>
> My jar is deployed to master and then to workers by
> spark-ec2/copy-dir. Why would including the example in my jar cause this
> error?
>
>
>
> On Thu, Jan 9, 2014 at 12:41 AM, Aureliano Buendia <
> buendia...@gmail.com> wrote:
>
>> Could someone explain how SPARK_MEM, SPARK_WORKER_MEMORY and
>> spark.executor.me

Re: Getting java.netUnknownHostException

2014-01-14 Thread Archit Thakur
Try running ./bin/start-slave.sh 1 spark://A-IP:PORT.

Thx, Archit_Thakur.


On Sat, Jan 11, 2014 at 7:18 AM, Khanderao kand wrote:

> For "java.netUnknownHostException" Did you check something basic that you
> are able to connect to A from B? and checked /etc/hosts?
>
>
> On Fri, Jan 10, 2014 at 7:58 AM, Mark Hamstra wrote:
>
>> Which Spark version?  The 'run' script no longer exists in current Spark.
>>
>>
>> > On Jan 10, 2014, at 4:57 AM, Rishi  wrote:
>> >
>> > I am trying to run standalone mode of spark.
>> >
>> > I have 2 machines, A and B.
>> > I run master on machine A by running this command : ./run
>> > spark.deploy.master.Master
>> >
>> > I tried to run a worker on machine B by this command : ./run
>> > spark.deploy.worker.Worker spark://:7077
>> >
>> > But i am getting this error :
>> >
>> > java.netUnknownHostException System-erro
>> >
>> > How can I resolve this ?
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-java-netUnknownHostException-tp439.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>


Re: Controlling hadoop block size

2014-01-14 Thread Archit Thakur
Hadoop block size decreased, do you mean HDFS block size? That is not
possible.

Block size of HDFS is never affected by your spark jobs.

"For a big number of tasks, I get a very high number of 1 MB files
generated by saveAsSequenceFile()."

What do you mean by "big number of tasks"

No. of files generated by saveAsSequenceFile() increases if your partitions
of RDD are increased.

Are you using your custom RDD? If Yes, you would have overridden the method
getPartitions - Check that.
If not, you might have used an operation where you specify your partitioner
or no. of output partitions, eg. groupByKey() - Check that.

"How is it possible to control the block size by spark?" Do you mean "How
is it possible to control the output partitions of an RDD?"


On Tue, Jan 14, 2014 at 7:59 AM, Aureliano Buendia wrote:

> Hi,
>
> Does the output hadoop block size depend on spark tasks number?
>
> In my application, when the number of tasks increases, hadoop block size
> decreases. For a big number of tasks, I get a very high number of 1 MB
> files generated by saveAsSequenceFile().
>
> How is it possible to control the block size by spark?
>


Re: Akka error kills workers in standalone mode

2014-01-14 Thread Archit Thakur
You are getting a NullPointerException because of which it gets failed. It
runs at local means you are ignoring a fact that many of the classes wont
be initialized on the worker executor node
when you might have initialized them in your master executor JVM.
To check = Does your code works when you give master as local[n] instead of
local.





On Tue, Jan 14, 2014 at 7:39 PM, vuakko  wrote:

> Spark fails to run practically any standalone mode jobs sent to it. The
> local
> mode works and spark-shell works even in standalone, but sending any other
> jobs manually fails with worker posting the following error:
>
> 2014-01-14 15:47:05,073 [sparkWorker-akka.actor.default-dispatcher-5] INFO
> org.apache.spark.deploy.worker.Worker - Connecting to master
> spark://niko-VirtualBox:7077...
> 2014-01-14 15:47:05,715 [sparkWorker-akka.actor.default-dispatcher-2] INFO
> org.apache.spark.deploy.worker.Worker - Successfully registered with master
> spark://niko-VirtualBox:7077
> 2014-01-14 15:47:23,408 [sparkWorker-akka.actor.default-dispatcher-14] INFO
> org.apache.spark.deploy.worker.Worker - Asked to launch executor
> app-20140114154723-/0 for Spark test
> 2014-01-14 15:47:23,431 [sparkWorker-akka.actor.default-dispatcher-14]
> ERROR
> akka.actor.OneForOneStrategy -
> java.lang.NullPointerException
> at java.io.File.(File.java:251)
> at
>
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:213)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO
> org.apache.spark.deploy.worker.Worker - Starting Spark worker
> niko-VirtualBox.local:33576 with 1 cores, 6.8 GB RAM
> 2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO
> org.apache.spark.deploy.worker.Worker - Spark home:
> /home/niko/local/incubator-spark
> 2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO
> org.apache.spark.deploy.worker.ui.WorkerWebUI - Started Worker web UI at
> http://niko-VirtualBox.local:8081
> 2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO
> org.apache.spark.deploy.worker.Worker - Connecting to master
> spark://niko-VirtualBox:7077...
> 2014-01-14 15:47:23,528 [sparkWorker-akka.actor.default-dispatcher-3] INFO
> org.apache.spark.deploy.worker.Worker - Successfully registered with master
> spark://niko-VirtualBox:7077
>
>
> Master spits out the following logs at the same time:
>
>
> 2014-01-14 15:47:05,683 [sparkMaster-akka.actor.default-dispatcher-4] INFO
> org.apache.spark.deploy.master.Master - Registering worker
> niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
> 2014-01-14 15:47:23,090 [sparkMaster-akka.actor.default-dispatcher-15] INFO
> org.apache.spark.deploy.master.Master - Registering app Spark test
> 2014-01-14 15:47:23,102 [sparkMaster-akka.actor.default-dispatcher-15] INFO
> org.apache.spark.deploy.master.Master - Registered app Spark test with ID
> app-20140114154723-
> 2014-01-14 15:47:23,216 [sparkMaster-akka.actor.default-dispatcher-15] INFO
> org.apache.spark.deploy.master.Master - Launching executor
> app-20140114154723-/0 on worker
> worker-20140114154704-niko-VirtualBox.local-33576
> 2014-01-14 15:47:23,523 [sparkMaster-akka.actor.default-dispatcher-15] INFO
> org.apache.spark.deploy.master.Master - Registering worker
> niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
> 2014-01-14 15:47:23,525 [sparkMaster-akka.actor.default-dispatcher-15] INFO
> org.apache.spark.deploy.master.Master - Attempted to re-register worker at
> same address: akka.tcp://sparkWorker@niko-VirtualBox.local:33576
> 2014-01-14 15:47:23,535 [sparkMaster-akka.actor.default-dispatcher-14] WARN
> org.apache.spark.deploy.master.Master - Got heartbeat from unregistered
> worker worker-20140114154723-niko-VirtualBox.local-33576
> ...
>
> Soon after this the master decides that the worker is dead, disassociates
> it
> and marks it DEAD in the web UI. The worker process however is still alive
> and still thinks that it's connected to master (as shown by the log).
>
> I'm launching the job with the following command (last argument is the
> master, replacing local there makes things run ok):
> java -cp
>
> ./target/classes:/etc/hadoop/conf:$SPARK_HOME/conf:$SPARK_HOME/assembly/tar

Re: yarn SPARK_CLASSPATH

2014-01-14 Thread Tom Graves
The right way to setup yarn/hadoop is tricky as its really very dependent upon 
your usage of it. 
 
Since HBase is a hadoop service you might just add it to your hadoop config 
yarn.application.classpath and have it on the classpath for all 
users/applications of that grid.  In this way you are treating it like how it 
picks up the HDFS jars.  Not sure if you have control over that though?  The 
risk of doing this is if there are dependency conflicts or versioning issues.  
If you have other applications like mapReduce that use hbase then it might make 
sense to do this.

The other option is to modify the spark on yarn code to have it add it to the 
classpath for you.  Either just add whatever is in the SPARK_CLASSPATH to it or 
have a separate variable.  Even though we wouldn't use it I can see it as being 
useful for some installations. 

Tom



On Monday, January 13, 2014 5:58 PM, Eric K  wrote:
 
Thanks for that extra insight about yarn.  I am new to the whole yarn 
eco-system so i've been having trouble figuring out the right way to do some 
things.  Sounds like even though the jars are already installed as part of our 
cluster on all the nodes, i should just go ahead and add them with the --files 
methods to simplify things and avoid having them added for all applications.

Thanks




On Mon, Jan 13, 2014 at 3:01 PM, Tom Graves  wrote:

I'm assuming you actually installed the jar on all the yarn clusters then?
>
>
>In general this isn't a good idea on yarn as most users don't have permissions 
>to install things on the nodes themselves.  The idea is Yarn provides a 
>certain set of jars which really should be just the yarn/hadoop framework,  it 
>adds those to your classpath and the user provides everything else application 
>specific when they submit their application and those get distributed with the 
>app and added to the classpath.   If you are worried about it being downloaded 
>everytime, you can use the public distributed cache on yarn as a way to 
>distribute it and share it.  It will only be removed from that nodes 
>distributed cache if other applications need that space.
>
>
>That said what yarn adds to the classpath is configurable via the hadoop 
>configuration file yarn-site.xml, config name: yarn.application.classpath.  So 
>you can change the config to add it, but it will be added for all types of 
>applications. 
>
>
>You can use the --files and --archives options in yarn-standalone mode to use 
>the distributed cache.  To make it public, make sure permissions on the file 
>are set appropriately.
>
>
>Tom
>
>
>
>On Monday, January 13, 2014 3:49 PM, Eric Kimbrel  wrote:
> 
>Is there any extra trick required to use jars on the SPARK_CLASSPATH when 
>running spark on yarn?
>
>I have several jars added to the SPARK_CLASSPATH in spark_env.sh   When my job 
>runs i print the SPARK_CLASSPATH so i can
 see that the jars were added to the environment that the app master is running 
in, however even though the jars are on the class path I continue to get class 
not found errors.
>
>I have also tried setting SPARK_CLASSPATH via SPARK_YARN_USER_ENV
>
>

Akka error kills workers in standalone mode

2014-01-14 Thread vuakko
Spark fails to run practically any standalone mode jobs sent to it. The local
mode works and spark-shell works even in standalone, but sending any other
jobs manually fails with worker posting the following error:

2014-01-14 15:47:05,073 [sparkWorker-akka.actor.default-dispatcher-5] INFO 
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:05,715 [sparkWorker-akka.actor.default-dispatcher-2] INFO 
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077
2014-01-14 15:47:23,408 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Asked to launch executor
app-20140114154723-/0 for Spark test
2014-01-14 15:47:23,431 [sparkWorker-akka.actor.default-dispatcher-14] ERROR
akka.actor.OneForOneStrategy - 
java.lang.NullPointerException
at java.io.File.(File.java:251)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:213)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Starting Spark worker
niko-VirtualBox.local:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,514 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Spark home:
/home/niko/local/incubator-spark
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.ui.WorkerWebUI - Started Worker web UI at
http://niko-VirtualBox.local:8081
2014-01-14 15:47:23,517 [sparkWorker-akka.actor.default-dispatcher-14] INFO 
org.apache.spark.deploy.worker.Worker - Connecting to master
spark://niko-VirtualBox:7077...
2014-01-14 15:47:23,528 [sparkWorker-akka.actor.default-dispatcher-3] INFO 
org.apache.spark.deploy.worker.Worker - Successfully registered with master
spark://niko-VirtualBox:7077


Master spits out the following logs at the same time:


2014-01-14 15:47:05,683 [sparkMaster-akka.actor.default-dispatcher-4] INFO 
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,090 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registering app Spark test
2014-01-14 15:47:23,102 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registered app Spark test with ID
app-20140114154723-
2014-01-14 15:47:23,216 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Launching executor
app-20140114154723-/0 on worker
worker-20140114154704-niko-VirtualBox.local-33576
2014-01-14 15:47:23,523 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Registering worker
niko-VirtualBox:33576 with 1 cores, 6.8 GB RAM
2014-01-14 15:47:23,525 [sparkMaster-akka.actor.default-dispatcher-15] INFO 
org.apache.spark.deploy.master.Master - Attempted to re-register worker at
same address: akka.tcp://sparkWorker@niko-VirtualBox.local:33576
2014-01-14 15:47:23,535 [sparkMaster-akka.actor.default-dispatcher-14] WARN 
org.apache.spark.deploy.master.Master - Got heartbeat from unregistered
worker worker-20140114154723-niko-VirtualBox.local-33576
...

Soon after this the master decides that the worker is dead, disassociates it
and marks it DEAD in the web UI. The worker process however is still alive
and still thinks that it's connected to master (as shown by the log).

I'm launching the job with the following command (last argument is the
master, replacing local there makes things run ok):
java -cp
./target/classes:/etc/hadoop/conf:$SPARK_HOME/conf:$SPARK_HOME/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.0.0-mr1-cdh4.5.0.jar
SparkTest spark://niko-VirtualBox:7077

Relevant versions are:
Spark: current git HEAD fa75e5e1c50da7d1e6c6f41c2d6d591c1e8a025f
Hadoop: 2.0.0-mr1-cdh4.5.0
Scala: 2.10.3





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Akka-error-kills-workers-in-standalone-mode-tp537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Shark runtime error

2014-01-14 Thread Kishore kumar
Installed the spark and scala to run the shark with the help of this
document https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster
when i run shark the error which iam getting is



-- [root@localhost bin]# shark

Starting the Shark Command Line Client
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.
Logging initialized using configuration in
jar:file:/root/shark-0.8.0-bin-cdh4/hive-0.9.0-shark-0.8.0-bin/lib/hive-common-0.9.0-shark-0.8.0.jar!/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201401140937_1343032668.txt
56.095: [GC 262656K->21306K(1005568K), 1.5447170 secs]
shark> Exception in thread "Thread-2" org.apache.spark.SparkException:
Error stopping standalone scheduler's driver actor
at
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.stop(StandaloneSchedulerBackend.scala:174)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:61)
at
org.apache.spark.scheduler.cluster.ClusterScheduler.stop(ClusterScheduler.scala:309)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:856)
at org.apache.spark.SparkContext.stop(SparkContext.scala:676)
at shark.SharkEnv$.stop(SharkEnv.scala:134)
at shark.SharkCliDriver$$anon$1.run(SharkCliDriver.scala:109)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[1] milliseconds
at akka.dispatch.DefaultPromise.ready(Future.scala:870)
at akka.dispatch.DefaultPromise.result(Future.scala:874)
at akka.dispatch.Await$.result(Future.scala:74)
at
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend.stop(StandaloneSchedulerBackend.scala:170)
... 6 more
any help..

Please check the attachment for shark-withinfo output


-- 

*Kishore Kumar*
ITIM


console.output
Description: Binary data


答复: squestion on using spark parallelism vs using num partitions in spark api

2014-01-14 Thread Huangguowei
“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”

Local or standalone(single node) ?

发件人: leosand...@gmail.com [mailto:leosand...@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark 
api

I think the parallelism param just control how many tasks could be run together 
in each work.
it could't control how many tasks should be split .


leosand...@gmail.com

From: hussam_jar...@dell.com
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org
Subject: squestion on using spark parallelism vs using num partitions in spark 
api
Hi,

Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node

It’s looks like upon setting spark parallelism using 
System.setProperty("spark.default.parallelism", 24) before creating my spark 
context as described in 
http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism 
has no effect on the default number of partitions that spark uses in its api’s 
like saveAsTextFile() .

For example if I set spark.default.parallelism to 24, I was expecting 24 tasks 
to be invoked upon calling saveAsTextFile() but it’s not the case as I am 
seeing only 1 task get invoked

If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);

I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24

Can someone explain the above behavior?

Thanks,
Hussam


Re: Running Spark on Mesos

2014-01-14 Thread deric
I've deleted whole /tmp/mesos on each slave, but it didn't help (this one
was running on mesos 0.15.0).  I've tried different mesos versions (0.14,
0.15, 0.16-rc1, 0.16-rc2). Now spark is compiled with mesos-0.15.0.jar, but
it doesn't seem to have any impact on this.

java.lang.NullPointerException
at 
com.typesafe.config.impl.Parseable$ParseableResources.rawParseValue(Parseable.java:509)
at 
com.typesafe.config.impl.Parseable$ParseableResources.rawParseValue(Parseable.java:492)
at com.typesafe.config.impl.Parseable.parseValue(Parseable.java:171)
at com.typesafe.config.impl.Parseable.parseValue(Parseable.java:165)
at com.typesafe.config.impl.Parseable.parse(Parseable.java:204)
at 
com.typesafe.config.ConfigFactory.parseResources(ConfigFactory.java:760)
at 
com.typesafe.config.ConfigFactory.parseResources(ConfigFactory.java:769)
at org.apache.spark.SparkConf.(SparkConf.scala:37)
at org.apache.spark.executor.Executor.(Executor.scala:60)
at 
org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:58)
Exception in thread "Thread-0"

stdout:

Running spark-executor with framework dir = /usr/share/spark
14/01/14 09:54:46 ERROR MesosExecutorBackend: Received launchTask but
executor was null




On 14 January 2014 03:23, Benjamin Hindman [via Apache Spark User List] <
ml-node+s1001560n528...@n3.nabble.com> wrote:

> You should be able to use 0.16.0-rc2, but I recommend using 0.16.0-rc3
> since it fixes a bug with the webui (but not related to this).
>
> Did you try restarting your slaves after deleting the meta-directory? Kill
> the slave, delete the meta-directory (by default at /tmp/mesos/meta unless
> you passed --work_dir to the slave) and then restart the slave. If things
> don't work after that please let us know!
>
> Ben.
>
>
> On Mon, Jan 13, 2014 at 4:06 PM, deric <[hidden 
> email]
> > wrote:
>
>> Hi Ben,
>>
>> is it possible that I've checked out that buggy version from 0.16.0-rc2
>> branch? Before that I was running on 0.16.0~c0a3fcf (some version from
>> November). Which Mesos version would you recommend for running Spark?
>> Spark's pom.xml says 0.13.0, which is quite old.
>>
>> Thanks,
>> Tomas
>>
>>
>> On 13 January 2014 22:14, Benjamin Hindman [via Apache Spark User List] 
>> <[hidden
>> email] > wrote:
>>
>>> What version of Mesos are you using?
>>>
>>> We tagged a release-candidate of Mesos that had a bug when used with
>>> frameworks that were using older JARs (like Spark). The manifestation of
>>> the bug was some protocol buffers didn't parse, such as SlaveInfo,
>>> resulting in a NullPointerException.
>>>
>>> Until Spark gets a new JAR (and assuming you're not using the buggy
>>> release candidate of Mesos) then you can fix this problem by restarting
>>> your slave after removing it's meta-directory.
>>>
>>> Please share the version (in fact, versions of Mesos that you've
>>> upgraded over time would be great). And let us know how it goes!
>>>
>>> Ben.
>>>
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 10:19 AM, deric <[hidden 
>>> email]
>>> > wrote:
>>>
 I've updated to the newest trunk version, and still all tasks are
 getting
 lost:

 java.lang.NullPointerException
 at

 com.typesafe.config.impl.Parseable$ParseableResources.rawParseValue(Parseable.java:509)
 at

 com.typesafe.config.impl.Parseable$ParseableResources.rawParseValue(Parseable.java:492)
 at
 com.typesafe.config.impl.Parseable.parseValue(Parseable.java:171)
 at
 com.typesafe.config.impl.Parseable.parseValue(Parseable.java:165)
 at com.typesafe.config.impl.Parseable.parse(Parseable.java:204)
 at
 com.typesafe.config.ConfigFactory.parseResources(ConfigFactory.java:760)
 at
 com.typesafe.config.ConfigFactory.parseResources(ConfigFactory.java:769)
 at org.apache.spark.SparkConf.(SparkConf.scala:37)
 at org.apache.spark.executor.Executor.(Executor.scala:60)
 at

 org.apache.spark.executor.MesosExecutorBackend.registered(MesosExecutorBackend.scala:58)
 Exception in thread "Thread-0"



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Mesos-tp503p505.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

>>>
>>>
>>>
>>> --
>>>  If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-Mesos-tp503p516.html
>>>  To unsubscribe from Running Spark on Mesos, click here.
>>> NAML