Print in JavaNetworkWordCount

2014-01-20 Thread Eduardo Costa Alfaia
Hi guys, Somebody help me, Where do I get change the print() function to print more than 10 lines in screen? Is there a manner to print the count total of all words in a batch? Best Regards -- --- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente

TorrentBroadcast + persist = bug

2014-01-20 Thread Milos Nikolic
Hello, I think there is a bug with TorrentBroadcast in the latest release (0.8.1). The problem is that even a simple job (e.g., rdd.count) hangs waiting for some tasks to finish. Here is how to reproduce the problem: 1) Configure Spark such that node X is the master and also one of the workers

ExternalAppendOnlyMap throw no such element

2014-01-20 Thread guojc
Hi, I'm tring out lastest master branch of spark for the exciting external hashmap feature. I have a code that is running correctly at spark 0.8.1 and I only make a change for its easily to be spilled to disk. However, I encounter a few task failure of java.util.NoSuchElementException

Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: scala.None$ error when mysql-async is add in build.sbt

2014-01-20 Thread Richard Siebeling
My application is failing with an Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: scala.None$ error when the mysql-async library ( https://github.com/mauricio/postgresql-async) is added to build.sbt. I've add the following line to build.sbt com.github.mauricio

Re: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: scala.None$ error when mysql-async is add in build.sbt

2014-01-20 Thread Richard Siebeling
Solved, the mysql-async required scala 2.10.3 and I was compiling was version 2.10.2 On Mon, Jan 20, 2014 at 1:29 PM, Richard Siebeling rsiebel...@gmail.comwrote: My application is failing with an Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: scala.None$

Re: cannot run sbt/sbt assembly

2014-01-20 Thread Nicolas Seyvet
Use scala 2.9.2.  From what I read 2.9.3 is not supported. You might want to try a later version of the JDK 7.0_51 On Friday, January 17, 2014 1:07 PM, Kal El pinu.datri...@yahoo.com wrote: Hello, I have tried to assemble spark (sbt/sbt assembly) with different versions of java (open JDK,

Re: Print in JavaNetworkWordCount

2014-01-20 Thread Tathagata Das
Hi Eduardo, You can do arbitrary stuff with the data in a DStream using the operation foreachRDD. yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch

Re: Spark writing to disk when there's enough memory?!

2014-01-20 Thread mharwida
Hi, I've experimented with the parameters provided but we are still seeing the same problem, data is still spilling to disk when there's clearly enough memory on the worker nodes. Please note that data is distributed equally amongst the 6 Hadoop nodes (About 5GB per node). Any workarounds or

Spark Master on Hadoop Job Tracker?

2014-01-20 Thread mharwida
Hi, Should the Spark Master run on the Hadoop Job Tracker node (and Spark workers on Task Trackers) or the placement of the Spark Master could reside on any Hadoop node? Thanks Majd -- View this message in context:

RE: SparkException: Expect only DirectTaskResults when using localScheduler()

2014-01-20 Thread Hussam_Jarada
Thank u Patrick. -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Friday, January 17, 2014 11:54 PM To: user@spark.incubator.apache.org Subject: Re: SparkException: Expect only DirectTaskResults when using localScheduler() This is a bug that was fixed and will

Re: Quality of documentation (rant)

2014-01-20 Thread Matei Zaharia
Hi Ognen, It’s true that the documentation is partly targeting Hadoop users, and that’s something we need to fix. Perhaps the best solution would be some kind of tutorial on “here’s how to set up Spark by hand on EC2”. However it also sounds like you ran into some issues with S3 that it would

SPARK protocol buffer issue. Need Help

2014-01-20 Thread suman bharadwaj
*Hi,* *I'm new to spark. And I was trying to read a file residing in HDFS. And perform some basic actions on this dataset. See below the code i used:* *object Hbase {* * def main(args: Array[String]) {* *val sc = new SparkContext(spark://servername:portno,somename)* * val input =

Re: Quality of documentation (rant)

2014-01-20 Thread Ognen Duzlevski
Hi Matei, thanks for replying! On Mon, Jan 20, 2014 at 8:08 PM, Matei Zaharia matei.zaha...@gmail.comwrote: It’s true that the documentation is partly targeting Hadoop users, and that’s something we need to fix. Perhaps the best solution would be some kind of tutorial on “here’s how to set up

Re: SPARK protocol buffer issue. Need Help

2014-01-20 Thread Sean Owen
Every time I see the magic words... InvalidProtocolBufferException: Message missing required fields: callId, status; ... it indicates that a client of something is using protobuf 2.4 and the server is using protobuf 2.5. Here you are using protobuf 2.4, check. And I suppose you are using HDFS

Re: Lzo + Protobuf

2014-01-20 Thread Vipul Pandey
Any suggestions, anyone? Core team / contributors / spark-developers - any thoughts? On Jan 17, 2014, at 4:45 PM, Vipul Pandey vipan...@gmail.com wrote: Hi All, Can someone please share (sample) code to read lzo compressed protobufs from hdfs (using elephant bird)? I'm trying whatever I

Gathering exception stack trace

2014-01-20 Thread Mingyu Kim
Hi all, I¹m having hard time trying to find out ways to report exception that happens during computation to the end-user of Spark system without having them ssh into the worker nodes or accessing Spark UI. For example, if some exception happens in the code that runs on worker nodes (e.g.

Re: SPARK protocol buffer issue. Need Help

2014-01-20 Thread Suman Subash
Hi Sean, Thanks. You are right. The SPARK_HOME , lib_managed folder has a different protocol buffer version jar than in /usr/lib/hadoop/lib. In hadoop lib, I have 2.4.0a version and in lib_managed i have 2.4.1 version which is, as you said, is conflicting. I'm really new to SPARK and SCALA as

Re: Quality of documentation (rant)

2014-01-20 Thread Jey Kottalam
This sounds like either a bug or somehow the S3 library requiring lots of memory to read a block. There isn’t a separate way to run HDFS over S3. Hadoop just has different implementations of “file systems”, one of which is S3. There’s a pointer to these versions at the bottom of

Re: Quality of documentation (rant)

2014-01-20 Thread Ognen Duzlevski
Jey, On Mon, Jan 20, 2014 at 10:59 PM, Jey Kottalam j...@cs.berkeley.edu wrote: This sounds like either a bug or somehow the S3 library requiring lots of memory to read a block. There isn’t a separate way to run HDFS over S3. Hadoop just has different implementations of “file systems”,

spark-shell on standalone cluster gives error no mesos in java.library.path

2014-01-20 Thread Manoj Samel
Hi I deployed spark 0.8.1 on standalone cluster per https://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html When i start a spark-shell , I get following error I thought mesos should not be required for standalone cluster. Do I have to change any parameters in make-distribution.sh

Re: spark-shell on standalone cluster gives error no mesos in java.library.path

2014-01-20 Thread Manoj Samel
Please ignore this error - I found the issue. Thanks ! On Mon, Jan 20, 2014 at 3:14 PM, Manoj Samel manojsamelt...@gmail.comwrote: Hi I deployed spark 0.8.1 on standalone cluster per https://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html When i start a spark-shell , I get

RE: Spark Master on Hadoop Job Tracker?

2014-01-20 Thread Liu, Raymond
Not sure what did you aim to solve. When you mention Spark Master, I guess you probably mean spark standalone mode? In that case spark cluster does not necessary coupled with hadoop cluster. While if you aim to achieve better data locality , then yes, run spark worker on HDFS data node might

Re: Error: Could not find or load main class org.apache.spark.executor.CoarseGrainedExecutorBackend

2014-01-20 Thread Tathagata Das
Hi Hussam, Have you (1) generated Spark jar using sbt/sbt assembl, (2) distributed the Spark jar to the worker machines? It could be that the system expects that Spark jar to be present in /opt/spark-0.8.0/conf:/opt/ spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.

Re: FileNotFoundException on distinct()?

2014-01-20 Thread Andrew Ash
Also you will need to bounce the spark services from a new ssh session to make the ulimit changes take effect (if you changed the value in /etc/limits) Sent from my mobile phone On Jan 20, 2014 5:32 PM, Jey Kottalam j...@cs.berkeley.edu wrote: Can you try ulimit -n to make sure the increased

RDD action hangs on a standalone mode cluster

2014-01-20 Thread Manoj Samel
Hi, I configured spark 0.8.1 cluster on AWS with one master node and 3 worker nodes. The cluster was configured as a standalone cluster using http://spark.incubator.apache.org/docs/latest/spark-standalone.html The distribution was generated the master node was started on master host with

How to perform multi dimensional reduction in spark?

2014-01-20 Thread Aureliano Buendia
Hi, It seems spark does not support nested RDD's, so I was wondering how can spark handle multi dimensional reductions. As an example consider a dataset with these rows: ((i, j), value) where i, j and k are long indexes, and value is a double. How is it possible to first reduce the above rdd

Re: Spark Master on Hadoop Job Tracker?

2014-01-20 Thread Nick Pentreath
If you intend to run Hadoop mapReduce and Spark on the same cluster concurrently, and you have enough memory on the jobtracker master, then you can run the Spark master (for standalone as Raymond mentions) on the same node . This is not necessary but more for convenience so you only have so ssh

Re: get CPU Metrics from spark

2014-01-20 Thread Mayur Rustagi
Hi Tianshuo, Your email went to spam for me, probably for others too :) Are you referring to total CPU usage information per task? Regards Mayur Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Fri, Jan 17,