Hi guys,
Somebody help me, Where do I get change the print() function to print more than
10 lines in screen? Is there a manner to print the count total of all words in
a batch?
Best Regards
--
---
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI
I dati utilizzati per l'invio del presente
Hello,
I think there is a bug with TorrentBroadcast in the latest release (0.8.1). The
problem is that even a simple job (e.g., rdd.count) hangs waiting for some
tasks to finish. Here is how to reproduce the problem:
1) Configure Spark such that node X is the master and also one of the workers
Hi,
I'm tring out lastest master branch of spark for the exciting external
hashmap feature. I have a code that is running correctly at spark 0.8.1 and
I only make a change for its easily to be spilled to disk. However, I
encounter a few task failure of
java.util.NoSuchElementException
My application is failing with an Loss was due to
java.lang.ClassNotFoundException java.lang.ClassNotFoundException:
scala.None$ error when the mysql-async library (
https://github.com/mauricio/postgresql-async) is added to build.sbt.
I've add the following line to build.sbt com.github.mauricio
Solved, the mysql-async required scala 2.10.3 and I was compiling was
version 2.10.2
On Mon, Jan 20, 2014 at 1:29 PM, Richard Siebeling rsiebel...@gmail.comwrote:
My application is failing with an Loss was due to
java.lang.ClassNotFoundException java.lang.ClassNotFoundException:
scala.None$
Use scala 2.9.2. From what I read 2.9.3 is not supported.
You might want to try a later version of the JDK 7.0_51
On Friday, January 17, 2014 1:07 PM, Kal El pinu.datri...@yahoo.com wrote:
Hello,
I have tried to assemble spark (sbt/sbt assembly) with different versions of
java (open JDK,
Hi Eduardo,
You can do arbitrary stuff with the data in a DStream using the operation
foreachRDD.
yourDStream.foreachRDD(rdd = {
// Get and print first n elements
val firstN = rdd.take(n)
println(First N elements = + firstN)
// Count the number of elements in each batch
Hi,
I've experimented with the parameters provided but we are still seeing the
same problem, data is still spilling to disk when there's clearly enough
memory on the worker nodes.
Please note that data is distributed equally amongst the 6 Hadoop nodes
(About 5GB per node).
Any workarounds or
Hi,
Should the Spark Master run on the Hadoop Job Tracker node (and Spark
workers on Task Trackers) or the placement of the Spark Master could reside
on any Hadoop node?
Thanks
Majd
--
View this message in context:
Thank u Patrick.
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Friday, January 17, 2014 11:54 PM
To: user@spark.incubator.apache.org
Subject: Re: SparkException: Expect only DirectTaskResults when using
localScheduler()
This is a bug that was fixed and will
Hi Ognen,
It’s true that the documentation is partly targeting Hadoop users, and that’s
something we need to fix. Perhaps the best solution would be some kind of
tutorial on “here’s how to set up Spark by hand on EC2”. However it also sounds
like you ran into some issues with S3 that it would
*Hi,*
*I'm new to spark. And I was trying to read a file residing in HDFS. And
perform some basic actions on this dataset. See below the code i used:*
*object Hbase {*
* def main(args: Array[String]) {*
*val sc = new
SparkContext(spark://servername:portno,somename)*
* val input =
Hi Matei, thanks for replying!
On Mon, Jan 20, 2014 at 8:08 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
It’s true that the documentation is partly targeting Hadoop users, and
that’s something we need to fix. Perhaps the best solution would be some
kind of tutorial on “here’s how to set up
Every time I see the magic words...
InvalidProtocolBufferException: Message missing required fields: callId, status;
... it indicates that a client of something is using protobuf 2.4 and
the server is using protobuf 2.5. Here you are using protobuf 2.4,
check. And I suppose you are using HDFS
Any suggestions, anyone?
Core team / contributors / spark-developers - any thoughts?
On Jan 17, 2014, at 4:45 PM, Vipul Pandey vipan...@gmail.com wrote:
Hi All,
Can someone please share (sample) code to read lzo compressed protobufs from
hdfs (using elephant bird)? I'm trying whatever I
Hi all,
I¹m having hard time trying to find out ways to report exception that
happens during computation to the end-user of Spark system without having
them ssh into the worker nodes or accessing Spark UI. For example, if some
exception happens in the code that runs on worker nodes (e.g.
Hi Sean,
Thanks. You are right. The SPARK_HOME , lib_managed folder has a different
protocol buffer version jar than in /usr/lib/hadoop/lib. In hadoop lib, I
have 2.4.0a version and in lib_managed i have 2.4.1 version which is, as
you said, is conflicting.
I'm really new to SPARK and SCALA as
This sounds like either a bug or somehow the S3 library requiring lots of
memory to read a block. There isn’t a separate way to run HDFS over S3.
Hadoop just has different implementations of “file systems”, one of which is
S3. There’s a pointer to these versions at the bottom of
Jey,
On Mon, Jan 20, 2014 at 10:59 PM, Jey Kottalam j...@cs.berkeley.edu wrote:
This sounds like either a bug or somehow the S3 library requiring lots
of
memory to read a block. There isn’t a separate way to run HDFS over S3.
Hadoop just has different implementations of “file systems”,
Hi
I deployed spark 0.8.1 on standalone cluster per
https://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html
When i start a spark-shell , I get following error
I thought mesos should not be required for standalone cluster. Do I have to
change any parameters in make-distribution.sh
Please ignore this error - I found the issue.
Thanks !
On Mon, Jan 20, 2014 at 3:14 PM, Manoj Samel manojsamelt...@gmail.comwrote:
Hi
I deployed spark 0.8.1 on standalone cluster per
https://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html
When i start a spark-shell , I get
Not sure what did you aim to solve. When you mention Spark Master, I guess you
probably mean spark standalone mode? In that case spark cluster does not
necessary coupled with hadoop cluster. While if you aim to achieve better data
locality , then yes, run spark worker on HDFS data node might
Hi Hussam,
Have you (1) generated Spark jar using sbt/sbt assembl, (2) distributed the
Spark jar to the worker machines? It could be that the system expects that
Spark jar to be present in /opt/spark-0.8.0/conf:/opt/
spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.
Also you will need to bounce the spark services from a new ssh session to
make the ulimit changes take effect (if you changed the value in
/etc/limits)
Sent from my mobile phone
On Jan 20, 2014 5:32 PM, Jey Kottalam j...@cs.berkeley.edu wrote:
Can you try ulimit -n to make sure the increased
Hi,
I configured spark 0.8.1 cluster on AWS with one master node and 3 worker
nodes. The cluster was configured as a standalone cluster using
http://spark.incubator.apache.org/docs/latest/spark-standalone.html
The distribution was generated
the master node was started on master host with
Hi,
It seems spark does not support nested RDD's, so I was wondering how can
spark handle multi dimensional reductions.
As an example consider a dataset with these rows:
((i, j), value)
where i, j and k are long indexes, and value is a double.
How is it possible to first reduce the above rdd
If you intend to run Hadoop mapReduce and Spark on the same cluster
concurrently, and you have enough memory on the jobtracker master, then you can
run the Spark master (for standalone as Raymond mentions) on the same node .
This is not necessary but more for convenience so you only have so ssh
Hi Tianshuo,
Your email went to spam for me, probably for others too :)
Are you referring to total CPU usage information per task?
Regards
Mayur
Mayur Rustagi
Ph: +919632149971
h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Fri, Jan 17,
28 matches
Mail list logo