date:20140331

Re: Task not serializable?

2014-03-31 Thread Daniel Liu

Hi

I am new to Spark and I encountered this error when I try to map RDD[A] =
RDD[Array[Double]] then collect the results.

A is a custom class extends Serializable. (Actually it's just a wrapper
class which wraps a few variables that are all serializable).

I also tried KryoSerializer according to this guide
http://spark.apache.org/docs/0.8.1/tuning.html and it gave the same error
message.

Daniel Liu

Re: SequenceFileRDDFunctions cannot be used output of spark package

2014-03-31 Thread pradeeps8

Hi Sonal,

There are no custom objects in saveRDD, it is of type RDD[(String, String)].

Thanks,
Pradeep 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SequenceFileRDDFunctions-cannot-be-used-output-of-spark-package-tp250p3508.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan

I am facing different kinds of java.lang.ClassNotFoundException when trying to 
run spark on mesos. One error has to do with 
org.apache.spark.executor.MesosExecutorBackend. Another has to do with 
org.apache.spark.serializer.JavaSerializer. I see other people complaining 
about similar issues.

I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT 
and faced the same problem. I think the reason for this is is related to the 
error below.

$ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
java.io.IOException: META-INF/license : could not create directory
at sun.tools.jar.Main.extractFile(Main.java:907)
at sun.tools.jar.Main.extract(Main.java:850)
at sun.tools.jar.Main.run(Main.java:240)
at sun.tools.jar.Main.main(Main.java:1147)

This error happens with all the jars that I created. But the classes that are 
already generated is different in the different cases. If JavaSerializer is not 
already extracted before encountering META-INF/license, then that class is not 
found during execution. If MesosExecutorBackend is not found, then that class 
shows up in the mesos slave error logs. Can someone confirm if this is a valid 
cause for the problem I am seeing? Any way I can debug this further?

— Bharath

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair

What versions are you running?  

There is a known protobuf 2.5 mismatch, depending on your versions. 

Cheers,
Tim

- Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos

 I am facing different kinds of java.lang.ClassNotFoundException when trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.

 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is
 related to the error below.

 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
 at sun.tools.jar.Main.extractFile(Main.java:907)
 at sun.tools.jar.Main.extract(Main.java:850)
 at sun.tools.jar.Main.run(Main.java:240)
 at sun.tools.jar.Main.main(Main.java:1147)

 This error happens with all the jars that I created. But the classes that are
 already generated is different in the different cases. If JavaSerializer is
 not already extracted before encountering META-INF/license, then that class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug this
 further?

 — Bharath

-- 
Cheers,
Tim
Freedom, Features, Friends, First - Fedora
https://fedoraproject.org/wiki/SIGs/bigdata

yarn.application.classpath in yarn-site.xml

2014-03-31 Thread Dan

Hi,

I've just tested spark in yarn mode, but something made me confused.

When I *delete* the yarn.application.classpath configuration in
yarn-site.xml, the following command works well.
*bin/spark-class org.apache.spark.deploy.yarn.Client --jar
examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar
--class org.apache.spark.examples.SparkPi --args yarn-standalone
--num-worker 3*

However, when I configures it as follows, yarnAppState has always kept in
the *ACCEPTED state*. The application has no tend to stop.
property
nameyarn.application.classpath/name
value$HADOOP_HOME/etc/hadoop/conf,

 $HADOOP_HOME/share/hadoop/common/*,$HADOOP_HOME/share/hadoop/common/lib/*,

 $HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*,

$HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*,

 $HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*
/value
/property

Hadoop version is 2.2.0 and the cluster has one master and three workers.

Does anyone have ideas about this problem?

Thanks,
Dan




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/yarn-application-classpath-in-yarn-site-xml-tp3512.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan

I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and 
the latest git tree.

Thanks


On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:

 What versions are you running?  
 
 There is a known protobuf 2.5 mismatch, depending on your versions. 
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos
 
 I am facing different kinds of java.lang.ClassNotFoundException when trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.
 
 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is
 related to the error below.
 
 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
at sun.tools.jar.Main.extractFile(Main.java:907)
at sun.tools.jar.Main.extract(Main.java:850)
at sun.tools.jar.Main.run(Main.java:240)
at sun.tools.jar.Main.main(Main.java:1147)
 
 This error happens with all the jars that I created. But the classes that are
 already generated is different in the different cases. If JavaSerializer is
 not already extracted before encountering META-INF/license, then that class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug this
 further?
 
 — Bharath
 
 -- 
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata

Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas

Howdy-doody,

I have a single, very large file sitting in S3 that I want to read in with
sc.textFile(). What are the best practices for reading in this file as
quickly as possible? How do I parallelize the read as much as possible?

Similarly, say I have a single, very large RDD sitting in memory that I
want to write out to S3 with RDD.saveAsTextFile(). What are the best
practices for writing this file out as quickly as possible?

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

2014-03-31 Thread Michael Armbrust


 * unionAll preserve duplicate v/s union that does not


This is true, if you want to eliminate duplicate items you should follow
the union with a distinct()


 * SQL union and unionAll result in same output format i.e. another SQL v/s
 different RDD types here.

* Understand the existing union contract issue. This may be a class
 hierarchy discussion for SchemaRDD, UnionRDD etc. ?


This is unfortunately going to be a limitation of the query DSL since it
extends standard RDDs.  It is not possible for us to return specialized
types from functions that are already defined in RDD (such as union) as the
base RDD class has a very opaque notion of schema, and at this point the
API for RDDs is very fixed.  If you use SQL however, you will always get
back SchemaRDDs.

Re: groupBy RDD does not have grouping column ?

2014-03-31 Thread Michael Armbrust

This is similar to how SQL works, items in the GROUP BY clause are not
included in the output by default.  You will need to include 'a in the
second parameter list (which is similar to the SELECT clause) as well if
you want it included in the output.


On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel manojsamelt...@gmail.comwrote:

 Hi,

 If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the
 resulting RDD should have 'a, 'foo and 'bar.

 The result RDD just shows 'foo and 'bar and is missing 'a

 Thoughts?

 Thanks,

 Manoj

Re: Error in SparkSQL Example

2014-03-31 Thread Michael Armbrust

val people: RDD[Person] // An RDD of case class objects, from the first
example. is just a placeholder to avoid cluttering up each example with
the same code for creating an RDD.  The : RDD[People] is just there to
let you know the expected type of the variable 'people'.  Perhaps there is
a clearer way to indicate this.

As you have realized, using the full line from the first example will allow
you to run the rest of them.



On Sun, Mar 30, 2014 at 7:31 AM, Manoj Samel manojsamelt...@gmail.comwrote:

 Hi,

 On
 http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html,
 I am trying to run code on Writing Language-Integrated Relational Queries
 ( I have 1.0.0 Snapshot ).

 I am running into error on

 val people: RDD[Person] // An RDD of case class objects, from the first
 example.

 scala val people: RDD[Person]
 console:19: error: not found: type RDD
val people: RDD[Person]
^

 scala val people: org.apache.spark.rdd.RDD[Person]
 console:18: error: class $iwC needs to be abstract, since value people
 is not defined
 class $iwC extends Serializable {
   ^

 Any idea what the issue is ?

 Also, its not clear what does the RDD[Person] brings. I can run the DSL
 without the case class objects RDD ...

 val people =
 sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p
 = Person(p(0), p(1).trim.toInt))

 val teenagers = people.where('age = 13).where('age = 19)

 Thanks,

Re: Error in SparkSQL Example

2014-03-31 Thread Manoj Samel

Hi Michael,

Thanks for the clarification. My question is about the error above error:
class $iwC needs to be abstract and what does the RDD brings, since I can
do the DSL without the people: people: org.apache.spark.rdd.RDD[Person]

Thanks,


On Mon, Mar 31, 2014 at 9:13 AM, Michael Armbrust mich...@databricks.comwrote:

 val people: RDD[Person] // An RDD of case class objects, from the first
 example. is just a placeholder to avoid cluttering up each example with
 the same code for creating an RDD.  The : RDD[People] is just there to
 let you know the expected type of the variable 'people'.  Perhaps there is
 a clearer way to indicate this.

 As you have realized, using the full line from the first example will
 allow you to run the rest of them.



 On Sun, Mar 30, 2014 at 7:31 AM, Manoj Samel manojsamelt...@gmail.comwrote:

 Hi,

 On
 http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html,
 I am trying to run code on Writing Language-Integrated Relational Queries
 ( I have 1.0.0 Snapshot ).

 I am running into error on

 val people: RDD[Person] // An RDD of case class objects, from the first
 example.

 scala val people: RDD[Person]
 console:19: error: not found: type RDD
val people: RDD[Person]
^

 scala val people: org.apache.spark.rdd.RDD[Person]
 console:18: error: class $iwC needs to be abstract, since value people
 is not defined
 class $iwC extends Serializable {
   ^

 Any idea what the issue is ?

 Also, its not clear what does the RDD[Person] brings. I can run the DSL
 without the case class objects RDD ...

 val people =
 sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p
 = Person(p(0), p(1).trim.toInt))

 val teenagers = people.where('age = 13).where('age = 19)

 Thanks,

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Aaron Davidson

Note that you may have minSplits set to more than the number of cores in
the cluster, and Spark will just run as many as possible at a time. This is
better if certain nodes may be slow, for instance.

In general, it is not necessarily the case that doubling the number of
cores doing IO will double the throughput, because you could be saturating
the throughput with fewer cores. However, S3 is odd in that each connection
gets way less bandwidth than your network link can provide, and it does
seem to scale linearly with the number of connections. So, yes, taking
minSplits up to 4 (or higher) will likely result in a 2x performance
improvement.

saveAsTextFile() will use as many partitions (aka splits) as the RDD it's
being called on. So for instance:

sc.textFile(myInputFile, 15).map(lambda x: x +
!!!).saveAsTextFile(myOutputFile)

will use 15 partitions to read the text file (i.e., up to 15 cores at a
time) and then again to save back to S3.

On Mon, Mar 31, 2014 at 9:46 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

So setting
minSplitshttp://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.context.SparkContext-class.html#textFile
will
set the parallelism on the read in SparkContext.textFile(), assuming I have
the cores in the cluster to deliver that level of parallelism. And if I
don't explicitly provide it, Spark will set the minSplits to 2.

So for example, say I have a cluster with 4 cores total, and it takes 40
minutes to read a single file from S3 with minSplits at 2. Tt should take
roughly 20 minutes to read the same file if I up minSplits to 4.

Did I understand that correctly?

RDD.saveAsTextFile() doesn't have an analog to minSplits, so I'm guessing
that's not an operation the user can tune.

On Mon, Mar 31, 2014 at 12:29 PM, Aaron Davidson ilike...@gmail.comwrote:

Spark will only use each core for one task at a time, so doing

sc.textFile(s3 location, num reducers)

where you set num reducers to at least as many as the total number of
cores in your cluster, is about as fast you can get out of the box. Same
goes for saveAsTextFile.

On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Howdy-doody,

I have a single, very large file sitting in S3 that I want to read in
with sc.textFile(). What are the best practices for reading in this file as
quickly as possible? How do I parallelize the read as much as possible?

Similarly, say I have a single, very large RDD sitting in memory that I
want to write out to S3 with RDD.saveAsTextFile(). What are the best
practices for writing this file out as quickly as possible?

Nick

--
View this message in context: Best practices: Parallelized write to /
read from
S3http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html
Sent from the Apache Spark User List mailing list
archivehttp://apache-spark-user-list.1001560.n3.nabble.com/at Nabble.com.

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas

OK sweet. Thanks for walking me through that.

I wish this were StackOverflow so I could bestow some nice rep on all you
helpful people.

On Mon, Mar 31, 2014 at 1:06 PM, Aaron Davidson ilike...@gmail.com wrote:

Note that you may have minSplits set to more than the number of cores in
the cluster, and Spark will just run as many as possible at a time. This is
better if certain nodes may be slow, for instance.

saveAsTextFile() will use as many partitions (aka splits) as the RDD it's
being called on. So for instance:

sc.textFile(myInputFile, 15).map(lambda x: x +
!!!).saveAsTextFile(myOutputFile)

will use 15 partitions to read the text file (i.e., up to 15 cores at a
time) and then again to save back to S3.

On Mon, Mar 31, 2014 at 9:46 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Did I understand that correctly?

RDD.saveAsTextFile() doesn't have an analog to minSplits, so I'm guessing
that's not an operation the user can tune.

On Mon, Mar 31, 2014 at 12:29 PM, Aaron Davidson ilike...@gmail.comwrote:

Spark will only use each core for one task at a time, so doing

sc.textFile(s3 location, num reducers)

where you set num reducers to at least as many as the total number of
cores in your cluster, is about as fast you can get out of the box. Same
goes for saveAsTextFile.

On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Howdy-doody,

Nick

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Martin Goodson

How about London?


-- 
Martin Goodson  |  VP Data Science
(0)20 3397 1240
[image: Inline image 1]


On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.comwrote:

 Hi folks,

 We have seen a lot of community growth outside of the Bay Area and we are
 looking to help spur even more!

 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.

 Some amazing Spark champions have stepped forward in Seattle, Vancouver,
 Boulder/Denver, and a few other areas already.

 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.

 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make this
 happen!

 Andy

inline: image.png

Re: network wordcount example

2014-03-31 Thread Diana Carroll

Not sure what data you are sending in.  You could try calling
lines.print() instead which should just output everything that comes in
on the stream.  Just to test that your socket is receiving what you think
you are sending.


On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote:

 Hello

 i just started working with spark today... and i am trying to run the
 wordcount network example

 i created a socket server and client.. and i am sending data to the server
 in an infinite loop

 when i run the spark class.. i see this output in the console...

 ---
 Time: 1396281891000 ms
 ---

 14/03/31 11:04:51 INFO SparkContext: Job finished: take at
 DStream.scala:586, took 0.056794606 s
 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job
 1396281891000 ms.0 from job set of time 1396281891000 ms
 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time
 1396281891000 ms (execution: 0.058 s)
 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool

 but i dont see any output from the workcount operation when i make this
 call...

 wordCounts.print();

 any help is greatly appreciated

 thanks in advance

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Andy Konwinski

Responses about London, Montreal/Toronto, DC, Chicago. Great coverage so
far, and keep 'em coming! (still looking for an NYC connection)

I'll reply to each of you off-list to coordinate next-steps for setting up
a Spark meetup in your home area.

Thanks again, this is super exciting.

Andy


On Mon, Mar 31, 2014 at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.comwrote:

 How about Chicago?


 On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 Montreal or Toronto?


 On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.comwrote:

 How about London?


 --
 Martin Goodson  |  VP Data Science
 (0)20 3397 1240
 [image: Inline image 1]


 On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com
  wrote:

 Hi folks,

 We have seen a lot of community growth outside of the Bay Area and we
 are looking to help spur even more!

 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.

 Some amazing Spark champions have stepped forward in Seattle,
 Vancouver, Boulder/Denver, and a few other areas already.

 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.

 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make this
 happen!

 Andy





inline: image.png

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Chris Gore

We'd love to see a Spark user group in Los Angeles and connect with others 
working with it here.

Ping me if you're in the LA area and use Spark at your company ( 
ch...@retentionscience.com ).

Chris
 
Retention Science
call: 734.272.3099
visit: Site | like: Facebook | follow: Twitter

On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com wrote:

 How about Chicago?
 
 
 On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 Montreal or Toronto?
 
 
 On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com wrote:
 How about London?
 
 
 -- 
 Martin Goodson  |  VP Data Science
 (0)20 3397 1240  
 image.png
 
 
 On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski andykonwin...@gmail.com 
 wrote:
 Hi folks,
 
 We have seen a lot of community growth outside of the Bay Area and we are 
 looking to help spur even more!
 
 For starters, the organizers of the Spark meetups here in the Bay Area want 
 to help anybody that is interested in setting up a meetup in a new city.
 
 Some amazing Spark champions have stepped forward in Seattle, Vancouver, 
 Boulder/Denver, and a few other areas already.
 
 Right now, we are looking to connect with you Spark enthusiasts in NYC about 
 helping to run an inaugural Spark Meetup in your area.
 
 You can reply to me directly if you are interested and I can tell you about 
 all of the resources we have to offer (speakers from the core community, a 
 budget for food, help scheduling, etc.), and let's make this happen!
 
 Andy

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair

It sounds like the protobuf issue. 

So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos  
protobuf. 

mesos 0.17.0  protobuf 2.5   

Cheers,
Tim

- Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 9:46:32 AM
 Subject: Re: java.lang.ClassNotFoundException - spark on mesos
 
 I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and
 the latest git tree.
 
 Thanks
 
 
 On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:
 
  What versions are you running?
  
  There is a known protobuf 2.5 mismatch, depending on your versions.
  
  Cheers,
  Tim
  
  - Original Message -
  From: Bharath Bhushan manku.ti...@outlook.com
  To: user@spark.apache.org
  Sent: Monday, March 31, 2014 8:16:19 AM
  Subject: java.lang.ClassNotFoundException - spark on mesos
  
  I am facing different kinds of java.lang.ClassNotFoundException when
  trying
  to run spark on mesos. One error has to do with
  org.apache.spark.executor.MesosExecutorBackend. Another has to do with
  org.apache.spark.serializer.JavaSerializer. I see other people complaining
  about similar issues.
  
  I tried with different version of spark distribution - 0.9.0 and
  1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is
  is
  related to the error below.
  
  $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
  java.io.IOException: META-INF/license : could not create directory
 at sun.tools.jar.Main.extractFile(Main.java:907)
 at sun.tools.jar.Main.extract(Main.java:850)
 at sun.tools.jar.Main.run(Main.java:240)
 at sun.tools.jar.Main.main(Main.java:1147)
  
  This error happens with all the jars that I created. But the classes that
  are
  already generated is different in the different cases. If JavaSerializer
  is
  not already extracted before encountering META-INF/license, then that
  class
  is not found during execution. If MesosExecutorBackend is not found, then
  that class shows up in the mesos slave error logs. Can someone confirm if
  this is a valid cause for the problem I am seeing? Any way I can debug
  this
  further?
  
  — Bharath
  
  --
  Cheers,
  Tim
  Freedom, Features, Friends, First - Fedora
  https://fedoraproject.org/wiki/SIGs/bigdata
 
 

-- 
Cheers,
Tim
Freedom, Features, Friends, First - Fedora
https://fedoraproject.org/wiki/SIGs/bigdata

how spark dstream handles congestion?

2014-03-31 Thread Dong Mo

Dear list,

I was wondering how Spark handles congestion when the upstream is
generating dstreams faster than downstream workers can handle?

Thanks
-Mo

Re: how spark dstream handles congestion?

2014-03-31 Thread Dong Mo

Thanks
-Mo


2014-03-31 13:16 GMT-05:00 Evgeny Shishkin itparan...@gmail.com:


 On 31 Mar 2014, at 21:05, Dong Mo monted...@gmail.com wrote:

  Dear list,
 
  I was wondering how Spark handles congestion when the upstream is
 generating dstreams faster than downstream workers can handle?

 It will eventually OOM.

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Yana Kadiyska

Nicholas, I'm in Boston and would be interested in a Spark group. Not
sure if you know this -- there was a meetup that never got off the
ground. Anyway, I'd be +1 for attending. Not sure what is involved in
organizing. Seems a shame that a city like Boston doesn't have one.

On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 As in, I am interested in helping organize a Spark meetup in the Boston
 area.


 On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:

 Well, since this thread has played out as it has, lemme throw in a
 shout-out for Boston.


 On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote:

 We'd love to see a Spark user group in Los Angeles and connect with
 others working with it here.

 Ping me if you're in the LA area and use Spark at your company (
 ch...@retentionscience.com ).

 Chris

 Retention Science
 call: 734.272.3099
 visit: Site | like: Facebook | follow: Twitter

 On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com
 wrote:

 How about Chicago?


 On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 Montreal or Toronto?


 On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com
 wrote:

 How about London?


 --
 Martin Goodson  |  VP Data Science
 (0)20 3397 1240
 image.png


 On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski
 andykonwin...@gmail.com wrote:

 Hi folks,

 We have seen a lot of community growth outside of the Bay Area and we
 are looking to help spur even more!

 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.

 Some amazing Spark champions have stepped forward in Seattle,
 Vancouver, Boulder/Denver, and a few other areas already.

 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.

 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make this
 happen!

 Andy

Calling Spahk enthusiasts in Boston

2014-03-31 Thread Nicholas Chammas

My fellow Bostonians and New Englanders,

We cannot allow New York to beat us to having a banging Spark meetup.

Respond to me (and I guess also Andy?) if you are interested.

Yana,

I'm not sure either what is involved in organizing, but we can figure it
out. I didn't know about the meetup that never took off.

Nick


On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:

 Nicholas, I'm in Boston and would be interested in a Spark group. Not
 sure if you know this -- there was a meetup that never got off the
 ground. Anyway, I'd be +1 for attending. Not sure what is involved in
 organizing. Seems a shame that a city like Boston doesn't have one.

 On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  As in, I am interested in helping organize a Spark meetup in the Boston
  area.
 
 
  On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
 
  Well, since this thread has played out as it has, lemme throw in a
  shout-out for Boston.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Calling Spahk enthusiasts in Boston

2014-03-31 Thread Nick Pentreath

I would offer to host one in Cape Town but we're almost certainly the only
Spark users in the country apart from perhaps one in Johanmesburg :)—
Sent from Mailbox for iPhone

On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

My fellow Bostonians and New Englanders,
We cannot allow New York to beat us to having a banging Spark meetup.
Respond to me (and I guess also Andy?) if you are interested.
Yana,
I'm not sure either what is involved in organizing, but we can figure it
out. I didn't know about the meetup that never took off.
Nick
On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:
Nicholas, I'm in Boston and would be interested in a Spark group. Not
sure if you know this -- there was a meetup that never got off the
ground. Anyway, I'd be +1 for attending. Not sure what is involved in
organizing. Seems a shame that a city like Boston doesn't have one.

On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
As in, I am interested in helping organize a Spark meetup in the Boston
area.

On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Well, since this thread has played out as it has, lemme throw in a
shout-out for Boston.

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Jeremy Freeman

Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, 
but am back in NYC quite often, and have been turning several computational 
people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be 
interest in those communities.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote:

 Nicholas, I'm in Boston and would be interested in a Spark group. Not
 sure if you know this -- there was a meetup that never got off the
 ground. Anyway, I'd be +1 for attending. Not sure what is involved in
 organizing. Seems a shame that a city like Boston doesn't have one.
 
 On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 As in, I am interested in helping organize a Spark meetup in the Boston
 area.
 
 
 On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 
 Well, since this thread has played out as it has, lemme throw in a
 shout-out for Boston.
 
 
 On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote:
 
 We'd love to see a Spark user group in Los Angeles and connect with
 others working with it here.
 
 Ping me if you're in the LA area and use Spark at your company (
 ch...@retentionscience.com ).
 
 Chris
 
 Retention Science
 call: 734.272.3099
 visit: Site | like: Facebook | follow: Twitter
 
 On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com
 wrote:
 
 How about Chicago?
 
 
 On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
 Montreal or Toronto?
 
 
 On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com
 wrote:
 
 How about London?
 
 
 --
 Martin Goodson  |  VP Data Science
 (0)20 3397 1240
 image.png
 
 
 On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski
 andykonwin...@gmail.com wrote:
 
 Hi folks,
 
 We have seen a lot of community growth outside of the Bay Area and we
 are looking to help spur even more!
 
 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.
 
 Some amazing Spark champions have stepped forward in Seattle,
 Vancouver, Boulder/Denver, and a few other areas already.
 
 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.
 
 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make 
 this
 happen!
 
 Andy

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Patrick Grinaway

Also in NYC, definitely interested in a spark meetup!

Sent from my iPhone

 On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote:
 
 Happy to help with an NYC meet up (just emailed Andy). I recently moved to 
 VA, but am back in NYC quite often, and have been turning several 
 computational people at Columbia / NYU / Simons Foundation onto Spark; 
 there'd definitely be interest in those communities.
 
 -- Jeremy
 
 -
 jeremy freeman, phd
 neuroscientist
 @thefreemanlab
 
 On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote:
 
 Nicholas, I'm in Boston and would be interested in a Spark group. Not
 sure if you know this -- there was a meetup that never got off the
 ground. Anyway, I'd be +1 for attending. Not sure what is involved in
 organizing. Seems a shame that a city like Boston doesn't have one.
 
 On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 As in, I am interested in helping organize a Spark meetup in the Boston
 area.
 
 
 On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 
 Well, since this thread has played out as it has, lemme throw in a
 shout-out for Boston.
 
 
 On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote:
 
 We'd love to see a Spark user group in Los Angeles and connect with
 others working with it here.
 
 Ping me if you're in the LA area and use Spark at your company (
 ch...@retentionscience.com ).
 
 Chris
 
 Retention Science
 call: 734.272.3099
 visit: Site | like: Facebook | follow: Twitter
 
 On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com
 wrote:
 
 How about Chicago?
 
 
 On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 
 Montreal or Toronto?
 
 
 On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com
 wrote:
 
 How about London?
 
 
 --
 Martin Goodson  |  VP Data Science
 (0)20 3397 1240
 image.png
 
 
 On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski
 andykonwin...@gmail.com wrote:
 
 Hi folks,
 
 We have seen a lot of community growth outside of the Bay Area and we
 are looking to help spur even more!
 
 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.
 
 Some amazing Spark champions have stepped forward in Seattle,
 Vancouver, Boulder/Denver, and a few other areas already.
 
 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.
 
 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make 
 this
 happen!
 
 Andy

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Denny Lee

If you have any questions on helping to get a Spark Meetup off the ground, 
please do not hesitate to ping me (denny.g@gmail.com).  I helped jump start 
the one here in Seattle (and tangentially have been helping the Vancouver and 
Denver ones as well).  HTH!


On March 31, 2014 at 12:35:38 PM, Patrick Grinaway (pgrina...@gmail.com) wrote:

Also in NYC, definitely interested in a spark meetup!

Sent from my iPhone

On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote:

Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA, 
but am back in NYC quite often, and have been turning several computational 
people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be 
interest in those communities.

-- Jeremy

-
jeremy freeman, phd
neuroscientist
@thefreemanlab

On Mar 31, 2014, at 2:31 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote:

Nicholas, I'm in Boston and would be interested in a Spark group. Not
sure if you know this -- there was a meetup that never got off the
ground. Anyway, I'd be +1 for attending. Not sure what is involved in
organizing. Seems a shame that a city like Boston doesn't have one.

On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
As in, I am interested in helping organize a Spark meetup in the Boston
area.


On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:

Well, since this thread has played out as it has, lemme throw in a
shout-out for Boston.


On Mon, Mar 31, 2014 at 1:52 PM, Chris Gore cdg...@cdgore.com wrote:

We'd love to see a Spark user group in Los Angeles and connect with
others working with it here.

Ping me if you're in the LA area and use Spark at your company (
ch...@retentionscience.com ).

Chris

Retention Science
call: 734.272.3099
visit: Site | like: Facebook | follow: Twitter

On Mar 31, 2014, at 10:42 AM, Anurag Dodeja anu...@anuragdodeja.com
wrote:

How about Chicago?


On Mon, Mar 31, 2014 at 12:38 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

Montreal or Toronto?


On Mon, Mar 31, 2014 at 1:36 PM, Martin Goodson mar...@skimlinks.com
wrote:

How about London?


--
Martin Goodson  |  VP Data Science
(0)20 3397 1240
image.png


On Mon, Mar 31, 2014 at 6:28 PM, Andy Konwinski
andykonwin...@gmail.com wrote:

Hi folks,

We have seen a lot of community growth outside of the Bay Area and we
are looking to help spur even more!

For starters, the organizers of the Spark meetups here in the Bay Area
want to help anybody that is interested in setting up a meetup in a new
city.

Some amazing Spark champions have stepped forward in Seattle,
Vancouver, Boulder/Denver, and a few other areas already.

Right now, we are looking to connect with you Spark enthusiasts in NYC
about helping to run an inaugural Spark Meetup in your area.

You can reply to me directly if you are interested and I can tell you
about all of the resources we have to offer (speakers from the core
community, a budget for food, help scheduling, etc.), and let's make this
happen!

Andy

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan

Your suggestion took me past the ClassNotFoundException. I then hit 
akka.actor.ActorNotFound exception. I patched in PR 568 into my 0.9.0 spark 
codebase and everything worked.

So thanks a lot, Tim. Is there a JIRA/PR for the protobuf issue? Why is it not 
fixed in the latest git tree?

Thanks.

On 31-Mar-2014, at 11:30 pm, Tim St Clair tstcl...@redhat.com wrote:

 It sounds like the protobuf issue. 
 
 So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos  
 protobuf. 
 
 mesos 0.17.0  protobuf 2.5   
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 9:46:32 AM
 Subject: Re: java.lang.ClassNotFoundException - spark on mesos
 
 I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and
 the latest git tree.
 
 Thanks
 
 
 On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:
 
 What versions are you running?
 
 There is a known protobuf 2.5 mismatch, depending on your versions.
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos
 
 I am facing different kinds of java.lang.ClassNotFoundException when
 trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.
 
 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is
 is
 related to the error below.
 
 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
   at sun.tools.jar.Main.extractFile(Main.java:907)
   at sun.tools.jar.Main.extract(Main.java:850)
   at sun.tools.jar.Main.run(Main.java:240)
   at sun.tools.jar.Main.main(Main.java:1147)
 
 This error happens with all the jars that I created. But the classes that
 are
 already generated is different in the different cases. If JavaSerializer
 is
 not already extracted before encountering META-INF/license, then that
 class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug
 this
 further?
 
 — Bharath
 
 --
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata
 
 
 
 -- 
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-03-31 Thread Patrick Wendell

Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be
getting pulled in unless you are directly using akka yourself. Are you?

Does your project have other dependencies that might be indirectly pulling
in protobuf 2.4.1? It would be helpful if you could list all of your
dependencies including the exact Spark version and other libraries.

- Patrick


On Sun, Mar 30, 2014 at 10:03 PM, Vipul Pandey vipan...@gmail.com wrote:

 I'm using ScalaBuff (which depends on protobuf2.5) and facing the same
 issue. any word on this one?
 On Mar 27, 2014, at 6:41 PM, Kanwaldeep kanwal...@gmail.com wrote:

  We are using Protocol Buffer 2.5 to send messages to Spark Streaming 0.9
 with
  Kafka stream setup. I have protocol Buffer 2.5 part of the uber jar
 deployed
  on each of the spark worker nodes.
  The message is compiled using 2.5 but then on runtime it is being
  de-serialized by 2.4.1 as I'm getting the following exception
 
  java.lang.VerifyError (java.lang.VerifyError: class
  com.snc.sinet.messages.XServerMessage$XServer overrides final method
  getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;)
  java.lang.ClassLoader.defineClass1(Native Method)
  java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
  java.lang.ClassLoader.defineClass(ClassLoader.java:615)
  java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
 
  Suggestions on how I could still use ProtoBuf 2.5. Based on the article -
  https://spark-project.atlassian.net/browse/SPARK-995 we should be able
 to
  use different version of protobuf in the application.
 
 
 
 
 
  --
  View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Using-ProtoBuf-2-5-for-messages-with-Spark-Streaming-tp3396.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.

Calling Spark enthusiasts in Austin, TX

2014-03-31 Thread Ognen Duzlevski

In the spirit of everything being bigger and better in TX ;) = if
anyone is in Austin and interested in meeting up over Spark - contact
me! There seems to be a Spark meetup group in Austin that has never met
and my initial email to organize the first gathering was never acknowledged.

Ognen

On 3/31/14, 2:01 PM, Nick Pentreath wrote:
I would offer to host one in Cape Town but we're almost certainly the
only Spark users in the country apart from perhaps one in Johanmesburg :)

—
Sent from Mailbox https://www.dropbox.com/mailbox for iPhone

On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas
nicholas.cham...@gmail.com mailto:nicholas.cham...@gmail.com wrote:

My fellow Bostonians and New Englanders,

We cannot allow New York to beat us to having a banging Spark meetup.

Respond to me (and I guess also Andy?) if you are interested.

Yana,

I'm not sure either what is involved in organizing, but we can
figure it out. I didn't know about the meetup that never took off.

Nick

On Mon, Mar 31, 2014 at 2:31 PM, Yana Kadiyska [hidden email]
/user/SendEmail.jtp?type=nodenode=3544i=0 wrote:

Nicholas, I'm in Boston and would be interested in a Spark
group. Not
sure if you know this -- there was a meetup that never got off the
ground. Anyway, I'd be +1 for attending. Not sure what is
involved in
organizing. Seems a shame that a city like Boston doesn't have
one.

On Mon, Mar 31, 2014 at 2:02 PM, Nicholas Chammas
[hidden email] /user/SendEmail.jtp?type=nodenode=3544i=1
wrote:
As in, I am interested in helping organize a Spark meetup in
the Boston
area.

On Mon, Mar 31, 2014 at 2:00 PM, Nicholas Chammas
[hidden email]
/user/SendEmail.jtp?type=nodenode=3544i=2 wrote:

Well, since this thread has played out as it has, lemme
throw in a
shout-out for Boston.

View this message in context: Calling Spahk enthusiasts in Boston

http://apache-spark-user-list.1001560.n3.nabble.com/Calling-Spahk-enthusiasts-in-Boston-tp3544.html
Sent from the Apache Spark User List mailing list archive
http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.

Re: network wordcount example

2014-03-31 Thread Chris Fregly

@eric-

i saw this exact issue recently while working on the KinesisWordCount.

are you passing local[2] to your example as the MASTER arg versus just
local or local[1]?

you need at least 2.  it's documented as n1 in the scala source docs -
which is easy to mistake for n=1.

i just ran the NetworkWordCount sample and confirmed that local[1] does not
work, but  local[2] does work.

give that a whirl.

-chris




On Mon, Mar 31, 2014 at 10:41 AM, Diana Carroll dcarr...@cloudera.comwrote:

 Not sure what data you are sending in.  You could try calling
 lines.print() instead which should just output everything that comes in
 on the stream.  Just to test that your socket is receiving what you think
 you are sending.


 On Mon, Mar 31, 2014 at 12:18 PM, eric perler ericper...@hotmail.comwrote:

 Hello

 i just started working with spark today... and i am trying to run the
 wordcount network example

 i created a socket server and client.. and i am sending data to the
 server in an infinite loop

 when i run the spark class.. i see this output in the console...

 ---
 Time: 1396281891000 ms
 ---

 14/03/31 11:04:51 INFO SparkContext: Job finished: take at
 DStream.scala:586, took 0.056794606 s
 14/03/31 11:04:51 INFO JobScheduler: Finished job streaming job
 1396281891000 ms.0 from job set of time 1396281891000 ms
 14/03/31 11:04:51 INFO JobScheduler: Total delay: 0.101 s for time
 1396281891000 ms (execution: 0.058 s)
 14/03/31 11:04:51 INFO TaskSchedulerImpl: Remove TaskSet 3.0 from pool

 but i dont see any output from the workcount operation when i make this
 call...

 wordCounts.print();

 any help is greatly appreciated

 thanks in advance

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan

I was talking about the protobuf version issue as not fixed. I could not find 
any reference to the problem or the fix.

Reg. SPARK-1052, I could pull in the fix into my 0.9.0 tree (from the tar ball 
on the website) and I see the fix in the latest git.

Thanks

On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote:

 Which repository do you use?
 
 The issue should be fixed in 0.9.1 and 1.0.0
 
 https://spark-project.atlassian.net/browse/SPARK-1052
 https://spark-project.atlassian.net/browse/SPARK-1052  
 
 There's an old repository 
 
 https://github.com/apache/incubator-spark
 
 and as Spark become one of top level projects, it was moved to new repo:
 
 https://github.com/apache/spark
 
 The 0.9.1 version hasn't been released yet, so you should get it from the
 new git repo.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Sonal Goyal

Hi Andy,

I would be interested in setting up a meetup in Delhi/NCR, India. Can you
please let me know how to go about organizing it?

Best Regards,
Sonal
Nube Technologies http://www.nubetech.co

http://in.linkedin.com/in/sonalgoyal




On Tue, Apr 1, 2014 at 10:04 AM, giive chen thegi...@gmail.com wrote:

 Hi Andy

 We are from Taiwan. We are already planning to have a Spark meetup.
 We already have some resources like place and food budget. But we do need
 some other resource.
 Please contact me offline.

 Thanks

 Wisely Chen


 On Tue, Apr 1, 2014 at 1:28 AM, Andy Konwinski andykonwin...@gmail.comwrote:

 Hi folks,

 We have seen a lot of community growth outside of the Bay Area and we are
 looking to help spur even more!

 For starters, the organizers of the Spark meetups here in the Bay Area
 want to help anybody that is interested in setting up a meetup in a new
 city.

 Some amazing Spark champions have stepped forward in Seattle, Vancouver,
 Boulder/Denver, and a few other areas already.

 Right now, we are looking to connect with you Spark enthusiasts in NYC
 about helping to run an inaugural Spark Meetup in your area.

 You can reply to me directly if you are interested and I can tell you
 about all of the resources we have to offer (speakers from the core
 community, a budget for food, help scheduling, etc.), and let's make this
 happen!

 Andy

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan

Another problem I noticed is that the current 1.0.0 git tree still gives me the 
ClassNotFoundException. I see that the SPARK-1052 is already fixed there. I 
then modified the pom.xml for mesos and protobuf and that still gave the 
ClassNotFoundException. I also tried modifying pom.xml only for mesos and that 
fails too. So I have no way of running the 1.0.0 git tree spark on mesos yet.

Thanks.

On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote:

 Which repository do you use?
 
 The issue should be fixed in 0.9.1 and 1.0.0
 
 https://spark-project.atlassian.net/browse/SPARK-1052
 https://spark-project.atlassian.net/browse/SPARK-1052  
 
 There's an old repository 
 
 https://github.com/apache/incubator-spark
 
 and as Spark become one of top level projects, it was moved to new repo:
 
 https://github.com/apache/spark
 
 The 0.9.1 version hasn't been released yet, so you should get it from the
 new git repo.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Task not serializable?

Re: SequenceFileRDDFunctions cannot be used output of spark package

java.lang.ClassNotFoundException - spark on mesos

Re: java.lang.ClassNotFoundException - spark on mesos

yarn.application.classpath in yarn-site.xml

Re: java.lang.ClassNotFoundException - spark on mesos

Best practices: Parallelized write to / read from S3

Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?

Re: groupBy RDD does not have grouping column ?

Re: Error in SparkSQL Example

Re: Error in SparkSQL Example

Re: Best practices: Parallelized write to / read from S3

Re: Best practices: Parallelized write to / read from S3

Re: Calling Spark enthusiasts in NYC

Re: network wordcount example

Re: Calling Spark enthusiasts in NYC

Re: Calling Spark enthusiasts in NYC

Re: java.lang.ClassNotFoundException - spark on mesos

how spark dstream handles congestion?

Re: how spark dstream handles congestion?

Re: Calling Spark enthusiasts in NYC

Calling Spahk enthusiasts in Boston

Re: Calling Spahk enthusiasts in Boston

Re: Calling Spark enthusiasts in NYC

Re: Calling Spark enthusiasts in NYC

Re: Calling Spark enthusiasts in NYC

Re: java.lang.ClassNotFoundException - spark on mesos

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

Calling Spark enthusiasts in Austin, TX

Re: network wordcount example

Re: java.lang.ClassNotFoundException - spark on mesos

Re: Calling Spark enthusiasts in NYC

Re: java.lang.ClassNotFoundException - spark on mesos

33 matches

Site Navigation

Mail list logo

Footer information