Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-05-30 Thread Margusja
Hi spark version I am using is spark-0.9.1-bin-hadoop2 I build spark-assembly_2.10-0.9.1-hadoop2.2.0.jar I moved JavaKafkaWordCount.java from examples to new directory to play with it. My compile commands: javac -cp

Announcing Spark 1.0.0

2014-05-30 Thread Patrick Wendell
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank

Re: Announcing Spark 1.0.0

2014-05-30 Thread Christopher Nguyen
Awesome work, Pat et al.! -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as

Re: Announcing Spark 1.0.0

2014-05-30 Thread prabeesh k
Please update the http://spark.apache.org/docs/latest/ link On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote: Is it possible to download pre build package? http://mirror.symnds.com/software/Apache/incubator/ spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz - gives me 404 Best

Re: Announcing Spark 1.0.0

2014-05-30 Thread Patrick Wendell
It is updated - try holding Shift + refresh in your browser, you are probably caching the page. On Fri, May 30, 2014 at 3:46 AM, prabeesh k prabsma...@gmail.com wrote: Please update the http://spark.apache.org/docs/latest/ link On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote:

Re: Announcing Spark 1.0.0

2014-05-30 Thread Margusja
Now I can download. Thanks. Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On 30/05/14 13:48, Patrick Wendell wrote: It is updated - try holding Shift +

RE: Announcing Spark 1.0.0

2014-05-30 Thread Kousuke Saruta
Hi all In https://spark.apache.org/downloads.html, the URL for release note of 1.0.0 seems to be wrong. The URL should be https://spark.apache.org/releases/spark-release-1-0-0.html but links to https://spark.apache.org/releases/spark-release-1.0.0.html Best Regards, Kousuke From:

Re: Announcing Spark 1.0.0

2014-05-30 Thread John Omernik
All: In the pom.xml file I see the MapR repository, but it's not included in the ./project/SparkBuild.scala file. Is this expected? I know to build I have to add it there otherwise sbt hates me with evil red messages and such. John On Fri, May 30, 2014 at 6:24 AM, Kousuke Saruta

Re: Announcing Spark 1.0.0

2014-05-30 Thread jose farfan
Awesome work On Fri, May 30, 2014 at 12:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0

Re: Announcing Spark 1.0.0

2014-05-30 Thread John Omernik
By the way: This is great work. I am new to the spark world, and have been like a kid in a candy store learnign all it can do. Is there a good list of build variables? What I me is like the SPARK_HIVE variable described on the Spark SQL page. I'd like to include that, but once I found that I

Re: Selecting first ten values in a RDD/partition

2014-05-30 Thread nilmish
My primary goal : To get top 10 hashtag for every 5 mins interval. I want to do this efficiently. I have already done this by using reducebykeyandwindow() and then sorting all hashtag in 5 mins interval taking only top 10 elements. But this is very slow. So I now I am thinking of retaining only

Re: pyspark MLlib examples don't work with Spark 1.0.0

2014-05-30 Thread jamborta
thanks for the reply. I am definitely running 1.0.0, I set it up manually. To answer my question, I found out from the examples that it would need a new data type called LabeledPoint instead of numpy array. -- View this message in context:

Re: Announcing Spark 1.0.0

2014-05-30 Thread Ognen Duzlevski
How exciting! Congratulations! :-) Ognen On 5/30/14, 5:12 AM, Patrick Wendell wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is

Re: Announcing Spark 1.0.0

2014-05-30 Thread Chanwit Kaewkasi
Congratulations !! -chanwit -- Chanwit Kaewkasi linkedin.com/in/chanwit On Fri, May 30, 2014 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing

Re: SparkContext startup time out

2014-05-30 Thread Pierre B
I was annoyed by this as well. It appears that just permuting the order of decencies inclusion solves this problem: first spark, than your cdh hadoop distro. HTH, Pierre -- View this message in context:

Re: KryoSerializer Exception

2014-05-30 Thread Andrea Esposito
Hi, i just migrate to 1.0. Still having the same issue. Either with or without the custom registrator. Just the usage of the KryoSerializer triggers the exception immediately. I set the kryo settings through the property: System.setProperty(spark.serializer, org.apache.spark.serializer.

Re: Announcing Spark 1.0.0

2014-05-30 Thread Dean Wampler
Congratulations!! On Fri, May 30, 2014 at 5:12 AM, Patrick Wendell pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark

Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Hi, I recently posted a question on stackoverflow but didn't get any reply. I joined the mailing list now. Can anyone of you guide me a way for the problem mentioned in http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark Thanks in advance

Monitoring / Instrumenting jobs in 1.0

2014-05-30 Thread Daniel Siegmann
The Spark 1.0.0 release notes state Internal instrumentation has been added to allow applications to monitor and instrument Spark jobs. Can anyone point me to the docs for this? -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY

RE: Announcing Spark 1.0.0

2014-05-30 Thread Ian Ferreira
Congrats Sent from my Windows Phone From: Dean Wamplermailto:deanwamp...@gmail.com Sent: ‎5/‎30/‎2014 6:53 AM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Announcing Spark 1.0.0 Congratulations!! On Fri, May 30, 2014 at 5:12 AM, Patrick

Using Spark on Data size larger than Memory size

2014-05-30 Thread Vibhor Banga
Hi all, I am planning to use spark with HBase, where I generate RDD by reading data from HBase Table. I want to know that in the case when the size of HBase Table grows larger than the size of RAM available in the cluster, will the application fail, or will there be an impact in performance ?

Re: Problem using Spark with Hbase

2014-05-30 Thread Vibhor Banga
Thanks Mayur for the reply. Actually issue was the I was running Spark application on hadoop-2.2.0 and hbase version there was 0.95.2. But spark by default gets build by an older hbase version. So I had to build spark again with hbase version as 0.95.2 in spark build file. And it worked.

Re: Announcing Spark 1.0.0

2014-05-30 Thread Nicholas Chammas
You guys were up late, eh? :) I'm looking forward to using this latest version. Is there any place we can get a list of the new functions in the Python API? The release notes don't enumerate them. Nick On Fri, May 30, 2014 at 10:15 AM, Ian Ferreira ianferre...@hotmail.com wrote: Congrats

Subscribing to news releases

2014-05-30 Thread Nick Chammas
Is there a way to subscribe to news releases http://spark.apache.org/news/index.html? That would be swell. Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Subscribing-to-news-releases-tp6592.html Sent from the Apache Spark User List mailing list

RE: Announcing Spark 1.0.0

2014-05-30 Thread giive chen
Great work! On May 30, 2014 10:15 PM, Ian Ferreira ianferre...@hotmail.com wrote: Congrats Sent from my Windows Phone -- From: Dean Wampler deanwamp...@gmail.com Sent: 5/30/2014 6:53 AM To: user@spark.apache.org Subject: Re: Announcing Spark 1.0.0

Spark 1.0.0 - Java 8

2014-05-30 Thread Upender Nimbekar
Great News ! I've been awaiting this release to start doing some coding with Spark using Java 8. Can I run Spark 1.0 examples on a virtual host with 16 GB ram and fair descent amount of hard disk ? Or do I reaaly need to use a cluster of machines. Second, are there any good exmaples of using MLIB

Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Surendranauth Hiraman
With respect to virtual hosts, my team uses Vagrant/Virtualbox. We have 3 CentOS VMs with 4 GB RAM each - 2 worker nodes and a master node. Everything works fine, though if you are using MapR, you have to make sure they are all on the same subnet. -Suren On Fri, May 30, 2014 at 12:20 PM,

Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Aaron Davidson
Also, the Spark examples can run out of the box on a single machine, as well as a cluster. See the Master URLs heading here: http://spark.apache.org/docs/latest/submitting-applications.html#master-urls On Fri, May 30, 2014 at 9:24 AM, Surendranauth Hiraman suren.hira...@velos.io wrote: With

Re: Local file being refrenced in mapper function

2014-05-30 Thread Marcelo Vanzin
Hi Rahul, I'll just copy paste your question here to aid with context, and reply afterwards. - Can I write the RDD data in excel file along with mapping in apache-spark? Is that a correct way? Isn't that a writing will be a local function and can't be passed over the clusters?? Below is

Re: Local file being refrenced in mapper function

2014-05-30 Thread Marcelo Vanzin
Hello there, On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin van...@cloudera.com wrote: workbook = xlsxwriter.Workbook('output_excel.xlsx') worksheet = workbook.add_worksheet() data = sc.textFile(xyz.txt) # xyz.txt is a file whose each line contains string delimited by SPACE row=0 def

Re: Local file being refrenced in mapper function

2014-05-30 Thread Jey Kottalam
Hi Rahul, Marcelo's explanation is correct. Here's a possible approach to your program, in pseudo-Python: # connect to Spark cluster sc = SparkContext(...) # load input data input_data = load_xls(file(input.xls)) input_rows = input_data['Sheet1'].rows # create RDD on cluster input_rdd =

Trouble with EC2

2014-05-30 Thread PJ$
Hey Folks, I'm really having quite a bit of trouble getting spark running on ec2. I'm not using scripts the https://github.com/apache/spark/tree/master/ec2 because I'd like to know how everything works. But I'm going a little crazy. I think that something about the networking configuration must

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-05-30 Thread Andrei
Thanks, Stephen. I have eventually decided to go with assembly, but put away Spark and Hadoop jars, and instead use `spark-submit` to automatically provide these dependencies. This way no resource conflicts arise and mergeStrategy needs no modification. To memorize this stable setup and also share

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello there, On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin van...@cloudera.com wrote: workbook = xlsxwriter.Workbook('output_excel.xlsx') worksheet

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks jey I was hellpful. On Sat, May 31, 2014 at 12:45 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello there, On Fri, May 30, 2014

Failed to remove RDD error

2014-05-30 Thread Michael Chang
I'm running a some kafka streaming spark contexts (on 0.9.1), and they seem to be dying after 10 or so minutes with a lot of these errors. I can't really tell what's going on here, except that maybe the driver is unresponsive somehow? Has anyone seen this before? 14/05/31 01:13:30 ERROR

possible typos in spark 1.0 documentation

2014-05-30 Thread Yadid Ayzenberg
Congrats on the new 1.0 release. Amazing work ! It looks like there may some typos in the latest http://spark.apache.org/docs/latest/sql-programming-guide.html in the Running SQL on RDDs section when choosing the java example: 1. ctx is an instance of JavaSQLContext but the textFile method

Re: Yay for 1.0.0! EC2 Still has problems.

2014-05-30 Thread Patrick Wendell
Hi Jeremy, That's interesting, I don't think anyone has ever reported an issue running these scripts due to Python incompatibility, but they may require Python 2.7+. I regularly run them from the AWS Ubuntu 12.04 AMI... that might be a good place to start. But if there is a straightforward way to