RE: Session for connections?

2014-12-13 Thread Ashic Mahtab
Thanks for the response. The fact that they'll get killed when the sc is closed is quite useful in this case. I'm looking at a cluster of four workers trying to send messages to rabbitmq, which can have many sessions open without much penalty. For other stores (like say SQL) and larger

Re: Read data from SparkStreaming from Java socket.

2014-12-13 Thread Guillermo Ortiz
I got it, thanks,, a silly question,, why if I do: out.write(hello + System.currentTimeMillis() + \n); it doesn't detect anything and if I do out.println(hello + System.currentTimeMillis()); it works?? I'm doing with spark val errorLines = lines.filter(_.contains(hello)) 2014-12-13 8:12

Spark SQL Roadmap?

2014-12-13 Thread Xiaoyong Zhu
Dear spark experts, I am very interested in Spark SQL availability in the future - could someone share with me the information about the following questions? 1. Is there some ETAs for the Spark SQL release? 2. I heard there is a Hive on Spark program also - what's the difference

unread block data when reading from NFS

2014-12-13 Thread gtinside
Hi , I am trying to read a csv file in the following way : val csvData = sc.textFile(file:///tmp/sample.csv) csvData.collect().length This works file on spark-shell but when I try to do spark-submit of the jar, I get the following exceptions : java.lang.IllegalStateException: unread block data

Re: Spark SQL Roadmap?

2014-12-13 Thread Denny Lee
Hi Xiaoyong, SparkSQL has already been released and has been part of the Spark code-base since Spark 1.0. The latest stable release is Spark 1.1 (here's the Spark SQL Programming Guide http://spark.apache.org/docs/1.1.0/sql-programming-guide.html) and we're currently voting on Spark 1.2. Hive

Re: Including data nucleus tools

2014-12-13 Thread spark.dubovsky.jakub
So to answer my own question. It is a bug and there is unmerged PR for that already. https://issues.apache.org/jira/browse/SPARK-2624 https://github.com/apache/spark/pull/3238 Jakub -- Původní zpráva -- Od: spark.dubovsky.ja...@seznam.cz Komu: spark.dubovsky.ja...@seznam.cz

JSON Input files

2014-12-13 Thread Madabhattula Rajesh Kumar
Hi Team, I have a large JSON file in Hadoop. Could you please let me know 1. How to read the JSON file 2. How to parse the JSON file Please share any example program based on Scala Regards, Rajesh

Re: Error: Spark-streaming to Cassandra

2014-12-13 Thread Helena Edelson
I am curious why you use the 1.0.4 java artifact with the latest 1.1.0? This might be your compilation problem - The older java version. dependency groupIdcom.datastax.spark/groupId artifactIdspark-cassandra-connector_2.10/artifactId version1.1.0/version /dependency dependency

Re: JSON Input files

2014-12-13 Thread Helena Edelson
One solution can be found here: https://spark.apache.org/docs/1.1.0/sql-programming-guide.html#json-datasets - Helena @helenaedelson On Dec 13, 2014, at 11:18 AM, Madabhattula Rajesh Kumar mrajaf...@gmail.com wrote: Hi Team, I have a large JSON file in Hadoop. Could you please let me know

Building Desktop application for ALS-MlLib/ Training ALS

2014-12-13 Thread Saurabh Agrawal
Hi, I am a new bee in spark and scala world I have been trying to implement Collaborative filtering using MlLib supplied out of the box with Spark and Scala I have 2 problems 1. The best model was trained with rank = 20 and lambda = 5.0, and numIter = 10, and its RMSE on the

Building Desktop application for ALS-MlLib/ Training ALS

2014-12-13 Thread Saurabh Agrawal
Hi, I am a new bee in spark and scala world I have been trying to implement Collaborative filtering using MlLib supplied out of the box with Spark and Scala I have 2 problems 1. The best model was trained with rank = 20 and lambda = 5.0, and numIter = 10, and its RMSE on the

Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-13 Thread Josh Rosen
I've noticed that several users are attempting to post messages to Spark's user / dev mailing lists using the Nabble web UI ( http://apache-spark-user-list.1001560.n3.nabble.com/). However, there are many posts in Nabble that are not posted to the Apache lists and are flagged with This post has

RE: Spark SQL Roadmap?

2014-12-13 Thread Xiaoyong Zhu
Thanks Denny for your information! For #1, what I meant is the Spark SQL beta/official release date (as today it is still in alpha phase)… thought today I see it has most basic functionalities, I don’t know when will the next milestone happen? i.e. Beta? For #2, thanks for the information! I

Re: Spark SQL Roadmap?

2014-12-13 Thread Matei Zaharia
Spark SQL is already available, the reason for the alpha component label is that we are still tweaking some of the APIs so we have not yet guaranteed API stability for it. However, that is likely to happen soon (possibly 1.3). One of the major things added in Spark 1.2 was an external data

Re: unread block data when reading from NFS

2014-12-13 Thread Yana
Someone just posted a very similar question: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-IllegalStateException-unread-block-data-tt20668.html I ran into this a few weeks back -- I can't remember if my jar was built against a different version of spark or if I had accidentally

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-13 Thread Yana Kadiyska
Since you mentioned this, I had a related quandry recently -- it also says that the forum archives *u...@spark.incubator.apache.org u...@spark.incubator.apache.org/* *d...@spark.incubator.apache.org d...@spark.incubator.apache.org *respectively, yet the Community page clearly says to email the

Calling ALS-MlLib from desktop application/ Training ALS

2014-12-13 Thread Saurabh Agrawal
Requesting guidance on my queries in trail email. -Original Message- From: Saurabh Agrawal Sent: Saturday, December 13, 2014 07:06 PM GMT Standard Time To: user@spark.apache.org Subject: Building Desktop application for ALS-MlLib/ Training ALS Hi, I am a new bee in spark and

Re: Calling ALS-MlLib from desktop application/ Training ALS

2014-12-13 Thread Krishna Sankar
a) There is no absolute RSME - it depends on the domain. Also RSME is the error based on what you have seen so far, a snapshot of a slice of the domain. b) My suggestion is put the system in place, see what happens when users interact with the system and then you can think of reducing the RSME as

Re: Having problem with Spark streaming with Kinesis

2014-12-13 Thread A.K.M. Ashrafuzzaman
Thanks Aniket, The trick is to have the #workers = #shards + 1. But I don’t know why is that. http://spark.apache.org/docs/latest/streaming-kinesis-integration.html Here in the figure[spark streaming kinesis architecture], it seems like one node should be able to take on more than one shards.