Re: Using Cassandra as an input stream from Java

2013-12-05 Thread Pulasthi Supun Wickramasinghe
Hi Lucas, That did the trick just had to change JavaPairRDDByteBuffer, SortedMapByteBuffer, IColumn to JavaPairRDDByteBuffer,* ? extends * SortedMapByteBuffer, IColumn thanks for the help. Regards, Pulasthi On Thu, Dec 5, 2013 at 10:40 AM, Lucas Fernandes Brunialti lbrunia...@igcorp.com.br

Re: Persisting MatrixFactorizationModel

2013-12-05 Thread Aslan Bekirov
Thanks a lot Evan... On Wed, Dec 4, 2013 at 8:31 PM, Evan R. Sparks evan.spa...@gmail.comwrote: Ah, actually - I just remembered that the user and product features of the model are RDDs, so - you might be better off saving those components to HDFS and then at load time reading them back in

Re: How to balance task load

2013-12-05 Thread Andrew Ash
Hi Hao, Where tasks go is influenced by where the data they operate on resides. If the data is on one executor, it may make more sense to do all the computation on that node rather than ship data across the network. How was the data distributed across your cluster? Andrew On Mon, Dec 2, 2013

Re: How to balance task load

2013-12-05 Thread Hao REN
Hi Andrew, My data was loaded in HDFS. Actually, I got the answer from the spark-user google group. Patrick said: All cores in the cluster are considered fungible since the tasks are completely parallel. So until you run out of cores on any given node, it might get all the tasks. In some cases

Re: Bagel caching issues

2013-12-05 Thread huangjay
Hi, Maybe you need to check those nodes. It's very slow. 3487SUCCESS PROCESS_LOCAL ip-10-60-150-111.ec2.internal 2013/12/01 02:11:38 17.7 m 16.3 m 23.3 MB 3447SUCCESS PROCESS_LOCAL ip-10-12-54-63.ec2.internal 2013/12/01 02:11:26 20.1 m 13.9 m 50.9 MB 在

RE: Pre-build Spark for Windows 8.1

2013-12-05 Thread Adrian Bonar
Excellent! Thank you, Matei. From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Wednesday, December 4, 2013 4:26 PM To: user@spark.incubator.apache.org Subject: Re: Pre-build Spark for Windows 8.1 Hey Adrian, Ideally you shouldn't use Cygwin to run on Windows - use the .cmd scripts we

RE: Pre-build Spark for Windows 8.1

2013-12-05 Thread Adrian Bonar
The master starts up now as expected but the workers are unable to connect to the master. It looks like the master is refusing the connection messages but I'm not sure why. The first two error lines below are from trying to connect a worker from a separate machine and the last two error lines

Writing to HBase

2013-12-05 Thread Benjamin Kim
Does anyone have an example or some sort of starting point code when writing from Spark Streaming into HBase? We currently stream ad server event log data using Flume-NG to tail log entries, collect them, and put them directly into a HBase table. We would like to do the same with Spark

Re: Writing to HBase

2013-12-05 Thread Philip Ogren
Here's a good place to start: http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3ccacyzca3askwd-tujhqi1805bn7sctguaoruhd5xtxcsul1a...@mail.gmail.com%3E On 12/5/2013 10:18 AM, Benjamin Kim wrote: Does anyone have an example or some sort of starting point code when

Re: Bagel caching issues

2013-12-05 Thread Josh Rosen
The variability in task completion times could be caused by variability in the amount of work that those tasks perform rather than slow or faulty nodes. For PageRank, consider a link graph contains a few disproportionately popular webpages that have many inlinks (such as Yahoo.com). These

Re: takeSample() computation

2013-12-05 Thread Matei Zaharia
Hi Matt, Try using take() instead, which will only begin computing from the start of the RDD (first partition) if the number of elements you ask for is small. Note that if you’re doing any shuffle operations, like groupBy or sort, then the stages before that do have to be computed fully.

Re: takeSample() computation

2013-12-05 Thread Matt Cheah
Actually, we want the opposite – we want as much data to be computed as possible. It's only for benchmarking purposes, of course. -Matt Cheah From: Matei Zaharia matei.zaha...@gmail.commailto:matei.zaha...@gmail.com Reply-To:

Re: Pre-build Spark for Windows 8.1

2013-12-05 Thread Matei Zaharia
Hi, When you launch the worker, try using spark://ADRIBONA-DEV-1:7077 as the URL (uppercase instead of lowercase). Unfortunately Akka is very specific about seeing hostnames written in the same way on each node, or else it thinks the message is for another machine! Matei On Dec 5, 2013, at

Re: Pre-build Spark for Windows 8.1

2013-12-05 Thread Andrew Ash
Speaking of akka and host sensitivity... How much have you hacked on akka to get it to support all of: myhost.mydomain.int, myhost, and 10.1.1.1? It's kind of a pain to get the Spark URL to exactly match. I'm wondering if there are usability gains that could be made here or if we're pretty

Re: takeSample() computation

2013-12-05 Thread Matei Zaharia
Ah, got it. Then takeSample is going to do what you want, because it needs a uniform sample. If you don’t want any result at all, you can also use RDD.foreach() with an empty function. Matei On Dec 5, 2013, at 12:54 PM, Matt Cheah mch...@palantir.com wrote: Actually, we want the opposite –

RE: Pre-build Spark for Windows 8.1

2013-12-05 Thread Adrian Bonar
Strange, but that definitely did the trick. Thanks again! From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Thursday, December 5, 2013 2:44 PM To: user@spark.incubator.apache.org Subject: Re: Pre-build Spark for Windows 8.1 Hi, When you launch the worker, try using

Re: Spark heap issues

2013-12-05 Thread purav aggarwal
Try allocating some more resources to your application. You seem to be using 512Mb for you worker node - (you can verify that from the master UI) Try putting the following settings into your code and see if it helps - System.setProperty(spark.executor.memory,15g) // Will allocate more memory