...@sigmoidanalytics.com
wrote:
Does this speed up?
val rdd = sc.parallelize(1 to 100*, 30)*
rdd.count
Thanks
Best Regards
On Wed, Apr 29, 2015 at 1:47 AM, Anshul Singhle ans...@betaglide.com
wrote:
Hi,
I'm running the following code in my cluster (standalone mode) via spark
shell -
val rdd
Do you have multiple disks? Maybe your work directory is not in the right
disk?
On Wed, Apr 29, 2015 at 4:43 PM, Selim Namsi selim.na...@gmail.com wrote:
Hi,
I'm using spark (1.3.1) MLlib to run random forest algorithm on tfidf
output,the training data is a file containing 156060 (size
yes
On 29 Apr 2015 03:31, ayan guha guha.a...@gmail.com wrote:
Are your driver running on the same m/c as master?
On 29 Apr 2015 03:59, Anshul Singhle ans...@betaglide.com wrote:
Hi,
I'm running short spark jobs on rdds cached in memory. I'm also using a
long running job context. I want
Hi,
I'm running the following code in my cluster (standalone mode) via spark
shell -
val rdd = sc.parallelize(1 to 100)
rdd.count
This takes around 1.2s to run.
Is this expected or am I configuring something wrong?
I'm using about 30 cores with 512MB executor memory
As expected, GC time is
Hi,
I'm running short spark jobs on rdds cached in memory. I'm also using a
long running job context. I want to be able to complete my jobs (on the
cached rdd) in under 1 sec.
I'm getting the following job times with about 15 GB of data distributed
across 6 nodes. Each executor has about 20GB of
Hi firemonk9,
What you're doing looks interesting. Can you share some more details?
Are you running the same spark context for each job, or are you running a
seperate spark context for each job?
Does your system need sharing of rdd's across multiple jobs? If yes, how do
you implement that?
Also
Hi,
I'm reading data stored in S3 and aggregating and storing it in Cassandra
using a spark job.
When I run the job with approx 3Mil records (about 3-4 GB of data) stored
in text files, I get the following error:
(11529/14925)15/04/10 19:32:43 INFO TaskSetManager: Starting task 11609.0
in