I got another report about this recently, and figured out that it's
caused by having different versions of python in driver and YARN:
http://stackoverflow.com/questions/28879803/spark-runs-in-local-but-not-in-yarn/28931934#28931934
Created JIRA:
Cycling related bits:
http://search-hadoop.com/m/LgpTk2DLMvc
On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan nasirkhan.onl...@gmail.com
wrote:
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?
I will be
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?
I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)
If some body can fill up this answers, that will be great! Thanks
*Cores *=
Yes so that brings me to another question. How do I do a batch insert from
worker?
In prod we are planning to put a 3 shared kinesis. So the number of
partitions should be 3. Right?
On Mar 8, 2015 8:57 PM, Ted Yu yuzhih...@gmail.com wrote:
What's the expected number of partitions in your use
Mostly, when you use different versions of jars, it will throw up
incompatible version errors.
Thanks
Best Regards
On Fri, Mar 6, 2015 at 7:38 PM, Zsolt Tóth toth.zsolt@gmail.com wrote:
Hi,
I submit spark jobs in yarn-cluster mode remotely from java code by
calling
You could do it like this:
val transformedFileAndTime = fileAndTime.transformWith(anomaly, (rdd1:
RDD[(String,String)], rdd2 : RDD[Int]) = {
var first
= ; var second = ; var third = 0
Did you follow these steps? https://wiki.apache.org/hadoop/AmazonS3 Also
make sure your jobtracker/mapreduce processes are running fine.
Thanks
Best Regards
On Sun, Mar 8, 2015 at 7:32 AM, roni roni.epi...@gmail.com wrote:
Did you get this to work?
I got pass the issues with the cluster not
Hi,
I would like to share a RDD in several Spark Applications,
i.e, create one in application A, publish the ID somewhere and get the RDD back
directly using ID in Application B.
I know I can use Tachyon just as a filesystem and
s.saveAsTextFile(tachyon://localhost:19998/Y”) like this.
But
errr...do you have any suggestions for me before 1.3 release?
I can't believe there's no ML model serialize method in Spark. I think
training the models are quite expensive, isn't it?
Thanks,
David
On Sun, Mar 8, 2015 at 5:14 AM Burak Yavuz brk...@gmail.com wrote:
Hi,
There is model
No woder I had out of memory issue before…
I doubt if we really need such configuration on production level…
Best regards,
Cui Lin
From: Krishna Sankar ksanka...@gmail.commailto:ksanka...@gmail.com
Date: Sunday, March 8, 2015 at 3:27 PM
To: Nasir Khan
Without knowing the data size, computation storage requirements ... :
- Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine.
Probably 5-10 machines.
- Don't go for the most exotic machines, otoh don't go for cheapest ones
either.
- Find a sweet spot with your
Hi,
It still don't work.
Is there any success instruction about how to pass a date to a hql script?
Alcaid
2015-03-07 2:43 GMT+08:00 Zhan Zhang zzh...@hortonworks.com:
Do you mean “--hiveConf” (two dash) , instead of -hiveconf (one dash)
Thanks.
Zhan Zhang
On Mar 6, 2015, at 4:20 AM,
You dont need SparkContext to simply serialize and deserialize objects. It
is Java mechanism.
On Mar 8, 2015 10:29 AM, Xi Shen davidshe...@gmail.com wrote:
errr...do you have any suggestions for me before 1.3 release?
I can't believe there's no ML model serialize method in Spark. I think
You may also take a look at PredictionIO, which can persist and then deploy
MLlib models as web services.
Simon
On Sunday, March 8, 2015, Sean Owen so...@cloudera.com wrote:
You dont need SparkContext to simply serialize and deserialize objects. It
is Java mechanism.
On Mar 8, 2015 10:29 AM,
Hi,
We are designing a solution which pulls file paths from Kafka and for the
current stage just counts the lines in each of these files.
When running the code it fails on:
Exception in thread main org.apache.spark.SparkException: Task not
serializable
at
Hi,
We are designing a solution which pulls file paths from Kafka and for the
current stage just counts the lines in each of these files.
When running the code it fails on:
Exception in thread main org.apache.spark.SparkException: Task not
serializable
at
What's the expected number of partitions in your use case ?
Have you thought of doing batching in the workers ?
Cheers
On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman
ashrafuzzaman...@gmail.com wrote:
While processing DStream in the Spark Programming Guide, the suggested
usage of
Hi,
We are designing a solution which pulls file paths from Kafka and for the
current stage just counts the lines in each of these files.
When running the code it fails on:
Exception in thread main org.apache.spark.SparkException: Task not
serializable
at
Yes, you can never use the SparkContext inside a remote function. It
is on the driver only.
On Sun, Mar 8, 2015 at 4:22 PM, Daniel Haviv
daniel.ha...@veracity-group.com wrote:
Hi,
We are designing a solution which pulls file paths from Kafka and for the
current stage just counts the lines in
Hello.
I create program, collaborative filtering using Spark,
but I have trouble with calculating speed.
I want to implement recommendation program using ALS (MLlib),
which is another process from Spark.
But access speed of MatrixFactorizationModel object on HDFS is slow,
so I want to cache it,
20 matches
Mail list logo