, Mesos) or LOCAL_DIRS (YARN)
On Sat, Mar 14, 2015 at 5:29 PM, Peng Xia sparkpeng...@gmail.com
wrote:
Hi Sean,
Thank very much for your reply.
I tried to config it from below code:
sf = SparkConf().setAppName(test).set(spark.executor.memory,
45g).set(spark.cores.max, 62),set
+scale+
On Mar 31, 2015, at 4:43 AM, Peng Xia sparkpeng...@gmail.com wrote:
Hi,
I have a RDD (rdd1)where each line is split into an array [a, b,
c], etc.
And I also have a local dictionary p (dict1) stores key value pair
{a:1, b: 2, c:3}
I want to replace the keys in the rdd
14, 2015 at 2:10 AM, Peng Xia sparkpeng...@gmail.com wrote:
Hi
I was running a logistic regression algorithm on a 8 nodes spark cluster,
each node has 8 cores and 56 GB Ram (each node is running a windows
system).
And the spark installation driver has 1.9 TB capacity. The dataset I
And I have 2 TB free space on C driver.
On Sat, Mar 14, 2015 at 8:29 PM, Peng Xia sparkpeng...@gmail.com wrote:
Hi Sean,
Thank very much for your reply.
I tried to config it from below code:
sf = SparkConf().setAppName(test).set(spark.executor.memory,
45g).set(spark.cores.max, 62),set
Hi
I was running a logistic regression algorithm on a 8 nodes spark cluster,
each node has 8 cores and 56 GB Ram (each node is running a windows
system). And the spark installation driver has 1.9 TB capacity. The dataset
I was training on are has around 40 million records with around 6600
algorithm in python.
3. train a logistic regression model with the converted labeled points.
Can any one give some advice for how to avoid the 2gb, if this is the cause?
Thanks very much for the help.
Best,
Peng
On Mon, Mar 9, 2015 at 3:54 PM, Peng Xia sparkpeng...@gmail.com wrote:
Hi,
I
Hi,
I was launching a spark cluster with 4 work nodes, each work nodes contains
8 cores and 56gb ram, and I was testing my logistic regression problem.
The training set is around 1.2 million records.When I was using 2**10
(1024) features, the whole program works fine, but when I use 2**14
Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600
mb), it takes more than 25 minutes to finish.
The spark version we are using is 1.0 and we were running this program on a
4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
The 5 million records only have two
and taking
awhile...
My guess it's a distinct function on the data.
J
Sent from my iPhone
On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote:
Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600
mb), it takes more than 25 minutes to finish
. -Xiangrui
On Thu, Oct 30, 2014 at 11:44 AM, peng xia toxiap...@gmail.com wrote:
Thanks for all your help.
I think I didn't cache the data. My previous cluster was expired and I
don't
have a chance to check the load balance or app manager.
Below is my code.
There are 18 features for each
Thanks Jimmy.
I will have a try.
Thanks very much for your guys' help.
Best,
Peng
On Thu, Oct 30, 2014 at 8:19 PM, Jimmy ji...@sellpoints.com wrote:
sampleRDD. cache()
Sent from my iPhone
On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote:
Hi Xiangrui,
Can you give me some
11 matches
Mail list logo