Model Save function (ML-Lib)

2015-07-17 Thread Guillaume Guy
SVMModel NO What is the recommended route to save a logistic regression or SVM ? I tried to pickle the SVM but it failed at loading it back. Any advice appreciated. Thanks ! Best, Guillaume Guy * +1 919 - 972 - 8750*

Re: Speed Benchmark

2015-03-04 Thread Guillaume Guy
Sorry for the confusion. All are running Hadoop services. Node 1 is the namenode whereas Nodes 2 and 3 are datanodes. Best, Guillaume Guy * +1 919 - 972 - 8750* On Sat, Feb 28, 2015 at 1:09 AM, Sean Owen so...@cloudera.com wrote: Is machine 1 the only one running an HDFS data node? You

Speed Benchmark

2015-02-27 Thread Guillaume Guy
Dear Spark users: I want to see if anyone has an idea of the performance for a small cluster. Reading from HDFS, what should be the performance of a count() operation on an 10GB RDD with 100M rows using pyspark. I looked into the CPU usage, all 6 are at 100%. Details: - master yarn-client

Re: Speed Benchmark

2015-02-27 Thread Guillaume Guy
, Guillaume Guy * +1 919 - 972 - 8750* On Fri, Feb 27, 2015 at 9:06 AM, Jason Bell jaseb...@gmail.com wrote: How many machines are on the cluster? And what is the configuration of those machines (Cores/RAM)? Small cluster is very subjective statement. Guillaume Guy wrote: Dear Spark users

Re: Speed Benchmark

2015-02-27 Thread Guillaume Guy
Hi Sean: Thanks for your feedback. Scala is much faster. The count is performed in ~1 minutes (vs 17min). I would expect scala to be 2-5X faster but this gap seems to be more than that. Is that also your conclusion? Thanks. Best, Guillaume Guy * +1 919 - 972 - 8750* On Fri, Feb 27, 2015

Re: Speed Benchmark

2015-02-27 Thread Guillaume Guy
it very slow, see https://issues.apache.org/jira/browse/SPARK-6055, will be fixed very soon. Davies On Fri, Feb 27, 2015 at 1:59 PM, Guillaume Guy guillaume.c@gmail.com javascript:; wrote: Hi Sean: Thanks for your feedback. Scala is much faster. The count is performed in ~1 minutes

Re: Spark can't pickle class: error cannot lookup attribute

2015-02-19 Thread Guillaume Guy
Thanks Davies and Eric. I followed Davies' instructions and it works wonderful. I would add that you can also add these scripts in the pyspark shell too: pyspark --py-files support.py where support.py is your script containing your class as Davies described. Best, Guillaume Guy * +1 919

Spark can't pickle class: error cannot lookup attribute

2015-02-18 Thread Guillaume Guy
Hi, This is a duplicate of the stack-overflow question here http://stackoverflow.com/questions/28569374/spark-returning-pickle-error-cannot-lookup-attribute. I hope to generate more interest on this mailing list. *The problem:* I am running into some attribute lookup problems when trying to