Bulk insert strategy

2015-03-07 Thread A.K.M. Ashrafuzzaman
While processing DStream in the Spark Programming Guide, the suggested usage of connection is the following, dstream.foreachRDD(rdd = { rdd.foreachPartition(partitionOfRecords = { // ConnectionPool is a static, lazily initialized pool of connections val connection =

Re: How to reuse a ML trained model?

2015-03-07 Thread Xi Shen
Ah~it is serializable. Thanks! On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy ekremak...@gmail.com wrote: You can serialize your trained model to persist somewhere. Ekrem Aksoy On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen davidshe...@gmail.com wrote: Hi, I checked a few ML algorithms in

Re: Spark streaming and executor object reusage

2015-03-07 Thread Jean-Pascal Billaud
Thanks Sean this is really helpful. Please see comments line. Sent from my iPad On Mar 7, 2015, at 4:45 AM, Sean Owen so...@cloudera.com wrote: In the example with createNewConnection(), a connection is created for every partition of every batch of input. You could take the idea further

Re: Spark streaming and executor object reusage

2015-03-07 Thread Jean-Pascal Billaud
Thanks a lot. Sent from my iPad On Mar 7, 2015, at 8:26 AM, Sean Owen so...@cloudera.com wrote: On Sat, Mar 7, 2015 at 4:17 PM, Jean-Pascal Billaud j...@tellapart.com wrote: So given this let's go a bit further. Imagine my static factory provides a stats collector that my various map()

Re: How to reuse a ML trained model?

2015-03-07 Thread Burak Yavuz
Hi, There is model import/export for some of the ML algorithms on the current master (and they'll be shipped with the 1.3 release). Burak On Mar 7, 2015 4:17 AM, Xi Shen davidshe...@gmail.com wrote: Wait...it seem SparkContext does not provide a way to save/load object files. It can only

Re: Spark streaming and executor object reusage

2015-03-07 Thread Sean Owen
On Sat, Mar 7, 2015 at 4:17 PM, Jean-Pascal Billaud j...@tellapart.com wrote: So given this let's go a bit further. Imagine my static factory provides a stats collector that my various map() code would use to export some metrics while mapping tuples. This stats collector comes with a timer

Re: How to reuse a ML trained model?

2015-03-07 Thread Ekrem Aksoy
You can serialize your trained model to persist somewhere. Ekrem Aksoy On Sat, Mar 7, 2015 at 12:10 PM, Xi Shen davidshe...@gmail.com wrote: Hi, I checked a few ML algorithms in MLLib.

How to reuse a ML trained model?

2015-03-07 Thread Xi Shen
Hi, I checked a few ML algorithms in MLLib. https://spark.apache.org/docs/0.8.1/api/mllib/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel I could not find a way to save the trained model. Does this means I have to train my model every time? Is there a more economic way

Re: How to reuse a ML trained model?

2015-03-07 Thread Xi Shen
Wait...it seem SparkContext does not provide a way to save/load object files. It can only save/load RDD. What do I missed here? Thanks, David On Sat, Mar 7, 2015 at 11:05 PM Xi Shen davidshe...@gmail.com wrote: Ah~it is serializable. Thanks! On Sat, Mar 7, 2015 at 10:59 PM Ekrem Aksoy

Re: Spark streaming and executor object reusage

2015-03-07 Thread Sean Owen
In the example with createNewConnection(), a connection is created for every partition of every batch of input. You could take the idea further and share connections across partitions or batches. This requires them to have a lifecycle beyond foreachRDD. That's accomplishable with some kind of

MLlib/kmeans newbie question(s)

2015-03-07 Thread Pierce Lamb
Hi all, I'm very new to machine learning algorithms and Spark. I'm follow the Twitter Streaming Language Classifier found here: http://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/README.html Specifically this code:

Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
Did you get this to work? I got pass the issues with the cluster not startetd problem I am having problem where distcp with s3 URI says incorrect forlder path and s3n:// hangs. stuck for 2 days :( Thanks -R -- View this message in context:

Re: Connection PHP application to Spark Sql thrift server

2015-03-07 Thread أنس الليثي
Sorry for late reply. I have tried to connect to the hive server instead of the spark sql but the same exception is thrown in the hive server logs. The only difference is the hive log has a little more information than the spark sql logs. The hive server logs has this message TTransportException