Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-12 Thread Olivier Grisel
Done.

-- 
Olivier


Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-05 Thread Olivier Grisel
2013/12/4 Josh Rosen rosenvi...@gmail.com:
 Thanks for organizing this!  I'll definitely be attending.

Great. Looking forward to meet you to.

Uri, you might want to register as well on the wiki :)

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-04 Thread Olivier Grisel
2013/12/3 Horia ho...@alum.berkeley.edu:
 I am very interested in this and will most definitely participate!

 Please share the event sign-up list and location details when all the
 organizational hurdles have been resolved :-)

Great! I just created a new entry for this sprint on the scikit-learn wiki:

  
https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#scikit-learn--pyspark-integration-sprint---friday-14-february-2014

Please feel free to register there.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-04 Thread Olivier Grisel
That should be fixed but only if nobody clicks on the previous URL...
Use the following instead:

https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events

That's a weird github bug...

-- 
Olivier


Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators

2013-11-27 Thread Olivier Grisel
013/11/27 Nick Pentreath nick.pentre...@gmail.com:
 CC'ing Spark Dev list

 I have been thinking about this for quite a while and would really love to
 see this happen.

 Most of my pipeline ends up in Scala/Spark these days - which I love, but it
 is partly because I am reliant on custom Hadoop input formats that are just
 way easier to use from Scala/Java - but I still use Python a lot for data
 analysis and interactive work. There is some good stuff happening with
 Breeze in Scala and MLlib in Spark (and IScala) but the breadth just doesn't
 compare as yet - not to mention IPython and plotting!

 There is a PR that was just merged into PySpark to allow arbitrary
 serialization protocols between the Java and Python layers. I hope to try to
 use this to allow PySpark users to pull data from arbitrary Hadoop
 InputFormats with minimum fuss. This I believe will open the way for many
 (including me!) to use PySpark directly for virtually all distributed data
 processing without needing to use Java
 (https://github.com/apache/incubator-spark/pull/146)
 (http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201311.mbox/browser).

This is very interesting, thanks for the heads up.


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel