Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014
Done. -- Olivier
Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014
2013/12/4 Josh Rosen rosenvi...@gmail.com: Thanks for organizing this! I'll definitely be attending. Great. Looking forward to meet you to. Uri, you might want to register as well on the wiki :) -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014
2013/12/3 Horia ho...@alum.berkeley.edu: I am very interested in this and will most definitely participate! Please share the event sign-up list and location details when all the organizational hurdles have been resolved :-) Great! I just created a new entry for this sprint on the scikit-learn wiki: https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events#scikit-learn--pyspark-integration-sprint---friday-14-february-2014 Please feel free to register there. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014
That should be fixed but only if nobody clicks on the previous URL... Use the following instead: https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events That's a weird github bug... -- Olivier
Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators
013/11/27 Nick Pentreath nick.pentre...@gmail.com: CC'ing Spark Dev list I have been thinking about this for quite a while and would really love to see this happen. Most of my pipeline ends up in Scala/Spark these days - which I love, but it is partly because I am reliant on custom Hadoop input formats that are just way easier to use from Scala/Java - but I still use Python a lot for data analysis and interactive work. There is some good stuff happening with Breeze in Scala and MLlib in Spark (and IScala) but the breadth just doesn't compare as yet - not to mention IPython and plotting! There is a PR that was just merged into PySpark to allow arbitrary serialization protocols between the Java and Python layers. I hope to try to use this to allow PySpark users to pull data from arbitrary Hadoop InputFormats with minimum fuss. This I believe will open the way for many (including me!) to use PySpark directly for virtually all distributed data processing without needing to use Java (https://github.com/apache/incubator-spark/pull/146) (http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201311.mbox/browser). This is very interesting, thanks for the heads up. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel