Introducing spark-sklearn, a scikit-learn integration package for Spark

2016-02-10 Thread Tim Hunter
Hello community, Joseph and I would like to introduce a new Spark package that should be useful for python users that depend on scikit-learn. Among other tools: - train and evaluate multiple scikit-learn models in parallel. - convert Spark's Dataframes seamlessly into numpy arrays -

Re: [build system] brief downtime, 8am PST thursday feb 10th

2016-02-10 Thread shane knapp
reminder: this is happening tomorrow morning. On Mon, Feb 8, 2016 at 9:27 AM, shane knapp wrote: > happy monday! > > i will be bringing down jenkins and the workers thursday morning to > upgrade docker on all of the workers from 1.5.0-1 to 1.7.1-2. > > as of december last

Re: map-side-combine in Spark SQL

2016-02-10 Thread Reynold Xin
I'm not 100% sure I understand your question, but yes, Spark (both the RDD API and SQL/DataFrame) does partial aggregation. On Tue, Feb 9, 2016 at 8:37 PM, Rishitesh Mishra wrote: > Can anybody confirm, whether ANY operator in Spark SQL uses > map-side-combine ? If

Re: Spark Job on YARN accessing Hbase Table

2016-02-10 Thread Prabhu Joseph
Yes Ted, spark.executor.extraClassPath will work if hbase client jars is present in all Spark Worker / NodeManager machines. spark.yarn.dist.files is the easier way, as hbase client jars can be copied from driver machine or hdfs into container / spark-executor classpath automatically. No need to

Re: Spark Job on YARN accessing Hbase Table

2016-02-10 Thread Prabhu Joseph
+ Spark-Dev For a Spark job on YARN accessing hbase table, added all hbase client jars into spark.yarn.dist.files, NodeManager when launching container i.e executor, does localization and brings all hbase-client jars into executor CWD, but still the executor tasks fail with ClassNotFoundException