Re: Tachyon in Spark

2014-12-12 Thread Jun Feng Liu
I think the linage is the key feature of tachyon to reproduce the RDD when any error happen. Otherwise, there have to be some data replica among tachyon nodes to ensure the data redundancy for fault tolerant - I think tachyon is avoiding to go to this path. Dose it mean the off-heap solution

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
reminder: jenkins is going down NOW. On Thu, Dec 11, 2014 at 3:08 PM, shane knapp skn...@berkeley.edu wrote: here's the plan... reboots, of course, come last. :) pause build queue at 7am, kill off (and eventually retrigger) any stragglers at 8am. then begin maintenance: all systems: *

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
downtime is extended to 10am PST so that i can finish testing the numpy upgrade... besides that, everything looks good and the system updates and reboots went off w/o a hitch. shane On Fri, Dec 12, 2014 at 7:26 AM, shane knapp skn...@berkeley.edu wrote: reminder: jenkins is going down NOW.

Re: jenkins downtime: 730-930am, 12/12/14

2014-12-12 Thread shane knapp
ok, we're back up w/all new jenkins workers. i'll be keeping an eye on these pretty closely today for any build failures caused by the new systems, and if things look bleak, i'll switch back to the original five. thanks for your patience! On Fri, Dec 12, 2014 at 8:47 AM, shane knapp

Re: zinc invocation examples

2014-12-12 Thread Patrick Wendell
Hey York - I'm sending some feedback off-list, feel free to open a PR as well. On Tue, Dec 9, 2014 at 12:05 PM, York, Brennon brennon.y...@capitalone.com wrote: Patrick, I¹ve nearly completed a basic build out for the SPARK-4501 issue (at https://github.com/brennonyork/spark/tree/SPARK-4501)

CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Hi Xiangrui, It seems that it's stateless so will be hard to implement regularization path. Any suggestion to extend it? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Newest ML-Lib on Spark 1.1

2014-12-12 Thread Ganelin, Ilya
Hi all – we’re running CDH 5.2 and would be interested in having the latest and greatest ML Lib version on our cluster (with YARN). Could anyone help me out in terms of figuring out what build profiles to use to get this to play well? Will I be able to update ML-Lib independently of updating

Re: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Debasish Das
For CDH this works well for me...tested till 5.1... ./make-distribution -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -Phive -DskipTests To build with hive thriftserver support for spark-sql On Fri, Dec 12, 2014 at 1:41 PM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: Hi all – we’re

Re: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Sean Owen
Could you specify what problems you're seeing? there is nothing special about the CDH distribution at all. The latest and greatest is 1.1, and that is what is in CDH 5.2. You can certainly compile even master for CDH and get it to work though. The safest build flags should be -Phadoop-2.4

IBM open-sources Spark Kernel

2014-12-12 Thread Robert C Senkbeil
We are happy to announce a developer preview of the Spark Kernel which enables remote applications to dynamically interact with Spark. You can think of the Spark Kernel as a remote Spark Shell that uses the IPython notebook interface to provide a common entrypoint for any application. The Spark

RE: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Ganelin, Ilya
Hi Sean - I should clarify : I was able to build the master but when running I hit really random looking protobuf errors (just starting up a spark shell), I can try doing a build later today and give the exact stack trace. I know that 5.2 is running 1.1 but I believe the latest and greatest Ml

Re: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Sean Owen
What errors do you see? protobuf errors usually mean you didn't build for the right version of Hadoop, but if you are using -Phadoop-2.3 or better -Phadoop-2.4 that should be fine. Yes, a stack trace would be good. I'm still not sure what error you are seeing. On Fri, Dec 12, 2014 at 10:32 PM,

Re: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Debasish Das
protobuf comes from missing -Phadoop2.3 On Fri, Dec 12, 2014 at 2:34 PM, Sean Owen so...@cloudera.com wrote: What errors do you see? protobuf errors usually mean you didn't build for the right version of Hadoop, but if you are using -Phadoop-2.3 or better -Phadoop-2.4 that should be fine.

Re: CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[M] can be overwritten to implement regularization path. Correct me if I'm wrong. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn:

Re: IBM open-sources Spark Kernel

2014-12-12 Thread Robert C Senkbeil
Hi Sam, We developed the Spark Kernel with a focus on the newest version of the IPython message protocol (5.0) for the upcoming IPython 3.0 release. We are building around Apache Spark's REPL, which is used in the current Spark Shell implementation. The Spark Kernel was designed to be

one hot encoding

2014-12-12 Thread Lochana Menikarachchi
Do we have one-hot encoding in spark MLLib 1.1.1 or 1.2.0 ? It wasn't available in 1.1.0. Thanks. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-12 Thread Josh Rosen
+1.  Tested using spark-perf and the Spark EC2 scripts.  I didn’t notice any performance regressions that could not be attributed to changes of default configurations.  To be more specific, when running Spark 1.2.0 with the Spark 1.1.0 settings of spark.shuffle.manager=hash and

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-12 Thread Mark Hamstra
+1 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice any performance regressions that could not be attributed to changes of default configurations. To be more specific, when running Spark 1.2.0

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-12 Thread Denny Lee
+1 Tested on OSX Tested Scala 2.10.3, SparkSQL with Hive 0.12 / Hadoop 2.5, Thrift Server, MLLib SVD On Fri Dec 12 2014 at 8:57:16 PM Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested using spark-perf and