about spark interactive shell

2014-05-12 Thread fengshen
i email to user list,but no body relpy me. so, i email to this. i hope relpy I am now using spark in production. but I notice spark driver including rdd and dag... and the executors will try to register with the driver. but in my company the executors do not register with the client because

Any ideas on SPARK-1021?

2014-05-12 Thread Mark Hamstra
I'm trying to decide whether attacking the underlying issue of RangePartitioner running eager jobs in rangeBounds (i.e. SPARK-1021) is a better option than a messy workaround for some async job-handling stuff that I am working on. It looks like there have been a couple of aborted attempts to

Re: LabeledPoint dump LibSVM if SparseVector

2014-05-12 Thread Xiangrui Meng
Hi Deb, There is a saveAsLibSVMFile in MLUtils now. Also, I submitted a PR for standardizing text format of vectors and labeled point: https://github.com/apache/spark/pull/685 Best, Xiangrui On Sun, May 11, 2014 at 9:40 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, I need to change

Re: mllib vector templates

2014-05-12 Thread Debasish Das
Hi, I see ALS is still using Array[Int] but for other mllib algorithm we moved to Vector[Double] so that it can support either dense and sparse formats... I know ALS can stay in Array[Int] due to the Netflix format for input datasets which is well defined but it helps if we move ALS to

Re: Bug is KryoSerializer under Mesos [work-around included]

2014-05-12 Thread Matei Zaharia
Hey Soren, are you sure that the JAR you used on the executors is for the right version of Spark? Maybe they’re running an older version. The Kryo serializer should be initialized the same way on both. Matei On May 12, 2014, at 10:39 AM, Soren Macbeth so...@yieldbot.com wrote: I finally

Re: Spark on Scala 2.11

2014-05-12 Thread Anand Avati
Matei, Thanks for confirming. I was looking specifically at the REPL part and how it can be significantly simplified with 2.11 Scala, without having to inherit a full copy of a refactored repl inside Spark. I am happy to investigate/contribute a simpler 2.11 based REPL if this is were seen as a

Kryo not default?

2014-05-12 Thread Anand Avati
Hi, Can someone share the reason why Kryo serializer is not the default? Is there anything to be careful about (because of which it is not enabled by default)? Thanks!

Re: [EC2] r3 instance type

2014-05-12 Thread Shivaram Venkataraman
I ran into this a couple of days back as well. Yes, we need to check if /dev/xvdb is formatted and if not create xfs or some such filesystem on it. We will need to change the deployment script and you can do that (similar to EBS volumes) at https://github.com/mesos/spark-ec2/blob/v2/setup-slave.sh

Bug is KryoSerializer under Mesos [work-around included]

2014-05-12 Thread Soren Macbeth
I finally managed to track down the source of the kryo issues that I was having under mesos. What happens is the for a reason that I haven't tracked down yet, a handful of the scala collection classes from chill-scala down get registered by the mesos executors, but they do all get registered in

Re: Spark on Scala 2.11

2014-05-12 Thread Jacek Laskowski
On Sun, May 11, 2014 at 11:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: We do want to support it eventually, possibly as early as Spark 1.1 (which we’d cross-build on Scala 2.10 and 2.11). If someone wants to look at it before, feel free to do so! Scala 2.11 is very close to 2.10 so I

Re: Kryo not default?

2014-05-12 Thread Matei Zaharia
It was just because it might not work with some user data types that are Serializable. But we should investigate it, as it’s the easiest thing one can enable to improve performance. Matei On May 12, 2014, at 2:47 PM, Anand Avati av...@gluster.org wrote: Hi, Can someone share the reason why

Re: Kryo not default?

2014-05-12 Thread Andrew Ash
As an example of where it sometimes doesn't work, in older versions of Kryo / Chill the Joda LocalDate class didn't serialize properly -- https://groups.google.com/forum/#!topic/cascalog-user/35cdnNIamKU On Mon, May 12, 2014 at 4:39 PM, Reynold Xin r...@databricks.com wrote: The main reason is

Re: Spark on Scala 2.11

2014-05-12 Thread Jacek Laskowski
Thanks a lot! Jacek On Tue, May 13, 2014 at 1:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Anyone can actually open a JIRA on https://issues.apache.org/jira/browse/SPARK. I’ve created one for this now: https://issues.apache.org/jira/browse/SPARK-1812. Matei On May 12, 2014, at

Re: Spark on Scala 2.11

2014-05-12 Thread Anand Avati
On Mon, May 12, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.comwrote: We can build the REPL separately for each version of Scala, or even give that package a different name in Scala 2.11. OK. Scala 2.11’s REPL actually added two flags, -Yrepl-class-based and -Yrepl-outdir, that

Preliminary Parquet numbers and including .count() in Catalyst

2014-05-12 Thread Andrew Ash
Hi Spark devs, First of all, huge congrats on the parquet integration with SparkSQL! This is an incredible direction forward and something I can see being very broadly useful. I was doing some preliminary tests to see how it works with one of my workflows, and wanted to share some numbers that