Re: Hanging tasks in spark 1.2.1 while working with 1.1.1

2015-03-17 Thread Dmitriy Lyubimov
FWIW observed similar behavior in similar situation. Was able to work around by forcefully committing one of the rdds right before the union into cache, and forcing that by executing take(1). Nothing else ever helped. Seems like yet-undiscovered 1.2.x thing. On Tue, Mar 17, 2015 at 4:21 PM,

Task result deserialization error (1.1.0)

2015-01-20 Thread Dmitriy Lyubimov
Hi, I am getting task result deserialization error (kryo is enabled). Is it some sort of `chill` registration issue at front end? This is application that lists spark as maven dependency (so it gets correct hadoop and chill dependencies in classpath, i checked). Thanks in advance. 15/01/20

Re: Upgrade to Spark 1.1.0?

2014-10-20 Thread Dmitriy Lyubimov
Mahout context does not include _all_ possible transitive dependencies. Would not be lighting fast to take all legacy etc. dependencies. There's an ignored unit test that asserts context path correctness. you can uningnore it and run to verify it still works as ex[ected.The reason it is set to

Re: Spark QL and protobuf schema

2014-08-21 Thread Dmitriy Lyubimov
the applySchema method on SparkContext. Would be great if you could contribute this back. On Wed, Aug 20, 2014 at 5:57 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hello, is there any known work to adapt protobuf schema to Spark QL data sourcing? If not, would it present interest

Re: MLLib : Math on Vector and Matrix

2014-07-03 Thread Dmitriy Lyubimov
On Wed, Jul 2, 2014 at 11:40 PM, Xiangrui Meng men...@gmail.com wrote: Hi Dmitriy, It is sweet to have the bindings, but it is very easy to downgrade the performance with them. The BLAS/LAPACK APIs have been there for more than 20 years and they are still the top choice for high-performance

Re: MLLib : Math on Vector and Matrix

2014-07-02 Thread Dmitriy Lyubimov
in my humble opinion Spark should've supported linalg a-la [1] before it even started dumping methodologies into mllib. [1] http://mahout.apache.org/users/sparkbindings/home.html On Wed, Jul 2, 2014 at 2:16 PM, Thunder Stumpges thunder.stump...@gmail.com wrote: Thanks. I always hate having

Re: Why Scala?

2014-05-29 Thread Dmitriy Lyubimov
There were few known concerns about Scala, and some still are, but having been doing Scala professionally over two years now, i learned to master and appreciate the advanatages. Major concern IMO is Scala in a less-than-scrupulous corporate environment. First, Scala requires significantly more

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
PS spark shell with all proper imports are also supported natively in Mahout (mahout spark-shell command). See M-1489 for specifics. There's also a tutorial somewhere but i suspect it has not been yet finished/publised via public link yet. Again, you need trunk to use spark shell there. On Wed,

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
PPS The shell/spark tutorial i've mentioned is actually being developed in MAHOUT-1542. As it stands, i believe it is now complete in its core. On Wed, May 14, 2014 at 5:48 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: PS spark shell with all proper imports are also supported natively

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Mahout now supports doing its distributed linalg natively on Spark so the problem of sequence file input load into Spark is already solved there (trunk, http://mahout.apache.org/users/sparkbindings/home.html, drmFromHDFS() call -- and then you can access to the direct rdd via rdd matrix property

Re: Spark - ready for prime time?

2014-04-10 Thread Dmitriy Lyubimov
On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash and...@andrewash.com wrote: The biggest issue I've come across is that the cluster is somewhat unstable when under memory pressure. Meaning that if you attempt to persist an RDD that's too big for memory, even with MEMORY_AND_DISK, you'll often

Re: Multi master Spark

2014-04-09 Thread Dmitriy Lyubimov
The only way i know to do this is to use mesos with zookeepers. you specify zookeeper url as spark url that contains multiple zookeeper hosts. Multiple mesos masters are then elected thru zookeeper leader election until current leader dies; at which point mesos will elect another master (if still