Re: Intellij IDEA 14 env setup; NoClassDefFoundError when run examples

2015-02-01 Thread Yafeng Guo
Finally it works. @Sean, I'm trying to setup env in IDE so I can track into to Spark -- that will help me understand Spark internal mechanism. @Ted, thanks. I'm using Maven, not SBT, but thanks for the suggestion anyway. For others who might interested in: I choose bigtop-dist profile so under

Re: renaming SchemaRDD - DataFrame

2015-02-01 Thread Evan Chan
It is true that you can persist SchemaRdds / DataFrames to disk via Parquet, but a lot of time and inefficiencies is lost. The in-memory columnar cached representation is completely different from the Parquet file format, and I believe there has to be a translation into a Row (because ultimately

Re: Intellij IDEA 14 env setup; NoClassDefFoundError when run examples

2015-02-01 Thread Sean Owen
How do you mean you run LogQuery? you would run these using the run-example script rather than in IntelliJ. On Sun, Feb 1, 2015 at 4:01 AM, Yafeng Guo daniel.yafeng@gmail.com wrote: Hi, I'm setting up a dev environment with Intellij IDEA 14. I selected profile scala-2.10, maven-3, hadoop

Caching tables at column level

2015-02-01 Thread Mick Davies
I have been working a lot recently with denormalised tables with lots of columns, nearly 600. We are using this form to avoid joins. I have tried to use cache table with this data, but it proves too expensive as it seems to try to cache all the data in the table. For data sets such as the one I

Re: Any interest in 'weighting' VectorTransformer which does component-wise scaling?

2015-02-01 Thread Octavian Geagla
I've added support for sparse vectors and created HadamardTF for the pipeline, please take a look on my branch https://github.com/ogeagla/spark/compare/spark-mllib-weighting . Thanks! -- View this message in context:

Re: Caching tables at column level

2015-02-01 Thread Michael Armbrust
Its not completely transparent, but you can do something like the following today: CACHE TABLE hotData AS SELECT columns, I, care, about FROM fullTable On Sun, Feb 1, 2015 at 3:03 AM, Mick Davies michael.belldav...@gmail.com wrote: I have been working a lot recently with denormalised tables

Re: Custom Cluster Managers / Standalone Recovery Mode in Spark

2015-02-01 Thread Aaron Davidson
For the specific question of supplementing Standalone Mode with a custom leader election protocol, this was actually already committed in master and will be available in Spark 1.3: https://github.com/apache/spark/pull/771/files You can specify spark.deploy.recoveryMode = CUSTOM and

Word2Vec IndexedRDD

2015-02-01 Thread Michael Malak
1. Is IndexedRDD planned for 1.3? https://issues.apache.org/jira/browse/SPARK-2365 2. Once IndexedRDD is in, is it planned to convert Word2VecModel to it from its current Map[String,Array[Float]]?

Re: Custom Cluster Managers / Standalone Recovery Mode in Spark

2015-02-01 Thread Anjana Fernando
Hi guys, That's great to hear that this is available in Spark 1.3! .. I will play around with this feature and let you know the results for integrating Hazelcast. Also, may I know the tentative release date for Spark 1.3? .. Cheers, Anjana. On Mon, Feb 2, 2015 at 3:07 AM, Aaron Davidson