Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Akhil Das
Is that all you have in the executor logs? I suspect some of those jobs are having a hard time managing the memory. Thanks Best Regards On Sun, Nov 1, 2015 at 9:38 PM, Romi Kuntsman wrote: > [adding dev list since it's probably a bug, but i'm not sure how to > reproduce so I

Re: Guidance to get started

2015-11-09 Thread Akhil Das
You can read the installation details from here http://spark.apache.org/docs/latest/ You can read about contributing to spark from here https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Thanks Best Regards On Thu, Oct 29, 2015 at 3:53 PM, Aaska Shah

?????? [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-09 Thread Ricky
Now I try the spark-1.5.2-rc2.zip from githup ,the result also has errors . [root@ouyangshourui spark-1.5.2-rc2]# pwd /SparkCode/spark-1.5.2-rc2 [root@ouyangshourui spark-1.5.2-rc2]# nohup mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package &

Re: sample or takeSample or ??

2015-11-09 Thread Akhil Das
You can't create a new RDD by selecting few elements. A rdd.take(n), takeSample etc are actions and it will trigger your entire pipeline to be executed. You can although do something like this i guess: val sample_data = rdd.take(10) val sample_rdd = sc.parallelize(sample_data) Thanks Best

Support for views/ virtual tables in SparkSQL

2015-11-09 Thread Sudhir Menon
Team: Do we plan to add support for views/ virtual tables in SparkSQL anytime soon? Trying to run the TPC-H workload and failing on queries that assumes support for views in the underlying database Thanks in advance Suds

?????? [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-09 Thread Ricky
thank for your help,do as you said,the problem is firewall issues,when changing to maven default repo (http://repo1.maven.org) from http://maven.oschina.net, spark-streaming-mqtt_2.10 module compiled Successfully. -- Best Regards --

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread tsh
Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the same cluster. Well, the question is quite ambivalent: from one hand, we have terabytes of versatile data and the necessity to make something like cubes (Hive and

Re: Sort Merge Join from the filesystem

2015-11-09 Thread Alex Nastetsky
Thanks for creating that ticket. Another thing I was thinking of, is doing this type of join between dataset A which is already partitioned/sorted on disk and dataset B, which gets generated during the run of the application. Dataset B would need something like repartitionAndSortWithinPartitions

Re: Block Transfer Service encryption support

2015-11-09 Thread turp1twin
I created a pull request for issue SPARK-6373 Any feedback would be appreciated... https://github.com/apache/spark/pull/9416 Jeff -- View this message in context:

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
If they have a problem managing memory, wouldn't there should be a OOM? Why does AppClient throw a NPE? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 9, 2015 at 4:59 PM, Akhil Das wrote: > Is that all you have in the executor logs? I

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Akhil Das
Did you find anything regarding the OOM in the executor logs? Thanks Best Regards On Mon, Nov 9, 2015 at 8:44 PM, Romi Kuntsman wrote: > If they have a problem managing memory, wouldn't there should be a OOM? > Why does AppClient throw a NPE? > > *Romi Kuntsman*, *Big Data

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
I didn't see anything about a OOM. This happens sometimes before anything in the application happened, and happens to a few applications at the same time - so I guess it's a communication failure, but the problem is that the error shown doesn't represent the actual problem (which may be a network

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Tim Preece
Searching shows several people hit this same NPE in AppClient.scala line 160 ( perhaps because appID was null - could application had be stopped before registered ?) -- View this message in context:

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Luke Han
Some friends refer me this thread about OLAP/Kylin and Spark... Here's my 2 cents.. If you are trying to setup OLAP, Apache Kylin should be one good idea for you to evaluate. The project has developed more than 2 years and going to graduate to Apache Top Level Project [1]. There are many

Re: Support for views/ virtual tables in SparkSQL

2015-11-09 Thread Zhan Zhang
I think you can rewrite those TPC-H queries not using view, for example registerTempTable Thanks. Zhan Zhang On Nov 9, 2015, at 9:34 PM, Sudhir Menon wrote: > Team: > > Do we plan to add support for views/ virtual tables in SparkSQL anytime soon? > Trying to run the TPC-H

Re: ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Sean Owen
Since it's a fairly expensive operation to build the Map, I tend to agree it should not happen in the loop. On Tue, Nov 10, 2015 at 5:08 AM, Yuming Wang wrote: > Hi > > > > I found org.apache.spark.ml.feature.Word2Vec.transform() very slow. > > I think we should not read

Re: ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Nick Pentreath
Seems a straightforward change that purely enhances efficiency, so yes please submit a JIRA and PR for this On Tue, Nov 10, 2015 at 8:56 AM, Sean Owen wrote: > Since it's a fairly expensive operation to build the Map, I tend to agree > it should not happen in the loop. > >

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
Hi, According to my experience, I would recommend option 3) using Apache Kylin for your requirements. This is a suggestion based on the open-source world. For the per cassandra thing, I accept your advice for the special support thing. But the community is very open and convinient for

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
Hi, Have you ever considered cassandra as a replacement ? We are now almost the seem usage as your engine, e.g. using mysql to store initial aggregated data. Can you share more about your kind of Cube queries ? We are very interested in that arch too : ) Best, Sun. fightf...@163.com

ml.feature.Word2Vec.transform() very slow issue

2015-11-09 Thread Yuming Wang
Hi I found org.apache.spark.ml.feature.Word2Vec.transform() very slow. I think we should not read broadcast every sentence, so I fixed on my forked. https://github.com/979969786/spark/commit/a9f894df3671bb8df2f342de1820dab3185598f3 I have use 2 number rows test it. Original version

[build system] shane OOO until monday, nov 16

2015-11-09 Thread shane knapp
i'll be at the USENIX LISA conference in DC, so josh and jon will be keeping an eye on jenkins and making sure it doesn't misbehave. since attending every session of every day will drive one insane, i will be sporadically checking in and making sure things are humming along... but for

RE: Sort Merge Join from the filesystem

2015-11-09 Thread Cheng, Hao
Yes, we definitely need to think how to handle this case, probably even more common than both sorted/partitioned tables case, can you jump to the jira and leave comment there? From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] Sent: Tuesday, November 10, 2015 3:03 AM To: Cheng, Hao

Re: Anyone has perfect solution for spark source code compilation issue on intellij

2015-11-09 Thread Tim Preece
I've had success building with maven ( 3.3.3 ) with: Intellij 14.1.5 scala 2.10.4 openjdk 7 (1.7.0_79) What OS/Platform are you on ? -- View this message in context:

Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread Ted Yu
Please consider using NoSQL engine such as hbase. Cheers > On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic >