Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set spark.yarn.jar and spark.jars in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 game.txt Exception in thread main java.lang.NoClassDefFoundError:

Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set spark.yarn.jar and spark.jars in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors 100 game.txt Exception in thread main java.lang.NoClassDefFoundError:

Re: Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Yin Huai
Is it the full stack trace? On Thu, Jun 18, 2015 at 6:39 AM, Sea 261810...@qq.com wrote: Hi, all: I want to run spark sql on yarn(yarn-client), but ... I already set spark.yarn.jar and spark.jars in conf/spark-defaults.conf. ./bin/spark-sql -f game.sql --executor-memory 2g --num-executors

Re: Sidebar: issues targeted for 1.4.0

2015-06-18 Thread Nicholas Chammas
Given fixed time, adding more TODOs generally means other stuff has to be taken out for the release. If not, then it happens de facto anyway, which is worse than managing it on purpose. +1 to this. I wouldn't mind helping go through open issues on JIRA targeted for the next release around RC

?????? Spark-sql(yarn-client) java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/ExecutorLauncher

2015-06-18 Thread Sea
Thanks, Yin Huai I work it out. I use JDK1.7 to build Spark 1.4.0, but my yarn cluster run on JDK1.6. But java.version in pom.xml in 1.6 and the exception makes me confused -- -- ??: Yin Huai;yh...@databricks.com; : 2015??6??18??(??)

Re: [mllib] Refactoring some spark.mllib model classes in Python not inheriting JavaModelWrapper

2015-06-18 Thread Xiangrui Meng
Hi Yu, Reducing the code complexity on the Python side is certainly what we want to see:) We didn't call Java directly in Python models because Java methods don't work inside RDD closures, e.g., rdd.map(lambda x: model.predict(x[1])) But I agree that for model save/load the implementation

Increase partition count (repartition) without shuffle

2015-06-18 Thread Ulanov, Alexander
Hi, Is there a way to increase the amount of partition of RDD without causing shuffle? I've found JIRA issue https://issues.apache.org/jira/browse/SPARK-5997 however there is no implementation yet. Just in case, I am reading data from ~300 big binary files, which results in 300 partitions,

Re: Random Forest driver memory

2015-06-18 Thread Joseph Bradley
Hi Isca, Could you please give more details? Data size, model parameters, stack traces / logs, etc. to help get a better picture? Thanks, Joseph On Wed, Jun 17, 2015 at 9:56 AM, Isca Harmatz pop1...@gmail.com wrote: hello, does anyone has any help on the issue? Isca On Tue, Jun 16,

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Sandy Ryza
Hi Alexander, There is currently no way to create an RDD with more partitions than its parent RDD without causing a shuffle. However, if the files are splittable, you can set the Hadoop configurations that control split size to something smaller so that the HadoopRDD ends up with more

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Mridul Muralidharan
If you can scan input twice, you can of course do per partition count and build custom RDD which can reparation without shuffle. But nothing off the shelf as Sandy mentioned. Regards Mridul On Thursday, June 18, 2015, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Alexander, There is currently

Re: Sidebar: issues targeted for 1.4.0

2015-06-18 Thread Sean Owen
I also like using Target Version meaningfully. It might be a little much to require no Target Version = X before starting an RC. I do think it's reasonable to not start the RC with Blockers open. And here we started the RC with almost 100 TODOs for 1.4.0, most of which did not get done. Not the

Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-18 Thread Nick Pentreath
If it's going into the DataFrame API (which it probably should rather than in RDD itself) - then it could become a UDT (similar to HyperLogLogUDT) which would mean it doesn't have to implement Serializable, as it appears that serialization is taken care of in the UDT def (e.g.