?????? Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread ????????
i download offical release version ,not build myself ;
but  i don't config that para ,there no error  ;


--  --
??: "Jeff Zhang";;
: 2016??6??22??(??) 2:09
??: "Yash Sharma"; 
: ""<958943...@qq.com>; "user"; 
: Re: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher



Make sure you built spark with -Pyarn, and check whether you have class 
ExecutorLauncher in your spark assembly jar. 



On Wed, Jun 22, 2016 at 2:04 PM, Yash Sharma  wrote:
How about supplying the jar directly in spark submit - 

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--driver-memory 512m \
--num-executors 2 \
--executor-memory 512m \
--executor-cores 2 \
/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


On Wed, Jun 22, 2016 at 3:59 PM,  <958943...@qq.com> wrote:
i  config this  para  at spark-defaults.conf
spark.yarn.jar 
hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


then ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-client --driver-memory 512m --num-executors 2 --executor-memory 512m 
--executor-cores 210:




Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher  

but  i don't config that para ,there no error  why???that para is only avoid 
Uploading resource file(jar package)??



 






-- 
Best Regards

Jeff Zhang

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Saisai Shao
spark.yarn.jar (none) The location of the Spark jar file, in case
overriding the default location is desired. By default, Spark on YARN will
use a Spark jar installed locally, but the Spark jar can also be in a
world-readable location on HDFS. This allows YARN to cache it on nodes so
that it doesn't need to be distributed each time an application runs. To
point to a jar on HDFS, for example, set this configuration to
hdfs:///some/path.

spark.yarn.jar is used for spark run-time system jar, which is spark
assembly jar, not the application jar (example-assembly jar). So in your
case you upload the example-assembly jar into hdfs, in which spark system
jars are not packed, so ExecutorLaucher cannot be found.

Thanks
Saisai

On Wed, Jun 22, 2016 at 2:10 PM, 另一片天 <958943...@qq.com> wrote:

> shihj@master:/usr/local/spark/spark-1.6.1-bin-hadoop2.6$
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
> yarn-client --driver-memory 512m --num-executors 2 --executor-memory 512m
> --executor-cores 2
> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar 10
> Warning: Local jar
> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar does not exist,
> skipping.
> java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> get error at once
> -- 原始邮件 --
> *发件人:* "Yash Sharma";;
> *发送时间:* 2016年6月22日(星期三) 下午2:04
> *收件人:* "另一片天"<958943...@qq.com>;
> *抄送:* "user";
> *主题:* Re: Could not find or load main class
> org.apache.spark.deploy.yarn.ExecutorLauncher
>
> How about supplying the jar directly in spark submit -
>
> ./bin/spark-submit \
>> --class org.apache.spark.examples.SparkPi \
>> --master yarn-client \
>> --driver-memory 512m \
>> --num-executors 2 \
>> --executor-memory 512m \
>> --executor-cores 2 \
>> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
>
>
> On Wed, Jun 22, 2016 at 3:59 PM, 另一片天 <958943...@qq.com> wrote:
>
>> i  config this  para  at spark-defaults.conf
>> spark.yarn.jar
>> hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
>>
>> then ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>> --master yarn-client --driver-memory 512m --num-executors 2
>> --executor-memory 512m --executor-cores 210:
>>
>>
>>
>>- Error: Could not find or load main class
>>org.apache.spark.deploy.yarn.ExecutorLauncher
>>
>> but  i don't config that para ,there no error  why???that para is only
>> avoid Uploading resource file(jar package)??
>>
>
>


Re: FullOuterJoin on Spark

2016-06-22 Thread Nirav Patel
Can your domain list fit in memory of one executor. if so you can use
broadcast join.

You can always narrow down to inner join and derive rest from original set
if memory is issue there. If you are just concerned about shuffle memory
then to reduce amount of shuffle you can do following:
1) partition both rdd (dataframes) with same partitioner with same count so
corresponding data will on on same node at least
2) increase shuffle.memoryfraction

you can use dataframes with spark 1.6 or greater to further reduce memory
footprint. I haven't tested that though.


On Tue, Jun 21, 2016 at 6:16 AM, Rychnovsky, Dusan <
dusan.rychnov...@firma.seznam.cz> wrote:

> Hi,
>
>
> can somebody please explain the way FullOuterJoin works on Spark? Does
> each intersection get fully loaded to memory?
>
> My problem is as follows:
>
>
> I have two large data-sets:
>
>
> * a list of web pages,
>
> * a list of domain-names with specific rules for processing pages from
> that domain.
>
>
> I am joining these web-pages with processing rules.
>
>
> For certain domains there are millions of web-pages.
>
>
> Based on the memory demands the join is having it looks like the whole
> intersection (i.e. a domain + all corresponding pages) are kept in memory
> while processing.
>
>
> What I really need in this case, though, is to hold just the domain and
> iterate over all corresponding pages, one at a time.
>
>
> What would be the best way to do this on Spark?
>
> Thank you,
>
> Dusan Rychnovsky
>
>
>

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube] 



?????? Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread ????????
shihj@master:/usr/local/spark/spark-1.6.1-bin-hadoop2.6$ ./bin/spark-submit 
--class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 
512m --num-executors 2 --executor-memory 512m --executor-cores 2   
/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar 10
Warning: Local jar /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar 
does not exist, skipping.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

get error at once
--  --
??: "Yash Sharma";;
: 2016??6??22??(??) 2:04
??: ""<958943...@qq.com>; 
: "user"; 
: Re: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher



How about supplying the jar directly in spark submit - 

./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--driver-memory 512m \
--num-executors 2 \
--executor-memory 512m \
--executor-cores 2 \
/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


On Wed, Jun 22, 2016 at 3:59 PM,  <958943...@qq.com> wrote:
i  config this  para  at spark-defaults.conf
spark.yarn.jar 
hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


then ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-client --driver-memory 512m --num-executors 2 --executor-memory 512m 
--executor-cores 210:




Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher  

but  i don't config that para ,there no error  why???that para is only avoid 
Uploading resource file(jar package)??

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Jeff Zhang
Make sure you built spark with -Pyarn, and check whether you have
class ExecutorLauncher in your spark assembly jar.


On Wed, Jun 22, 2016 at 2:04 PM, Yash Sharma  wrote:

> How about supplying the jar directly in spark submit -
>
> ./bin/spark-submit \
>> --class org.apache.spark.examples.SparkPi \
>> --master yarn-client \
>> --driver-memory 512m \
>> --num-executors 2 \
>> --executor-memory 512m \
>> --executor-cores 2 \
>> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
>
>
> On Wed, Jun 22, 2016 at 3:59 PM, 另一片天 <958943...@qq.com> wrote:
>
>> i  config this  para  at spark-defaults.conf
>> spark.yarn.jar
>> hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
>>
>> then ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>> --master yarn-client --driver-memory 512m --num-executors 2
>> --executor-memory 512m --executor-cores 210:
>>
>>
>>
>>- Error: Could not find or load main class
>>org.apache.spark.deploy.yarn.ExecutorLauncher
>>
>> but  i don't config that para ,there no error  why???that para is only
>> avoid Uploading resource file(jar package)??
>>
>
>


-- 
Best Regards

Jeff Zhang


Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma
How about supplying the jar directly in spark submit -

./bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn-client \
> --driver-memory 512m \
> --num-executors 2 \
> --executor-memory 512m \
> --executor-cores 2 \
> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


On Wed, Jun 22, 2016 at 3:59 PM, 另一片天 <958943...@qq.com> wrote:

> i  config this  para  at spark-defaults.conf
> spark.yarn.jar
> hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
>
> then ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
> yarn-client --driver-memory 512m --num-executors 2 --executor-memory 512m
> --executor-cores 210:
>
>
>
>- Error: Could not find or load main class
>org.apache.spark.deploy.yarn.ExecutorLauncher
>
> but  i don't config that para ,there no error  why???that para is only
> avoid Uploading resource file(jar package)??
>


Re: spark job automatically killed without rhyme or reason

2016-06-22 Thread Nirav Patel
spark is memory hogger and suicidal if you have a job processing bigger
dataset. however databricks claims that  spark > 1.6  have optimization
related to memory footprint as well as processing. It will only be
available if you use dataframe or dataset. if you are using rdd you have to
do lot of testing and tuning.

On Mon, Jun 20, 2016 at 1:34 AM, Sean Owen  wrote:

> I'm not sure that's the conclusion. It's not trivial to tune and
> configure YARN and Spark to match your app's memory needs and profile,
> but, it's also just a matter of setting them properly. I'm not clear
> you've set the executor memory for example, in particular
> spark.yarn.executor.memoryOverhead
>
> Everything else you mention is a symptom of YARN shutting down your
> jobs because your memory settings don't match what your app does.
> They're not problems per se, based on what you have provided.
>
>
> On Mon, Jun 20, 2016 at 9:17 AM, Zhiliang Zhu
>  wrote:
> > Hi Alexander ,
> >
> > Thanks a lot for your comments.
> >
> > Spark seems not that stable when it comes to run big job, too much data
> or
> > too much time, yes, the problem is gone when reducing the scale.
> > Sometimes reset some job running parameter (such as --drive-memory may
> help
> > in GC issue) , sometimes may rewrite the codes by applying other
> algorithm.
> >
> > As you commented the shuffle operation, it sounds some as the reason ...
> >
> > Best Wishes !
> >
> >
> >
> > On Friday, June 17, 2016 8:45 PM, Alexander Kapustin 
> > wrote:
> >
> >
> > Hi Zhiliang,
> >
> > Yes, find the exact reason of failure is very difficult. We have issue
> with
> > similar behavior, due to limited time for investigation, we reduce the
> > number of processed data, and problem has gone.
> >
> > Some points which may help you in investigations:
> > · If you start spark-history-server (or monitoring running
> > application on 4040 port), look into failed stages (if any). By default
> > Spark try to retry stage execution 2 times, after that job fails
> > · Some useful information may contains in yarn logs on Hadoop
> nodes
> > (yarn--nodemanager-.log), but this is only information about
> > killed container, not about the reasons why this stage took so much
> memory
> >
> > As I can see in your logs, failed step relates to shuffle operation,
> could
> > you change your job to avoid massive shuffle operation?
> >
> > --
> > WBR, Alexander
> >
> > From: Zhiliang Zhu
> > Sent: 17 июня 2016 г. 14:10
> > To: User; kp...@hotmail.com
> > Subject: Re: spark job automatically killed without rhyme or reason
> >
> >
> > Show original message
> >
> >
> > Hi Alexander,
> >
> > is your yarn userlog   just for the executor log ?
> >
> > as for those logs seem a little difficult to exactly decide the wrong
> point,
> > due to sometimes successful job may also have some kinds of the error
> ...
> > but will repair itself.
> > spark seems not that stable currently ...
> >
> > Thank you in advance~
> >
> >
> >
> > On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu 
> wrote:
> >
> >
> > Hi Alexander,
> >
> > Thanks a lot for your reply.
> >
> > Yes, submitted by yarn.
> > Do you just mean in the executor log file by way of yarn logs
> -applicationId
> > id,
> >
> > in this file, both in some containers' stdout  and stderr :
> >
> > 16/06/17 14:05:40 INFO client.TransportClientFactory: Found inactive
> > connection to ip-172-31-20-104/172.31.20.104:49991, creating a new one.
> > 16/06/17 14:05:40 ERROR shuffle.RetryingBlockFetcher: Exception while
> > beginning fetch of 1 outstanding blocks
> > java.io.IOException: Failed to connect to
> > ip-172-31-20-104/172.31.20.104:49991  <-- may it be due
> to
> > that spark is not stable, and spark may repair itself for these kinds of
> > error ? (saw some in successful run )
> >
> > at
> >
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
> > at
> >
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
> > 
> > Caused by: java.net.ConnectException: Connection refused:
> > ip-172-31-20-104/172.31.20.104:49991
> > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> > at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> > at
> >
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
> > at
> >
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at 

Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread ????????
i  config this  para  at spark-defaults.conf
spark.yarn.jar 
hdfs://master:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar


then ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-client --driver-memory 512m --num-executors 2 --executor-memory 512m 
--executor-cores 210:




Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher  

but  i don't config that para ,there no error  why???that para is only avoid 
Uploading resource file(jar package)??

<    1   2