Re: Spark-Submit issues

2014-11-12 Thread Ted Malaska
Other wish include them at the time of execution.  here is an example.

spark-submit --jars
/opt/cloudera/parcels/CDH/lib/zookeeper/zookeeper-3.4.5-cdh5.1.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/guava-12.0.1.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/protobuf-java-2.5.0.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop-compat.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
--class org.apache.spark.hbase.example.HBaseBulkDeleteExample --master yarn
--deploy-mode client --executor-memory 512M --num-executors 4
--driver-java-options
-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/lib/*
SparkHBase.jar t1 c

On Wed, Nov 12, 2014 at 4:25 PM, Hari Shreedharan  wrote:

> Yep, you’d need to shade jars to ensure all your dependencies are in the
> classpath.
>
> Thanks,
> Hari
>
>
> On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska 
> wrote:
>
>> Hey this is Ted
>>
>> Are you using Shade when you build your jar and are you using the bigger
>> jar?  Looks like classes are not included in you jar.
>>
>> On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson <
>> jeniba.john...@lntinfotech.com> wrote:
>>
>>> Hi Hari,
>>>
>>> Now Iam trying out the same FlumeEventCount example running with
>>> spark-submit Instead of run example. The steps I followed is that I have
>>> exported the JavaFlumeEventCount.java into jar.
>>>
>>> The command used is
>>> ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar
>>> --master local --class org.JavaFlumeEventCount  bin/flumeeventcnt2.jar
>>> localhost 2323
>>>
>>> The output is
>>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>>> blocks
>>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>>> 1415795102000
>>>
>>> If I use this command
>>>  ./bin/spark-submit --master local --class org.JavaFlumeEventCount
>>> bin/flumeeventcnt2.jar  localhost 2323
>>>
>>> Then I get an error
>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>> classpath
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/examples/streaming/StreamingExamples
>>> at org.JavaFlumeEventCount.main(JavaFlumeEventCount.java:22)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:601)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.examples.streaming.StreamingExamples
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>> ... 8 more
>>>
>>>
>>> I Just wanted to ask is  that it is able to  find spark-assembly.jar but
>>> why not spark-example.jar.
>>> The next doubt is  while running FlumeEventCount example through
>>> runexample
>>>
>>> I get an output as
>>> Received 4 flume events.
>>>
>>> 14/11/12 18:30:14 INFO scheduler.JobScheduler: Finished job streaming
>>> job 1415797214000 ms.0 from job set of time 1415797214000 ms
>>> 14/11/12 18:30:14 INFO rdd.MappedRDD: Removing RDD 70 from persistence
>>> list
>>>
>>> But If I run the same program through Spark-Submit
>>>
>>> I get an output as
>>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>>> blocks
>>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>>> 1415795102000
>>>
>>> So I need a clarification, since in the program the printing statement
>>> is written as " Received n flume events." So how come Iam able to see as
>>> "Stream 0 received n blocks".
>>> And what is the difference of running the program through spark-submit
>>> and run-example.
>>>
>>> Awaiting for your kind reply
>>>
>>> Regards,
>>> Jeniba Johnson
>>>
>>>
>>> 
>>> The contents of this e-mail and any attachment(s) may contain
>>> confidential or privileged information for the intended recipient(s).
>>> Unintended recipients are prohibited from taking action on the basis of
>>> information in this

Re: Spark-Submit issues

2014-11-12 Thread Hari Shreedharan
Yep, you’d need to shade jars to ensure all your dependencies are in the 
classpath.


Thanks,
Hari

On Wed, Nov 12, 2014 at 3:23 AM, Ted Malaska 
wrote:

> Hey this is Ted
> Are you using Shade when you build your jar and are you using the bigger
> jar?  Looks like classes are not included in you jar.
> On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson <
> jeniba.john...@lntinfotech.com> wrote:
>> Hi Hari,
>>
>> Now Iam trying out the same FlumeEventCount example running with
>> spark-submit Instead of run example. The steps I followed is that I have
>> exported the JavaFlumeEventCount.java into jar.
>>
>> The command used is
>> ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar
>> --master local --class org.JavaFlumeEventCount  bin/flumeeventcnt2.jar
>> localhost 2323
>>
>> The output is
>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>> blocks
>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>> 1415795102000
>>
>> If I use this command
>>  ./bin/spark-submit --master local --class org.JavaFlumeEventCount
>> bin/flumeeventcnt2.jar  localhost 2323
>>
>> Then I get an error
>> Spark assembly has been built with Hive, including Datanucleus jars on
>> classpath
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/examples/streaming/StreamingExamples
>> at org.JavaFlumeEventCount.main(JavaFlumeEventCount.java:22)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>> at
>> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.examples.streaming.StreamingExamples
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>> ... 8 more
>>
>>
>> I Just wanted to ask is  that it is able to  find spark-assembly.jar but
>> why not spark-example.jar.
>> The next doubt is  while running FlumeEventCount example through runexample
>>
>> I get an output as
>> Received 4 flume events.
>>
>> 14/11/12 18:30:14 INFO scheduler.JobScheduler: Finished job streaming job
>> 1415797214000 ms.0 from job set of time 1415797214000 ms
>> 14/11/12 18:30:14 INFO rdd.MappedRDD: Removing RDD 70 from persistence list
>>
>> But If I run the same program through Spark-Submit
>>
>> I get an output as
>> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
>> blocks
>> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
>> 1415795102000
>>
>> So I need a clarification, since in the program the printing statement is
>> written as " Received n flume events." So how come Iam able to see as
>> "Stream 0 received n blocks".
>> And what is the difference of running the program through spark-submit and
>> run-example.
>>
>> Awaiting for your kind reply
>>
>> Regards,
>> Jeniba Johnson
>>
>>
>> 
>> The contents of this e-mail and any attachment(s) may contain confidential
>> or privileged information for the intended recipient(s). Unintended
>> recipients are prohibited from taking action on the basis of information in
>> this e-mail and using or disseminating the information, and must notify the
>> sender and delete it from their system. L&T Infotech will not accept
>> responsibility or liability for the accuracy or completeness of, or the
>> presence of any virus or disabling code in this e-mail"
>>

Re: Spark-Submit issues

2014-11-12 Thread Ted Malaska
Hey this is Ted

Are you using Shade when you build your jar and are you using the bigger
jar?  Looks like classes are not included in you jar.

On Wed, Nov 12, 2014 at 2:09 AM, Jeniba Johnson <
jeniba.john...@lntinfotech.com> wrote:

> Hi Hari,
>
> Now Iam trying out the same FlumeEventCount example running with
> spark-submit Instead of run example. The steps I followed is that I have
> exported the JavaFlumeEventCount.java into jar.
>
> The command used is
> ./bin/spark-submit --jars lib/spark-examples-1.1.0-hadoop1.0.4.jar
> --master local --class org.JavaFlumeEventCount  bin/flumeeventcnt2.jar
> localhost 2323
>
> The output is
> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
> blocks
> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
> 1415795102000
>
> If I use this command
>  ./bin/spark-submit --master local --class org.JavaFlumeEventCount
> bin/flumeeventcnt2.jar  localhost 2323
>
> Then I get an error
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/examples/streaming/StreamingExamples
> at org.JavaFlumeEventCount.main(JavaFlumeEventCount.java:22)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.examples.streaming.StreamingExamples
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> ... 8 more
>
>
> I Just wanted to ask is  that it is able to  find spark-assembly.jar but
> why not spark-example.jar.
> The next doubt is  while running FlumeEventCount example through runexample
>
> I get an output as
> Received 4 flume events.
>
> 14/11/12 18:30:14 INFO scheduler.JobScheduler: Finished job streaming job
> 1415797214000 ms.0 from job set of time 1415797214000 ms
> 14/11/12 18:30:14 INFO rdd.MappedRDD: Removing RDD 70 from persistence list
>
> But If I run the same program through Spark-Submit
>
> I get an output as
> 14/11/12 17:55:02 INFO scheduler.ReceiverTracker: Stream 0 received 1
> blocks
> 14/11/12 17:55:02 INFO scheduler.JobScheduler: Added jobs for time
> 1415795102000
>
> So I need a clarification, since in the program the printing statement is
> written as " Received n flume events." So how come Iam able to see as
> "Stream 0 received n blocks".
> And what is the difference of running the program through spark-submit and
> run-example.
>
> Awaiting for your kind reply
>
> Regards,
> Jeniba Johnson
>
>
> 
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>