Re: java.lang.ClassNotFoundException for s3a comitter

2020-07-21 Thread Gourav Sengupta
Hi,

I am not sure about this but is there any requirement to use S3a at all ?


Regards,
Gourav

On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran 
wrote:

>
>
> On Tue, 7 Jul 2020 at 03:42, Stephen Coy 
> wrote:
>
>> Hi Steve,
>>
>> While I understand your point regarding the mixing of Hadoop jars, this
>> does not address the java.lang.ClassNotFoundException.
>>
>> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
>> Hadoop 3.2. Not Hadoop 3.1.
>>
>
> sorry, I should have been clearer. Hadoop 3.2.x has everything you need.
>
>
>
>>
>> The only place that I have found that missing class is in the Spark
>> “hadoop-cloud” source module, and currently the only way to get the jar
>> containing it is to build it yourself. If any of the devs are listening it
>>  would be nice if this was included in the standard distribution. It has a
>> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>>
>> But I am relatively new to this stuff so I could be wrong.
>>
>> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
>> like:
>>
>> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
>> "directory");
>> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
>> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
>> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
>> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
>> hadoopConfiguration.set("fs.s3a.connection.maximum",
>> Integer.toString(coreCount * 2));
>>
>> Querying and updating s3a data sources seems to be working ok.
>>
>> Thanks,
>>
>> Steve C
>>
>> On 29 Jun 2020, at 10:34 pm, Steve Loughran 
>> wrote:
>>
>> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
>> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
>> doomed. using a different aws sdk jar is a bit risky, though more recent
>> upgrades have all be fairly low stress
>>
>> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <
>> murat.migdiso...@gmail.com> wrote:
>>
>>> Hi all
>>> I've upgraded my test cluster to spark 3 and change my comitter to
>>> directory and I still get this error.. The documentations are somehow
>>> obscure on that.
>>> Do I need to add a third party jar to support new comitters?
>>>
>>> java.lang.ClassNotFoundException:
>>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>>
>>>
>>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>>> murat.migdiso...@gmail.com> wrote:
>>>
>>>> Hello all,
>>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>>> s3guard is enabled.
>>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>>
>>>> When I try to save a dataframe in parquet format, I get the following
>>>> exception:
>>>> java.lang.ClassNotFoundException:
>>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>>
>>>> My relevant spark configurations are as following:
>>>>
>>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>>> "fs.s3a.committer.name
>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D=0>":
>>>> "magic",
>>>> "fs.s3a.committer.magic.enabled": true,
>>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>>
>>>> While spark streaming fails with the exception above, apache beam
>>>> succeeds writing parquet files.
>>>> What might be the problem?
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>> --
>>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>>> our hands, not our tongues."
>>>> W. Shakespeare
>>>>
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature_source=Internal_medium=Email_content=Driving%20Force>
>> This email contains confidential information of and is the copyright of
>> Infomedia. It must not be forwarded, amended or disclosed without consent
>> of the sender. If you received this message by mistake, please advise the
>> sender and delete all copies. Security of transmission on the internet
>> cannot be guaranteed, could be infected, intercepted, or corrupted and you
>> should ensure you have suitable antivirus protection in place. By sending
>> us your or any third party personal details, you consent to (or confirm you
>> have obtained consent from such third parties) to Infomedia’s privacy
>> policy. http://www.infomedia.com.au/privacy-policy/
>>
>


Re: java.lang.ClassNotFoundException for s3a comitter

2020-07-21 Thread Steve Loughran
On Tue, 7 Jul 2020 at 03:42, Stephen Coy 
wrote:

> Hi Steve,
>
> While I understand your point regarding the mixing of Hadoop jars, this
> does not address the java.lang.ClassNotFoundException.
>
> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
> Hadoop 3.2. Not Hadoop 3.1.
>

sorry, I should have been clearer. Hadoop 3.2.x has everything you need.



>
> The only place that I have found that missing class is in the Spark
> “hadoop-cloud” source module, and currently the only way to get the jar
> containing it is to build it yourself. If any of the devs are listening it
>  would be nice if this was included in the standard distribution. It has a
> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>
> But I am relatively new to this stuff so I could be wrong.
>
> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
> like:
>
> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
> "directory");
> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
> hadoopConfiguration.set("fs.s3a.connection.maximum",
> Integer.toString(coreCount * 2));
>
> Querying and updating s3a data sources seems to be working ok.
>
> Thanks,
>
> Steve C
>
> On 29 Jun 2020, at 10:34 pm, Steve Loughran 
> wrote:
>
> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
> doomed. using a different aws sdk jar is a bit risky, though more recent
> upgrades have all be fairly low stress
>
> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu 
> wrote:
>
>> Hi all
>> I've upgraded my test cluster to spark 3 and change my comitter to
>> directory and I still get this error.. The documentations are somehow
>> obscure on that.
>> Do I need to add a third party jar to support new comitters?
>>
>> java.lang.ClassNotFoundException:
>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>
>>
>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>> murat.migdiso...@gmail.com> wrote:
>>
>>> Hello all,
>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>> s3guard is enabled.
>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>
>>> When I try to save a dataframe in parquet format, I get the following
>>> exception:
>>> java.lang.ClassNotFoundException:
>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>
>>> My relevant spark configurations are as following:
>>>
>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>> "fs.s3a.committer.name
>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D=0>":
>>> "magic",
>>> "fs.s3a.committer.magic.enabled": true,
>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>
>>> While spark streaming fails with the exception above, apache beam
>>> succeeds writing parquet files.
>>> What might be the problem?
>>>
>>> Thanks in advance
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature_source=Internal_medium=Email_content=Driving%20Force>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>


Re: java.lang.ClassNotFoundException for s3a comitter

2020-07-06 Thread Stephen Coy
Hi Steve,

While I understand your point regarding the mixing of Hadoop jars, this does 
not address the java.lang.ClassNotFoundException.

Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or Hadoop 
3.2. Not Hadoop 3.1.

The only place that I have found that missing class is in the Spark 
“hadoop-cloud” source module, and currently the only way to get the jar 
containing it is to build it yourself. If any of the devs are listening it  
would be nice if this was included in the standard distribution. It has a 
sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.

But I am relatively new to this stuff so I could be wrong.

I am currently running Spark 3.0 clusters with no HDFS. Spark is set up like:

hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name", "directory");
hadoopConfiguration.set("spark.sql.sources.commitProtocolClass", 
"org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
hadoopConfiguration.set("spark.sql.parquet.output.committer.class", 
"org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
hadoopConfiguration.set("fs.s3a.connection.maximum", Integer.toString(coreCount 
* 2));

Querying and updating s3a data sources seems to be working ok.

Thanks,

Steve C

On 29 Jun 2020, at 10:34 pm, Steve Loughran 
mailto:ste...@cloudera.com.INVALID>> wrote:

you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the 
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is doomed. 
using a different aws sdk jar is a bit risky, though more recent upgrades have 
all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu 
mailto:murat.migdiso...@gmail.com>> wrote:
Hi all
I've upgraded my test cluster to spark 3 and change my comitter to directory 
and I still get this error.. The documentations are somehow obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException: 
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu 
mailto:murat.migdiso...@gmail.com>> wrote:
Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard is 
enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: 
com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name<https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D=0>":
 "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds 
writing parquet files.
What might be the problem?

Thanks in advance


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our 
hands, not our tongues."
W. Shakespeare


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our 
hands, not our tongues."
W. Shakespeare


[http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature_source=Internal_medium=Email_content=Driving%20Force>

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-29 Thread Steve Loughran
you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
doomed. using a different aws sdk jar is a bit risky, though more recent
upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu 
wrote:

> Hi all
> I've upgraded my test cluster to spark 3 and change my comitter to
> directory and I still get this error.. The documentations are somehow
> obscure on that.
> Do I need to add a third party jar to support new comitters?
>
> java.lang.ClassNotFoundException:
> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
>
> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
> murat.migdiso...@gmail.com> wrote:
>
>> Hello all,
>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>> s3guard is enabled.
>> We are using hadoop 3.2.1 with spark 2.4.5.
>>
>> When I try to save a dataframe in parquet format, I get the following
>> exception:
>> java.lang.ClassNotFoundException:
>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>
>> My relevant spark configurations are as following:
>>
>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>> "fs.s3a.committer.name": "magic",
>> "fs.s3a.committer.magic.enabled": true,
>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>
>> While spark streaming fails with the exception above, apache beam
>> succeeds writing parquet files.
>> What might be the problem?
>>
>> Thanks in advance
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>


Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-18 Thread Stephen Coy
Hi Murat Migdisoglu,

Unfortunately you need the secret sauce to resolve this.

It is necessary to check out the Apache Spark source code and build it with the 
right command line options. This is what I have been using:

dev/make-distribution.sh --name my-spark --tgz -Pyarn -Phadoop-3.2  -Pyarn 
-Phadoop-cloud -Dhadoop.version=3.2.1

This will add additional jars into the build.

Copy hadoop-aws-3.2.1.jar, hadoop-openstack-3.2.1.jar and 
spark-hadoop-cloud_2.12-3.0.0.jar into the “jars” directory of your Spark 
distribution. If you are paranoid you could copy/replace all the 
hadoop-*-3.2.1.jar files but I have not found that necessary.

You will also need to upgrade the version of guava that appears in the spark 
distro because Hadoop 3.2.1 bumped this from guava-14.0.1.jar to 
guava-27.0-jre.jar. Otherwise you will get runtime ClassNotFound exceptions.

I have been using this combo for many months now with the Spark 3.0 
pre-releases and it has been working great.

Cheers,

Steve C


On 19 Jun 2020, at 10:24 am, murat migdisoglu 
mailto:murat.migdiso...@gmail.com>> wrote:

Hi all
I've upgraded my test cluster to spark 3 and change my comitter to directory 
and I still get this error.. The documentations are somehow obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException: 
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu 
mailto:murat.migdiso...@gmail.com>> wrote:
Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard is 
enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: 
com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name<https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F=02%7C01%7Cscoy%40infomedia.com.au%7C0725287744754aed9c5108d813e71e6e%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637281230668124994=n6l70htGxJ1q%2BcWH21RWIML7eGdE26UCdY8cDsufY6o%3D=0>":
 "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds 
writing parquet files.
What might be the problem?

Thanks in advance


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our 
hands, not our tongues."
W. Shakespeare


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our 
hands, not our tongues."
W. Shakespeare

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/


Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-18 Thread murat migdisoglu
Hi all
I've upgraded my test cluster to spark 3 and change my comitter to
directory and I still get this error.. The documentations are somehow
obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException:
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu 
wrote:

> Hello all,
> we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
> is enabled.
> We are using hadoop 3.2.1 with spark 2.4.5.
>
> When I try to save a dataframe in parquet format, I get the following
> exception:
> java.lang.ClassNotFoundException:
> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>
> My relevant spark configurations are as following:
>
> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
> "fs.s3a.committer.name": "magic",
> "fs.s3a.committer.magic.enabled": true,
> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>
> While spark streaming fails with the exception above, apache beam succeeds
> writing parquet files.
> What might be the problem?
>
> Thanks in advance
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare


java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco

2020-06-17 Thread murat migdisoglu
Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following
exception:
java.lang.ClassNotFoundException:
com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds
writing parquet files.
What might be the problem?

Thanks in advance


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare


User class threw exception: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html

2018-04-27 Thread amit kumar singh
Hi Team,

I am working on structured streaming

i have added all libraries in build,sbt then also its not picking up right
library an failing with error

User class threw exception: java.lang.ClassNotFoundException: Failed to
find data source: kafka. Please find packages at
http://spark.apache.org/third-party-projects.html

i am using jenkins to deploy this task

thanks
amit


Re: Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function

2017-03-01 Thread Dominik Safaric
The jars I am submitting are the following:

bin/spark-submit --class topology.SimpleProcessingTopology --master 
spark://10.0.0.8:7077 --jars /tmp/spark_streaming-1.0-SNAPSHOT.jar 
/tmp//tmp/spark_streaming-1.0-SNAPSHOT.jar /tmp/streaming.properties

I’ve even tried using the spark.executor.extraClassPath option but 
unfortunately unsuccessfully. 

What do you mean by conflicting copies of Spark classes? Could you elaborate it?

> On 1 Mar 2017, at 14:51, Sean Owen <so...@cloudera.com> wrote:
> 
> What is the --jars you are submitting? You may have conflicting copies of 
> Spark classes that interfere.
> 
> 
> On Wed, Mar 1, 2017, 14:20 Dominik Safaric <dominiksafa...@gmail.com 
> <mailto:dominiksafa...@gmail.com>> wrote:
> I've been trying to submit a Spark Streaming application using spark-submit 
> to a cluster of mine consisting of a master and two worker nodes. The 
> application has been written in Scala, and build using Maven. Importantly, 
> the Maven build is configured to produce a fat JAR containing all 
> dependencies. Furthermore, the JAR has been distributed to all of nodes. The 
> streaming job has been submitted using the following command:
> 
> bin/spark-submit --class topology.SimpleProcessingTopology --jars 
> /tmp/spark_streaming-1.0-SNAPSHOT.jar --master spark://10.0.0.8:7077 
> <http://10.0.0.8:7077/> --verbose /tmp/spark_streaming-1.0-SNAPSHOT.jar 
> /tmp/streaming-benchmark.properties 
> where 10.0.0.8 is the IP address of the master node within the VNET. 
> 
> However, I keep getting the following exception while starting the streaming 
> application:
> 
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> 
> Caused by: java.lang.ClassNotFoundException: 
> topology.SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
> I've checked the content of the JAR using jar tvf and as you can see in the 
> output below, it does contain the class in question.
> 
> 1735 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1.class
>702 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology.class
>   2415 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.class
>   2500 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1.class
>   7045 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$.class
> This exception has been caused due to the anonymous function of the 
> foreachPartition call:
> 
> rdd.foreachPartition(partition => {
>   val outTopic = props.getString("application.simple.kafka.out.topic")
>   val producer = new KafkaProducer[Array[Byte],Array[Byte]](kafkaParams)
>   partition.foreach(record => {
> val producerRecord = new ProducerRecord[Array[Byte], 
> Array[Byte]](outTopic, record.key(), record.value())
> producer.send(producerRecord)
>   })
>   producer.close()
> })
> Unfortunately, I am not able to find the root cause of this since so far. 
> Hence, I would appreciate if anyone could help me out fixing this issue.
> 



Re: Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function

2017-03-01 Thread Sean Owen
What is the --jars you are submitting? You may have conflicting copies of
Spark classes that interfere.

On Wed, Mar 1, 2017, 14:20 Dominik Safaric <dominiksafa...@gmail.com> wrote:

> I've been trying to submit a Spark Streaming application using
> spark-submit to a cluster of mine consisting of a master and two worker
> nodes. The application has been written in Scala, and build using Maven.
> Importantly, the Maven build is configured to produce a fat JAR containing
> all dependencies. Furthermore, the JAR has been distributed to all of
> nodes. The streaming job has been submitted using the following command:
>
> bin/spark-submit --class topology.SimpleProcessingTopology --jars 
> /tmp/spark_streaming-1.0-SNAPSHOT.jar --master spark://10.0.0.8:7077 
> --verbose /tmp/spark_streaming-1.0-SNAPSHOT.jar 
> /tmp/streaming-benchmark.properties
>
> where 10.0.0.8 is the IP address of the master node within the VNET.
>
> However, I keep getting the following exception while starting the
> streaming application:
>
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> Caused by: java.lang.ClassNotFoundException: 
> topology.SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>
> I've checked the content of the JAR using jar tvf and as you can see in
> the output below, it does contain the class in question.
>
> 1735 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1.class
>702 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology.class
>   2415 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.class
>   2500 Wed Mar 01 12:29:20 UTC 2017 
> topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1.class
>   7045 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$.class
>
> This exception has been caused due to the anonymous function of the
> foreachPartition call:
>
> rdd.foreachPartition(partition => {
>   val outTopic = props.getString("application.simple.kafka.out.topic")
>   val producer = new KafkaProducer[Array[Byte],Array[Byte]](kafkaParams)
>   partition.foreach(record => {
> val producerRecord = new ProducerRecord[Array[Byte], 
> Array[Byte]](outTopic, record.key(), record.value())
> producer.send(producerRecord)
>   })
>   producer.close()
> })
>
> Unfortunately, I am not able to find the root cause of this since so far.
> Hence, I would appreciate if anyone could help me out fixing this issue.
>
>


Spark Streaming - java.lang.ClassNotFoundException Scala anonymous function

2017-03-01 Thread Dominik Safaric
I've been trying to submit a Spark Streaming application using spark-submit to 
a cluster of mine consisting of a master and two worker nodes. The application 
has been written in Scala, and build using Maven. Importantly, the Maven build 
is configured to produce a fat JAR containing all dependencies. Furthermore, 
the JAR has been distributed to all of nodes. The streaming job has been 
submitted using the following command:

bin/spark-submit --class topology.SimpleProcessingTopology --jars 
/tmp/spark_streaming-1.0-SNAPSHOT.jar --master spark://10.0.0.8:7077 --verbose 
/tmp/spark_streaming-1.0-SNAPSHOT.jar /tmp/streaming-benchmark.properties 
where 10.0.0.8 is the IP address of the master node within the VNET. 

However, I keep getting the following exception while starting the streaming 
application:

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)

Caused by: java.lang.ClassNotFoundException: 
topology.SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
I've checked the content of the JAR using jar tvf and as you can see in the 
output below, it does contain the class in question.

1735 Wed Mar 01 12:29:20 UTC 2017 
topology/SimpleProcessingTopology$$anonfun$main$1.class
   702 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology.class
  2415 Wed Mar 01 12:29:20 UTC 2017 
topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.class
  2500 Wed Mar 01 12:29:20 UTC 2017 
topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1.class
  7045 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$.class
This exception has been caused due to the anonymous function of the 
foreachPartition call:

rdd.foreachPartition(partition => {
  val outTopic = props.getString("application.simple.kafka.out.topic")
  val producer = new KafkaProducer[Array[Byte],Array[Byte]](kafkaParams)
  partition.foreach(record => {
val producerRecord = new ProducerRecord[Array[Byte], 
Array[Byte]](outTopic, record.key(), record.value())
producer.send(producerRecord)
  })
  producer.close()
})
Unfortunately, I am not able to find the root cause of this since so far. 
Hence, I would appreciate if anyone could help me out fixing this issue.



Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-26 Thread Marco Mistroni
Hi Raymond
 run this command and it should work, provided you have kafka setup a s
well  on localhost at port 2181

spark-submit --packages
org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.1  kafka_wordcount.py
localhost:2181 test

But i suggest, if you are a beginner, to use Spark examples' wordcount
instead, as i believe it reads from a local directory rather than setting
up kafka , which is an additional overhead you dont really need
If you want to go ahead with Kafka, the two links below can give you a start

https://dzone.com/articles/running-apache-kafka-on-windows-os   (i believe
similar setup can be used on Linux)
https://spark.apache.org/docs/latest/streaming-kafka-integration.html

kr




On Sat, Feb 25, 2017 at 11:12 PM, Marco Mistroni <mmistr...@gmail.com>
wrote:

> Hi I have a look. At GitHub project tomorrow and let u know. U have a py
> scripts to run and dependencies to specify.. pls check spark docs in
> meantime...I do all my coding in Scala and specify dependencies using
> --packages. ::.
> Kr
>
> On 25 Feb 2017 11:06 pm, "Raymond Xie" <xie3208...@gmail.com> wrote:
>
>> Thank you very much Marco,
>>
>> I am a beginner in this area, is it possible for you to show me what you
>> think the right script should be to get it executed in terminal?
>>
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>> On Sat, Feb 25, 2017 at 6:00 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> Try to use --packages to include the jars. From error it seems it's
>>> looking for main class in jars but u r running a python script...
>>>
>>> On 25 Feb 2017 10:36 pm, "Raymond Xie" <xie3208...@gmail.com> wrote:
>>>
>>> That's right Anahita, however, the class name is not indicated in the
>>> original github project so I don't know what class should be used here. The
>>> github only says:
>>> and then run the example
>>> `$ bin/spark-submit --jars \
>>> external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar
>>> \
>>> examples/src/main/python/streaming/kafka_wordcount.py \
>>> localhost:2181 test`
>>> """ Can anyone give any thought on how to find out? Thank you very much
>>> in advance.
>>>
>>>
>>> **
>>> *Sincerely yours,*
>>>
>>>
>>> *Raymond*
>>>
>>> On Sat, Feb 25, 2017 at 5:27 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com> wrote:
>>>
>>>> You're welcome.
>>>> You need to specify the class. I meant like that:
>>>>
>>>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/l
>>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>> --class "give the name of the class"
>>>>
>>>>
>>>>
>>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you, it is still not working:
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> By the way, here is the original source:
>>>>>
>>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>>> n/python/streaming/kafka_wordcount.py
>>>>>
>>>>>
>>>>> **
>>>>> *Sincerely yours,*
>>>>>
>>>>>
>>>>> *Raymond*
>>>>>
>>>>> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <
>>>>> anahita.t.am...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think if you remove --jars, it will work. Like:
>>>>>>
>>>>>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/l
>>>>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>>>
>>>>>>  I had the same problem before and solved it by removing --jars.
>>>>>>
>>>>>> Cheers,
>>>>>> Anahita
>>>>>>
>>>>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am doing a spark streaming on a hortonworks sandbox and am stuck
>>>>>>> here now, can anyone tell me what's wrong with the following code

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Marco Mistroni
Try to use --packages to include the jars. From error it seems it's looking
for main class in jars but u r running a python script...

On 25 Feb 2017 10:36 pm, "Raymond Xie" <xie3208...@gmail.com> wrote:

That's right Anahita, however, the class name is not indicated in the
original github project so I don't know what class should be used here. The
github only says:
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar
\
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test`
""" Can anyone give any thought on how to find out? Thank you very much in
advance.


**
*Sincerely yours,*


*Raymond*

On Sat, Feb 25, 2017 at 5:27 PM, Anahita Talebi <anahita.t.am...@gmail.com>
wrote:

> You're welcome.
> You need to specify the class. I meant like that:
>
> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"
>
>
>
> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:
>
>> Thank you, it is still not working:
>>
>> [image: Inline image 1]
>>
>> By the way, here is the original source:
>>
>> https://github.com/apache/spark/blob/master/examples/src/mai
>> n/python/streaming/kafka_wordcount.py
>>
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <
>> anahita.t.am...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think if you remove --jars, it will work. Like:
>>>
>>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/l
>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>
>>>  I had the same problem before and solved it by removing --jars.
>>>
>>> Cheers,
>>> Anahita
>>>
>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com>
>>> wrote:
>>>
>>>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>>>> now, can anyone tell me what's wrong with the following code and the
>>>> exception it causes and how do I fix it? Thank you very much in advance.
>>>>
>>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> No main class set in JAR; please specify one with --class
>>>>
>>>>
>>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>
>>>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0
>>>> -1245-hadoop2.7.3.2.5.0.0-1245.jar  /root/hdp/kafka_wordcount.py
>>>> 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>
>>>> **
>>>> *Sincerely yours,*
>>>>
>>>>
>>>> *Raymond*
>>>>
>>>
>>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
Thank you very much Marco,

I am a beginner in this area, is it possible for you to show me what you
think the right script should be to get it executed in terminal?


**
*Sincerely yours,*


*Raymond*

On Sat, Feb 25, 2017 at 6:00 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> Try to use --packages to include the jars. From error it seems it's
> looking for main class in jars but u r running a python script...
>
> On 25 Feb 2017 10:36 pm, "Raymond Xie" <xie3208...@gmail.com> wrote:
>
> That's right Anahita, however, the class name is not indicated in the
> original github project so I don't know what class should be used here. The
> github only says:
> and then run the example
> `$ bin/spark-submit --jars \
> external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar
> \
> examples/src/main/python/streaming/kafka_wordcount.py \
> localhost:2181 test`
> """ Can anyone give any thought on how to find out? Thank you very much
> in advance.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sat, Feb 25, 2017 at 5:27 PM, Anahita Talebi <anahita.t.am...@gmail.com
> > wrote:
>
>> You're welcome.
>> You need to specify the class. I meant like that:
>>
>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
>> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"
>>
>>
>>
>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:
>>
>>> Thank you, it is still not working:
>>>
>>> [image: Inline image 1]
>>>
>>> By the way, here is the original source:
>>>
>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>> n/python/streaming/kafka_wordcount.py
>>>
>>>
>>> **
>>> *Sincerely yours,*
>>>
>>>
>>> *Raymond*
>>>
>>> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <
>>> anahita.t.am...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I think if you remove --jars, it will work. Like:
>>>>
>>>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/l
>>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>
>>>>  I had the same problem before and solved it by removing --jars.
>>>>
>>>> Cheers,
>>>> Anahita
>>>>
>>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am doing a spark streaming on a hortonworks sandbox and am stuck
>>>>> here now, can anyone tell me what's wrong with the following code and the
>>>>> exception it causes and how do I fix it? Thank you very much in advance.
>>>>>
>>>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>>
>>>>> Error:
>>>>> No main class set in JAR; please specify one with --class
>>>>>
>>>>>
>>>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>>
>>>>> Error:
>>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>>
>>>>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>>>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>> /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0
>>>>> -1245-hadoop2.7.3.2.5.0.0-1245.jar  /root/hdp/kafka_wordcount.py
>>>>> 192.168.128.119:2181 test
>>>>>
>>>>> Error:
>>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>>>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>>
>>>>> **
>>>>> *Sincerely yours,*
>>>>>
>>>>>
>>>>> *Raymond*
>>>>>
>>>>
>>>
>
>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
That's right Anahita, however, the class name is not indicated in the
original github project so I don't know what class should be used here. The
github only says:
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar
\
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test`
""" Can anyone give any thought on how to find out? Thank you very much in
advance.


**
*Sincerely yours,*


*Raymond*

On Sat, Feb 25, 2017 at 5:27 PM, Anahita Talebi <anahita.t.am...@gmail.com>
wrote:

> You're welcome.
> You need to specify the class. I meant like that:
>
> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"
>
>
>
> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:
>
>> Thank you, it is still not working:
>>
>> [image: Inline image 1]
>>
>> By the way, here is the original source:
>>
>> https://github.com/apache/spark/blob/master/examples/src/mai
>> n/python/streaming/kafka_wordcount.py
>>
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <
>> anahita.t.am...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I think if you remove --jars, it will work. Like:
>>>
>>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/l
>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>
>>>  I had the same problem before and solved it by removing --jars.
>>>
>>> Cheers,
>>> Anahita
>>>
>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com>
>>> wrote:
>>>
>>>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>>>> now, can anyone tell me what's wrong with the following code and the
>>>> exception it causes and how do I fix it? Thank you very much in advance.
>>>>
>>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> No main class set in JAR; please specify one with --class
>>>>
>>>>
>>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>>
>>>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>> /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0
>>>> -1245-hadoop2.7.3.2.5.0.0-1245.jar  /root/hdp/kafka_wordcount.py
>>>> 192.168.128.119:2181 test
>>>>
>>>> Error:
>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>>
>>>> **
>>>> *Sincerely yours,*
>>>>
>>>>
>>>> *Raymond*
>>>>
>>>
>>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Anahita Talebi
You're welcome.
You need to specify the class. I meant like that:

spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
0-1245-hadoop2.7.3.2.5.0.0-1245.jar --class "give the name of the class"



On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:

> Thank you, it is still not working:
>
> [image: Inline image 1]
>
> By the way, here is the original source:
>
> https://github.com/apache/spark/blob/master/examples/
> src/main/python/streaming/kafka_wordcount.py
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <anahita.t.am...@gmail.com
> <javascript:_e(%7B%7D,'cvml','anahita.t.am...@gmail.com');>> wrote:
>
>> Hi,
>>
>> I think if you remove --jars, it will work. Like:
>>
>> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
>> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>
>>  I had the same problem before and solved it by removing --jars.
>>
>> Cheers,
>> Anahita
>>
>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','xie3208...@gmail.com');>> wrote:
>>
>>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>>> now, can anyone tell me what's wrong with the following code and the
>>> exception it causes and how do I fix it? Thank you very much in advance.
>>>
>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> No main class set in JAR; please specify one with --class
>>>
>>>
>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>
>>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/li
>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>>  /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>>
>>> Error:
>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>>
>>> **
>>> *Sincerely yours,*
>>>
>>>
>>> *Raymond*
>>>
>>
>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
Thank you, it is still not working:

[image: Inline image 1]

By the way, here is the original source:

https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py


**
*Sincerely yours,*


*Raymond*

On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi <anahita.t.am...@gmail.com>
wrote:

> Hi,
>
> I think if you remove --jars, it will work. Like:
>
> spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.
> 0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>
>  I had the same problem before and solved it by removing --jars.
>
> Cheers,
> Anahita
>
> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:
>
>> I am doing a spark streaming on a hortonworks sandbox and am stuck here
>> now, can anyone tell me what's wrong with the following code and the
>> exception it causes and how do I fix it? Thank you very much in advance.
>>
>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li
>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>
>> Error:
>> No main class set in JAR; please specify one with --class
>>
>>
>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li
>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>
>> Error:
>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li
>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>
>> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/l
>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/li
>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>>  /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>>
>> Error:
>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li
>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>>
>> **
>> *Sincerely yours,*
>>
>>
>> *Raymond*
>>
>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Anahita Talebi
Hi,

I think if you remove --jars, it will work. Like:

spark-submit  /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.
0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar

 I had the same problem before and solved it by removing --jars.

Cheers,
Anahita

On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> wrote:

> I am doing a spark streaming on a hortonworks sandbox and am stuck here
> now, can anyone tell me what's wrong with the following code and the
> exception it causes and how do I fix it? Thank you very much in advance.
>
> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> No main class set in JAR; please specify one with --class
>
>
> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>
> spark-submit --class  /usr/hdp/2.5.0.0-1245/kafka/
> libs/kafka-streams-0.10.0.2.5.0.0-1245.jar /usr/hdp/2.5.0.0-1245/spark/
> lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
>  /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test
>
> Error:
> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/
> libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>


Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread yohann jardin
You should read (again?) the Spark documentation about submitting an 
application: http://spark.apache.org/docs/latest/submitting-applications.html

Try with the Pi computation example available with Spark.
For example:

./bin/spark-submit --class org.apache.spark.examples.SparkPi 
examples/jars/spark-examples*.jar

after --class you specify the path, in your provided jar, to the Main you want 
to run. You finish by specifying the jar that contains your main class.

Yohann Jardin

Le 2/25/2017 à 9:50 PM, Raymond Xie a écrit :
I am doing a spark streaming on a hortonworks sandbox and am stuck here now, 
can anyone tell me what's wrong with the following code and the exception it 
causes and how do I fix it? Thank you very much in advance.

spark-submit --jars 
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar 
/root/hdp/kafka_wordcount.py 192.168.128.119:2181<http://192.168.128.119:2181> 
test

Error:
No main class set in JAR; please specify one with --class


spark-submit --class 
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
  /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar 
/root/hdp/kafka_wordcount.py 192.168.128.119:2181<http://192.168.128.119:2181> 
test

Error:
java.lang.ClassNotFoundException: 
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar

spark-submit --class  
/usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar 
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
  /root/hdp/kafka_wordcount.py 
192.168.128.119:2181<http://192.168.128.119:2181> test

Error:
java.lang.ClassNotFoundException: 
/usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar


Sincerely yours,


Raymond



No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
I am doing a spark streaming on a hortonworks sandbox and am stuck here
now, can anyone tell me what's wrong with the following code and the
exception it causes and how do I fix it? Thank you very much in advance.

spark-submit --jars
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
 /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
/root/hdp/kafka_wordcount.py 192.168.128.119:2181 test

Error:
No main class set in JAR; please specify one with --class


spark-submit --class
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
 /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
/root/hdp/kafka_wordcount.py 192.168.128.119:2181 test

Error:
java.lang.ClassNotFoundException:
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar

spark-submit --class
 /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar
/usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar
 /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test

Error:
java.lang.ClassNotFoundException:
/usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar

**
*Sincerely yours,*


*Raymond*


Re: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ . Please Help!!!!!!!

2016-11-04 Thread shyla deshpande
I feel so good that Holden replied.

Yes, that was the problem. I was running from Intellij, I removed the
provided scope and works great.

Thanks a lot.

On Fri, Nov 4, 2016 at 2:05 PM, Holden Karau  wrote:

> It seems like you've marked the spark jars as provided, in this case they
> would only be provided you run your application with spark-submit or
> otherwise have Spark's JARs on your class path. How are you launching your
> application?
>
> On Fri, Nov 4, 2016 at 2:00 PM, shyla deshpande 
> wrote:
>
>> object App {
>>
>>
>>  import org.apache.spark.sql.functions._
>> import org.apache.spark.sql.SparkSession
>>
>>   def main(args : Array[String]) {
>> println( "Hello World!" )
>>   val sparkSession = SparkSession.builder.
>>   master("local")
>>   .appName("spark session example")
>>   .getOrCreate()
>>   }
>>
>> }
>>
>>
>> 
>>   1.8
>>   1.8
>>   UTF-8
>>   2.11.8
>>   2.11
>> 
>>
>> 
>>   
>> org.scala-lang
>> scala-library
>> ${scala.version}
>>   
>>
>>   
>>   org.apache.spark
>>   spark-core_2.11
>>   2.0.1
>>   provided
>>   
>>   
>>   org.apache.spark
>>   spark-sql_2.11
>>   2.0.1
>>   provided
>>   
>>
>>   
>> org.specs2
>> specs2-core_${scala.compat.version}
>> 2.4.16
>> test
>>   
>> 
>>
>> 
>>   src/main/scala
>> 
>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ . Please Help!!!!!!!

2016-11-04 Thread Holden Karau
It seems like you've marked the spark jars as provided, in this case they
would only be provided you run your application with spark-submit or
otherwise have Spark's JARs on your class path. How are you launching your
application?

On Fri, Nov 4, 2016 at 2:00 PM, shyla deshpande 
wrote:

> object App {
>
>
>  import org.apache.spark.sql.functions._
> import org.apache.spark.sql.SparkSession
>
>   def main(args : Array[String]) {
> println( "Hello World!" )
>   val sparkSession = SparkSession.builder.
>   master("local")
>   .appName("spark session example")
>   .getOrCreate()
>   }
>
> }
>
>
> 
>   1.8
>   1.8
>   UTF-8
>   2.11.8
>   2.11
> 
>
> 
>   
> org.scala-lang
> scala-library
> ${scala.version}
>   
>
>   
>   org.apache.spark
>   spark-core_2.11
>   2.0.1
>   provided
>   
>   
>   org.apache.spark
>   spark-sql_2.11
>   2.0.1
>   provided
>   
>
>   
> org.specs2
> specs2-core_${scala.compat.version}
> 2.4.16
> test
>   
> 
>
> 
>   src/main/scala
> 
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ . Please Help!!!!!!!

2016-11-04 Thread shyla deshpande
object App {


 import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession

  def main(args : Array[String]) {
println( "Hello World!" )
  val sparkSession = SparkSession.builder.
  master("local")
  .appName("spark session example")
  .getOrCreate()
  }

}



  1.8
  1.8
  UTF-8
  2.11.8
  2.11



  
org.scala-lang
scala-library
${scala.version}
  

  
  org.apache.spark
  spark-core_2.11
  2.0.1
  provided
  
  
  org.apache.spark
  spark-sql_2.11
  2.0.1
  provided
  

  
org.specs2
specs2-core_${scala.compat.version}
2.4.16
test
  



  src/main/scala



Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Use Spark XML version,0.3.3

com.databricks
spark-xml_2.10
0.3.3


On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote:

> Hi Siva
>
> This is what i have for jars. Did you manage to run with these or
> different versions ?
>
>
> 
> org.apache.spark
> spark-core_2.10
> 1.6.1
> 
> 
> org.apache.spark
> spark-sql_2.10
> 1.6.1
> 
> 
> com.databricks
> spark-xml_2.10
> 0.2.0
> 
> 
> org.scala-lang
> scala-library
> 2.10.6
> 
>
> Thanks
> VG
>
>
> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> wrote:
>
>> Hi Marco,
>>
>> I did run in IDE(Intellij) as well. It works fine.
>> VG, make sure the right jar is in classpath.
>>
>> --Siva
>>
>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> and  your eclipse path is correct?
>>> i suggest, as Siva did before, to build your jar and run it via
>>> spark-submit  by specifying the --packages option
>>> it's as simple as run this command
>>>
>>> spark-submit   --packages
>>> com.databricks:spark-xml_:   --class >> your class containing main> 
>>>
>>> Indeed, if you have only these lines to run, why dont you try them in
>>> spark-shell ?
>>>
>>> hth
>>>
>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> nopes. eclipse.
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com>
>>>> wrote:
>>>>
>>>>> If you are running from IDE, Are you using Intellij?
>>>>>
>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Can you try to package as a jar and run using spark-submit
>>>>>>
>>>>>> Siva
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>>>>
>>>>>>> I am trying to run from IDE and everything else is working fine.
>>>>>>> I added spark-xml jar and now I ended up into this dependency
>>>>>>>
>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>>>>> scala/collection/GenTraversableOnce$class*
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>>> Caused by:* java.lang.ClassNotFoundException:
>>>>>>> scala.collection.GenTraversableOnce$class*
>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>> ... 5 more
>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown
>>>>>>> hook
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> So you are using spark-submit  or spark-shell?
>>>>>>>>
>>>>>>>> you will need to launch either by passing --packages option (like
>>>>>>>> in the example below for spark-csv). you will need to iknow
>>>>>>>>
>>>>>>>> --packages com.databricks:spark-xml_:>>>>>>> version>
>>>>>>>>
>>>>>>>> hth
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
It proceeded with the jars I mentioned.
However no data getting loaded into data frame...

sob sob :(

On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote:

> Hi Siva
>
> This is what i have for jars. Did you manage to run with these or
> different versions ?
>
>
> 
> org.apache.spark
> spark-core_2.10
> 1.6.1
> 
> 
> org.apache.spark
> spark-sql_2.10
> 1.6.1
> 
> 
> com.databricks
> spark-xml_2.10
> 0.2.0
> 
> 
> org.scala-lang
> scala-library
> 2.10.6
> 
>
> Thanks
> VG
>
>
> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> wrote:
>
>> Hi Marco,
>>
>> I did run in IDE(Intellij) as well. It works fine.
>> VG, make sure the right jar is in classpath.
>>
>> --Siva
>>
>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> and  your eclipse path is correct?
>>> i suggest, as Siva did before, to build your jar and run it via
>>> spark-submit  by specifying the --packages option
>>> it's as simple as run this command
>>>
>>> spark-submit   --packages
>>> com.databricks:spark-xml_:   --class >> your class containing main> 
>>>
>>> Indeed, if you have only these lines to run, why dont you try them in
>>> spark-shell ?
>>>
>>> hth
>>>
>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> nopes. eclipse.
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com>
>>>> wrote:
>>>>
>>>>> If you are running from IDE, Are you using Intellij?
>>>>>
>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Can you try to package as a jar and run using spark-submit
>>>>>>
>>>>>> Siva
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>>>>
>>>>>>> I am trying to run from IDE and everything else is working fine.
>>>>>>> I added spark-xml jar and now I ended up into this dependency
>>>>>>>
>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>>>>> scala/collection/GenTraversableOnce$class*
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>>> Caused by:* java.lang.ClassNotFoundException:
>>>>>>> scala.collection.GenTraversableOnce$class*
>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>>> ... 5 more
>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown
>>>>>>> hook
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> So you are using spark-submit  or spark-shell?
>>>>>>>>
>>>>>>>> you will need to launch either by passing --packages option (like
>>>>>>>> in the example below for spark-csv). you will need to iknow
>>>>>>>>
>>>>>>>> --packages com.databricks:spark-xml_:>>>>>>> version>
>>>>>>>>
>>>>>>>> hth
>>>>>>>>
>>>>>>>>
>>>>>&

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Hi Siva

This is what i have for jars. Did you manage to run with these or different
versions ?



org.apache.spark
spark-core_2.10
1.6.1


org.apache.spark
spark-sql_2.10
1.6.1


com.databricks
spark-xml_2.10
0.2.0


org.scala-lang
scala-library
2.10.6


Thanks
VG


On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> wrote:

> Hi Marco,
>
> I did run in IDE(Intellij) as well. It works fine.
> VG, make sure the right jar is in classpath.
>
> --Siva
>
> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> and  your eclipse path is correct?
>> i suggest, as Siva did before, to build your jar and run it via
>> spark-submit  by specifying the --packages option
>> it's as simple as run this command
>>
>> spark-submit   --packages
>> com.databricks:spark-xml_:   --class > your class containing main> 
>>
>> Indeed, if you have only these lines to run, why dont you try them in
>> spark-shell ?
>>
>> hth
>>
>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote:
>>
>>> nopes. eclipse.
>>>
>>>
>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com>
>>> wrote:
>>>
>>>> If you are running from IDE, Are you using Intellij?
>>>>
>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com>
>>>> wrote:
>>>>
>>>>> Can you try to package as a jar and run using spark-submit
>>>>>
>>>>> Siva
>>>>>
>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>>>
>>>>>> I am trying to run from IDE and everything else is working fine.
>>>>>> I added spark-xml jar and now I ended up into this dependency
>>>>>>
>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>>>> scala/collection/GenTraversableOnce$class*
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>> Caused by:* java.lang.ClassNotFoundException:
>>>>>> scala.collection.GenTraversableOnce$class*
>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>> ... 5 more
>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown
>>>>>> hook
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> So you are using spark-submit  or spark-shell?
>>>>>>>
>>>>>>> you will need to launch either by passing --packages option (like in
>>>>>>> the example below for spark-csv). you will need to iknow
>>>>>>>
>>>>>>> --packages com.databricks:spark-xml_:
>>>>>>>
>>>>>>> hth
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Apologies for that.
>>>>>>>> I am trying to use spark-xml to load data of a xml file.
>>>>>>>>
>>>>>>>> here is the exception
>>>>>>>>
>>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed
>>>>>>>> to find data source: org.apache.spark.xml. Ple

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Hi Marco,

I did run in IDE(Intellij) as well. It works fine.
VG, make sure the right jar is in classpath.

--Siva

On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> and  your eclipse path is correct?
> i suggest, as Siva did before, to build your jar and run it via
> spark-submit  by specifying the --packages option
> it's as simple as run this command
>
> spark-submit   --packages
> com.databricks:spark-xml_:   --class  your class containing main> 
>
> Indeed, if you have only these lines to run, why dont you try them in
> spark-shell ?
>
> hth
>
> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote:
>
>> nopes. eclipse.
>>
>>
>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:
>>
>>> If you are running from IDE, Are you using Intellij?
>>>
>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com>
>>> wrote:
>>>
>>>> Can you try to package as a jar and run using spark-submit
>>>>
>>>> Siva
>>>>
>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>>
>>>>> I am trying to run from IDE and everything else is working fine.
>>>>> I added spark-xml jar and now I ended up into this dependency
>>>>>
>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>>> scala/collection/GenTraversableOnce$class*
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>> Caused by:* java.lang.ClassNotFoundException:
>>>>> scala.collection.GenTraversableOnce$class*
>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>> ... 5 more
>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> So you are using spark-submit  or spark-shell?
>>>>>>
>>>>>> you will need to launch either by passing --packages option (like in
>>>>>> the example below for spark-csv). you will need to iknow
>>>>>>
>>>>>> --packages com.databricks:spark-xml_:
>>>>>>
>>>>>> hth
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>>>>
>>>>>>> Apologies for that.
>>>>>>> I am trying to use spark-xml to load data of a xml file.
>>>>>>>
>>>>>>> here is the exception
>>>>>>>
>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed
>>>>>>> to find data source: org.apache.spark.xml. Please find packages at
>>>>>>> http://spark-packages.org
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>>>>> at
>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>>> at
>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>>> Caused by: java.lang.ClassNot

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
and  your eclipse path is correct?
i suggest, as Siva did before, to build your jar and run it via
spark-submit  by specifying the --packages option
it's as simple as run this command

spark-submit   --packages
com.databricks:spark-xml_:   --class  

Indeed, if you have only these lines to run, why dont you try them in
spark-shell ?

hth

On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote:

> nopes. eclipse.
>
>
> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:
>
>> If you are running from IDE, Are you using Intellij?
>>
>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:
>>
>>> Can you try to package as a jar and run using spark-submit
>>>
>>> Siva
>>>
>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>
>>>> I am trying to run from IDE and everything else is working fine.
>>>> I added spark-xml jar and now I ended up into this dependency
>>>>
>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>> scala/collection/GenTraversableOnce$class*
>>>> at
>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>> Caused by:* java.lang.ClassNotFoundException:
>>>> scala.collection.GenTraversableOnce$class*
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 5 more
>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> So you are using spark-submit  or spark-shell?
>>>>>
>>>>> you will need to launch either by passing --packages option (like in
>>>>> the example below for spark-csv). you will need to iknow
>>>>>
>>>>> --packages com.databricks:spark-xml_:
>>>>>
>>>>> hth
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>>>
>>>>>> Apologies for that.
>>>>>> I am trying to use spark-xml to load data of a xml file.
>>>>>>
>>>>>> here is the exception
>>>>>>
>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed
>>>>>> to find data source: org.apache.spark.xml. Please find packages at
>>>>>> http://spark-packages.org
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>> org.apache.spark.xml.DefaultSource
>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>&g

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Try to import the class and see if you are getting compilation error

import com.databricks.spark.xml

Siva

On Fri, Jun 17, 2016 at 4:02 PM, VG <vlin...@gmail.com> wrote:

> nopes. eclipse.
>
>
> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:
>
>> If you are running from IDE, Are you using Intellij?
>>
>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:
>>
>>> Can you try to package as a jar and run using spark-submit
>>>
>>> Siva
>>>
>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>>
>>>> I am trying to run from IDE and everything else is working fine.
>>>> I added spark-xml jar and now I ended up into this dependency
>>>>
>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>>> scala/collection/GenTraversableOnce$class*
>>>> at
>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>> Caused by:* java.lang.ClassNotFoundException:
>>>> scala.collection.GenTraversableOnce$class*
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 5 more
>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> So you are using spark-submit  or spark-shell?
>>>>>
>>>>> you will need to launch either by passing --packages option (like in
>>>>> the example below for spark-csv). you will need to iknow
>>>>>
>>>>> --packages com.databricks:spark-xml_:
>>>>>
>>>>> hth
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>>>
>>>>>> Apologies for that.
>>>>>> I am trying to use spark-xml to load data of a xml file.
>>>>>>
>>>>>> here is the exception
>>>>>>
>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed
>>>>>> to find data source: org.apache.spark.xml. Please find packages at
>>>>>> http://spark-packages.org
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>>> at
>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>> org.apache.spark.xml.DefaultSource
>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>>> at
>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>>> at scala.util.Try$.apply

Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
nopes. eclipse.


On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> wrote:

> If you are running from IDE, Are you using Intellij?
>
> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:
>
>> Can you try to package as a jar and run using spark-submit
>>
>> Siva
>>
>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>>
>>> I am trying to run from IDE and everything else is working fine.
>>> I added spark-xml jar and now I ended up into this dependency
>>>
>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>>> scala/collection/GenTraversableOnce$class*
>>> at
>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>> Caused by:* java.lang.ClassNotFoundException:
>>> scala.collection.GenTraversableOnce$class*
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 5 more
>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>>> wrote:
>>>
>>>> So you are using spark-submit  or spark-shell?
>>>>
>>>> you will need to launch either by passing --packages option (like in
>>>> the example below for spark-csv). you will need to iknow
>>>>
>>>> --packages com.databricks:spark-xml_:
>>>>
>>>> hth
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>>
>>>>> Apologies for that.
>>>>> I am trying to use spark-xml to load data of a xml file.
>>>>>
>>>>> here is the exception
>>>>>
>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>>>>> find data source: org.apache.spark.xml. Please find packages at
>>>>> http://spark-packages.org
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.apache.spark.xml.DefaultSource
>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>>> at scala.util.Try$.apply(Try.scala:192)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>>> at scala.util.Try.orElse(Try.scala:84)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>>>>> ... 4 more
>>>>>
>>>>> Code
>>>>> SQLContext sqlContext = new SQLContext(sc);
>>>>> DataFrame df = sqlContext.read()
>>>>> .format("org.apache.spark.xml")
>>>>> .option("rowTag", "row")
>>>>> .load("A.xml");
>>>>>
>>>>> Any suggestions please ..
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> too little info
>>>>>> it'll help if you can post the exception and show your sbt file (if
>>>>>> you are using sbt), and provide minimal details on what you are doing
>>>>>> kr
>>>>>>
>>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>>>>
>>>>>>> Failed to find data source: com.databricks.spark.xml
>>>>>>>
>>>>>>> Any suggestions to resolve this
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If you are running from IDE, Are you using Intellij?

On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> wrote:

> Can you try to package as a jar and run using spark-submit
>
> Siva
>
> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:
>
>> I am trying to run from IDE and everything else is working fine.
>> I added spark-xml jar and now I ended up into this dependency
>>
>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
>> Exception in thread "main" *java.lang.NoClassDefFoundError:
>> scala/collection/GenTraversableOnce$class*
>> at
>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>> Caused by:* java.lang.ClassNotFoundException:
>> scala.collection.GenTraversableOnce$class*
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 5 more
>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>>
>>
>>
>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> So you are using spark-submit  or spark-shell?
>>>
>>> you will need to launch either by passing --packages option (like in the
>>> example below for spark-csv). you will need to iknow
>>>
>>> --packages com.databricks:spark-xml_:
>>>
>>> hth
>>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> Apologies for that.
>>>> I am trying to use spark-xml to load data of a xml file.
>>>>
>>>> here is the exception
>>>>
>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>>>> find data source: org.apache.spark.xml. Please find packages at
>>>> http://spark-packages.org
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.apache.spark.xml.DefaultSource
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>>> at scala.util.Try$.apply(Try.scala:192)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>>> at scala.util.Try.orElse(Try.scala:84)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>>>> ... 4 more
>>>>
>>>> Code
>>>> SQLContext sqlContext = new SQLContext(sc);
>>>> DataFrame df = sqlContext.read()
>>>> .format("org.apache.spark.xml")
>>>> .option("rowTag", "row")
>>>> .load("A.xml");
>>>>
>>>> Any suggestions please ..
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>>>> wrote:
>>>>
>>>>> too little info
>>>>> it'll help if you can post the exception and show your sbt file (if
>>>>> you are using sbt), and provide minimal details on what you are doing
>>>>> kr
>>>>>
>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>>>
>>>>>> Failed to find data source: com.databricks.spark.xml
>>>>>>
>>>>>> Any suggestions to resolve this
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Can you try to package as a jar and run using spark-submit

Siva

On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote:

> I am trying to run from IDE and everything else is working fine.
> I added spark-xml jar and now I ended up into this dependency
>
> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
> Exception in thread "main" *java.lang.NoClassDefFoundError:
> scala/collection/GenTraversableOnce$class*
> at
> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
> Caused by:* java.lang.ClassNotFoundException:
> scala.collection.GenTraversableOnce$class*
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 5 more
> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook
>
>
>
> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> So you are using spark-submit  or spark-shell?
>>
>> you will need to launch either by passing --packages option (like in the
>> example below for spark-csv). you will need to iknow
>>
>> --packages com.databricks:spark-xml_:
>>
>> hth
>>
>>
>>
>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>>
>>> Apologies for that.
>>> I am trying to use spark-xml to load data of a xml file.
>>>
>>> here is the exception
>>>
>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>>> find data source: org.apache.spark.xml. Please find packages at
>>> http://spark-packages.org
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.xml.DefaultSource
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>>> at scala.util.Try$.apply(Try.scala:192)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>>> at scala.util.Try.orElse(Try.scala:84)
>>> at
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>>> ... 4 more
>>>
>>> Code
>>> SQLContext sqlContext = new SQLContext(sc);
>>> DataFrame df = sqlContext.read()
>>> .format("org.apache.spark.xml")
>>> .option("rowTag", "row")
>>> .load("A.xml");
>>>
>>> Any suggestions please ..
>>>
>>>
>>>
>>>
>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>>> wrote:
>>>
>>>> too little info
>>>> it'll help if you can post the exception and show your sbt file (if you
>>>> are using sbt), and provide minimal details on what you are doing
>>>> kr
>>>>
>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>>
>>>>> Failed to find data source: com.databricks.spark.xml
>>>>>
>>>>> Any suggestions to resolve this
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
I am trying to run from IDE and everything else is working fine.
I added spark-xml jar and now I ended up into this dependency

6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" *java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class*
at
org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.(ddl.scala:150)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
Caused by:* java.lang.ClassNotFoundException:
scala.collection.GenTraversableOnce$class*
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown hook



On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> So you are using spark-submit  or spark-shell?
>
> you will need to launch either by passing --packages option (like in the
> example below for spark-csv). you will need to iknow
>
> --packages com.databricks:spark-xml_:
>
> hth
>
>
>
> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:
>
>> Apologies for that.
>> I am trying to use spark-xml to load data of a xml file.
>>
>> here is the exception
>>
>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>> find data source: org.apache.spark.xml. Please find packages at
>> http://spark-packages.org
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.xml.DefaultSource
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try$.apply(Try.scala:192)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try.orElse(Try.scala:84)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>> ... 4 more
>>
>> Code
>> SQLContext sqlContext = new SQLContext(sc);
>> DataFrame df = sqlContext.read()
>> .format("org.apache.spark.xml")
>> .option("rowTag", "row")
>> .load("A.xml");
>>
>> Any suggestions please ..
>>
>>
>>
>>
>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> too little info
>>> it'll help if you can post the exception and show your sbt file (if you
>>> are using sbt), and provide minimal details on what you are doing
>>> kr
>>>
>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> Failed to find data source: com.databricks.spark.xml
>>>>
>>>> Any suggestions to resolve this
>>>>
>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Hi Siva,

I still get a similar exception (See the highlighted section - It is
looking for DataSource)
16/06/17 15:11:37 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: xml. Please find packages at http://spark-packages.org
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
*Caused by: java.lang.ClassNotFoundException: xml.DefaultSource*
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at scala.util.Try$.apply(Try.scala:192)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at scala.util.Try.orElse(Try.scala:84)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
... 4 more
16/06/17 15:11:38 INFO SparkContext: Invoking stop() from shutdown hook



On Fri, Jun 17, 2016 at 2:56 PM, Siva A <siva9940261...@gmail.com> wrote:

> Just try to use "xml" as format like below,
>
> SQLContext sqlContext = new SQLContext(sc);
> DataFrame df = sqlContext.read()
> .format("xml")
> .option("rowTag", "row")
> .load("A.xml");
>
> FYR: https://github.com/databricks/spark-xml
>
> --Siva
>
> On Fri, Jun 17, 2016 at 2:50 PM, VG <vlin...@gmail.com> wrote:
>
>> Apologies for that.
>> I am trying to use spark-xml to load data of a xml file.
>>
>> here is the exception
>>
>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>> find data source: org.apache.spark.xml. Please find packages at
>> http://spark-packages.org
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.xml.DefaultSource
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try$.apply(Try.scala:192)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try.orElse(Try.scala:84)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>> ... 4 more
>>
>> Code
>> SQLContext sqlContext = new SQLContext(sc);
>> DataFrame df = sqlContext.read()
>> .format("org.apache.spark.xml")
>> .option("rowTag", "row")
>> .load("A.xml");
>>
>> Any suggestions please ..
>>
>>
>>
>>
>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> too little info
>>> it'll help if you can post the exception and show your sbt file (if you
>>> are using sbt), and provide minimal details on what you are doing
>>> kr
>>>
>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> Failed to find data source: com.databricks.spark.xml
>>>>
>>>> Any suggestions to resolve this
>>>>
>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
So you are using spark-submit  or spark-shell?

you will need to launch either by passing --packages option (like in the
example below for spark-csv). you will need to iknow

--packages com.databricks:spark-xml_:

hth



On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote:

> Apologies for that.
> I am trying to use spark-xml to load data of a xml file.
>
> here is the exception
>
> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find data source: org.apache.spark.xml. Please find packages at
> http://spark-packages.org
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.xml.DefaultSource
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
> at scala.util.Try$.apply(Try.scala:192)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
> at scala.util.Try.orElse(Try.scala:84)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
> ... 4 more
>
> Code
> SQLContext sqlContext = new SQLContext(sc);
> DataFrame df = sqlContext.read()
> .format("org.apache.spark.xml")
> .option("rowTag", "row")
> .load("A.xml");
>
> Any suggestions please ..
>
>
>
>
> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> too little info
>> it'll help if you can post the exception and show your sbt file (if you
>> are using sbt), and provide minimal details on what you are doing
>> kr
>>
>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>
>>> Failed to find data source: com.databricks.spark.xml
>>>
>>> Any suggestions to resolve this
>>>
>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
If its not working,

Add the package list while executing spark-submit/spark-shell like below

$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3

$SPARK_HOME/bin/spark-submit --packages com.databricks:spark-xml_2.10:0.3.3



On Fri, Jun 17, 2016 at 2:56 PM, Siva A <siva9940261...@gmail.com> wrote:

> Just try to use "xml" as format like below,
>
> SQLContext sqlContext = new SQLContext(sc);
> DataFrame df = sqlContext.read()
> .format("xml")
> .option("rowTag", "row")
> .load("A.xml");
>
> FYR: https://github.com/databricks/spark-xml
>
> --Siva
>
> On Fri, Jun 17, 2016 at 2:50 PM, VG <vlin...@gmail.com> wrote:
>
>> Apologies for that.
>> I am trying to use spark-xml to load data of a xml file.
>>
>> here is the exception
>>
>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
>> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
>> find data source: org.apache.spark.xml. Please find packages at
>> http://spark-packages.org
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.xml.DefaultSource
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try$.apply(Try.scala:192)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
>> at scala.util.Try.orElse(Try.scala:84)
>> at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
>> ... 4 more
>>
>> Code
>> SQLContext sqlContext = new SQLContext(sc);
>> DataFrame df = sqlContext.read()
>> .format("org.apache.spark.xml")
>> .option("rowTag", "row")
>> .load("A.xml");
>>
>> Any suggestions please ..
>>
>>
>>
>>
>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
>> wrote:
>>
>>> too little info
>>> it'll help if you can post the exception and show your sbt file (if you
>>> are using sbt), and provide minimal details on what you are doing
>>> kr
>>>
>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>>
>>>> Failed to find data source: com.databricks.spark.xml
>>>>
>>>> Any suggestions to resolve this
>>>>
>>>>
>>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Siva A
Just try to use "xml" as format like below,

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("xml")
.option("rowTag", "row")
.load("A.xml");

FYR: https://github.com/databricks/spark-xml

--Siva

On Fri, Jun 17, 2016 at 2:50 PM, VG <vlin...@gmail.com> wrote:

> Apologies for that.
> I am trying to use spark-xml to load data of a xml file.
>
> here is the exception
>
> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find data source: org.apache.spark.xml. Please find packages at
> http://spark-packages.org
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.xml.DefaultSource
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
> at scala.util.Try$.apply(Try.scala:192)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
> at scala.util.Try.orElse(Try.scala:84)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
> ... 4 more
>
> Code
> SQLContext sqlContext = new SQLContext(sc);
> DataFrame df = sqlContext.read()
> .format("org.apache.spark.xml")
> .option("rowTag", "row")
> .load("A.xml");
>
> Any suggestions please ..
>
>
>
>
> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com>
> wrote:
>
>> too little info
>> it'll help if you can post the exception and show your sbt file (if you
>> are using sbt), and provide minimal details on what you are doing
>> kr
>>
>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>>
>>> Failed to find data source: com.databricks.spark.xml
>>>
>>> Any suggestions to resolve this
>>>
>>>
>>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Apologies for that.
I am trying to use spark-xml to load data of a xml file.

here is the exception

16/06/17 14:49:04 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: org.apache.spark.xml. Please find packages at
http://spark-packages.org
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.xml.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62)
at scala.util.Try$.apply(Try.scala:192)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62)
at scala.util.Try.orElse(Try.scala:84)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62)
... 4 more

Code
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("org.apache.spark.xml")
.option("rowTag", "row")
.load("A.xml");

Any suggestions please ..




On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> too little info
> it'll help if you can post the exception and show your sbt file (if you
> are using sbt), and provide minimal details on what you are doing
> kr
>
> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote:
>
>> Failed to find data source: com.databricks.spark.xml
>>
>> Any suggestions to resolve this
>>
>>
>>
>


Re: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread Marco Mistroni
too little info
it'll help if you can post the exception and show your sbt file (if you are
using sbt), and provide minimal details on what you are doing
kr

On Fri, Jun 17, 2016 at 10:08 AM, VG  wrote:

> Failed to find data source: com.databricks.spark.xml
>
> Any suggestions to resolve this
>
>
>


java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at http://spark-packages.org

2016-06-17 Thread VG
Failed to find data source: com.databricks.spark.xml

Any suggestions to resolve this


Re: ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-16 Thread Jacek Laskowski
Hi,

Why do you provided spark-core while the others are non-provided? How do
you assemble the app? How do you submit it for execution? What's the
deployment environment?

More info...more info...

Jacek
On 15 Jun 2016 10:26 p.m., "S Sarkar"  wrote:

Hello,

I built package for a spark application with the following sbt file:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies ++= Seq(
  "org.apache.spark"  %% "spark-core"  % "1.4.0" % "provided",
  "org.apache.spark"  %% "spark-mllib" % "1.4.0",
  "org.apache.spark"  %% "spark-sql"   % "1.4.0",
  "org.apache.spark"  %% "spark-sql"   % "1.4.0"
  )
resolvers += "Akka Repository" at "http://repo.akka.io/releases/;

I am getting TaskResultGetter error with ClassNotFoundException for
scala.Some .

Can I please get some help how to fix it?

Thanks,
S. Sarkar



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskResultGetter-Exception-while-getting-task-result-java-io-IOException-java-lang-ClassNotFoue-tp27178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


Re: [scala-user] ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-16 Thread Oliver Ruebenacker
 Hello,

  It would be useful to see the code that throws the exception. It probably
means that the Scala standard library is not being uploaded to the
executers. Try adding the Scala standard library to the SBT file
("org.scala-lang" % "scala-library" % "2.10.3"), or check your
configuration. Also, did you launch using spark-submit?

 Best, Oliver

On Wed, Jun 15, 2016 at 4:16 PM,  wrote:

> Hello,
>
> I am building package for spark application with the following sbt file:
>
> name := "Simple Project"
>
> version := "1.0"
>
> scalaVersion := "2.10.3"
>
> libraryDependencies ++= Seq(
>   "org.apache.spark"  %% "spark-core"  % "1.4.0" % "provided",
>   "org.apache.spark"  %% "spark-mllib" % "1.4.0",
>   "org.apache.spark"  %% "spark-sql"   % "1.4.0",
>   "org.apache.spark"  %% "spark-sql"   % "1.4.0"
>   )
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/;
>
> I am getting TaskResultGetter error with ClassNotFoundException for
> scala.Some .
>
> Can I please get some help how to fix it?
>
> Thanks,
> S. Sarkar
>
> --
> You received this message because you are subscribed to the Google Groups
> "scala-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scala-user+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
, Broad Institute



ERROR TaskResultGetter: Exception while getting task result java.io.IOException: java.lang.ClassNotFoundException: scala.Some

2016-06-15 Thread S Sarkar
Hello,

I built package for a spark application with the following sbt file:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.3"

libraryDependencies ++= Seq(
  "org.apache.spark"  %% "spark-core"  % "1.4.0" % "provided",
  "org.apache.spark"  %% "spark-mllib" % "1.4.0",
  "org.apache.spark"  %% "spark-sql"   % "1.4.0",
  "org.apache.spark"  %% "spark-sql"   % "1.4.0"
  )
resolvers += "Akka Repository" at "http://repo.akka.io/releases/;

I am getting TaskResultGetter error with ClassNotFoundException for
scala.Some .

Can I please get some help how to fix it?  

Thanks,
S. Sarkar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-TaskResultGetter-Exception-while-getting-task-result-java-io-IOException-java-lang-ClassNotFoue-tp27178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Analyzing json Data streams using sparkSQL in spark streaming returns java.lang.ClassNotFoundException

2016-03-08 Thread Tristan Nixon
this is a bit strange, because you’re trying to create an RDD inside of a 
foreach function (the jsonElements). This executes on the workers, and so will 
actually produce a different instance in each JVM on each worker, not one 
single RDD referenced by the driver, which is what I think you’re trying to get.

Why don’t you try something like:

JavaDStream jsonElements = lines.flatMap( … )

and just skip the lines.foreach?

> On Mar 8, 2016, at 11:59 AM, Nesrine BEN MUSTAPHA 
> <nesrine.benmusta...@gmail.com> wrote:
> 
> Hello,
> 
> I tried to use sparkSQL to analyse json data streams within a standalone 
> application. 
> 
> here the code snippet that receive the streaming data: 
> final JavaReceiverInputDStream lines = 
> streamCtx.socketTextStream("localhost", Integer.parseInt(args[0]), 
> StorageLevel.MEMORY_AND_DISK_SER_2());
> 
> lines.foreachRDD((rdd) -> {
> 
> final JavaRDD jsonElements = rdd.flatMap(new FlatMapFunction<String, 
> String>() {
> 
> @Override
> 
> public Iterable call(final String line)
> 
> throws Exception {
> 
> return Arrays.asList(line.split("\n"));
> 
> }
> 
> }).filter(new Function<String, Boolean>() {
> 
> @Override
> 
> public Boolean call(final String v1)
> 
> throws Exception {
> 
> return v1.length() > 0;
> 
> }
> 
> });
> 
> //System.out.println("Data Received = " + jsonElements.collect().size());
> 
> final SQLContext sqlContext = 
> JavaSQLContextSingleton.getInstance(rdd.context());
> 
> final DataFrame dfJsonElement = sqlContext.read().json(jsonElements); 
> 
> executeSQLOperations(sqlContext, dfJsonElement);
> 
> });
> 
> streamCtx.start();
> 
> streamCtx.awaitTermination();
> 
> }
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I got the following error when the red line is executed:
> 
> java.lang.ClassNotFoundException: 
> com.intrinsec.common.spark.SQLStreamingJsonAnalyzer$2
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 
> 
> 
> 
> 
> 



Analyzing json Data streams using sparkSQL in spark streaming returns java.lang.ClassNotFoundException

2016-03-08 Thread Nesrine BEN MUSTAPHA
Hello,

I tried to use sparkSQL to analyse json data streams within a standalone
application.

here the code snippet that receive the streaming data:

*final JavaReceiverInputDStream lines =
streamCtx.socketTextStream("localhost", Integer.parseInt(args[0]),
StorageLevel.MEMORY_AND_DISK_SER_2());*

*lines.foreachRDD((rdd) -> {*

*final JavaRDD jsonElements = rdd.flatMap(new
FlatMapFunction<String, String>() {*

*@Override*

*public Iterable call(final String line)*

*throws Exception {*

*return Arrays.asList(line.split("\n"));*

*}*

*}).filter(new Function<String, Boolean>() {*

*@Override*

*public Boolean call(final String v1)*

*throws Exception {*

*return v1.length() > 0;*

*}*

*});*

*//System.out.println("Data Received = " + jsonElements.collect().size());*

*final SQLContext sqlContext =
JavaSQLContextSingleton.getInstance(rdd.context());*

*final DataFrame dfJsonElement = sqlContext.read().json(jsonElements);
 *

*executeSQLOperations(sqlContext, dfJsonElement);*

*});*

*streamCtx.start();*

*streamCtx.awaitTermination();*

*}*


I got the following error when the red line is executed:

java.lang.ClassNotFoundException:
com.intrinsec.common.spark.SQLStreamingJsonAnalyzer$2
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)


starting start-master.sh throws "java.lang.ClassNotFoundException: org.slf4j.Logger" error

2015-11-26 Thread Mich Talebzadeh
Hi,

 

I just built spark without hive jars and trying to run

 

start-master.sh

 

I get this error in the log. Sounds like it cannot find
java.lang.ClassNotFoundException: org.slf4j.Logger

 

Spark Command: /usr/java/latest/bin/java -cp
/usr/lib/spark/sbin/../conf/:/usr/lib/spark/lib/spark-assembly-1.5.2-hadoop2
.6.0.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m
org.apache.spark.deploy.master.Master --ip rhes564 --port 7077 --webui-port
8080



Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger

at java.lang.Class.getDeclaredMethods0(Native Method)

at java.lang.Class.privateGetDeclaredMethods(Class.java:2521)

at java.lang.Class.getMethod0(Class.java:2764)

at java.lang.Class.getMethod(Class.java:1653)

at
sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)

at
sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)

Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 6 more

 

Although I have added to the CLASSPATH.

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908
.pdf>
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf

Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 



Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread Tathagata Das
gt;>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>>>>> > 4.0.0
>>>>> > SparkFirstTry
>>>>> > SparkFirstTry
>>>>> > 0.0.1-SNAPSHOT
>>>>> >
>>>>> > 
>>>>> > 
>>>>> > org.apache.spark
>>>>> > spark-core_2.10
>>>>> > 1.5.1
>>>>> > provided
>>>>> > 
>>>>> >
>>>>> > 
>>>>> > org.apache.spark
>>>>> > spark-streaming_2.10
>>>>> > 1.5.1
>>>>> > provided
>>>>> > 
>>>>> >
>>>>> > 
>>>>> > org.twitter4j
>>>>> > twitter4j-stream
>>>>> > 3.0.3
>>>>> > 
>>>>> > 
>>>>> > org.apache.spark
>>>>> > spark-streaming-twitter_2.10
>>>>> > 1.0.0
>>>>> > 
>>>>> >
>>>>> > 
>>>>> >
>>>>> > 
>>>>> > src
>>>>> > 
>>>>> > 
>>>>> > maven-compiler-plugin
>>>>> > 3.3
>>>>> > 
>>>>> > 1.8
>>>>> > 1.8
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > maven-assembly-plugin
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >
>>>>> > com.test.sparkTest.SimpleApp
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >
>>>>>  jar-with-dependencies
>>>>> > 
>>>>> >     
>>>>> > 
>>>>> >
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >
>>>>> >
>>>>> > The application starts successfully but no tweets comes and this
>>>>> exception
>>>>> > is thrown
>>>>> >
>>>>> > 15/11/08 15:55:46 WARN TaskSetManager: Lost task 0.0 in stage 4.0
>>>>> (TID 78,
>>>>> > 192.168.122.39): java.io.IOException:
>>>>> java.lang.ClassNotFoundException:
>>>>> > org.apache.spark.streaming.twitter.TwitterReceiver
>>>>> > at
>>>>> org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>>>>> > at
>>>>> >
>>>>> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> > at
>>>>> >
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>> > at
>>>>> >
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>> > at java.lang.reflect.Method.invoke(Method.java:497)
>>>>> > at
>>>>> >
>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
>>>>> > at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
>>>>> > at
>>>>> >
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>> > at
>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>> > at
>>>>> >
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>>>> > at
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>> > at
>>>>> >
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>> > at
>>>>> java.io.Objec

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread DW @ Gmail
>>>>>> >
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > here is the pom file
>>>>>> >
>>>>>> > http://maven.apache.org/POM/4.0.0;
>>>>>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>>>>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>>>>>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>>>>>> > 4.0.0
>>>>>> > SparkFirstTry
>>>>>> > SparkFirstTry
>>>>>> > 0.0.1-SNAPSHOT
>>>>>> >
>>>>>> > 
>>>>>> > 
>>>>>> > org.apache.spark
>>>>>> > spark-core_2.10
>>>>>> > 1.5.1
>>>>>> > provided
>>>>>> > 
>>>>>> >
>>>>>> > 
>>>>>> > org.apache.spark
>>>>>> > spark-streaming_2.10
>>>>>> > 1.5.1
>>>>>> > provided
>>>>>> > 
>>>>>> >
>>>>>> > 
>>>>>> > org.twitter4j
>>>>>> > twitter4j-stream
>>>>>> > 3.0.3
>>>>>> > 
>>>>>> > 
>>>>>> > org.apache.spark
>>>>>> > spark-streaming-twitter_2.10
>>>>>> > 1.0.0
>>>>>> > 
>>>>>> >
>>>>>> > 
>>>>>> >
>>>>>> > 
>>>>>> > src
>>>>>> > 
>>>>>> > 
>>>>>> > maven-compiler-plugin
>>>>>> > 3.3
>>>>>> > 
>>>>>> > 1.8
>>>>>> > 1.8
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > maven-assembly-plugin
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >
>>>>>> > com.test.sparkTest.SimpleApp
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > jar-with-dependencies
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >
>>>>>> >
>>>>>> > The application starts successfully but no tweets comes and this 
>>>>>> > exception
>>>>>> > is thrown
>>>>>> >
>>>>>> > 15/11/08 15:55:46 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 
>>>>>> > 78,
>>>>>> > 192.168.122.39): java.io.IOException: java.lang.ClassNotFoundException:
>>>>>> > org.apache.spark.streaming.twitter.TwitterReceiver
>>>>>> > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
>>>>>> > at
>>>>>> > org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> > at
>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>> > at
>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> > at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>> > at
>>>>>> > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
>>>>>> > at 
>>>>>> > java.io.ObjectInputStream.readSerialData

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread أنس الليثي
If I packaged the application and submit it, it works fine but I need to
run it from eclipse.

Is there any problem running the application from eclipse ?



On 9 November 2015 at 12:27, Tathagata Das <t...@databricks.com> wrote:

> How are you submitting the spark application?
> You are supposed to submit the fat-jar of the application that include the
> spark-streaming-twitter dependency (and its subdeps) but not
> spark-streaming and spark-core.
>
> On Mon, Nov 9, 2015 at 1:02 AM, أنس الليثي <dev.fano...@gmail.com> wrote:
>
>> I tried to remove maven and adding the dependencies manually using build
>> path > configure build path > add external jars, then adding the jars
>> manually but it did not work.
>>
>> I tried to create another project and copied the code from the first app
>> but the problem still the same.
>>
>> I event tried to change eclipse with another version, but the same
>> problem exist.
>>
>> :( :( :( :(
>>
>> On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote:
>>
>>> I tried both, the same exception still thrown
>>>
>>> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> You included a very old version of the Twitter jar - 1.0.0. Did you
>>>> mean 1.5.1?
>>>>
>>>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
>>>> > This is my first Spark Stream application. The setup is as following
>>>> >
>>>> > 3 nodes running a spark cluster. One master node and two slaves.
>>>> >
>>>> > The application is a simple java application streaming from twitter
>>>> and
>>>> > dependencies managed by maven.
>>>> >
>>>> > Here is the code of the application
>>>> >
>>>> > public class SimpleApp {
>>>> >
>>>> > public static void main(String[] args) {
>>>> >
>>>> > SparkConf conf = new SparkConf().setAppName("Simple
>>>> > Application").setMaster("spark://rethink-node01:7077");
>>>> >
>>>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new
>>>> > Duration(1000));
>>>> >
>>>> > ConfigurationBuilder cb = new ConfigurationBuilder();
>>>> >
>>>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
>>>> > .setOAuthConsumerSecret("ConsumerSecret")
>>>> > .setOAuthAccessToken("AccessToken")
>>>> > .setOAuthAccessTokenSecret("TokenSecret");
>>>> >
>>>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>>>> >
>>>> > JavaDStream tweets = TwitterUtils.createStream(sc,
>>>> auth);
>>>> >
>>>> >  JavaDStream statuses = tweets.map(new
>>>> Function<Status,
>>>> > String>() {
>>>> >  public String call(Status status) throws Exception {
>>>> > return status.getText();
>>>> > }
>>>> > });
>>>> >
>>>> >  statuses.print();;
>>>> >
>>>> >  sc.start();
>>>> >
>>>> >  sc.awaitTermination();
>>>> >
>>>> > }
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > here is the pom file
>>>> >
>>>> > http://maven.apache.org/POM/4.0.0;
>>>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>>>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>>>> > 4.0.0
>>>> > SparkFirstTry
>>>> > SparkFirstTry
>>>> > 0.0.1-SNAPSHOT
>>>> >
>>>> > 
>>>> > 
>>>> > org.apache.spark
>>>> > spark-core_2.10
>>>> > 1.5.1
>>>> > provided
>>>> > 
>>>> >
>>>> > 
>>>> > org.apache.spark
>>>> >     spark-streaming_2.10
>>>> >  

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread Tathagata Das
How are you submitting the spark application?
You are supposed to submit the fat-jar of the application that include the
spark-streaming-twitter dependency (and its subdeps) but not
spark-streaming and spark-core.

On Mon, Nov 9, 2015 at 1:02 AM, أنس الليثي <dev.fano...@gmail.com> wrote:

> I tried to remove maven and adding the dependencies manually using build
> path > configure build path > add external jars, then adding the jars
> manually but it did not work.
>
> I tried to create another project and copied the code from the first app
> but the problem still the same.
>
> I event tried to change eclipse with another version, but the same problem
> exist.
>
> :( :( :( :(
>
> On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote:
>
>> I tried both, the same exception still thrown
>>
>> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote:
>>
>>> You included a very old version of the Twitter jar - 1.0.0. Did you mean
>>> 1.5.1?
>>>
>>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
>>> > This is my first Spark Stream application. The setup is as following
>>> >
>>> > 3 nodes running a spark cluster. One master node and two slaves.
>>> >
>>> > The application is a simple java application streaming from twitter and
>>> > dependencies managed by maven.
>>> >
>>> > Here is the code of the application
>>> >
>>> > public class SimpleApp {
>>> >
>>> > public static void main(String[] args) {
>>> >
>>> > SparkConf conf = new SparkConf().setAppName("Simple
>>> > Application").setMaster("spark://rethink-node01:7077");
>>> >
>>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new
>>> > Duration(1000));
>>> >
>>> > ConfigurationBuilder cb = new ConfigurationBuilder();
>>> >
>>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
>>> > .setOAuthConsumerSecret("ConsumerSecret")
>>> > .setOAuthAccessToken("AccessToken")
>>> > .setOAuthAccessTokenSecret("TokenSecret");
>>> >
>>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>>> >
>>> > JavaDStream tweets = TwitterUtils.createStream(sc,
>>> auth);
>>> >
>>> >  JavaDStream statuses = tweets.map(new Function<Status,
>>> > String>() {
>>> >  public String call(Status status) throws Exception {
>>> > return status.getText();
>>> > }
>>> > });
>>> >
>>> >  statuses.print();;
>>> >
>>> >  sc.start();
>>> >
>>> >  sc.awaitTermination();
>>> >
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > here is the pom file
>>> >
>>> > http://maven.apache.org/POM/4.0.0;
>>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>>> > 4.0.0
>>> > SparkFirstTry
>>> > SparkFirstTry
>>> > 0.0.1-SNAPSHOT
>>> >
>>> > 
>>> > 
>>> > org.apache.spark
>>> > spark-core_2.10
>>> > 1.5.1
>>> > provided
>>> > 
>>> >
>>> > 
>>> > org.apache.spark
>>> > spark-streaming_2.10
>>> > 1.5.1
>>> > provided
>>> > 
>>> >
>>> > 
>>> > org.twitter4j
>>> > twitter4j-stream
>>> > 3.0.3
>>> > 
>>> > 
>>> > org.apache.spark
>>> > spark-streaming-twitter_2.10
>>> > 1.0.0
>>> > 
>>> >
>>> > 
>>> >
>>> > 
>>> > src
>>> > 
>>> >   

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-09 Thread أنس الليثي
I tried to remove maven and adding the dependencies manually using build
path > configure build path > add external jars, then adding the jars
manually but it did not work.

I tried to create another project and copied the code from the first app
but the problem still the same.

I event tried to change eclipse with another version, but the same problem
exist.

:( :( :( :(

On 9 November 2015 at 10:47, أنس الليثي <dev.fano...@gmail.com> wrote:

> I tried both, the same exception still thrown
>
> On 9 November 2015 at 10:45, Sean Owen <so...@cloudera.com> wrote:
>
>> You included a very old version of the Twitter jar - 1.0.0. Did you mean
>> 1.5.1?
>>
>> On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
>> > This is my first Spark Stream application. The setup is as following
>> >
>> > 3 nodes running a spark cluster. One master node and two slaves.
>> >
>> > The application is a simple java application streaming from twitter and
>> > dependencies managed by maven.
>> >
>> > Here is the code of the application
>> >
>> > public class SimpleApp {
>> >
>> > public static void main(String[] args) {
>> >
>> > SparkConf conf = new SparkConf().setAppName("Simple
>> > Application").setMaster("spark://rethink-node01:7077");
>> >
>> > JavaStreamingContext sc = new JavaStreamingContext(conf, new
>> > Duration(1000));
>> >
>> > ConfigurationBuilder cb = new ConfigurationBuilder();
>> >
>> > cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
>> > .setOAuthConsumerSecret("ConsumerSecret")
>> > .setOAuthAccessToken("AccessToken")
>> > .setOAuthAccessTokenSecret("TokenSecret");
>> >
>> > OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>> >
>> > JavaDStream tweets = TwitterUtils.createStream(sc,
>> auth);
>> >
>> >  JavaDStream statuses = tweets.map(new Function<Status,
>> > String>() {
>> >  public String call(Status status) throws Exception {
>> > return status.getText();
>> > }
>> > });
>> >
>> >  statuses.print();;
>> >
>> >  sc.start();
>> >
>> >  sc.awaitTermination();
>> >
>> > }
>> >
>> > }
>> >
>> >
>> > here is the pom file
>> >
>> > http://maven.apache.org/POM/4.0.0;
>> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
>> > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
>> > http://maven.apache.org/xsd/maven-4.0.0.xsd;>
>> > 4.0.0
>> > SparkFirstTry
>> > SparkFirstTry
>> > 0.0.1-SNAPSHOT
>> >
>> > 
>> > 
>> > org.apache.spark
>> > spark-core_2.10
>> > 1.5.1
>> > provided
>> > 
>> >
>> > 
>> > org.apache.spark
>> > spark-streaming_2.10
>> > 1.5.1
>> > provided
>> > 
>> >
>> > 
>> > org.twitter4j
>> > twitter4j-stream
>> > 3.0.3
>> > 
>> > 
>> > org.apache.spark
>> > spark-streaming-twitter_2.10
>> > 1.0.0
>> >     
>> >
>> >     
>> >
>> > 
>> > src
>> > 
>> > 
>> > maven-compiler-plugin
>> > 3.3
>> > 
>> > 1.8
>> > 1.8
>> > 
>> > 
>> > 
>> > maven-assembly-plugin
>> > 
>> > 
>> > 
>> >
>> > com.test.sparkTest.SimpleApp
>> > 
>> > 
>> > 
>> >
>>  jar-with-dependencies
>> > 
>> > 
>> > 
>> >
>> > 
>> >

java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-08 Thread fanooos
This is my first Spark Stream application. The setup is as following

3 nodes running a spark cluster. One master node and two slaves.

The application is a simple java application streaming from twitter and
dependencies managed by maven.

Here is the code of the application

public class SimpleApp {

public static void main(String[] args) {

SparkConf conf = new SparkConf().setAppName("Simple
Application").setMaster("spark://rethink-node01:7077");

JavaStreamingContext sc = new JavaStreamingContext(conf, new
Duration(1000));

ConfigurationBuilder cb = new ConfigurationBuilder();

cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
.setOAuthConsumerSecret("ConsumerSecret")
.setOAuthAccessToken("AccessToken")
.setOAuthAccessTokenSecret("TokenSecret");

OAuthAuthorization auth = new OAuthAuthorization(cb.build());

JavaDStream tweets = TwitterUtils.createStream(sc, auth);

 JavaDStream statuses = tweets.map(new Function<Status,
String>() {
 public String call(Status status) throws Exception {
return status.getText();
}
});

 statuses.print();;

 sc.start();

 sc.awaitTermination();

}

}


here is the pom file

http://maven.apache.org/POM/4.0.0;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
4.0.0
SparkFirstTry
SparkFirstTry
0.0.1-SNAPSHOT


 
org.apache.spark
spark-core_2.10
1.5.1
provided



org.apache.spark
spark-streaming_2.10
1.5.1
provided



org.twitter4j
twitter4j-stream
3.0.3


org.apache.spark
spark-streaming-twitter_2.10
1.0.0





src


maven-compiler-plugin
3.3

1.8
1.8



maven-assembly-plugin



   
com.test.sparkTest.SimpleApp



jar-with-dependencies









The application starts successfully but no tweets comes and this exception
is thrown

15/11/08 15:55:46 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 78,
192.168.122.39): java.io.IOException: java.lang.ClassNotFoundException:
org.apache.spark.streaming.twitter.TwitterReceiver
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
at
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.streaming.twitter.TwitterReceiver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.spark.se

Re: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver

2015-11-08 Thread Sean Owen
You included a very old version of the Twitter jar - 1.0.0. Did you mean 1.5.1?

On Mon, Nov 9, 2015 at 7:36 AM, fanooos <dev.fano...@gmail.com> wrote:
> This is my first Spark Stream application. The setup is as following
>
> 3 nodes running a spark cluster. One master node and two slaves.
>
> The application is a simple java application streaming from twitter and
> dependencies managed by maven.
>
> Here is the code of the application
>
> public class SimpleApp {
>
> public static void main(String[] args) {
>
> SparkConf conf = new SparkConf().setAppName("Simple
> Application").setMaster("spark://rethink-node01:7077");
>
> JavaStreamingContext sc = new JavaStreamingContext(conf, new
> Duration(1000));
>
> ConfigurationBuilder cb = new ConfigurationBuilder();
>
> cb.setDebugEnabled(true).setOAuthConsumerKey("ConsumerKey")
> .setOAuthConsumerSecret("ConsumerSecret")
> .setOAuthAccessToken("AccessToken")
> .setOAuthAccessTokenSecret("TokenSecret");
>
> OAuthAuthorization auth = new OAuthAuthorization(cb.build());
>
> JavaDStream tweets = TwitterUtils.createStream(sc, auth);
>
>  JavaDStream statuses = tweets.map(new Function<Status,
> String>() {
>  public String call(Status status) throws Exception {
> return status.getText();
> }
> });
>
>  statuses.print();;
>
>  sc.start();
>
>  sc.awaitTermination();
>
> }
>
> }
>
>
> here is the pom file
>
> http://maven.apache.org/POM/4.0.0;
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd;>
> 4.0.0
> SparkFirstTry
> SparkFirstTry
> 0.0.1-SNAPSHOT
>
> 
> 
> org.apache.spark
> spark-core_2.10
> 1.5.1
> provided
> 
>
> 
> org.apache.spark
> spark-streaming_2.10
> 1.5.1
> provided
> 
>
> 
> org.twitter4j
> twitter4j-stream
> 3.0.3
> 
> 
> org.apache.spark
> spark-streaming-twitter_2.10
> 1.0.0
> 
>
> 
>
> 
> src
> 
> 
> maven-compiler-plugin
> 3.3
> 
> 1.8
> 1.8
> 
> 
> 
> maven-assembly-plugin
> 
> 
> 
>
> com.test.sparkTest.SimpleApp
> 
>     
> 
> jar-with-dependencies
> 
> 
> 
>
> 
> 
> 
>
>
> The application starts successfully but no tweets comes and this exception
> is thrown
>
> 15/11/08 15:55:46 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 78,
> 192.168.122.39): java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.spark.streaming.twitter.TwitterReceiver
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1163)
> at
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> at
> org.apache.sp

java.lang.ClassNotFoundException

2015-08-08 Thread Yasemin Kaya
Hi,

I have a little spark program and i am getting an error why i dont
understand.
My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3.
I am using spark 1.3
Submitting : bin/spark-submit --class MonthlyAverage --master local[4]
weather.jar


error:

~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage
--master local[4] weather.jar
java.lang.ClassNotFoundException: MonthlyAverage
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties


Please help me Asap..

yasemin
-- 
hiç ender hiç


Re: java.lang.ClassNotFoundException

2015-08-08 Thread Ted Yu
Have you tried including package name in the class name ?

Thanks



 On Aug 8, 2015, at 12:00 AM, Yasemin Kaya godo...@gmail.com wrote:
 
 Hi,
 
 I have a little spark program and i am getting an error why i dont 
 understand. 
 My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3.
 I am using spark 1.3 
 Submitting : bin/spark-submit --class MonthlyAverage --master local[4] 
 weather.jar
 
 
 error: 
 
 ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage --master 
 local[4] weather.jar
 java.lang.ClassNotFoundException: MonthlyAverage
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:274)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Using Spark's default log4j profile: 
 org/apache/spark/log4j-defaults.properties
 
 
 Please help me Asap..
 
 yasemin
 -- 
 hiç ender hiç


Re: java.lang.ClassNotFoundException

2015-08-08 Thread Yasemin Kaya
Thanx Ted, i solved it :)

2015-08-08 14:07 GMT+03:00 Ted Yu yuzhih...@gmail.com:

 Have you tried including package name in the class name ?

 Thanks



 On Aug 8, 2015, at 12:00 AM, Yasemin Kaya godo...@gmail.com wrote:

 Hi,

 I have a little spark program and i am getting an error why i dont
 understand.
 My code is https://gist.github.com/yaseminn/522a75b863ad78934bc3.
 I am using spark 1.3
 Submitting : bin/spark-submit --class MonthlyAverage --master local[4]
 weather.jar


 error:

 ~/spark-1.3.1-bin-hadoop2.4$ bin/spark-submit --class MonthlyAverage
 --master local[4] weather.jar
 java.lang.ClassNotFoundException: MonthlyAverage
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:274)
 at
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
 at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties


 Please help me Asap..

 yasemin
 --
 hiç ender hiç




-- 
hiç ender hiç


Re: Strange JavaDeserialization error - java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel

2015-03-27 Thread Ondrej Smola
It happens only when StorageLevel is used with 1 replica ( StorageLevel.
MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) , StorageLevel.MEMORY_ONLY ,
StorageLevel.MEMORY_AND_DISK works - the problems must be clearly somewhere
between mesos-spark . From console I see that spark is trying to replicate
to nodes - nodes show up in Mesos active tasks ... but they always fail
with ClassNotFoundE.

2015-03-27 0:52 GMT+01:00 Tathagata Das t...@databricks.com:

 Could you try running a simpler spark streaming program with receiver (may
 be socketStream) and see if that works.

 TD

 On Thu, Mar 26, 2015 at 2:08 PM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi thanks for reply,

 yes I have custom receiver - but it has simple logic .. pop ids from
 redis queue - load docs based on ids from elastic and store them in spark.
 No classloader modifications. I am running multiple Spark batch jobs (with
 user supplied partitioning) and they have no problems, debug in local mode
 show no errors.

 2015-03-26 21:47 GMT+01:00 Tathagata Das t...@databricks.com:

 Here are few steps to debug.

 1. Try using replication from a Spark job: sc.parallelize(1 to 100,
 100).persist(StorageLevel.MEMORY_ONLY_2).count()
 2. If one works, then we know that there is probably nothing wrong with
 the Spark installation, and probably in the threads related to the
 receivers receiving the data. Are you writing a custom receiver? Are you
 somehow playing around with the class loader in the custom receiver?

 TD


 On Thu, Mar 26, 2015 at 10:59 AM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi,

 I am running spark streaming v 1.3.0 (running inside Docker) on Mesos
 0.21.1. Spark streaming is started using Marathon - docker container gets
 deployed and starts streaming (from custom Actor). Spark binary is located
 on shared GlusterFS volume. Data is streamed from Elasticsearch/Redis. When
 new batch arrives Spark tries to replicate it but fails with following
 error :

 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0 of size 2840
 dropped from memory (free 278017782)
 15/03/26 14:50:00 INFO BlockManager: Removing block broadcast_0_piece0
 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0_piece0 of size
 1658 dropped from memory (free 278019440)
 15/03/26 14:50:00 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while invoking
 RpcHandler#receive() on RPC id 7178767328921933569
 java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:344)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
 at
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
 at
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
 at
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
 at
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130

Re: Strange JavaDeserialization error - java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel

2015-03-27 Thread Ondrej Smola
More info

when using *spark.mesos.coarse* everything works as expected. I think this
must be a bug in spark-mesos integration.


2015-03-27 9:23 GMT+01:00 Ondrej Smola ondrej.sm...@gmail.com:

 It happens only when StorageLevel is used with 1 replica ( StorageLevel.
 MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) , StorageLevel.MEMORY_ONLY ,
 StorageLevel.MEMORY_AND_DISK works - the problems must be clearly
 somewhere between mesos-spark . From console I see that spark is trying to
 replicate to nodes - nodes show up in Mesos active tasks ... but they
 always fail with ClassNotFoundE.

 2015-03-27 0:52 GMT+01:00 Tathagata Das t...@databricks.com:

 Could you try running a simpler spark streaming program with receiver
 (may be socketStream) and see if that works.

 TD

 On Thu, Mar 26, 2015 at 2:08 PM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi thanks for reply,

 yes I have custom receiver - but it has simple logic .. pop ids from
 redis queue - load docs based on ids from elastic and store them in spark.
 No classloader modifications. I am running multiple Spark batch jobs (with
 user supplied partitioning) and they have no problems, debug in local mode
 show no errors.

 2015-03-26 21:47 GMT+01:00 Tathagata Das t...@databricks.com:

 Here are few steps to debug.

 1. Try using replication from a Spark job: sc.parallelize(1 to 100,
 100).persist(StorageLevel.MEMORY_ONLY_2).count()
 2. If one works, then we know that there is probably nothing wrong with
 the Spark installation, and probably in the threads related to the
 receivers receiving the data. Are you writing a custom receiver? Are you
 somehow playing around with the class loader in the custom receiver?

 TD


 On Thu, Mar 26, 2015 at 10:59 AM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi,

 I am running spark streaming v 1.3.0 (running inside Docker) on Mesos
 0.21.1. Spark streaming is started using Marathon - docker container gets
 deployed and starts streaming (from custom Actor). Spark binary is located
 on shared GlusterFS volume. Data is streamed from Elasticsearch/Redis. 
 When
 new batch arrives Spark tries to replicate it but fails with following
 error :

 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0 of size 2840
 dropped from memory (free 278017782)
 15/03/26 14:50:00 INFO BlockManager: Removing block broadcast_0_piece0
 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0_piece0 of size
 1658 dropped from memory (free 278019440)
 15/03/26 14:50:00 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while invoking
 RpcHandler#receive() on RPC id 7178767328921933569
 java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:344)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
 at
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
 at
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
 at
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319

Re: Strange JavaDeserialization error - java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel

2015-03-27 Thread Ondrej Smola
Yes, only when using fine grained mode and replication
(StorageLevel.MEMORY_ONLY_2
etc).

2015-03-27 19:06 GMT+01:00 Tathagata Das t...@databricks.com:

 Does it fail with just Spark jobs (using storage levels) on non-coarse
 mode?

 TD

 On Fri, Mar 27, 2015 at 4:39 AM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 More info

 when using *spark.mesos.coarse* everything works as expected. I think
 this must be a bug in spark-mesos integration.


 2015-03-27 9:23 GMT+01:00 Ondrej Smola ondrej.sm...@gmail.com:

 It happens only when StorageLevel is used with 1 replica ( StorageLevel.
 MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) , StorageLevel.MEMORY_ONLY
 ,StorageLevel.MEMORY_AND_DISK works - the problems must be clearly
 somewhere between mesos-spark . From console I see that spark is trying to
 replicate to nodes - nodes show up in Mesos active tasks ... but they
 always fail with ClassNotFoundE.

 2015-03-27 0:52 GMT+01:00 Tathagata Das t...@databricks.com:

 Could you try running a simpler spark streaming program with receiver
 (may be socketStream) and see if that works.

 TD

 On Thu, Mar 26, 2015 at 2:08 PM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi thanks for reply,

 yes I have custom receiver - but it has simple logic .. pop ids from
 redis queue - load docs based on ids from elastic and store them in 
 spark.
 No classloader modifications. I am running multiple Spark batch jobs (with
 user supplied partitioning) and they have no problems, debug in local mode
 show no errors.

 2015-03-26 21:47 GMT+01:00 Tathagata Das t...@databricks.com:

 Here are few steps to debug.

 1. Try using replication from a Spark job: sc.parallelize(1 to 100,
 100).persist(StorageLevel.MEMORY_ONLY_2).count()
 2. If one works, then we know that there is probably nothing wrong
 with the Spark installation, and probably in the threads related to the
 receivers receiving the data. Are you writing a custom receiver? Are you
 somehow playing around with the class loader in the custom receiver?

 TD


 On Thu, Mar 26, 2015 at 10:59 AM, Ondrej Smola 
 ondrej.sm...@gmail.com wrote:

 Hi,

 I am running spark streaming v 1.3.0 (running inside Docker) on
 Mesos 0.21.1. Spark streaming is started using Marathon - docker 
 container
 gets deployed and starts streaming (from custom Actor). Spark binary is
 located on shared GlusterFS volume. Data is streamed from
 Elasticsearch/Redis. When new batch arrives Spark tries to replicate it 
 but
 fails with following error :

 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0 of size 2840
 dropped from memory (free 278017782)
 15/03/26 14:50:00 INFO BlockManager: Removing block
 broadcast_0_piece0
 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0_piece0 of size
 1658 dropped from memory (free 278019440)
 15/03/26 14:50:00 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while
 invoking RpcHandler#receive() on RPC id 7178767328921933569
 java.lang.ClassNotFoundException:
 org/apache/spark/storage/StorageLevel
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:344)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
 at
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
 at
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
 at
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319

Re: Strange JavaDeserialization error - java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel

2015-03-27 Thread Tathagata Das
Does it fail with just Spark jobs (using storage levels) on non-coarse mode?

TD

On Fri, Mar 27, 2015 at 4:39 AM, Ondrej Smola ondrej.sm...@gmail.com
wrote:

 More info

 when using *spark.mesos.coarse* everything works as expected. I think
 this must be a bug in spark-mesos integration.


 2015-03-27 9:23 GMT+01:00 Ondrej Smola ondrej.sm...@gmail.com:

 It happens only when StorageLevel is used with 1 replica ( StorageLevel.
 MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) , StorageLevel.MEMORY_ONLY
 ,StorageLevel.MEMORY_AND_DISK works - the problems must be clearly
 somewhere between mesos-spark . From console I see that spark is trying to
 replicate to nodes - nodes show up in Mesos active tasks ... but they
 always fail with ClassNotFoundE.

 2015-03-27 0:52 GMT+01:00 Tathagata Das t...@databricks.com:

 Could you try running a simpler spark streaming program with receiver
 (may be socketStream) and see if that works.

 TD

 On Thu, Mar 26, 2015 at 2:08 PM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi thanks for reply,

 yes I have custom receiver - but it has simple logic .. pop ids from
 redis queue - load docs based on ids from elastic and store them in spark.
 No classloader modifications. I am running multiple Spark batch jobs (with
 user supplied partitioning) and they have no problems, debug in local mode
 show no errors.

 2015-03-26 21:47 GMT+01:00 Tathagata Das t...@databricks.com:

 Here are few steps to debug.

 1. Try using replication from a Spark job: sc.parallelize(1 to 100,
 100).persist(StorageLevel.MEMORY_ONLY_2).count()
 2. If one works, then we know that there is probably nothing wrong
 with the Spark installation, and probably in the threads related to the
 receivers receiving the data. Are you writing a custom receiver? Are you
 somehow playing around with the class loader in the custom receiver?

 TD


 On Thu, Mar 26, 2015 at 10:59 AM, Ondrej Smola ondrej.sm...@gmail.com
  wrote:

 Hi,

 I am running spark streaming v 1.3.0 (running inside Docker) on Mesos
 0.21.1. Spark streaming is started using Marathon - docker container 
 gets
 deployed and starts streaming (from custom Actor). Spark binary is 
 located
 on shared GlusterFS volume. Data is streamed from Elasticsearch/Redis. 
 When
 new batch arrives Spark tries to replicate it but fails with following
 error :

 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0 of size 2840
 dropped from memory (free 278017782)
 15/03/26 14:50:00 INFO BlockManager: Removing block broadcast_0_piece0
 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0_piece0 of size
 1658 dropped from memory (free 278019440)
 15/03/26 14:50:00 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while invoking
 RpcHandler#receive() on RPC id 7178767328921933569
 java.lang.ClassNotFoundException:
 org/apache/spark/storage/StorageLevel
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:344)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
 at
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
 at
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
 at
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead

Re: Strange JavaDeserialization error - java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel

2015-03-27 Thread Tathagata Das
Seems like a bug, could you file a JIRA?

@Tim: Patrick said you take a look at Mesos related issues. Could you take
a look at this. Thanks!

TD

On Fri, Mar 27, 2015 at 1:25 PM, Ondrej Smola ondrej.sm...@gmail.com
wrote:

 Yes, only when using fine grained mode and replication 
 (StorageLevel.MEMORY_ONLY_2
 etc).

 2015-03-27 19:06 GMT+01:00 Tathagata Das t...@databricks.com:

 Does it fail with just Spark jobs (using storage levels) on non-coarse
 mode?

 TD

 On Fri, Mar 27, 2015 at 4:39 AM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 More info

 when using *spark.mesos.coarse* everything works as expected. I think
 this must be a bug in spark-mesos integration.


 2015-03-27 9:23 GMT+01:00 Ondrej Smola ondrej.sm...@gmail.com:

 It happens only when StorageLevel is used with 1 replica (
 StorageLevel.MEMORY_ONLY_2,StorageLevel.MEMORY_AND_DISK_2) ,
 StorageLevel.MEMORY_ONLY ,StorageLevel.MEMORY_AND_DISK works - the
 problems must be clearly somewhere between mesos-spark . From console I see
 that spark is trying to replicate to nodes - nodes show up in Mesos active
 tasks ... but they always fail with ClassNotFoundE.

 2015-03-27 0:52 GMT+01:00 Tathagata Das t...@databricks.com:

 Could you try running a simpler spark streaming program with receiver
 (may be socketStream) and see if that works.

 TD

 On Thu, Mar 26, 2015 at 2:08 PM, Ondrej Smola ondrej.sm...@gmail.com
 wrote:

 Hi thanks for reply,

 yes I have custom receiver - but it has simple logic .. pop ids from
 redis queue - load docs based on ids from elastic and store them in 
 spark.
 No classloader modifications. I am running multiple Spark batch jobs 
 (with
 user supplied partitioning) and they have no problems, debug in local 
 mode
 show no errors.

 2015-03-26 21:47 GMT+01:00 Tathagata Das t...@databricks.com:

 Here are few steps to debug.

 1. Try using replication from a Spark job: sc.parallelize(1 to 100,
 100).persist(StorageLevel.MEMORY_ONLY_2).count()
 2. If one works, then we know that there is probably nothing wrong
 with the Spark installation, and probably in the threads related to the
 receivers receiving the data. Are you writing a custom receiver? Are you
 somehow playing around with the class loader in the custom receiver?

 TD


 On Thu, Mar 26, 2015 at 10:59 AM, Ondrej Smola 
 ondrej.sm...@gmail.com wrote:

 Hi,

 I am running spark streaming v 1.3.0 (running inside Docker) on
 Mesos 0.21.1. Spark streaming is started using Marathon - docker 
 container
 gets deployed and starts streaming (from custom Actor). Spark binary is
 located on shared GlusterFS volume. Data is streamed from
 Elasticsearch/Redis. When new batch arrives Spark tries to replicate 
 it but
 fails with following error :

 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0 of size 2840
 dropped from memory (free 278017782)
 15/03/26 14:50:00 INFO BlockManager: Removing block
 broadcast_0_piece0
 15/03/26 14:50:00 INFO MemoryStore: Block broadcast_0_piece0 of
 size 1658 dropped from memory (free 278019440)
 15/03/26 14:50:00 INFO BlockManagerMaster: Updated info of block
 broadcast_0_piece0
 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while
 invoking RpcHandler#receive() on RPC id 7178767328921933569
 java.lang.ClassNotFoundException:
 org/apache/spark/storage/StorageLevel
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:344)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
 at
 org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
 at
 org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
 at
 org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
 at
 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at
 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103

RE: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-03 Thread Cheng, Hao
Which version / distribution are you using? Please references this blog that 
Felix C posted if you’re running on CDH.
http://eradiating.wordpress.com/2015/02/22/getting-hivecontext-to-work-in-cdh/

Or you may also need to download the datanucleus*.jar files try to add the 
option of “--jars” while starting the spark shell.

From: Anusha Shamanur [mailto:anushas...@gmail.com]
Sent: Wednesday, March 4, 2015 5:07 AM
To: Cheng, Hao
Subject: Re: Spark SQL Thrift Server start exception : 
java.lang.ClassNotFoundException: 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory

Hi,

I am getting the same error. There is no lib folder in my $SPARK_HOME. But I 
included these jars while calling spark-shell.

Now, I get this:

Caused by: org.datanucleus.exceptions.ClassNotResolvedException: Class 
org.datanucleus.store.rdbms.RDBMSStoreManager was not found in the CLASSPATH. 
Please check your specification and your CLASSPATH.

   at 
org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:218)



How do I solve this?

On Mon, Mar 2, 2015 at 11:04 PM, Cheng, Hao 
hao.ch...@intel.commailto:hao.ch...@intel.com wrote:
Copy those jars into the $SPARK_HOME/lib/

datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar

see https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L120


-Original Message-
From: fanooos [mailto:dev.fano...@gmail.commailto:dev.fano...@gmail.com]
Sent: Tuesday, March 3, 2015 2:50 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Spark SQL Thrift Server start exception : 
java.lang.ClassNotFoundException: 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory

I have installed a hadoop cluster (version : 2.6.0), apache spark (version :
1.2.1 preBuilt for hadoop 2.4 and later), and hive (version 1.0.0).

When I try to start the spark sql thrift server I am getting the following 
exception.

Exception in thread main java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
at scala.Option.orElse(Option.scala:257)
at
org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
at
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:292)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:248)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:91)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:90)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext.init(SQLContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.init(HiveContext.scala:72)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:51)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 26 more

Re: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-03 Thread Anusha Shamanur
I downloaded different versions of the jars and it worked.

Thanks!

On Tue, Mar 3, 2015 at 4:45 PM, Cheng, Hao hao.ch...@intel.com wrote:

  Which version / distribution are you using? Please references this blog
 that Felix C posted if you’re running on CDH.


 http://eradiating.wordpress.com/2015/02/22/getting-hivecontext-to-work-in-cdh/



 Or you may also need to download the datanucleus*.jar files try to add the
 option of “--jars” while starting the spark shell.



 *From:* Anusha Shamanur [mailto:anushas...@gmail.com]
 *Sent:* Wednesday, March 4, 2015 5:07 AM
 *To:* Cheng, Hao
 *Subject:* Re: Spark SQL Thrift Server start exception :
 java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory



 Hi,



 I am getting the same error. There is no lib folder in my $SPARK_HOME. But
 I included these jars while calling spark-shell.



 Now, I get this:

 Caused by: org.datanucleus.exceptions.ClassNotResolvedException: Class
 org.datanucleus.store.rdbms.RDBMSStoreManager was not found in the
 CLASSPATH. Please check your specification and your CLASSPATH.

at
 org.datanucleus.ClassLoaderResolverImpl.classForName(ClassLoaderResolverImpl.java:218)



 How do I solve this?



 On Mon, Mar 2, 2015 at 11:04 PM, Cheng, Hao hao.ch...@intel.com wrote:

 Copy those jars into the $SPARK_HOME/lib/

 datanucleus-api-jdo-3.2.6.jar
 datanucleus-core-3.2.10.jar
 datanucleus-rdbms-3.2.9.jar

 see
 https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L120



 -Original Message-
 From: fanooos [mailto:dev.fano...@gmail.com]
 Sent: Tuesday, March 3, 2015 2:50 PM
 To: user@spark.apache.org
 Subject: Spark SQL Thrift Server start exception :
 java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory

 I have installed a hadoop cluster (version : 2.6.0), apache spark (version
 :
 1.2.1 preBuilt for hadoop 2.4 and later), and hive (version 1.0.0).

 When I try to start the spark sql thrift server I am getting the following
 exception.

 Exception in thread main java.lang.RuntimeException:
 java.lang.RuntimeException: Unable to instantiate
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
 at

 org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
 at

 org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
 at scala.Option.orElse(Option.scala:257)
 at
 org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
 at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
 at

 org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
 at
 org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
 at
 org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:292)
 at
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
 at
 org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:248)
 at
 org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:91)
 at
 org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:90)
 at

 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at org.apache.spark.sql.SQLContext.init(SQLContext.scala:90)
 at
 org.apache.spark.sql.hive.HiveContext.init(HiveContext.scala:72)
 at

 org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:51)
 at

 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56)
 at

 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.RuntimeException: Unable to instantiate
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at

 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
 at

 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
 at

 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
 at

 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453

RE: Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-02 Thread Cheng, Hao
Copy those jars into the $SPARK_HOME/lib/

datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar

see https://github.com/apache/spark/blob/master/bin/compute-classpath.sh#L120


-Original Message-
From: fanooos [mailto:dev.fano...@gmail.com] 
Sent: Tuesday, March 3, 2015 2:50 PM
To: user@spark.apache.org
Subject: Spark SQL Thrift Server start exception : 
java.lang.ClassNotFoundException: 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory

I have installed a hadoop cluster (version : 2.6.0), apache spark (version :
1.2.1 preBuilt for hadoop 2.4 and later), and hive (version 1.0.0). 

When I try to start the spark sql thrift server I am getting the following 
exception. 

Exception in thread main java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
at scala.Option.orElse(Option.scala:257)
at
org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
at
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:292)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:248)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:91)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:90)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext.init(SQLContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.init(HiveContext.scala:72)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:51)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 31 more
Caused by: javax.jdo.JDOFatalUserException: Class 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339

Spark SQL Thrift Server start exception : java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2015-03-02 Thread fanooos
I have installed a hadoop cluster (version : 2.6.0), apache spark (version :
1.2.1 preBuilt for hadoop 2.4 and later), and hive (version 1.0.0). 

When I try to start the spark sql thrift server I am getting the following
exception. 

Exception in thread main java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
at
org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
at scala.Option.orElse(Option.scala:257)
at
org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
at
org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:292)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:248)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:91)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:90)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.SQLContext.init(SQLContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.init(HiveContext.scala:72)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:51)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:62)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 31 more
Caused by: javax.jdo.JDOFatalUserException: Class
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339)
at
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at
org.apache.hadoop.hive.metastore.RawStoreProxy.init(RawStoreProxy.java:58

[graphx] failed to submit an application with java.lang.ClassNotFoundException

2014-11-27 Thread Yifan LI
Hi,

I just tried to submit an application from graphx examples directory, but it 
failed:

yifan2:bin yifanli$ MASTER=local[*] ./run-example graphx.PPR_hubs
java.lang.ClassNotFoundException: org.apache.spark.examples.graphx.PPR_hubs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:318)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

and also,
yifan2:bin yifanli$ ./spark-submit --class 
org.apache.spark.examples.graphx.PPR_hubs 
../examples/target/scala-2.10/spark-examples-1.2.0-SNAPSHOT-hadoop1.0.4.jar
java.lang.ClassNotFoundException: org.apache.spark.examples.graphx.PPR_hubs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:318)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

anyone has some points on this?



Best,
Yifan LI







Re: Fixed:spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-06 Thread serkan.dogan
Thanks MLnick,

I fixed the error.

First i compile spark with original version later  I download this pom file
to examples folder 

https://github.com/tedyu/spark/commit/70fb7b4ea8fd7647e4a4ddca4df71521b749521c


Then i recompile with maven. 


mvn -Dhbase.profile=hadoop-provided -Phadoop-2.4 -Dhadoop.version=2.4.1
-DskipTests clean package 

Now everything is  ok.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15668p15778.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-04 Thread Nick Pentreath
 recent call last):
   File

 /home/downloads/spark/spark-1.1.0/./examples/src/main/python/hbase_inputformat.py,
 line 70, in module
 conf=conf)
   File /home/downloads/spark/spark-1.1.0/python/pyspark/context.py, line
 471, in newAPIHadoopRDD
 jconf, batchSize)
   File

 /usr/lib/python2.6/site-packages/py4j-0.8.2.1-py2.6.egg/py4j/java_gateway.py,
 line 538, in __call__
 self.target_id, self.name)
   File

 /usr/lib/python2.6/site-packages/py4j-0.8.2.1-py2.6.egg/py4j/protocol.py,
 line 300, in get_return_value
 format(target_id, '.', name), value)
 py4j.protocol.Py4JJavaError: An error occurred while calling
 z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
 : java.lang.ClassNotFoundException:
 org.apache.hadoop.hbase.io.ImmutableBytesWritable
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at org.apache.spark.util.Utils$.classForName(Utils.scala:150)
 at

 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDDFromClassNames(PythonRDD.scala:451)
 at

 org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:436)
 at
 org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:622)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at
 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:701)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15668.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.io.ImmutableBytesWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:150)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDDFromClassNames(PythonRDD.scala:451)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:436)
at
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:701)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15666.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
.
: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.io.ImmutableBytesWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.util.Utils$.classForName(Utils.scala:150)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDDFromClassNames(PythonRDD.scala:451)
at
org.apache.spark.api.python.PythonRDD$.newAPIHadoopRDD(PythonRDD.scala:436)
at
org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:701)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15668.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: java.lang.ClassNotFoundException on driver class in executor

2014-09-23 Thread Barrington Henry
Hi Andrew,

Thanks for the prompt response. I tried command line and it works fine. But, I 
want to try from IDE for easier debugging and transparency into code execution. 
I would try and see if there is any way to get the jar over to the executor 
from within the IDE.

- Barrington

 On Sep 21, 2014, at 10:52 PM, Andrew Or and...@databricks.com wrote:
 
 Hi Barrington,
 
 Have you tried running it from the command line? (i.e. bin/spark-submit 
 --master yarn-client --class YOUR_CLASS YOUR_JAR) Does it still fail? I am 
 not super familiar with running Spark through intellij, but the AFAIK the 
 classpaths are setup a little differently there. Also, Spark submit does this 
 for you nicely, so if you go through this path you don't even have to call 
 `setJars` as you did in your application.
 
 -Andrew
 
 2014-09-21 12:52 GMT-07:00 Barrington Henry barrington.he...@me.com 
 mailto:barrington.he...@me.com:
 Hi,
 
 I am running spark from my IDE (InteliJ) using YARN as my cluster manager. 
 However, the executor node is not able to find my main driver class 
 “LascoScript”. I keep getting  java.lang.ClassNotFoundException.
 I tried adding  the jar of the main class by running the snippet below
 
 
val conf = new SparkConf().set(spark.driver.host, barrymac)
   .setMaster(yarn-client)
   .setAppName(Lasco Script”)
   
 .setJars(SparkContext.jarOfClass(this.getClass).toSeq)
 
 But the jarOfClass function returns nothing. See below for logs.
 
 
 
 14/09/21 10:53:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
 barrymac): java.lang.ClassNotFoundException: LascoScript$$anonfun$1
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:264)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 java.lang.Thread.run(Thread.java:722)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on 
 executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
 [duplicate 1]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 4, 
 barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) on 
 executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
 [duplicate 2]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 2.1 in stage 0.0 (TID 5, 
 barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on 
 executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
 [duplicate 3]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 3.1 in stage 0.0 (TID 6, 
 barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 4) on 
 executor barrymac

java.lang.ClassNotFoundException on driver class in executor

2014-09-21 Thread Barrington Henry
Hi,

I am running spark from my IDE (InteliJ) using YARN as my cluster manager. 
However, the executor node is not able to find my main driver class 
“LascoScript”. I keep getting  java.lang.ClassNotFoundException.
I tried adding  the jar of the main class by running the snippet below


   val conf = new SparkConf().set(spark.driver.host, barrymac)
  .setMaster(yarn-client)
  .setAppName(Lasco Script”)
  
.setJars(SparkContext.jarOfClass(this.getClass).toSeq)

But the jarOfClass function returns nothing. See below for logs.



14/09/21 10:53:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
barrymac): java.lang.ClassNotFoundException: LascoScript$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
java.lang.ClassLoader.loadClass(ClassLoader.java:356)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:264)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)

org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:722)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 1]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 4, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 2]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 2.1 in stage 0.0 (TID 5, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 3]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 3.1 in stage 0.0 (TID 6, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 4) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 4]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 7, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 2.1 in stage 0.0 (TID 5) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 5]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 2.2 in stage 0.0 (TID 8, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 3.1 in stage 0.0 (TID 6) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 6]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 3.2 in stage 0.0 (TID 9, 
barrymac, NODE_LOCAL, 1312 bytes)
14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 7) on 
executor barrymac: java.lang.ClassNotFoundException (LascoScript$$anonfun$1) 
[duplicate 7]
14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 10, 
barrymac, NODE_LOCAL, 1312

Re: java.lang.ClassNotFoundException on driver class in executor

2014-09-21 Thread Andrew Or
Hi Barrington,

Have you tried running it from the command line? (i.e. bin/spark-submit
--master yarn-client --class YOUR_CLASS YOUR_JAR) Does it still fail? I am
not super familiar with running Spark through intellij, but the AFAIK the
classpaths are setup a little differently there. Also, Spark submit does
this for you nicely, so if you go through this path you don't even have to
call `setJars` as you did in your application.

-Andrew

2014-09-21 12:52 GMT-07:00 Barrington Henry barrington.he...@me.com:

 Hi,

 I am running spark from my IDE (InteliJ) using YARN as my cluster manager.
 However, the executor node is not able to find my main driver class
 “LascoScript”. I keep getting  java.lang.ClassNotFoundException.
 I tried adding  the jar of the main class by running the snippet below


val conf = new SparkConf().set(spark.driver.host, barrymac)
   .setMaster(yarn-client)
   .setAppName(Lasco Script”)

 .setJars(SparkContext.jarOfClass(this.getClass).toSeq)

 But the jarOfClass function returns nothing. See below for logs.

 

 14/09/21 10:53:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
 barrymac): java.lang.ClassNotFoundException: LascoScript$$anonfun$1
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:264)

 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)

 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)

 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)

 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
 org.apache.spark.scheduler.Task.run(Task.scala:54)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 java.lang.Thread.run(Thread.java:722)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1)
 on executor barrymac: java.lang.ClassNotFoundException
 (LascoScript$$anonfun$1) [duplicate 1]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID
 4, barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2)
 on executor barrymac: java.lang.ClassNotFoundException
 (LascoScript$$anonfun$1) [duplicate 2]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 2.1 in stage 0.0 (TID
 5, barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3)
 on executor barrymac: java.lang.ClassNotFoundException
 (LascoScript$$anonfun$1) [duplicate 3]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 3.1 in stage 0.0 (TID
 6, barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 4)
 on executor barrymac: java.lang.ClassNotFoundException
 (LascoScript$$anonfun$1) [duplicate 4]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID
 7, barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 2.1 in stage 0.0 (TID 5)
 on executor barrymac: java.lang.ClassNotFoundException
 (LascoScript$$anonfun$1) [duplicate 5]
 14/09/21 10:53:15 INFO TaskSetManager: Starting task 2.2 in stage 0.0 (TID
 8, barrymac, NODE_LOCAL, 1312 bytes)
 14/09/21 10:53:15 INFO TaskSetManager: Lost task 3.1 in stage 0.0 (TID 6)
 on executor barrymac: java.lang.ClassNotFoundException

Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-05-30 Thread Margusja
 TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
14/05/30 11:53:56 INFO TaskSetManager: Starting task 6.0:0 as TID 72 on 
executor 0: dlvm1 (PROCESS_LOCAL)
14/05/30 11:53:56 INFO TaskSetManager: Serialized task 6.0:0 as 1958 
bytes in 0 ms
14/05/30 11:53:56 INFO DAGScheduler: Got job 4 (runJob at 
NetworkInputTracker.scala:182) with 1 output partitions (allowLocal=false)
14/05/30 11:53:56 INFO DAGScheduler: Final stage: Stage 8 (runJob at 
NetworkInputTracker.scala:182)

14/05/30 11:53:56 INFO DAGScheduler: Parents of final stage: List()
14/05/30 11:53:56 INFO DAGScheduler: Missing parents: List()
14/05/30 11:53:56 INFO DAGScheduler: Submitting Stage 8 
(ParallelCollectionRDD[0] at makeRDD at NetworkInputTracker.scala:165), 
which has no missing parents
14/05/30 11:53:56 INFO MapOutputTrackerMasterActor: Asked to send map 
output locations for shuffle 2 to spark@dlvm1:48363
14/05/30 11:53:56 INFO MapOutputTrackerMaster: Size of output statuses 
for shuffle 2 is 82 bytes
14/05/30 11:53:56 INFO TaskSetManager: Finished TID 72 in 37 ms on dlvm1 
(progress: 1/1)
14/05/30 11:53:56 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose 
tasks have all completed, from pool
14/05/30 11:53:56 INFO DAGScheduler: Submitting 1 missing tasks from 
Stage 8 (ParallelCollectionRDD[0] at makeRDD at 
NetworkInputTracker.scala:165)

14/05/30 11:53:56 INFO TaskSchedulerImpl: Adding task set 8.0 with 1 tasks
14/05/30 11:53:56 INFO TaskSetManager: Starting task 8.0:0 as TID 73 on 
executor 0: dlvm1 (PROCESS_LOCAL)
14/05/30 11:53:56 INFO TaskSetManager: Serialized task 8.0:0 as 2975 
bytes in 1 ms

14/05/30 11:53:56 INFO DAGScheduler: Completed ResultTask(6, 0)
14/05/30 11:53:56 INFO DAGScheduler: Stage 6 (take at DStream.scala:586) 
finished in 0.051 s
14/05/30 11:53:56 INFO SparkContext: Job finished: take at 
DStream.scala:586, took 0.087153883 s

14/05/30 11:53:56 INFO SparkContext: Starting job: take at DStream.scala:586
14/05/30 11:53:56 INFO DAGScheduler: Got job 5 (take at 
DStream.scala:586) with 1 output partitions (allowLocal=true)
14/05/30 11:53:56 INFO DAGScheduler: Final stage: Stage 9 (take at 
DStream.scala:586)

14/05/30 11:53:56 INFO DAGScheduler: Parents of final stage: List(Stage 10)
14/05/30 11:53:56 INFO DAGScheduler: Missing parents: List()
14/05/30 11:53:56 INFO DAGScheduler: Submitting Stage 9 
(MapPartitionsRDD[19] at combineByKey at ShuffledDStream.scala:42), 
which has no missing parents
14/05/30 11:53:56 INFO DAGScheduler: Submitting 1 missing tasks from 
Stage 9 (MapPartitionsRDD[19] at combineByKey at ShuffledDStream.scala:42)

14/05/30 11:53:56 INFO TaskSchedulerImpl: Adding task set 9.0 with 1 tasks
14/05/30 11:53:56 INFO TaskSetManager: Starting task 9.0:0 as TID 74 on 
executor 0: dlvm1 (PROCESS_LOCAL)
14/05/30 11:53:56 INFO TaskSetManager: Serialized task 9.0:0 as 1958 
bytes in 0 ms

14/05/30 11:53:56 WARN TaskSetManager: Lost TID 73 (task 8.0:0)
14/05/30 11:53:56 WARN TaskSetManager: Loss was due to 
java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: 
org.apache.spark.streaming.kafka.KafkaReceiver

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at 
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
at 
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)

at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke

Re: java.lang.ClassNotFoundException

2014-05-12 Thread Archit Thakur
Hi Joe,

Your messages are going into spam folder for me.

Thx, Archit_Thakur.


On Fri, May 2, 2014 at 9:22 AM, Joe L selme...@yahoo.com wrote:

 Hi, You should include the jar file of your project. for example:
 conf.set(yourjarfilepath.jar)

 Joe
   On Friday, May 2, 2014 7:39 AM, proofmoore [via Apache Spark User List]
 [hidden email] wrote:
   HelIo. I followed A Standalone App in Java part of the tutorial
 https://spark.apache.org/docs/0.8.1/quick-start.html

 Spark standalone cluster looks it's running without a problem :
 http://i.stack.imgur.com/7bFv8.png

 I have built a fat jar for running this JavaApp on the cluster. Before
 maven package:

 find .

 ./pom.xml
 ./src
 ./src/main
 ./src/main/java
 ./src/main/java/SimpleApp.java


 content of SimpleApp.java is :

  import org.apache.spark.api.java.*;
  import org.apache.spark.api.java.function.Function;
  import org.apache.spark.SparkConf;
  import org.apache.spark.SparkContext;


  public class SimpleApp {
  public static void main(String[] args) {

  SparkConf conf =  new SparkConf()
.setMaster(spark://10.35.23.13:7077)
.setAppName(My app)
.set(spark.executor.memory, 1g);

  JavaSparkContext   sc = new JavaSparkContext (conf);
  String logFile = /home/ubuntu/spark-0.9.1/test_data;
  JavaRDDString logData = sc.textFile(logFile).cache();

  long numAs = logData.filter(new FunctionString, Boolean() {
   public Boolean call(String s) { return s.contains(a); }
  }).count();

  System.out.println(Lines with a:  + numAs);
  }
  }

 This program only works when master is set as setMaster(local).
 Otherwise I get this error : http://i.stack.imgur.com/doRSn.png

 Thanks,
 Ibrahim


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-tp5191.html
  To start a new topic under Apache Spark User List, email [hidden email]
 To unsubscribe from Apache Spark User List, click here.
 NAML



 --
 View this message in context: Re: java.lang.ClassNotFoundException
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: java.lang.ClassNotFoundException - spark on mesos

2014-05-02 Thread bo...@shopify.com
I have opened a PR for discussion on the apache/spark repository
https://github.com/apache/spark/pull/620

There is certainly a classLoader problem in the way Mesos and Spark operate,
I'm not sure what caused it to suddenly stop working so I'd like to open the
discussion there



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p5245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


java.lang.ClassNotFoundException

2014-05-01 Thread İbrahim Rıza HALLAÇ



HelIo. I followed A Standalone App in Java part of the tutorial 
https://spark.apache.org/docs/0.8.1/quick-start.html
Spark standalone cluster looks it's running without a problem : 
http://i.stack.imgur.com/7bFv8.png
I have built a fat jar for running this JavaApp on the cluster. Before maven 
package:find ../pom.xml./src./src/main
./src/main/java./src/main/java/SimpleApp.java

content of SimpleApp.java is :
 import org.apache.spark.api.java.*; import 
org.apache.spark.api.java.function.Function; import 
org.apache.spark.SparkConf; import org.apache.spark.SparkContext;

 public class SimpleApp { public static void main(String[] args) {
 SparkConf conf =  new SparkConf()   
.setMaster(spark://10.35.23.13:7077)   .setAppName(My 
app)   .set(spark.executor.memory, 1g);
 JavaSparkContext   sc = new JavaSparkContext (conf); String logFile = 
/home/ubuntu/spark-0.9.1/test_data; JavaRDDString logData = 
sc.textFile(logFile).cache();
 long numAs = logData.filter(new FunctionString, Boolean() {  public 
Boolean call(String s) { return s.contains(a); } }).count();
 System.out.println(Lines with a:  + numAs);  } } This program 
only works when master is set as setMaster(local). Otherwise I get this error 
: http://i.stack.imgur.com/doRSn.png
Thanks,Ibrahim
  

java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I am facing different kinds of java.lang.ClassNotFoundException when trying to 
run spark on mesos. One error has to do with 
org.apache.spark.executor.MesosExecutorBackend. Another has to do with 
org.apache.spark.serializer.JavaSerializer. I see other people complaining 
about similar issues.

I tried with different version of spark distribution - 0.9.0 and 1.0.0-SNAPSHOT 
and faced the same problem. I think the reason for this is is related to the 
error below.

$ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
java.io.IOException: META-INF/license : could not create directory
at sun.tools.jar.Main.extractFile(Main.java:907)
at sun.tools.jar.Main.extract(Main.java:850)
at sun.tools.jar.Main.run(Main.java:240)
at sun.tools.jar.Main.main(Main.java:1147)

This error happens with all the jars that I created. But the classes that are 
already generated is different in the different cases. If JavaSerializer is not 
already extracted before encountering META-INF/license, then that class is not 
found during execution. If MesosExecutorBackend is not found, then that class 
shows up in the mesos slave error logs. Can someone confirm if this is a valid 
cause for the problem I am seeing? Any way I can debug this further?

— Bharath

Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair
What versions are you running?  

There is a known protobuf 2.5 mismatch, depending on your versions. 

Cheers,
Tim

- Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos
 
 I am facing different kinds of java.lang.ClassNotFoundException when trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.
 
 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is
 related to the error below.
 
 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
 at sun.tools.jar.Main.extractFile(Main.java:907)
 at sun.tools.jar.Main.extract(Main.java:850)
 at sun.tools.jar.Main.run(Main.java:240)
 at sun.tools.jar.Main.main(Main.java:1147)
 
 This error happens with all the jars that I created. But the classes that are
 already generated is different in the different cases. If JavaSerializer is
 not already extracted before encountering META-INF/license, then that class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug this
 further?
 
 — Bharath

-- 
Cheers,
Tim
Freedom, Features, Friends, First - Fedora
https://fedoraproject.org/wiki/SIGs/bigdata


Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and 
the latest git tree.

Thanks


On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:

 What versions are you running?  
 
 There is a known protobuf 2.5 mismatch, depending on your versions. 
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos
 
 I am facing different kinds of java.lang.ClassNotFoundException when trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.
 
 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is is
 related to the error below.
 
 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
at sun.tools.jar.Main.extractFile(Main.java:907)
at sun.tools.jar.Main.extract(Main.java:850)
at sun.tools.jar.Main.run(Main.java:240)
at sun.tools.jar.Main.main(Main.java:1147)
 
 This error happens with all the jars that I created. But the classes that are
 already generated is different in the different cases. If JavaSerializer is
 not already extracted before encountering META-INF/license, then that class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug this
 further?
 
 — Bharath
 
 -- 
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata



Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Tim St Clair
It sounds like the protobuf issue. 

So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos  
protobuf. 

mesos 0.17.0  protobuf 2.5   

Cheers,
Tim

- Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 9:46:32 AM
 Subject: Re: java.lang.ClassNotFoundException - spark on mesos
 
 I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and
 the latest git tree.
 
 Thanks
 
 
 On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:
 
  What versions are you running?
  
  There is a known protobuf 2.5 mismatch, depending on your versions.
  
  Cheers,
  Tim
  
  - Original Message -
  From: Bharath Bhushan manku.ti...@outlook.com
  To: user@spark.apache.org
  Sent: Monday, March 31, 2014 8:16:19 AM
  Subject: java.lang.ClassNotFoundException - spark on mesos
  
  I am facing different kinds of java.lang.ClassNotFoundException when
  trying
  to run spark on mesos. One error has to do with
  org.apache.spark.executor.MesosExecutorBackend. Another has to do with
  org.apache.spark.serializer.JavaSerializer. I see other people complaining
  about similar issues.
  
  I tried with different version of spark distribution - 0.9.0 and
  1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is
  is
  related to the error below.
  
  $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
  java.io.IOException: META-INF/license : could not create directory
 at sun.tools.jar.Main.extractFile(Main.java:907)
 at sun.tools.jar.Main.extract(Main.java:850)
 at sun.tools.jar.Main.run(Main.java:240)
 at sun.tools.jar.Main.main(Main.java:1147)
  
  This error happens with all the jars that I created. But the classes that
  are
  already generated is different in the different cases. If JavaSerializer
  is
  not already extracted before encountering META-INF/license, then that
  class
  is not found during execution. If MesosExecutorBackend is not found, then
  that class shows up in the mesos slave error logs. Can someone confirm if
  this is a valid cause for the problem I am seeing? Any way I can debug
  this
  further?
  
  — Bharath
  
  --
  Cheers,
  Tim
  Freedom, Features, Friends, First - Fedora
  https://fedoraproject.org/wiki/SIGs/bigdata
 
 

-- 
Cheers,
Tim
Freedom, Features, Friends, First - Fedora
https://fedoraproject.org/wiki/SIGs/bigdata


Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
Your suggestion took me past the ClassNotFoundException. I then hit 
akka.actor.ActorNotFound exception. I patched in PR 568 into my 0.9.0 spark 
codebase and everything worked.

So thanks a lot, Tim. Is there a JIRA/PR for the protobuf issue? Why is it not 
fixed in the latest git tree?

Thanks.

On 31-Mar-2014, at 11:30 pm, Tim St Clair tstcl...@redhat.com wrote:

 It sounds like the protobuf issue. 
 
 So FWIW, You might want to try updating the 0.9.0 w/pom mods for mesos  
 protobuf. 
 
 mesos 0.17.0  protobuf 2.5   
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 9:46:32 AM
 Subject: Re: java.lang.ClassNotFoundException - spark on mesos
 
 I tried 0.9.0 and the latest git tree of spark. For mesos, I tried 0.17.0 and
 the latest git tree.
 
 Thanks
 
 
 On 31-Mar-2014, at 7:24 pm, Tim St Clair tstcl...@redhat.com wrote:
 
 What versions are you running?
 
 There is a known protobuf 2.5 mismatch, depending on your versions.
 
 Cheers,
 Tim
 
 - Original Message -
 From: Bharath Bhushan manku.ti...@outlook.com
 To: user@spark.apache.org
 Sent: Monday, March 31, 2014 8:16:19 AM
 Subject: java.lang.ClassNotFoundException - spark on mesos
 
 I am facing different kinds of java.lang.ClassNotFoundException when
 trying
 to run spark on mesos. One error has to do with
 org.apache.spark.executor.MesosExecutorBackend. Another has to do with
 org.apache.spark.serializer.JavaSerializer. I see other people complaining
 about similar issues.
 
 I tried with different version of spark distribution - 0.9.0 and
 1.0.0-SNAPSHOT and faced the same problem. I think the reason for this is
 is
 related to the error below.
 
 $ jar -xf spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar
 java.io.IOException: META-INF/license : could not create directory
   at sun.tools.jar.Main.extractFile(Main.java:907)
   at sun.tools.jar.Main.extract(Main.java:850)
   at sun.tools.jar.Main.run(Main.java:240)
   at sun.tools.jar.Main.main(Main.java:1147)
 
 This error happens with all the jars that I created. But the classes that
 are
 already generated is different in the different cases. If JavaSerializer
 is
 not already extracted before encountering META-INF/license, then that
 class
 is not found during execution. If MesosExecutorBackend is not found, then
 that class shows up in the mesos slave error logs. Can someone confirm if
 this is a valid cause for the problem I am seeing? Any way I can debug
 this
 further?
 
 — Bharath
 
 --
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata
 
 
 
 -- 
 Cheers,
 Tim
 Freedom, Features, Friends, First - Fedora
 https://fedoraproject.org/wiki/SIGs/bigdata



Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
I was talking about the protobuf version issue as not fixed. I could not find 
any reference to the problem or the fix.

Reg. SPARK-1052, I could pull in the fix into my 0.9.0 tree (from the tar ball 
on the website) and I see the fix in the latest git.

Thanks

On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote:

 Which repository do you use?
 
 The issue should be fixed in 0.9.1 and 1.0.0
 
 https://spark-project.atlassian.net/browse/SPARK-1052
 https://spark-project.atlassian.net/browse/SPARK-1052  
 
 There's an old repository 
 
 https://github.com/apache/incubator-spark
 
 and as Spark become one of top level projects, it was moved to new repo:
 
 https://github.com/apache/spark
 
 The 0.9.1 version hasn't been released yet, so you should get it from the
 new git repo.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: java.lang.ClassNotFoundException - spark on mesos

2014-03-31 Thread Bharath Bhushan
Another problem I noticed is that the current 1.0.0 git tree still gives me the 
ClassNotFoundException. I see that the SPARK-1052 is already fixed there. I 
then modified the pom.xml for mesos and protobuf and that still gave the 
ClassNotFoundException. I also tried modifying pom.xml only for mesos and that 
fails too. So I have no way of running the 1.0.0 git tree spark on mesos yet.

Thanks.

On 01-Apr-2014, at 3:28 am, deric barton.to...@gmail.com wrote:

 Which repository do you use?
 
 The issue should be fixed in 0.9.1 and 1.0.0
 
 https://spark-project.atlassian.net/browse/SPARK-1052
 https://spark-project.atlassian.net/browse/SPARK-1052  
 
 There's an old repository 
 
 https://github.com/apache/incubator-spark
 
 and as Spark become one of top level projects, it was moved to new repo:
 
 https://github.com/apache/spark
 
 The 0.9.1 version hasn't been released yet, so you should get it from the
 new git repo.
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-tp3510p3551.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked at the individual nodes logs? Can you post a bit more of 
the exception's output?


On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:

Hi all,

I got java.lang.ClassNotFoundException even with addJar called. The 
jar file is present in each node.


I use the version of spark from github master.

Any ideas ?


Jaonary


Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked through the logs fully? I have seen this (in my limited 
experience) pop up as a result of previous exceptions/errors, also as a 
result of being unable to serialize objects etc.

Ognen

On 3/26/14, 10:39 AM, Jaonary Rabarisoa wrote:
I notice that I get this error when I'm trying to load an objectFile 
with val viperReloaded = context.objectFile[ReIdDataSetEntry](data)



On Wed, Mar 26, 2014 at 3:58 PM, Jaonary Rabarisoa jaon...@gmail.com 
mailto:jaon...@gmail.com wrote:


Here the output that I get :

[error] (run-main-0) org.apache.spark.SparkException: Job aborted:
Task 1.0:1 failed 4 times (most recent failure: Exception failure
in TID 6 on host 172.166.86.36 http://172.166.86.36:
java.lang.ClassNotFoundException: value.models.ReIdDataSetEntry)
org.apache.spark.SparkException: Job aborted: Task 1.0:1 failed 4
times (most recent failure: Exception failure in TID 6 on host
172.166.86.36 http://172.166.86.36:
java.lang.ClassNotFoundException: value.models.ReIdDataSetEntry)
at

org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1011)
at

org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1009)
at

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org

http://org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1009)
at

org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
at

org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:596)
at

org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:146)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at

akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at

scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at

scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Spark says that the jar is added :

14/03/26 15:49:18 INFO SparkContext: Added JAR
target/scala-2.10/value-spark_2.10-1.0.jar





On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com
wrote:

Have you looked at the individual nodes logs? Can you post a
bit more of the exception's output?


On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:

Hi all,

I got java.lang.ClassNotFoundException even with addJar
called. The jar file is present in each node.

I use the version of spark from github master.

Any ideas ?


Jaonary





Re: java.lang.ClassNotFoundException

2014-03-26 Thread Jaonary Rabarisoa
it seems to be an old problem :

http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E

https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I

Does anyone got the solution ?


On Wed, Mar 26, 2014 at 5:50 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:

 I might be way off here but are you looking at the logs on the worker
 machines? I am running an older version (0.8) and when I look at the
 error log for the executor process I see the exact location where the
 executor process tries to load the jar from...with a line like this:

 14/03/26 13:57:11 INFO executor.Executor: Adding
 file:/dirs/dirs/spark/work/app-20140326135710-0029/0/./spark-test.jar
 to class loader

 You said The jar file is present in each node, do you see any
 information on the executor indicating that it's trying to load the
 jar or where it's loading it from? I can't tell for sure by looking at
 your logs but they seem to be logs from the master and driver, not
 from the executor itself?

 On Wed, Mar 26, 2014 at 11:46 AM, Ognen Duzlevski
 og...@plainvanillagames.com wrote:
  Have you looked through the logs fully? I have seen this (in my limited
  experience) pop up as a result of previous exceptions/errors, also as a
  result of being unable to serialize objects etc.
  Ognen
 
 
  On 3/26/14, 10:39 AM, Jaonary Rabarisoa wrote:
 
  I notice that I get this error when I'm trying to load an objectFile with
  val viperReloaded = context.objectFile[ReIdDataSetEntry](data)
 
 
  On Wed, Mar 26, 2014 at 3:58 PM, Jaonary Rabarisoa jaon...@gmail.com
  wrote:
 
  Here the output that I get :
 
  [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
  1.0:1 failed 4 times (most recent failure: Exception failure in TID 6 on
  host 172.166.86.36: java.lang.ClassNotFoundException:
  value.models.ReIdDataSetEntry)
  org.apache.spark.SparkException: Job aborted: Task 1.0:1 failed 4 times
  (most recent failure: Exception failure in TID 6 on host 172.166.86.36:
  java.lang.ClassNotFoundException: value.models.ReIdDataSetEntry)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1011)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1009)
  at
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at
  org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1009)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
  at scala.Option.foreach(Option.scala:236)
  at
 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:596)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:146)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
  at
 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
  scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
  Spark says that the jar is added :
 
  14/03/26 15:49:18 INFO SparkContext: Added JAR
  target/scala-2.10/value-spark_2.10-1.0.jar
 
 
 
 
 
  On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski
  og...@plainvanillagames.com wrote:
 
  Have you looked at the individual nodes logs? Can you post a bit more
 of
  the exception's output?
 
 
  On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:
 
  Hi all,
 
  I got java.lang.ClassNotFoundException even with addJar called. The
  jar file is present in each node.
 
  I use the version of spark from github master.
 
  Any ideas ?
 
 
  Jaonary
 
 



Re: java.lang.ClassNotFoundException

2014-03-26 Thread Aniket Mokashi
context.objectFile[ReIdDataSetEntry](data) -not sure how this is compiled
in scala. But, if it uses some sort of ObjectInputStream, you need to be
careful - ObjectInputStream uses root classloader to load classes and does
not work with jars that are added to TCCC. Apache commons has
ClassLoaderObjectInputStream to workaround this.


On Wed, Mar 26, 2014 at 1:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote:

 it seems to be an old problem :


 http://mail-archives.apache.org/mod_mbox/spark-user/201311.mbox/%3c7f6aa9e820f55d4a96946a87e086ef4a4bcdf...@eagh-erfpmbx41.erf.thomson.com%3E

 https://groups.google.com/forum/#!topic/spark-users/Q66UOeA2u-I

 Does anyone got the solution ?


 On Wed, Mar 26, 2014 at 5:50 PM, Yana Kadiyska yana.kadiy...@gmail.comwrote:

 I might be way off here but are you looking at the logs on the worker
 machines? I am running an older version (0.8) and when I look at the
 error log for the executor process I see the exact location where the
 executor process tries to load the jar from...with a line like this:

 14/03/26 13:57:11 INFO executor.Executor: Adding
 file:/dirs/dirs/spark/work/app-20140326135710-0029/0/./spark-test.jar
 to class loader

 You said The jar file is present in each node, do you see any
 information on the executor indicating that it's trying to load the
 jar or where it's loading it from? I can't tell for sure by looking at
 your logs but they seem to be logs from the master and driver, not
 from the executor itself?

 On Wed, Mar 26, 2014 at 11:46 AM, Ognen Duzlevski
 og...@plainvanillagames.com wrote:
  Have you looked through the logs fully? I have seen this (in my limited
  experience) pop up as a result of previous exceptions/errors, also as a
  result of being unable to serialize objects etc.
  Ognen
 
 
  On 3/26/14, 10:39 AM, Jaonary Rabarisoa wrote:
 
  I notice that I get this error when I'm trying to load an objectFile
 with
  val viperReloaded = context.objectFile[ReIdDataSetEntry](data)
 
 
  On Wed, Mar 26, 2014 at 3:58 PM, Jaonary Rabarisoa jaon...@gmail.com
  wrote:
 
  Here the output that I get :
 
  [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
  1.0:1 failed 4 times (most recent failure: Exception failure in TID 6
 on
  host 172.166.86.36: java.lang.ClassNotFoundException:
  value.models.ReIdDataSetEntry)
  org.apache.spark.SparkException: Job aborted: Task 1.0:1 failed 4 times
  (most recent failure: Exception failure in TID 6 on host 172.166.86.36
 :
  java.lang.ClassNotFoundException: value.models.ReIdDataSetEntry)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1011)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1009)
  at
 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at
  org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1009)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:596)
  at scala.Option.foreach(Option.scala:236)
  at
 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:596)
  at
 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:146)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
  at akka.actor.ActorCell.invoke(ActorCell.scala:456)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
  at akka.dispatch.Mailbox.run(Mailbox.scala:219)
  at
 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
  Spark says that the jar is added :
 
  14/03/26 15:49:18 INFO SparkContext: Added JAR
  target/scala-2.10/value-spark_2.10-1.0.jar
 
 
 
 
 
  On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski
  og...@plainvanillagames.com wrote:
 
  Have you looked at the individual nodes logs? Can you post a bit more
 of
  the exception's output?
 
 
  On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:
 
  Hi all,
 
  I got java.lang.ClassNotFoundException even with addJar called. The
  jar file is present in each node.
 
  I use the version of spark from github master.
 
  Any ideas ?
 
 
  Jaonary
 
 





-- 
...:::Aniket:::... Quetzalco@tl


java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0

2014-03-07 Thread pradeeps8
Hi,

We are currently trying to migrate to hadoop 2.2.0 and hence we have
installed spark 0.9.0 and the pre-release version of shark 0.9.0.
When we execute the script ( script.txt
http://apache-spark-user-list.1001560.n3.nabble.com/file/n2401/script.txt 
) we get the following error.
/org.apache.spark.SparkException: Job aborted: Task 1.0:3 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
 
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
 
at scala.Option.foreach(Option.scala:236) 
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619) 
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
 
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
/

Has anyone seen this error?
If so, could you please help me get it corrected?

Thanks,
Pradeep




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-in-spark-0-9-0-shark-0-9-0-pre-release-and-hadoop-2-2-0-tp2401.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.