Richard,

You response was very helpful and actually resolved my issue. In case
others run into a similar issue,  I followed the procedure:

   - Upgraded to spark 1.3.0
   - Add all spark related libraries are "provided"
   - Include spark transitive library dependencies

where my build.sbt file

libraryDependencies ++= {
  Seq(
        "org.apache.spark" %% "spark-core" % "1.3.0" % "provided",
        "org.apache.spark" %% "spark-streaming" % "1.3.0" % "provided",
        "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.3.0" %
"provided",
        "joda-time" % "joda-time" % "2.2",
        "org.joda" % "joda-convert" % "1.2",
        "com.amazonaws" % "aws-java-sdk" % "1.8.3",
        "com.amazonaws" % "amazon-kinesis-client" % "1.2.0")

and submitting a spark job can done via

sh ./spark-1.3.0-bin-cdh4/bin/spark-submit --jars
spark-streaming-kinesis-asl_2.10-1.3.0.jar --verbose --class
com.xxx.MyClass target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar

Thanks again Richard!

Cheers Mike.


On Tue, Apr 14, 2015 at 11:01 AM, Richard Marscher <rmarsc...@localytics.com
> wrote:

> Hi,
>
> I've gotten an application working with sbt-assembly and spark, thought
> I'd present an option. In my experience, trying to bundle any of the Spark
> libraries in your uber jar is going to be a major pain. There will be a lot
> of deduplication to work through and even if you resolve them it can be
> easy to do it incorrectly. I considered it an intractable problem. So the
> alternative is to not include those jars in your uber jar. For this to work
> you will need the same libraries on the classpath of your Spark cluster and
> your driver program (if you are running that as an application and not just
> using spark-submit).
>
> As for your NoClassDefFoundError, you either are missing Joda Time in your
> runtime classpath or have conflicting versions. It looks like something
> related to AWS wants to use it. Check your uber jar to see if its including
> the org/joda/time as well as the classpath of your spark cluster. For
> example: I use the Spark 1.3.0 on Hadoop 1.x, which in the 'lib' directory
> has an uber jar spark-assembly-1.3.0-hadoop1.0.4.jar. At one point in Spark
> 1.2 I found a conflict between httpclient versions that my uber jar pulled
> in for AWS libraries and the one bundled in the spark uber jar. I hand
> patched the spark uber jar to remove the offending httpclient bytecode to
> resolve the issue. You may be facing a similar situation.
>
> I hope that gives some ideas for resolving your issue.
>
> Regards,
> Rich
>
> On Tue, Apr 14, 2015 at 1:14 PM, Mike Trienis <mike.trie...@orcsol.com>
> wrote:
>
>> Hi Vadim,
>>
>> After removing "provided" from "org.apache.spark" %%
>> "spark-streaming-kinesis-asl" I ended up with huge number of deduplicate
>> errors:
>>
>> https://gist.github.com/trienism/3d6f8d6b7ff5b7cead6a
>>
>> It would be nice if you could share some pieces of your mergeStrategy
>> code for reference.
>>
>> Also, after adding "provided" back to "spark-streaming-kinesis-asl" and I
>> submit the spark job with the spark-streaming-kinesis-asl jar file
>>
>> sh /usr/lib/spark/bin/spark-submit --verbose --jars
>> lib/spark-streaming-kinesis-asl_2.10-1.2.0.jar --class com.xxx.DataConsumer
>> target/scala-2.10/xxx-assembly-0.1-SNAPSHOT.jar
>>
>> I still end up with the following error...
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/joda/time/format/DateTimeFormat
>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at java.lang.Class.newInstance(Class.java:379)
>>
>> Has anyone else run into this issue?
>>
>>
>>
>> On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy <
>> vadim.bichuts...@gmail.com> wrote:
>>
>>> I don't believe the Kinesis asl should be provided. I used mergeStrategy
>>> successfully to produce an "uber jar."
>>>
>>> Fyi, I've been having trouble consuming data out of Kinesis with Spark
>>> with no success :(
>>> Would be curious to know if you got it working.
>>>
>>> Vadim
>>>
>>> On Apr 13, 2015, at 9:36 PM, Mike Trienis <mike.trie...@orcsol.com>
>>> wrote:
>>>
>>> Hi All,
>>>
>>> I have having trouble building a fat jar file through sbt-assembly.
>>>
>>> [warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
>>> [warn] Merging 'META-INF/NOTICE' with strategy 'rename'
>>> [warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
>>> [warn] Merging 'META-INF/LICENSE' with strategy 'rename'
>>> [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
>>> [warn] Merging
>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.properties' with
>>> strategy 'discard'
>>> [warn] Merging
>>> 'META-INF/maven/com.thoughtworks.paranamer/paranamer/pom.xml' with strategy
>>> 'discard'
>>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.properties'
>>> with strategy 'discard'
>>> [warn] Merging 'META-INF/maven/commons-dbcp/commons-dbcp/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.properties'
>>> with strategy 'discard'
>>> [warn] Merging 'META-INF/maven/commons-pool/commons-pool/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.properties' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/joda-time/joda-time/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.properties' with strategy
>>> 'discard'
>>> [warn] Merging 'META-INF/maven/log4j/log4j/pom.xml' with strategy
>>> 'discard'
>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.properties'
>>> with strategy 'discard'
>>> [warn] Merging 'META-INF/maven/org.joda/joda-convert/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.properties' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-api/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.properties'
>>> with strategy 'discard'
>>> [warn] Merging 'META-INF/maven/org.slf4j/slf4j-log4j12/pom.xml' with
>>> strategy 'discard'
>>> [warn] Merging 'META-INF/services/java.sql.Driver' with strategy
>>> 'filterDistinctLines'
>>> [warn] Merging 'rootdoc.txt' with strategy 'concat'
>>> [warn] Strategy 'concat' was applied to a file
>>> [warn] Strategy 'discard' was applied to 17 files
>>> [warn] Strategy 'filterDistinctLines' was applied to a file
>>> [warn] Strategy 'rename' was applied to 4 files
>>>
>>> When submitting the spark application through the command
>>>
>>> sh /usr/lib/spark/bin/spark-submit -class com.xxx.ExampleClassName
>>> target/scala-2.10/xxxx-snapshot.jar
>>>
>>> I end up the the following error,
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/joda/time/format/DateTimeFormat
>>> at com.amazonaws.auth.AWS4Signer.<clinit>(AWS4Signer.java:44)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at java.lang.Class.newInstance(Class.java:379)
>>> at com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119)
>>> at
>>> com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105)
>>> at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78)
>>> at
>>> com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307)
>>> at
>>> com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280)
>>> at
>>> com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
>>> at
>>> com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102)
>>> at
>>> com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216)
>>> at
>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:202)
>>> at
>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:175)
>>> at
>>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:155)
>>> at com.quickstatsengine.aws.AwsProvider$.<init>(AwsProvider.scala:20)
>>> at com.quickstatsengine.aws.AwsProvider$.<clinit>(AwsProvider.scala)
>>>
>>> The snippet from my build.sbt file is:
>>>
>>>         "org.apache.spark" %% "spark-core" % "1.2.0" % "provided",
>>>         "org.apache.spark" %% "spark-streaming" % "1.2.0" % "provided",
>>>         "com.datastax.spark" %% "spark-cassandra-connector" %
>>> "1.2.0-alpha1" % "provided",
>>>         "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.2.0" %
>>> "provided",
>>>
>>> And the error is originating from:
>>>
>>> val kinesisClient = new AmazonKinesisClient(new
>>> DefaultAWSCredentialsProviderChain())
>>>
>>> Am I correct to set spark-streaming-kinesis-asl as a *provided *dependency?
>>> Also, is there a merge strategy I need apply?
>>>
>>> Any help would be appreciated, Mike.
>>>
>>>
>>>
>>
>

Reply via email to