Re: spark2.0 how to use sparksession and StreamingContext same time

2016-07-25 Thread kevin
thanks a lot Terry

2016-07-26 12:03 GMT+08:00 Terry Hoo :

> Kevin,
>
> Try to create the StreamingContext as following:
>
> val ssc = new StreamingContext(spark.sparkContext, Seconds(2))
>
>
>
> On Tue, Jul 26, 2016 at 11:25 AM, kevin  wrote:
>
>> hi,all:
>> I want to read data from kafka and regist as a table then join a jdbc
>> table.
>> My sample like this :
>>
>> val spark = SparkSession
>>   .builder
>>   .config(sparkConf)
>>   .getOrCreate()
>>
>> val jdbcDF = spark.read.format("jdbc").options(Map("url" ->
>> "jdbc:mysql://master1:3306/demo", "driver" -> "com.mysql.jdbc.Driver",
>> "dbtable" -> "i_user", "user" -> "root", "password" -> "passok")).load()
>> jdbcDF.cache().createOrReplaceTempView("black_book")
>>   val df = spark.sql("select * from black_book")
>>   df.show()
>>
>> val ssc = new StreamingContext(sparkConf, Seconds(2))
>> ssc.checkpoint("checkpoint")
>>
>> val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
>> val lines = KafkaUtils.createStream(ssc, zkQuorum, group,
>> topicMap).map(_._2)
>> val words = lines.flatMap(_.split(" "))
>>
>> *I got error :*
>>
>> 16/07/26 11:18:07 WARN AbstractHandler: No Server set for
>> org.spark_project.jetty.server.handler.ErrorHandler@6f0ca692
>> ++++
>> |  id|username|password|
>> ++++
>> |e6faca36-8766-4dc...|   a|   a|
>> |699285a3-a108-457...|   admin| 123|
>> |e734752d-ac98-483...|test|test|
>> |c0245226-128d-487...|   test2|   test2|
>> |4f1bbdb2-89d1-4cc...| 119| 911|
>> |16a9a360-13ee-4b5...|1215|1215|
>> |bf7d6a0d-2949-4c3...|   demo3|   demo3|
>> |de30747c-c466-404...| why| why|
>> |644741c9-8fd7-4a5...|   scala|   p|
>> |cda1e44d-af4b-461...| 123| 231|
>> |6e409ed9-c09b-4e7...| 798|  23|
>> ++++
>>
>> Exception in thread "main" org.apache.spark.SparkException: Only one
>> SparkContext may be running in this JVM (see SPARK-2243). To ignore this
>> error, set spark.driver.allowMultipleContexts = true. The currently running
>> SparkContext was created at:
>>
>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:749)
>> main.POC$.main(POC.scala:43)
>> main.POC.main(POC.scala)
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> java.lang.reflect.Method.invoke(Method.java:498)
>>
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> at
>> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2211)
>> at
>> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2207)
>> at scala.Option.foreach(Option.scala:257)
>> at
>> org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2207)
>> at
>> org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2277)
>> at org.apache.spark.SparkContext.(SparkContext.scala:91)
>> at
>> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:837)
>> at
>> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:84)
>> at main.POC$.main(POC.scala:50)
>> at main.POC.main(POC.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>>
>


spark2.0 how to use sparksession and StreamingContext same time

2016-07-25 Thread kevin
hi,all:
I want to read data from kafka and regist as a table then join a jdbc table.
My sample like this :

val spark = SparkSession
  .builder
  .config(sparkConf)
  .getOrCreate()

val jdbcDF = spark.read.format("jdbc").options(Map("url" ->
"jdbc:mysql://master1:3306/demo", "driver" -> "com.mysql.jdbc.Driver",
"dbtable" -> "i_user", "user" -> "root", "password" -> "passok")).load()
jdbcDF.cache().createOrReplaceTempView("black_book")
  val df = spark.sql("select * from black_book")
  df.show()

val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")

val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group,
topicMap).map(_._2)
val words = lines.flatMap(_.split(" "))

*I got error :*

16/07/26 11:18:07 WARN AbstractHandler: No Server set for
org.spark_project.jetty.server.handler.ErrorHandler@6f0ca692
++++
|  id|username|password|
++++
|e6faca36-8766-4dc...|   a|   a|
|699285a3-a108-457...|   admin| 123|
|e734752d-ac98-483...|test|test|
|c0245226-128d-487...|   test2|   test2|
|4f1bbdb2-89d1-4cc...| 119| 911|
|16a9a360-13ee-4b5...|1215|1215|
|bf7d6a0d-2949-4c3...|   demo3|   demo3|
|de30747c-c466-404...| why| why|
|644741c9-8fd7-4a5...|   scala|   p|
|cda1e44d-af4b-461...| 123| 231|
|6e409ed9-c09b-4e7...| 798|  23|
++++

Exception in thread "main" org.apache.spark.SparkException: Only one
SparkContext may be running in this JVM (see SPARK-2243). To ignore this
error, set spark.driver.allowMultipleContexts = true. The currently running
SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:749)
main.POC$.main(POC.scala:43)
main.POC.main(POC.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at
org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2211)
at
org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$2.apply(SparkContext.scala:2207)
at scala.Option.foreach(Option.scala:257)
at
org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2207)
at
org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2277)
at org.apache.spark.SparkContext.(SparkContext.scala:91)
at
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:837)
at
org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:84)
at main.POC$.main(POC.scala:50)
at main.POC.main(POC.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Re: Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread kevin
thanks a lot .after change to scala 2.11 , it works.

2016-07-25 17:40 GMT+08:00 Tomasz Gawęda :

> Hi,
>
> Please change Scala version to 2.11.  As far as I know, Spark packages are
> now build with Scala 2.11 and I've got other - 2.10 - version
>
>
>
> --
> *Od:* kevin 
> *Wysłane:* 25 lipca 2016 11:33
> *Do:* user.spark; dev.spark
> *Temat:* spark2.0 can't run SqlNetworkWordCount
>
> hi,all:
> I download spark2.0 per-build. I can run SqlNetworkWordCount test use :
> bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount
> master1 
>
> but when I use spark2.0 example source code SqlNetworkWordCount.scala and
> build it to a jar bao with dependencies ( JDK 1.8 AND SCALA2.10)
> when I use spark-submit to run it I got error:
>
> 16/07/25 17:28:30 INFO scheduler.JobScheduler: Starting job streaming job
> 146943891 ms.0 from job set of time 146943891 ms
> Exception in thread "streaming-job-executor-2"
> java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
> at
> main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:67)
> at
> main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:61)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
> at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
> at scala.util.Try$.apply(Try.scala:192)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:247)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:246)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>


Re: Outer Explode needed

2016-07-25 Thread Michael Armbrust
I don't think this would be hard to implement.  The physical explode
operator supports it (for our HiveQL compatibility).

Perhaps comment on this JIRA?
https://issues.apache.org/jira/browse/SPARK-13721

It could probably just be another argument to explode()

Michael

On Mon, Jul 25, 2016 at 6:12 PM, Don Drake  wrote:

> No response on the Users list, I thought I would repost here.
>
> See below.
>
> -Don
>
> -- Forwarded message --
> From: Don Drake 
> Date: Sun, Jul 24, 2016 at 2:18 PM
> Subject: Outer Explode needed
> To: user 
>
>
> I have a nested data structure (array of structures) that I'm using the
> DSL df.explode() API to flatten the data.  However, when the array is
> empty, I'm not getting the rest of the row in my output as it is skipped.
>
> This is the intended behavior, and Hive supports a SQL "OUTER explode()"
> to generate the row when the explode would not yield any output.
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
>
> Can we get this same outer explode in the DSL?  I have to jump through
> some outer join hoops to get the rows where the array is empty.
>
> Thanks.
>
> -Don
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake 
> 800-733-2143
>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake 
> 800-733-2143
>


Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
Thank you,I can't find spark-streaming-kafka_2.10 jar for spark2 from maven
center. so I try the version 1.6.2,it not work ,it need class
org.apache.spark.Logging, which can't find in spark2. so I build
spark-streaming-kafka_2.10
jar for spark2 from the source code. it's work now.

2016-07-26 2:12 GMT+08:00 Cody Koeninger :

> For 2.0, the kafka dstream support is in two separate subprojects
> depending on which version of Kafka you are using
>
> spark-streaming-kafka-0-10
> or
> spark-streaming-kafka-0-8
>
> corresponding to brokers that are version 0.10+ or 0.8+
>
> On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin  wrote:
> > The presentation at Spark Summit SF was probably referring to Structured
> > Streaming. The existing Spark Streaming (dstream) in Spark 2.0 has the
> same
> > production stability level as Spark 1.6. There is also Kafka 0.10
> support in
> > dstream.
> >
> > On July 25, 2016 at 10:26:49 AM, Andy Davidson
> > (a...@santacruzintegration.com) wrote:
> >
> > Hi Kevin
> >
> > Just a heads up at the recent spark summit in S.F. There was a
> presentation
> > about streaming in 2.0. They said that streaming was not going to
> production
> > ready in 2.0.
> >
> > I am not sure if the older 1.6.x version will be supported. My project
> will
> > not be able to upgrade with streaming support. We also use kafka
> >
> > Andy
> >
> > From: Marco Mistroni 
> > Date: Monday, July 25, 2016 at 2:33 AM
> > To: kevin 
> > Cc: "user @spark" , "dev.spark"
> > 
> > Subject: Re: where I can find spark-streaming-kafka for spark2.0
> >
> > Hi Kevin
> >   you should not need to rebuild everything.
> > Instead, i believe you should launch spark-submit by specifying the kafka
> > jar file in your --packages... i had to follow same when integrating
> spark
> > streaming with flume
> >
> >   have you checked this link ?
> > https://spark.apache.org/docs/latest/streaming-kafka-integration.html
> >
> >
> > hth
> >
> >
> >
> > On Mon, Jul 25, 2016 at 10:20 AM, kevin  wrote:
> >>
> >> I have compile it from source code
> >>
> >> 2016-07-25 12:05 GMT+08:00 kevin :
> >>>
> >>> hi,all :
> >>> I try to run example
> org.apache.spark.examples.streaming.KafkaWordCount ,
> >>> I got error :
> >>> Exception in thread "main" java.lang.NoClassDefFoundError:
> >>> org/apache/spark/streaming/kafka/KafkaUtils$
> >>> at
> >>>
> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
> >>> at
> >>>
> org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> at
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>> at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>> at java.lang.reflect.Method.invoke(Method.java:498)
> >>> at
> >>>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
> >>> at
> >>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
> >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >>> Caused by: java.lang.ClassNotFoundException:
> >>> org.apache.spark.streaming.kafka.KafkaUtils$
> >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>> ... 11 more
> >>>
> >>> so where I can find spark-streaming-kafka for spark2.0
> >>
> >>
> >
>


Fwd: Outer Explode needed

2016-07-25 Thread Don Drake
No response on the Users list, I thought I would repost here.

See below.

-Don
-- Forwarded message --
From: Don Drake 
Date: Sun, Jul 24, 2016 at 2:18 PM
Subject: Outer Explode needed
To: user 


I have a nested data structure (array of structures) that I'm using the DSL
df.explode() API to flatten the data.  However, when the array is empty,
I'm not getting the rest of the row in my output as it is skipped.

This is the intended behavior, and Hive supports a SQL "OUTER explode()" to
generate the row when the explode would not yield any output.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

Can we get this same outer explode in the DSL?  I have to jump through some
outer join hoops to get the rows where the array is empty.

Thanks.

-Don

-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake 
800-733-2143



-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake 
800-733-2143


Re: Potential Change in Kafka's Partition Assignment Semantics when Subscription Changes

2016-07-25 Thread Cody Koeninger
This seems really low risk to me.  In order to be impacted, it'd have
to be someone who was using the kafka integration in spark 2.0, which
isn't even officially released yet.

On Mon, Jul 25, 2016 at 7:23 PM, Vahid S Hashemian
 wrote:
> Sorry, meant to ask if any Apache Sparkuser would be affected.
>
> --Vahid
>
>
>
> From:Vahid S Hashemian/Silicon Valley/IBM@IBMUS
> To:u...@spark.apache.org, dev@spark.apache.org
> Date:07/25/2016 05:21 PM
> Subject:Potential Change in Kafka's Partition Assignment Semantics
> when Subscription Changes
> 
>
>
>
> Hello,
>
> We have started a KIP under the Kafka project that proposes a fix for an
> inconsistency in how partition assignments are currently handled in Kafka
> when the consumer changes subscription. Note that this applies to new
> consumer only.
> The KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change
>
> The compatibility section
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-70%3A+Revise+Partition+Assignment+Semantics+on+New+Consumer%27s+Subscription+Change#KIP-70:RevisePartitionAssignmentSemanticsonNewConsumer'sSubscriptionChange-Compatibility,Deprecation,andMigrationPlan)
> describes impacted users.
>
> We would like to know if any Apache Storm user would be affected by this
> change. Thanks.
>
> Regards,
> --Vahid
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Cartesian join between DataFrames

2016-07-25 Thread Nicholas Chammas
Oh, sorry you’re right. I looked at the doc for join()

and didn’t realize you could do a cartesian join. But it turns out that
df1.join(df2) does the job and matches the SQL equivalent too.
​

On Mon, Jul 25, 2016 at 6:45 PM Reynold Xin  wrote:

> DataFrame can do cartesian joins.
>
>
> On July 25, 2016 at 3:43:19 PM, Nicholas Chammas (
> nicholas.cham...@gmail.com) wrote:
>
> It appears that RDDs can do a cartesian join, but not DataFrames. Is there
> a fundamental reason why not, or is this just waiting for someone to
> implement?
>
> I know you can get the RDDs underlying the DataFrames and do the cartesian
> join that way, but you lose the schema of course.
>
> Nick
>
>


Re: Cartesian join between DataFrames

2016-07-25 Thread Reynold Xin
DataFrame can do cartesian joins.


On July 25, 2016 at 3:43:19 PM, Nicholas Chammas (nicholas.cham...@gmail.com)
wrote:

It appears that RDDs can do a cartesian join, but not DataFrames. Is there
a fundamental reason why not, or is this just waiting for someone to
implement?

I know you can get the RDDs underlying the DataFrames and do the cartesian
join that way, but you lose the schema of course.

Nick


Cartesian join between DataFrames

2016-07-25 Thread Nicholas Chammas
It appears that RDDs can do a cartesian join, but not DataFrames. Is there
a fundamental reason why not, or is this just waiting for someone to
implement?

I know you can get the RDDs underlying the DataFrames and do the cartesian
join that way, but you lose the schema of course.

Nick


[build system] jenkins downtime friday afternoon, july 29th 2016

2016-07-25 Thread shane knapp
around 1pm  friday, july 29th, we will be taking jenkins down for a
rack move and celebrating national systems administrator day.

the outage should only last a couple of hours at most, and will be
concluded with champagne toasts.

yes, the outage and holiday are real, but the champagne in the colo is
not...  ;)

shane

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-25 Thread Luciano Resende
When are we planning to push the release maven artifacts ? We are waiting
for this in order to push an official Apache Bahir release supporting Spark
2.0.

On Sat, Jul 23, 2016 at 7:05 AM, Reynold Xin  wrote:

> The vote has passed with the following +1 votes and no -1 votes. I will
> work on packaging the new release next week.
>
>
> +1
>
> Reynold Xin*
> Sean Owen*
> Shivaram Venkataraman*
> Jonathan Kelly
> Joseph E. Gonzalez*
> Krishna Sankar
> Dongjoon Hyun
> Ricardo Almeida
> Joseph Bradley*
> Matei Zaharia*
> Luciano Resende
> Holden Karau
> Michael Armbrust*
> Felix Cheung
> Suresh Thalamati
> Kousuke Saruta
> Xiao Li
>
>
> * binding votes
>
>
> On July 19, 2016 at 7:35:19 PM, Reynold Xin (r...@databricks.com) wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Cody Koeninger
For 2.0, the kafka dstream support is in two separate subprojects
depending on which version of Kafka you are using

spark-streaming-kafka-0-10
or
spark-streaming-kafka-0-8

corresponding to brokers that are version 0.10+ or 0.8+

On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin  wrote:
> The presentation at Spark Summit SF was probably referring to Structured
> Streaming. The existing Spark Streaming (dstream) in Spark 2.0 has the same
> production stability level as Spark 1.6. There is also Kafka 0.10 support in
> dstream.
>
> On July 25, 2016 at 10:26:49 AM, Andy Davidson
> (a...@santacruzintegration.com) wrote:
>
> Hi Kevin
>
> Just a heads up at the recent spark summit in S.F. There was a presentation
> about streaming in 2.0. They said that streaming was not going to production
> ready in 2.0.
>
> I am not sure if the older 1.6.x version will be supported. My project will
> not be able to upgrade with streaming support. We also use kafka
>
> Andy
>
> From: Marco Mistroni 
> Date: Monday, July 25, 2016 at 2:33 AM
> To: kevin 
> Cc: "user @spark" , "dev.spark"
> 
> Subject: Re: where I can find spark-streaming-kafka for spark2.0
>
> Hi Kevin
>   you should not need to rebuild everything.
> Instead, i believe you should launch spark-submit by specifying the kafka
> jar file in your --packages... i had to follow same when integrating spark
> streaming with flume
>
>   have you checked this link ?
> https://spark.apache.org/docs/latest/streaming-kafka-integration.html
>
>
> hth
>
>
>
> On Mon, Jul 25, 2016 at 10:20 AM, kevin  wrote:
>>
>> I have compile it from source code
>>
>> 2016-07-25 12:05 GMT+08:00 kevin :
>>>
>>> hi,all :
>>> I try to run example org.apache.spark.examples.streaming.KafkaWordCount ,
>>> I got error :
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/streaming/kafka/KafkaUtils$
>>> at
>>> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
>>> at
>>> org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.spark.streaming.kafka.KafkaUtils$
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> ... 11 more
>>>
>>> so where I can find spark-streaming-kafka for spark2.0
>>
>>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Nested/Chained case statements generate codegen over 64k exception

2016-07-25 Thread Jonathan Gray
I came back to this to try and investigate further using the latest version
of the project.  However, I don't have enough experience with the code base
to understand fully what is now happening, could someone take a look at the
testcase attached to this JIRA and run on the latest version of the code
base?

It currently appears as one branch of the code receives the code
compilation exception and so applies the fallback.  However, subsequent a
similar exception is thrown for different branches of the code (does the
non-compilable code get put into a cache somewhere?)  So, where it should
now be falling back to non-codegen it doesn't appear to completely.

On 19 May 2016 at 09:25, Jonathan Gray  wrote:

> That makes sense, I will take a look there first. That will at least give
> a clearer understanding of the problem space to determine when to fallback.
> On 15 May 2016 3:02 am, "Reynold Xin"  wrote:
>
>> It might be best to fix this with fallback first, and then figure out how
>> we can do it more intelligently.
>>
>>
>>
>> On Sat, May 14, 2016 at 2:29 AM, Jonathan Gray 
>> wrote:
>>
>>> Hi,
>>>
>>> I've raised JIRA SPARK-15258 (with code attached to re-produce problem)
>>> and would like to have a go at fixing it but don't really know where to
>>> start.  Could anyone provide some pointers?
>>>
>>> I've looked at the code associated with SPARK-13242 but was hoping to
>>> find a way to avoid the codegen fallback.  Is this something that is
>>> possible?
>>>
>>> Thanks,
>>> Jon
>>>
>>
>>


Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Reynold Xin
The presentation at Spark Summit SF was probably referring to Structured
Streaming. The existing Spark Streaming (dstream) in Spark 2.0 has the same
production stability level as Spark 1.6. There is also Kafka 0.10 support
in dstream.

On July 25, 2016 at 10:26:49 AM, Andy Davidson (
a...@santacruzintegration.com) wrote:

Hi Kevin

Just a heads up at the recent spark summit in S.F. There was a presentation
about streaming in 2.0. They said that streaming was not going to
production ready in 2.0.

I am not sure if the older 1.6.x version will be supported. My project will
not be able to upgrade with streaming support. We also use kafka

Andy

From: Marco Mistroni 
Date: Monday, July 25, 2016 at 2:33 AM
To: kevin 
Cc: "user @spark" , "dev.spark" 
Subject: Re: where I can find spark-streaming-kafka for spark2.0

Hi Kevin
  you should not need to rebuild everything.
Instead, i believe you should launch spark-submit by specifying the kafka
jar file in your --packages... i had to follow same when integrating spark
streaming with flume

  have you checked this link ?
https://spark.apache.org/docs/latest/streaming-kafka-integration.html


hth



On Mon, Jul 25, 2016 at 10:20 AM, kevin  wrote:

> I have compile it from source code
>
> 2016-07-25 12:05 GMT+08:00 kevin :
>
>> hi,all :
>> I try to run example org.apache.spark.examples.streaming.KafkaWordCount ,
>> I got error :
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>> at
>> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
>> at
>> org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.streaming.kafka.KafkaUtils$
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 11 more
>>
>> so where I can find spark-streaming-kafka for spark2.0
>>
>
>


get hdfs file path in spark

2016-07-25 Thread Yang Cao
Hi,
To be new here, I hope to get assistant from you guys. I wonder whether I have 
some elegant way to get some directory under some path. For example, I have a 
path like on hfs /a/b/c/d/e/f, and I am given a/b/c, is there any straight 
forward way to get the path /a/b/c/d/e . I think I can do it with the help of 
regex. But I still hope to find whether there is easier way that make my code 
cleaner. My evn: spark 1.6, language: Scala


Thx
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark RC5 - OutOfMemoryError: Requested array size exceeds VM limit

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi,

I am running some tpcds queries (data is Parquet stored in hdfs) with spark 2.0 
rc5 and for some queries I get this OOM:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:214)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Can you please provide some hints on how to avoid/fix it?

Thank you!

Best,
Ovidiu

Re: orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Thank you! Any chance for this work being reviewed and integrated with next 
Spark release?

Best,
Ovidiu
> On 25 Jul 2016, at 12:20, Hyukjin Kwon  wrote:
> 
> For the question 1., It is possible but not supported yet. Please refer 
> https://github.com/apache/spark/pull/13775 
> 
> 
> Thanks!
> 
> 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU 
> mailto:ovidiu-cristian.ma...@inria.fr>>:
> Hi,
> 
> Assuming I have some data in both ORC/Parquet formats, and some complex 
> workflow that eventually combine results of some queries on these datasets, I 
> would like to get the best execution and looking at the default configs I 
> noticed:
> 
> 1) Vectorized query execution possible with Parquet only, can you confirm 
> this is possible with the ORC format?
> 
> parameter spark.sql.parquet.enableVectorizedReader
> [1] 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
>  
> 
> Hive is assuming ORC, parameter hive.vectorized.execution.enabled
> [2] 
> https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution 
> 
> 
> 2) Enabling filter pushdown is by default true for Parquet only, why not also 
> for ORC?
> spark.sql.parquet.filterPushdown=true
> spark.sql.orc.filterPushdown=false
> 
> 3) Should I even try to process ORC format with Spark at it seems there is 
> Parquet native support?
> 
> 
> Thank you!
> 
> Best,
> Ovidiu
> 



Re: orc/parquet sql conf

2016-07-25 Thread Hyukjin Kwon
For the question 1., It is possible but not supported yet. Please refer
https://github.com/apache/spark/pull/13775

Thanks!

2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr>:

> Hi,
>
> Assuming I have some data in both ORC/Parquet formats, and some complex
> workflow that eventually combine results of some queries on these datasets,
> I would like to get the best execution and looking at the default configs I
> noticed:
>
> 1) Vectorized query execution possible with Parquet only, can you confirm
> this is possible with the ORC format?
>
> parameter spark.sql.parquet.enableVectorizedReader
> [1]
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
> Hive is assuming ORC, parameter hive.vectorized.execution.enabled
> [2]
> https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution
>
> 2) Enabling filter pushdown is by default true for Parquet only, why not
> also for ORC?
> spark.sql.parquet.filterPushdown=true
> spark.sql.orc.filterPushdown=false
>
> 3) Should I even try to process ORC format with Spark at it seems there is
> Parquet native support?
>
>
> Thank you!
>
> Best,
> Ovidiu
>


orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi,

Assuming I have some data in both ORC/Parquet formats, and some complex 
workflow that eventually combine results of some queries on these datasets, I 
would like to get the best execution and looking at the default configs I 
noticed:

1) Vectorized query execution possible with Parquet only, can you confirm this 
is possible with the ORC format?

parameter spark.sql.parquet.enableVectorizedReader
[1] 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 

Hive is assuming ORC, parameter hive.vectorized.execution.enabled
[2] https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution 


2) Enabling filter pushdown is by default true for Parquet only, why not also 
for ORC?
spark.sql.parquet.filterPushdown=true
spark.sql.orc.filterPushdown=false

3) Should I even try to process ORC format with Spark at it seems there is 
Parquet native support?


Thank you!

Best,
Ovidiu

Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread Tomasz Gawęda
Hi,

Please change Scala version to 2.11.  As far as I know, Spark packages are now 
build with Scala 2.11 and I've got other - 2.10 - version



Od: kevin 
Wysłane: 25 lipca 2016 11:33
Do: user.spark; dev.spark
Temat: spark2.0 can't run SqlNetworkWordCount

hi,all:
I download spark2.0 per-build. I can run SqlNetworkWordCount test use : 
bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount master1 


but when I use spark2.0 example source code SqlNetworkWordCount.scala and build 
it to a jar bao with dependencies ( JDK 1.8 AND SCALA2.10)
when I use spark-submit to run it I got error:

16/07/25 17:28:30 INFO scheduler.JobScheduler: Starting job streaming job 
146943891 ms.0 from job set of time 146943891 ms
Exception in thread "streaming-job-executor-2" java.lang.NoSuchMethodError: 
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:67)
at main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:61)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:247)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:246)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)






spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread kevin
hi,all:
I download spark2.0 per-build. I can run SqlNetworkWordCount test use :
bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount
master1 

but when I use spark2.0 example source code SqlNetworkWordCount.scala and
build it to a jar bao with dependencies ( JDK 1.8 AND SCALA2.10)
when I use spark-submit to run it I got error:

16/07/25 17:28:30 INFO scheduler.JobScheduler: Starting job streaming job
146943891 ms.0 from job set of time 146943891 ms
Exception in thread "streaming-job-executor-2" java.lang.NoSuchMethodError:
scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at
main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:67)
at
main.SqlNetworkWordCount$$anonfun$main$1.apply(SqlNetworkWordCount.scala:61)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:247)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:246)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
I have compile it from source code

2016-07-25 12:05 GMT+08:00 kevin :

> hi,all :
> I try to run example org.apache.spark.examples.streaming.KafkaWordCount ,
> I got error :
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
> at
> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
> at
> org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.spark.streaming.kafka.KafkaUtils$
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 11 more
>
> so where I can find spark-streaming-kafka for spark2.0
>