Re: spark 2.0 readStream from a REST API

2016-08-11 Thread Sela, Amit
The current available output modes are Complete and Append. Complete mode is 
for stateful processing (aggregations), and Append mode for stateless 
processing (I.e., map/filter). See : 
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes
Dataset#writeStream will produce a DataStreamWriter which allows you to start a 
query. This seems consistent with Spark’s previous behaviour of only executing 
upon an “action”, and the queries I guess are what “jobs” used to be.


Thanks,
Amit

From: Ayoub Benali 
<benali.ayoub.i...@gmail.com<mailto:benali.ayoub.i...@gmail.com>>
Date: Tuesday, August 2, 2016 at 11:59 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Jacek Laskowski <ja...@japila.pl<mailto:ja...@japila.pl>>, Amit Sela 
<amitsel...@gmail.com<mailto:amitsel...@gmail.com>>, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>>
Subject: Re: spark 2.0 readStream from a REST API

Why writeStream is needed to consume the data ?

When I tried it I got this exception:

INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
org.apache.spark.sql.AnalysisException: Complete output mode not supported when 
there are no streaming aggregations on streaming DataFrames/Datasets;
at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
at 
org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForStreaming(UnsupportedOperationChecker.scala:65)
at 
org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:236)
at 
org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:287)
at .(:59)



2016-08-01 18:44 GMT+02:00 Amit Sela 
<amitsel...@gmail.com<mailto:amitsel...@gmail.com>>:
I think you're missing:

valquery=wordCounts.writeStream

  .outputMode("complete")
  .format("console")
  .start()

Dis it help ?

On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski 
<ja...@japila.pl<mailto:ja...@japila.pl>> wrote:
On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
<benali.ayoub.i...@gmail.com<mailto:benali.ayoub.i...@gmail.com>> wrote:

> the problem now is that when I consume the dataframe for example with count
> I get the stack trace below.

Mind sharing the entire pipeline?

> I followed the implementation of TextSocketSourceProvider to implement my
> data source and Text Socket source is used in the official documentation
> here.

Right. Completely forgot about the provider. Thanks for reminding me about it!

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>




Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Jacek Laskowski
On Tue, Aug 2, 2016 at 10:59 AM, Ayoub Benali
 wrote:
> Why writeStream is needed to consume the data ?

You need to start your structured streaming (query) and there's no way
to access it without DataStreamWriter => writeStream's a must.

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.streaming.DataStreamWriter

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Ayoub Benali
Why writeStream is needed to consume the data ?

When I tried it I got this exception:

INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
> org.apache.spark.sql.AnalysisException: Complete output mode not supported
> when there are no streaming aggregations on streaming DataFrames/Datasets;
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForStreaming(UnsupportedOperationChecker.scala:65)
> at
> org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:236)
> at
> org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:287)
> at .(:59)




2016-08-01 18:44 GMT+02:00 Amit Sela :

> I think you're missing:
>
> val query = wordCounts.writeStream
>
>   .outputMode("complete")
>   .format("console")
>   .start()
>
> Dis it help ?
>
> On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski  wrote:
>
>> On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
>>  wrote:
>>
>> > the problem now is that when I consume the dataframe for example with
>> count
>> > I get the stack trace below.
>>
>> Mind sharing the entire pipeline?
>>
>> > I followed the implementation of TextSocketSourceProvider to implement
>> my
>> > data source and Text Socket source is used in the official documentation
>> > here.
>>
>> Right. Completely forgot about the provider. Thanks for reminding me
>> about it!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: spark 2.0 readStream from a REST API

2016-08-02 Thread Ayoub Benali
Hello,

here is the code I am trying to run:


https://gist.github.com/ayoub-benali/a96163c711b4fce1bdddf16b911475f2

Thanks,
Ayoub.

2016-08-01 13:44 GMT+02:00 Jacek Laskowski :

> On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
>  wrote:
>
> > the problem now is that when I consume the dataframe for example with
> count
> > I get the stack trace below.
>
> Mind sharing the entire pipeline?
>
> > I followed the implementation of TextSocketSourceProvider to implement my
> > data source and Text Socket source is used in the official documentation
> > here.
>
> Right. Completely forgot about the provider. Thanks for reminding me about
> it!
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>


Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Amit Sela
I think you're missing:

val query = wordCounts.writeStream

  .outputMode("complete")
  .format("console")
  .start()

Dis it help ?

On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski  wrote:

> On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
>  wrote:
>
> > the problem now is that when I consume the dataframe for example with
> count
> > I get the stack trace below.
>
> Mind sharing the entire pipeline?
>
> > I followed the implementation of TextSocketSourceProvider to implement my
> > data source and Text Socket source is used in the official documentation
> > here.
>
> Right. Completely forgot about the provider. Thanks for reminding me about
> it!
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Jacek Laskowski
On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
 wrote:

> the problem now is that when I consume the dataframe for example with count
> I get the stack trace below.

Mind sharing the entire pipeline?

> I followed the implementation of TextSocketSourceProvider to implement my
> data source and Text Socket source is used in the official documentation
> here.

Right. Completely forgot about the provider. Thanks for reminding me about it!

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 2.0 readStream from a REST API

2016-08-01 Thread Ayoub Benali
Hello,

using the full class name worked, thanks.

the problem now is that when I consume the dataframe for example with count
I get the stack trace below.

I followed the implementation of TextSocketSourceProvider

to
implement my data source and Text Socket source is used in the official
documentation here

.

Why does count works in the example documentation? is there some other
trait that need to be implemented ?

Thanks,
Ayoub.


org.apache.spark.sql.AnalysisException: Queries with streaming sources must
> be executed with writeStream.start();
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:33)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:31)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:31)
> at
> org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:70)
> at
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68)
> at
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74)
> at
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2541)
> at org.apache.spark.sql.Dataset.count(Dataset.scala:2216)







2016-07-31 21:56 GMT+02:00 Michael Armbrust :

> You have to add a file in resource too (example
> ).
> Either that or give a full class name.
>
> On Sun, Jul 31, 2016 at 9:45 AM, Ayoub Benali  > wrote:
>
>> Looks like the way to go in spark 2.0 is to implement
>> StreamSourceProvider
>> 
>>  with DataSourceRegister
>> .
>> But now spark fails at loading the class when doing:
>>
>> spark.readStream.format("mysource").load()
>>
>> I get :
>>
>> java.lang.ClassNotFoundException: Failed to find data source: mysource.
>> Please find packages at http://spark-packages.org
>>
>> Is there something I need to do in order to "load" the Stream source
>> provider ?
>>
>> Thanks,
>> Ayoub
>>
>> 2016-07-31 17:19 GMT+02:00 Jacek Laskowski :
>>
>>> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali
>>>  wrote:
>>>
>>> > I started playing with the Structured Streaming API in spark 2.0 and I
>>> am
>>> > looking for a way to create streaming Dataset/Dataframe from a rest
>>> HTTP
>>> > endpoint but I am bit stuck.
>>>
>>> What a great idea! Why did I myself not think about this?!?!
>>>
>>> > What would be the easiest way to hack around it ? Do I need to
>>> implement the
>>> > Datasource API ?
>>>
>>> Yes and perhaps Hadoop API too, but not sure which one exactly since I
>>> haven't even thought about it (not even once).
>>>
>>> > Are there examples on how to create a DataSource from a REST endpoint ?
>>>
>>> Never heard of one.
>>>
>>> I'm hosting a Spark/Scala meetup this week so I'll definitely propose
>>> it as a topic. Thanks a lot!
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> 

Re: spark 2.0 readStream from a REST API

2016-07-31 Thread Jacek Laskowski
Hi,

See 
https://github.com/jaceklaskowski/spark-workshop/tree/master/solutions/spark-mf-format.
There's a custom format that you can use to get started.

Basically, you need to develop the code behind "mysource" format and
register it using --packages or --jars or similar when you
spark-submit your Spark application.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Sun, Jul 31, 2016 at 6:45 PM, Ayoub Benali
 wrote:
> Looks like the way to go in spark 2.0 is to implement StreamSourceProvider
> with DataSourceRegister. But now spark fails at loading the class when
> doing:
>
> spark.readStream.format("mysource").load()
>
> I get :
>
> java.lang.ClassNotFoundException: Failed to find data source: mysource.
> Please find packages at http://spark-packages.org
>
> Is there something I need to do in order to "load" the Stream source
> provider ?
>
> Thanks,
> Ayoub
>
> 2016-07-31 17:19 GMT+02:00 Jacek Laskowski :
>>
>> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali
>>  wrote:
>>
>> > I started playing with the Structured Streaming API in spark 2.0 and I
>> > am
>> > looking for a way to create streaming Dataset/Dataframe from a rest HTTP
>> > endpoint but I am bit stuck.
>>
>> What a great idea! Why did I myself not think about this?!?!
>>
>> > What would be the easiest way to hack around it ? Do I need to implement
>> > the
>> > Datasource API ?
>>
>> Yes and perhaps Hadoop API too, but not sure which one exactly since I
>> haven't even thought about it (not even once).
>>
>> > Are there examples on how to create a DataSource from a REST endpoint ?
>>
>> Never heard of one.
>>
>> I'm hosting a Spark/Scala meetup this week so I'll definitely propose
>> it as a topic. Thanks a lot!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: spark 2.0 readStream from a REST API

2016-07-31 Thread Michael Armbrust
You have to add a file in resource too (example
).
Either that or give a full class name.

On Sun, Jul 31, 2016 at 9:45 AM, Ayoub Benali 
wrote:

> Looks like the way to go in spark 2.0 is to implement StreamSourceProvider
> 
>  with DataSourceRegister
> .
> But now spark fails at loading the class when doing:
>
> spark.readStream.format("mysource").load()
>
> I get :
>
> java.lang.ClassNotFoundException: Failed to find data source: mysource.
> Please find packages at http://spark-packages.org
>
> Is there something I need to do in order to "load" the Stream source
> provider ?
>
> Thanks,
> Ayoub
>
> 2016-07-31 17:19 GMT+02:00 Jacek Laskowski :
>
>> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali
>>  wrote:
>>
>> > I started playing with the Structured Streaming API in spark 2.0 and I
>> am
>> > looking for a way to create streaming Dataset/Dataframe from a rest HTTP
>> > endpoint but I am bit stuck.
>>
>> What a great idea! Why did I myself not think about this?!?!
>>
>> > What would be the easiest way to hack around it ? Do I need to
>> implement the
>> > Datasource API ?
>>
>> Yes and perhaps Hadoop API too, but not sure which one exactly since I
>> haven't even thought about it (not even once).
>>
>> > Are there examples on how to create a DataSource from a REST endpoint ?
>>
>> Never heard of one.
>>
>> I'm hosting a Spark/Scala meetup this week so I'll definitely propose
>> it as a topic. Thanks a lot!
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>
>


Re: spark 2.0 readStream from a REST API

2016-07-31 Thread Ayoub Benali
Looks like the way to go in spark 2.0 is to implement StreamSourceProvider

 with DataSourceRegister
.
But now spark fails at loading the class when doing:

spark.readStream.format("mysource").load()

I get :

java.lang.ClassNotFoundException: Failed to find data source: mysource.
Please find packages at http://spark-packages.org

Is there something I need to do in order to "load" the Stream source
provider ?

Thanks,
Ayoub

2016-07-31 17:19 GMT+02:00 Jacek Laskowski :

> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali
>  wrote:
>
> > I started playing with the Structured Streaming API in spark 2.0 and I am
> > looking for a way to create streaming Dataset/Dataframe from a rest HTTP
> > endpoint but I am bit stuck.
>
> What a great idea! Why did I myself not think about this?!?!
>
> > What would be the easiest way to hack around it ? Do I need to implement
> the
> > Datasource API ?
>
> Yes and perhaps Hadoop API too, but not sure which one exactly since I
> haven't even thought about it (not even once).
>
> > Are there examples on how to create a DataSource from a REST endpoint ?
>
> Never heard of one.
>
> I'm hosting a Spark/Scala meetup this week so I'll definitely propose
> it as a topic. Thanks a lot!
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>