Re: spark 2.0 readStream from a REST API

Ayoub Benali Mon, 01 Aug 2016 02:02:11 -0700

Hello,

using the full class name worked, thanks.


the problem now is that when I consume the dataframe for example with count
I get the stack trace below.

I followed the implementation of TextSocketSourceProvider
<https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala#L128>
to
implement my data source and Text Socket source is used in the official
documentation here
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example>
.

Why does count works in the example documentation? is there some other
trait that need to be implemented ?

Thanks,
Ayoub.


org.apache.spark.sql.AnalysisException: Queries with streaming sources must
> be executed with writeStream.start();
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:33)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:31)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> at
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:31)
> at
> org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:59)
> at
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:70)
> at
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68)
> at
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74)
> at
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
> at
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
> at
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2541)
> at org.apache.spark.sql.Dataset.count(Dataset.scala:2216)







2016-07-31 21:56 GMT+02:00 Michael Armbrust <mich...@databricks.com>:

> You have to add a file in resource too (example
> <https://github.com/apache/spark/blob/master/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister>).
> Either that or give a full class name.
>
> On Sun, Jul 31, 2016 at 9:45 AM, Ayoub Benali <benali.ayoub.i...@gmail.com
> > wrote:
>
>> Looks like the way to go in spark 2.0 is to implement
>> StreamSourceProvider
>> <https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L117>
>>  with DataSourceRegister
>> <https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L40>.
>> But now spark fails at loading the class when doing:
>>
>> spark.readStream.format("mysource").load()
>>
>> I get :
>>
>> java.lang.ClassNotFoundException: Failed to find data source: mysource.
>> Please find packages at http://spark-packages.org
>>
>> Is there something I need to do in order to "load" the Stream source
>> provider ?
>>
>> Thanks,
>> Ayoub
>>
>> 2016-07-31 17:19 GMT+02:00 Jacek Laskowski <ja...@japila.pl>:
>>
>>> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali
>>> <benali.ayoub.i...@gmail.com> wrote:
>>>
>>> > I started playing with the Structured Streaming API in spark 2.0 and I
>>> am
>>> > looking for a way to create streaming Dataset/Dataframe from a rest
>>> HTTP
>>> > endpoint but I am bit stuck.
>>>
>>> What a great idea! Why did I myself not think about this?!?!
>>>
>>> > What would be the easiest way to hack around it ? Do I need to
>>> implement the
>>> > Datasource API ?
>>>
>>> Yes and perhaps Hadoop API too, but not sure which one exactly since I
>>> haven't even thought about it (not even once).
>>>
>>> > Are there examples on how to create a DataSource from a REST endpoint ?
>>>
>>> Never heard of one.
>>>
>>> I'm hosting a Spark/Scala meetup this week so I'll definitely propose
>>> it as a topic. Thanks a lot!
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>
>

Re: spark 2.0 readStream from a REST API

Reply via email to