Hello, using the full class name worked, thanks.
the problem now is that when I consume the dataframe for example with count I get the stack trace below. I followed the implementation of TextSocketSourceProvider <https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala#L128> to implement my data source and Text Socket source is used in the official documentation here <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example> . Why does count works in the example documentation? is there some other trait that need to be implemented ? Thanks, Ayoub. org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start(); > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:33) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:31) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:31) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:59) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:70) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83) > at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2541) > at org.apache.spark.sql.Dataset.count(Dataset.scala:2216) 2016-07-31 21:56 GMT+02:00 Michael Armbrust <mich...@databricks.com>: > You have to add a file in resource too (example > <https://github.com/apache/spark/blob/master/sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister>). > Either that or give a full class name. > > On Sun, Jul 31, 2016 at 9:45 AM, Ayoub Benali <benali.ayoub.i...@gmail.com > > wrote: > >> Looks like the way to go in spark 2.0 is to implement >> StreamSourceProvider >> <https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L117> >> with DataSourceRegister >> <https://github.com/apache/spark/blob/9e2c763dbb5ac6fc5d2eb0759402504d4b9073a4/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L40>. >> But now spark fails at loading the class when doing: >> >> spark.readStream.format("mysource").load() >> >> I get : >> >> java.lang.ClassNotFoundException: Failed to find data source: mysource. >> Please find packages at http://spark-packages.org >> >> Is there something I need to do in order to "load" the Stream source >> provider ? >> >> Thanks, >> Ayoub >> >> 2016-07-31 17:19 GMT+02:00 Jacek Laskowski <ja...@japila.pl>: >> >>> On Sun, Jul 31, 2016 at 12:53 PM, Ayoub Benali >>> <benali.ayoub.i...@gmail.com> wrote: >>> >>> > I started playing with the Structured Streaming API in spark 2.0 and I >>> am >>> > looking for a way to create streaming Dataset/Dataframe from a rest >>> HTTP >>> > endpoint but I am bit stuck. >>> >>> What a great idea! Why did I myself not think about this?!?! >>> >>> > What would be the easiest way to hack around it ? Do I need to >>> implement the >>> > Datasource API ? >>> >>> Yes and perhaps Hadoop API too, but not sure which one exactly since I >>> haven't even thought about it (not even once). >>> >>> > Are there examples on how to create a DataSource from a REST endpoint ? >>> >>> Never heard of one. >>> >>> I'm hosting a Spark/Scala meetup this week so I'll definitely propose >>> it as a topic. Thanks a lot! >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> ---- >>> https://medium.com/@jaceklaskowski/ >>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >> >> >