[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016213#comment-17016213 ] Sowmya commented on SPARK-18165: Hi [~itsvikramagr] ,Can this library be used in production with spark 2.4 code rather than command line? I tried to use it with spark code(sbt in intellij) I tried to use it with this code and see an error "Cannot find data source kinesis". val spark = SparkSession.builder .master("local") .appName("Spark Kinesis Connector") .getOrCreate() println(spark) val kinesis = spark .readStream .format("kinesis") .option("streamName", "stream name") .option("endpointUrl", "kinesis endpoint") .option("startingposition", "TRIM_HORIZON") .load > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740555#comment-16740555 ] Aman Mundra commented on SPARK-18165: - Hi [~itsvikramagr], any idea when this jar would be coming out as a production feature in spark 2.2 or later? > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711319#comment-16711319 ] Vikram Agrawal commented on SPARK-18165: Hi [~danielil] - right now it is available at https://github.com/qubole/kinesis-sql > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711128#comment-16711128 ] Daniel Haviv commented on SPARK-18165: -- Hi [~itsvikramagr], I couldn't find it in spark-packages.com. In what way is it available? Thanks! > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696303#comment-16696303 ] Vikram Agrawal commented on SPARK-18165: [~piyush9194] - The library is already available for both 2.3 and 2.4. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696217#comment-16696217 ] piyush gupta commented on SPARK-18165: -- [~itsvikramagr]: Can you please tell me when the library for Spark 2.3 will be available? > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494732#comment-16494732 ] Vikram Agrawal commented on SPARK-18165: [~mail2sivan...@gmail.com] - This library has been tested and developed against SPARK-2.2.X. I understand that you are trying it against SPARK-2.3.0. Can you please raise an issue in the kinesis-sql repo (https://github.com/qubole/kinesis-sql) and we can have a further discussion there. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494403#comment-16494403 ] sivanesh selvanataraj commented on SPARK-18165: --- [~itsvikramagr] I got this error when i execute {{ kinesis .selectExpr("CAST(data AS STRING)").as[(String)] .groupBy("data").count() .writeStream .format("console") .outputMode("complete") .start() .awaitTermination()}} ERROR MicroBatchExecution:91 - Query [id = 52e761d3-02f7-4352-9c8a-d1f59d7938bb, runId = f82e0f00-9c88-4d52-ae26-9ff54d8267c3] terminated with error java.lang.AbstractMethodError at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99) at org.apache.spark.sql.kinesis.KinesisSource.initializeLogIfNecessary(KinesisSource.scala:51) at org.apache.spark.internal.Logging$class.log(Logging.scala:46) at org.apache.spark.sql.kinesis.KinesisSource.log(KinesisSource.scala:51) at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) at org.apache.spark.sql.kinesis.KinesisSource.logInfo(KinesisSource.scala:51) at org.apache.spark.sql.kinesis.KinesisSource.prevBatchShardInfo(KinesisSource.scala:211) at org.apache.spark.sql.kinesis.KinesisSource.getOffset(KinesisSource.scala:105) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$5$$anonfun$apply$5.apply(MicroBatchExecution.scala:268) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$5$$anonfun$apply$5.apply(MicroBatchExecution.scala:268) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$5.apply(MicroBatchExecution.scala:267) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$5.apply(MicroBatchExecution.scala:264) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:264) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:237) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:124) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189) > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466897#comment-16466897 ] Vikram Agrawal commented on SPARK-18165: Thanks [~marmbrus] - Planning to start the work on porting the connector in next few weeks. Will share my feedbacks/ask for help once I am ready. - Thanks for your suggestion. Will check out apache Bahir/Spark Packages and start a PR once I have ported my changes to DataSourceV2 APIs. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466410#comment-16466410 ] Michael Armbrust commented on SPARK-18165: -- This is great! I'm glad there are more connectors for Structured Streaming! A few high-level thoughts: - The current Source/Sink APIs are internal/unstable. We are working on building public/stable APIs as part of DataSourceV2. Would be great to get feedback on those APIs if this is ported to them - In general as the Spark project scales, we are trying to move more of the connectors out of the core project. I'd suggest looking at contributing this to Apache Bahir and/or Spark Packages. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391557#comment-16391557 ] Vikram Agrawal commented on SPARK-18165: [~gaurav24] - yeah I saw that. Nonetheless, I have spent enough time going through available Kinesis APIs and Structured Streaming Source Provider requirement to come up with this library. You can give it a try and share your feedbacks/suggestions. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391501#comment-16391501 ] Gaurav Shah commented on SPARK-18165: - Databricks have it implemented not sure why is it exclusive for databricks customers only. https://docs.databricks.com/spark/latest/structured-streaming/kinesis.html > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391305#comment-16391305 ] Vikram Agrawal commented on SPARK-18165: I have worked on an implementation of Kinesis Integration as a source for Structured Streaming. It's available here: https://github.com/qubole/kinesis-sql. Please try it out. Would be happy to discuss the design details and work on any concerns. If the implementation is acceptable and there is enough interest, I will start a PR for it. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339812#comment-16339812 ] Swaranga Sarma commented on SPARK-18165: Any updates? > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos >Priority: Major > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931574#comment-15931574 ] Gaurav Shah commented on SPARK-18165: - anything that I can do to help for this feature ? > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714327#comment-15714327 ] Takeshi Yamamuro commented on SPARK-18165: -- Thanks for the reference! I'd like to discuss the kinesis integration for structured streaming in future after the component becomes stable in 2.1 (or 2.2?). So, this is my prototype to check feasibility for implementing the kinesis integration on the current structured streaming APIs. > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18165) Kinesis support in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-18165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671010#comment-15671010 ] Herman van Hovell commented on SPARK-18165: --- [~maropu] wrote something for this: https://twitter.com/maropu/status/798757811710664704 > Kinesis support in Structured Streaming > --- > > Key: SPARK-18165 > URL: https://issues.apache.org/jira/browse/SPARK-18165 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Lauren Moos > > Implement Kinesis based sources and sinks for Structured Streaming -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org