First of all thank you to the Spark dev team for coming up with the standardized and intuitive API interfaces. I am sure it will encourage integrating a lot more new datasource integration.
I have been creating playing with the API and have some questions on the continuous streaming API (see https://github.com/JThakrar/sparkconn#continuous-streaming-datasource ) It seems that "commit" is never called query.status always shows the message below even after the query has been initialized, data has been streaming: { "message" : "Initializing sources", "isDataAvailable" : false, "isTriggerActive" : true } query.recentProgress always shows an empty array: Array[org.apache.spark.sql.streaming.StreamingQueryProgress] = Array() And stopping a query always shows as if the tasks were lost involuntarily or uncleanly (even though close on the datasource was called) : 2018-04-06 08:07:10 WARN TaskSetManager:66 - Lost task 2.0 in stage 1.0 (TID 7, localhost, executor driver): TaskKilled (Stage cancelled) 2018-04-06 08:07:10 WARN TaskSetManager:66 - Lost task 1.0 in stage 1.0 (TID 6, localhost, executor driver): TaskKilled (Stage cancelled) 2018-04-06 08:07:10 WARN TaskSetManager:66 - Lost task 3.0 in stage 1.0 (TID 8, localhost, executor driver): TaskKilled (Stage cancelled) 2018-04-06 08:07:10 WARN TaskSetManager:66 - Lost task 0.0 in stage 1.0 (TID 5, localhost, executor driver): TaskKilled (Stage cancelled) 2018-04-06 08:07:10 WARN TaskSetManager:66 - Lost task 4.0 in stage 1.0 (TID 9, localhost, executor driver): TaskKilled (Stage cancelled) Any pointers/info will be greatly appreciated.