Ayush Verma created BAHIR-223:
---------------------------------
Summary: Concern around reliability of sql-streaming-sqs
Key: BAHIR-223
URL: https://issues.apache.org/jira/browse/BAHIR-223
Project: Bahir
Issue Type: Bug
Components: Spark Structured Streaming Connectors
Reporter: Ayush Verma
Looking at the source for the *sql-streaming-sqs* connector, it seems that we
delete the messages in SQS on every fetchMaxOffset() call.
[https://github.com/apache/bahir/blob/3912360ca5bcca269a30ff42120cac46934693c4/sql-streaming-sqs/src/main/scala/org/apache/spark/sql/streaming/sqs/SqsSource.scala#L106]
My understanding of a spark streaming source is that a call to the commit()
method signals that spark has completed processing up-to the given offset.
Should we not delete the SQS messages on a call to commit() instead?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)