Hi,

We have a spark structured streaming monitoring a folder and converting
jsonl files into parquet. However, if there are some pre-existing jsonl
files before the first time (no check point yet) running of the spark
streaming job, these files will not be processed by the spark job when it
runs. We need to do something like
https://stackoverflow.com/questions/44618783/spark-streaming-only-streams-files-created-after-the-stream-initialization-time
.

Is there a way for the spark streaming job to pick up the pre-existing
files? For example, is there a setting for this? Appreciate any clue.

Reply via email to