Hi,
I have started to play around with structured streaming and it seems the
documentation (structured streaming programming guide) does not match the
actual behavior I am seeing.
It says in the documentation that maxFilesPerTrigger (as well as latestFirst)
are options for the File sink. However, in fact, at least maxFilesPerTrigger
does not seem to have any real effect. On the other hand, the streaming source
(readStream) which has no documentation for this option, does limit the number
of files.
This behavior actually makes more sense than the documentation as I expect the
file reader to define how to read files rather than the sink (e.g. if I would
use a kafka sink or foreach sink, they should still get the same behavior from
the reading).
Thanks,
Assaf.