Re: [Structured Streaming] Multiple sources best practice/recommendation

2017-09-14 Thread Michael Armbrust
I would probably suggest that you partition by format (though you can get the file name from the build in function input_file_name()). You can load multiple streams from different directories and union them together as long as the schema is the same after parsing. Otherwise you can just run

[Structured Streaming] Multiple sources best practice/recommendation

2017-09-13 Thread JG Perrin
Hi, I have different files being dumped on S3, I want to ingest them and join them. What does sound better to you? Have one " directory" for all or one per file format? If I have one directory for all, can you get some metadata about the file, like its name? If multiple directory, how can I