Hi I have pdf to load into spark with at least <filename, byte_array> format. I have considered some options:
- spark streaming does not provide a native file stream for binary with variable size (binaryRecordStream specifies a constant size) and I would have to write my own receiver. - Structured streaming allows to process avro/parquet/orc files containing pdfs, but this makes things more complicated than monitoring a simple folder containing pdfs - Kafka is not designed to handle messages > 100KB, and for this reason it is not a good option to use in the stream pipeline. Somebody has a suggestion ? Thanks, -- nicolas --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org