Re: streaming pdf
Why does it have to be a stream? > Am 18.11.2018 um 23:29 schrieb Nicolas Paris : > > Hi > > I have pdf to load into spark with at least > format. I have considered some options: > > - spark streaming does not provide a native file stream for binary with > variable size (binaryRecordStream specifies a constant size) and I > would have to write my own receiver. > > - Structured streaming allows to process avro/parquet/orc files > containing pdfs, but this makes things more complicated than > monitoring a simple folder containing pdfs > > - Kafka is not designed to handle messages > 100KB, and for this reason > it is not a good option to use in the stream pipeline. > > Somebody has a suggestion ? > > Thanks, > > -- > nicolas > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
streaming pdf
Hi I have pdf to load into spark with at least format. I have considered some options: - spark streaming does not provide a native file stream for binary with variable size (binaryRecordStream specifies a constant size) and I would have to write my own receiver. - Structured streaming allows to process avro/parquet/orc files containing pdfs, but this makes things more complicated than monitoring a simple folder containing pdfs - Kafka is not designed to handle messages > 100KB, and for this reason it is not a good option to use in the stream pipeline. Somebody has a suggestion ? Thanks, -- nicolas - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
CVE-2018-17190: Unsecured Apache Spark standalone executes user code
Severity: Low Vendor: The Apache Software Foundation Versions Affected: All versions of Apache Spark Description: Spark's standalone resource manager accepts code to execute on a 'master' host, that then runs that code on 'worker' hosts. The master itself does not, by design, execute user code. A specially-crafted request to the master can, however, cause the master to execute code too. Note that this does not affect standalone clusters with authentication enabled. While the master host typically has less outbound access to other resources than a worker, the execution of code on the master is nevertheless unexpected. Mitigation: Enable authentication on any Spark standalone cluster that is not otherwise secured from unwanted access, for example by network-level restrictions. Use spark.authenticate and related security properties described at https://spark.apache.org/docs/latest/security.html Credit: Andre Protas, Apple Information Security References: https://spark.apache.org/security.html - To unsubscribe e-mail: user-unsubscr...@spark.apache.org