By writing PDF files, do you mean something equivalent to a hadoop fs -put
/path?
I'm not sure how Pdfbox works though, have you tried writing individually
without spark?
We can potentially look if you have established that as a starting point to
see how Spark can be interfaced to write to HDFS.
I would like to write pdf files using pdfbox to HDFS from my Spark
application. Can this be done?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-a-Spark-App-run-with-spark-submit-write-pdf-files-to-HDFS-tp23233.html
Sent from the Apache Spark User
I don't know anything about your use case, so take this with a grain of
salt, but typically if you are operating at a scale that benefits from
Spark, then you likely will not want to write your output records as
individual files into HDFS. Spark has built-in support for the Hadoop
SequenceFile