Re: Can a Spark App run with spark-submit write pdf files to HDFS

2015-06-09 Thread nsalian
By writing PDF files, do you mean something equivalent to a hadoop fs -put /path? I'm not sure how Pdfbox works though, have you tried writing individually without spark? We can potentially look if you have established that as a starting point to see how Spark can be interfaced to write to HDFS.

Can a Spark App run with spark-submit write pdf files to HDFS

2015-06-09 Thread Richard Catlin
I would like to write pdf files using pdfbox to HDFS from my Spark application. Can this be done? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-a-Spark-App-run-with-spark-submit-write-pdf-files-to-HDFS-tp23233.html Sent from the Apache Spark User

Re: Can a Spark App run with spark-submit write pdf files to HDFS

2015-06-09 Thread William Briggs
I don't know anything about your use case, so take this with a grain of salt, but typically if you are operating at a scale that benefits from Spark, then you likely will not want to write your output records as individual files into HDFS. Spark has built-in support for the Hadoop SequenceFile