Hi Nicolas thanks much for the reply. Do you have any sample code somewhere?
Do your just keep pdf in avro binary all the time? How often you parse into
text using pdfbox? Is it on demand basis or you always parse as text and
keep pdf as binary in avro as just interim state?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to