On Nov 2, 2016, at 2:22 PM, Daniel Siegmann <dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:
Yes, it needs to be on a single line. Spark (or Hadoop really) treats newlines as a record separator by default. While it is possible to use a different string as a record separator, what would you use in the case of JSON? If you do some Googling I suspect you'll find some possible solutions. Personally, I would just use a separate JSON library (e.g. json4s) to parse this metadata into an object, rather than trying to read it in through Spark. Yeah, that’s the basic idea. This JSON is metadata to help drive the process not row records… although the column descriptors are row records so in the short term I could cheat and just store those in a file. :-( -- Daniel Siegmann Senior Software Engineer SecurityScorecard Inc. 214 W 29th Street, 5th Floor New York, NY 10001