Re: Quirk in how Spark DF handles JSON input records?

Michael Segel Wed, 02 Nov 2016 12:39:24 -0700

On Nov 2, 2016, at 2:22 PM, Daniel Siegmann 
<dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:


Yes, it needs to be on a single line. Spark (or Hadoop really) treats newlines 
as a record separator by default. While it is possible to use a different 
string as a record separator, what would you use in the case of JSON?

If you do some Googling I suspect you'll find some possible solutions. 
Personally, I would just use a separate JSON library (e.g. json4s) to parse 
this metadata into an object, rather than trying to read it in through Spark.


Yeah, that’s the basic idea.

This JSON is metadata to help drive the process not row records… although the 
column descriptors are row records so in the short term I could cheat and just 
store those in a file.

:-(

--
Daniel Siegmann
Senior Software Engineer
SecurityScorecard Inc.
214 W 29th Street, 5th Floor
New York, NY 10001

Re: Quirk in how Spark DF handles JSON input records?

Reply via email to