Hi, there

Spark has provided json document processing feature for a long time. In
most examples I see, each line is a json object in the sample file. That is
the easiest case. But how can we process a json document, which does not
conform to this standard format (one line per json object)? Here is the
document I am working on.

First of all, it is multiple lines for one single big json object. The real
file can be as long as 20+ G. Within that one single json object, it
contains many name/value pairs. The name is some kind of id values. The
value is the actual json object that I would like to be part of dataframe.
Is there any way to do that? Appreciate any input.


{
"id1": {
"Title":"title1",
"Author":"Tom",
"Source":{
"Date":"20160506",
"Type":"URL"
},
"Data":" blah blah"},

"id2": {
"Title":"title2",
"Author":"John",
"Source":{
"Date":"20150923",
"Type":"URL"
},
"Data":" blah blah "},

"id3: {
"Title":"title3",
"Author":"John",
"Source":{
"Date":"20150902",
"Type":"URL"
},
"Data":" blah blah "}
}

Reply via email to