Re: DataFrame to read json and include raw Json in DataFrame

Annabel Melongo Thu, 29 Dec 2016 21:25:36 -0800

Richard,
In the provided documentation, under the paragraph "Schema Merging", you can 
actually perform what you want this way:
1. Create a schema that read the raw json, line by line
2. Create another schema that read the json file and structure it in ("id", 
"ln", "fn"....)
3. Merge the two schemas and you'll get what you want.
Thanks


    On Thursday, December 29, 2016 7:18 PM, Richard Xin 
<richardxin...@yahoo.com> wrote:
 

 thanks, I have seen this, but this doesn't cover my question.
What I need is read json and include raw json as part of my dataframe. 

    On Friday, December 30, 2016 10:23 AM, Annabel Melongo 
<melongo_anna...@yahoo.com.INVALID> wrote:
 

 Richard,
Below documentation will show you how to create a sparkSession and how to 
programmatically load data:
Spark SQL and DataFrames - Spark 2.1.0 Documentation

  
|  
|   |  
Spark SQL and DataFrames - Spark 2.1.0 Documentation
   |  |

  |

 
 

    On Thursday, December 29, 2016 5:16 PM, Richard Xin 
<richardxin...@yahoo.com.INVALID> wrote:
 

 Say I have following data in file:{"id":1234,"ln":"Doe","fn":"John","age":25}
{"id":1235,"ln":"Doe","fn":"Jane","age":22}
java code snippet:        final SparkConf sparkConf = new 
SparkConf().setMaster("local[2]").setAppName("json_test");
        JavaSparkContext ctx = new JavaSparkContext(sparkConf);
        HiveContext hc = new HiveContext(ctx.sc());
        DataFrame df = hc.read().json("files/json/example2.json");

what I need is a DataFrame with columns id, ln, fn, age as well as raw_json 
string
any advice on the best practice in java?Thanks,
Richard

Re: DataFrame to read json and include raw Json in DataFrame

Reply via email to