I've been looking at a way to use existing benchmarks converted into a complex 
json document.

Take for example TPCH benchmark, which has PKey-FKey relations. 

So for a JSON output for a query like this:
0: jdbc:drill:schme=dfs.tpchDri1000> select  r.r_NAME, n.n_NAME 
. . . . . . . . . . . . . . . . . .>   , r.r_REGIONKEY 
. . . . . . . . . . . . . . . . . .>   , n.n_NATIONKEY 
. . . . . . . . . . . . . . . . . .>   , n.n_REGIONKEY 
. . . . . . . . . . . . . . . . . .> from nation n,region r 
. . . . . . . . . . . . . . . . . .> where n.n_regionkey = r.r_regionkey
. . . . . . . . . . . . . . . . . .> order by r.r_NAME, n.n_NAME;
+--------------+-----------------+--------------+--------------+--------------+
|    r_NAME    |     n_NAME      | r_REGIONKEY  | n_NATIONKEY  | n_REGIONKEY  |
+--------------+-----------------+--------------+--------------+--------------+
| AFRICA       | ALGERIA         | 0            | 0            | 0            |
| AFRICA       | ETHIOPIA        | 0            | 5            | 0            |
| AFRICA       | KENYA           | 0            | 14           | 0            |
| AFRICA       | MOROCCO         | 0            | 15           | 0            |
| AFRICA       | MOZAMBIQUE      | 0            | 16           | 0            |
| AMERICA      | ARGENTINA       | 1            | 1            | 1            |
| AMERICA      | BRAZIL          | 1            | 2            | 1            |
| AMERICA      | CANADA          | 1            | 3            | 1            |
| AMERICA      | PERU            | 1            | 17           | 1            |
| AMERICA      | UNITED STATES   | 1            | 24           | 1            |
| ASIA         | CHINA           | 2            | 18           | 2            |
| ASIA         | INDIA           | 2            | 8            | 2            |
| ASIA         | INDONESIA       | 2            | 9            | 2            |
| ASIA         | JAPAN           | 2            | 12           | 2            |
| ASIA         | VIETNAM         | 2            | 21           | 2            |
| EUROPE       | FRANCE          | 3            | 6            | 3            |
| EUROPE       | GERMANY         | 3            | 7            | 3            |
| EUROPE       | ROMANIA         | 3            | 19           | 3            |
| EUROPE       | RUSSIA          | 3            | 22           | 3            |
| EUROPE       | UNITED KINGDOM  | 3            | 23           | 3            |
| MIDDLE EAST  | EGYPT           | 4            | 4            | 4            |
| MIDDLE EAST  | IRAN            | 4            | 10           | 4            |
| MIDDLE EAST  | IRAQ            | 4            | 11           | 4            |
| MIDDLE EAST  | JORDAN          | 4            | 13           | 4            |
| MIDDLE EAST  | SAUDI ARABIA    | 4            | 20           | 4            |
+--------------+-----------------+--------------+--------------+--------------+
25 rows selected (0.519 seconds)

I'm wondering if I could get, say, 5 documents representing the 5 regions and 
the nested structure within that representing the countries. 

Not the best usecase, I agree... but to distil it down to a simple question, 
what I'm asking is whether there is a value in having some series of simple 
steps that would reverse how that a JSON doc can be "flattened" to a CSV 
format. 

It can't be as simple as just using an un-flatten operator, but close enough. 
For e.g., I could have the data defined by defining the nesting based on the 
ORDER BY operator, so that the final writer can stream through the output and 
create the nested document accordingly. 

Just wondering the value of something like this.


-----Original Message-----
From: rahul challapalli [mailto:[email protected]] 
Sent: Monday, September 18, 2017 4:02 PM
To: dev <[email protected]>
Subject: Re: Convert CSV to nested JSON

Can you give an example? Converting CSV into nested JSON does not make sense to 
me.

On Mon, Sep 18, 2017 at 3:54 PM, Ted Dunning <[email protected]> wrote:

> What is the ultimate purpose here?
>
>
>
> On Mon, Sep 18, 2017 at 3:21 PM, Kunal Khatua <[email protected]> wrote:
>
> > I'm curious about whether there are any implementations of 
> > converting CSV to a nested JSON format  "automagically".
> >
> > Within Drill, I know that the CTAS route will basically convert each 
> > row into a JSON document with depth=1, which is pretty much an obese 
> > CSV data format.
> >
> > Is it worth having something like this, or is it too hard a problem 
> > that it's best that users explicitly define and write the documents?
> >
> > ~ Kunal
> >
> >
>

Reply via email to