I think there's a problem with your schema.

 {DataASet: (A1: int,A2: int,DataBSets: {DataBSet: (B1: chararray,B2:
chararray)})}

should probably look like

 {DataASet: (A1: int,A2: int,DataBSets: {(DataBSet: (B1: chararray,B2:
chararray))})}


On Thu, Aug 7, 2014 at 11:22 AM, Klüber, Ralf <ralf.klue...@p3-group.com>
wrote:

> Hello,
>
>
>
> I am new to this list. I tried to solve this problem for the last 48h but
> I am stuck. I hope someone here can hint me in the right direction.
>
>
>
> I have problems using the Pig JsonLoader and wondering if I do something
> wrong or I encounter another problem.
>
>
>
> The 1st half of this post is to show I know a at least something about
> what I am talking and that I did my homework. During research I found a lot
> about elephant-bird but there seems to be a conflict with cloudera. This
> way I am stuck as well. If you trust me already you can directly jump to
> the 2nd half of my post ,-).
>
>
>
> The desired solution should work both, in Cloudera and on Amazon EMR.
>
>
>
> To proof something works.
>
> --------------------------
>
>
>
> I have this data file:
>
> ```
>
> $ cat a.json
>
>
> {"DataASet":{"A1":1,"A2":4,"DataBSets":[{"B1":"1","B2":"1"},{"B1":"2","B2":"2"}]}}
>
> $ ./jq '.' a.json
>
> {
>
>   "DataASet": {
>
>     "A1": 1,
>
>     "A2": 4,
>
>     "DataBSets": [
>
>       {
>
>         "B1": "1",
>
>         "B2": "1"
>
>       },
>
>       {
>
>         "B1": "2",
>
>         "B2": "2"
>
>       }
>
>     ]
>
>   }
>
> }
>
> $
>
> ```
>
>
>
> I am using this Pig Script to load it.
>
>
>
> ``` Pig
>
> a = load 'a.json' using JsonLoader('
>
>      DataASet: (
>
>        A1:int,
>
>        A2:int,
>
>        DataBSets: {
>
>         (
>
>            B1:chararray,
>
>            B2:chararray
>
>          )
>
>        }
>
>      )
>
> ');
>
> ```
>
>
>
> In grunt everything seems ok.
>
>
>
> ```
>
> grunt> describe a;
>
> a: {DataASet: (A1: int,A2: int,DataBSets: {(B1: chararray,B2: chararray)})}
>
> grunt> dump a;
>
> ((1,4,{(1,1),(2,2)}))
>
> grunt>
>
> ```
>
>
>
> So far so good.
>
>
>
> Real Problem
>
> ------------
>
>
>
> In fact my real data (Gigabytes) looks a little bit different. The array
> is in fact an array of an object.
>
>
>
> ```
>
> $ ./jq '.' b.json
>
> {
>
>   "DataASet": {
>
>     "A1": 1,
>
>     "A2": 4,
>
>     "DataBSets": [
>
>       {
>
>         "DataBSet": {
>
>           "B1": "1",
>
>           "B2": "1"
>
>         }
>
>       },
>
>       {
>
>         "DataBSet": {
>
>           "B1": "2",
>
>           "B2": "2"
>
>         }
>
>       }
>
>     ]
>
>   }
>
> }
>
> $ cat b.json
>
>
> {"DataASet":{"A1":1,"A2":4,"DataBSets":[{"DataBSet":{"B1":"1","B2":"1"}},{"DataBSet":{"B1":"2","B2":"2"}}]}}
>
> $
>
> ```
>
>
>
> I trying to load this json with the following schema:
>
>
>
> ``` Pig
>
> b = load 'b.json' using JsonLoader('
>
>      DataASet: (
>
>        A1:int,
>
>        A2:int,
>
>        DataBSets: {
>
>         DataBSet: (
>
>            B1:chararray,
>
>            B2:chararray
>
>          )
>
>        }
>
>      )
>
> ');
>
> ```
>
>
>
> Again it looks good so far in grunt.
>
>
>
> ```
>
> grunt> describe b;
>
> b: {DataASet: (A1: int,A2: int,DataBSets: {DataBSet: (B1: chararray,B2:
> chararray)})} ```
>
>
>
> I expect someting like this when dumping b:
>
>
>
> ```
>
> ((1,4,{((1,1)),((2,2))}))
>
> ```
>
>
>
> But I get this:
>
>
>
> ```
>
> grunt> dump b;
>
> ()
>
> grunt>
>
> ```
>
>
>
> Obviously I am doing something wrong. An empty set hints in the direction
> that the schema does not match on the input line.
>
>
>
> Any hints? Thanks in advance.
>
>
>
> Kind regards.
>
> Ralf
>

Reply via email to