[
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065961#comment-13065961
]
Dmitriy V. Ryaboy commented on PIG-1914:
----------------------------------------
Very cool.
Some quick code review notes:
Tiny typo here:
"e = foreach d generate flatten(men#'value') as val;" -- that should read
menu#'value'
{code}
boolean notDone = in.nextKeyValue();
if (!notDone) {
return null;
}
{code}
Better: {code}
if (!in.nextKeyValue()) {
return null;
}
{code}
Parse exceptions: it's better to increment a counter and move on than to break
on a bad input string. Throwing an exception kills the whole job. So maybe
something like
{code}
t = null;
while (t == null && in.nextKeyValue()) {
...
}
return t;
{code}
In flatten_array, if the value is an array, you allocate a new bag, populate it
recursively, and add the contents of the new bag to the old bag. Why not skip
the object allocation and copy, and simply pass the original bag into the
recursive call?
Also: are null values for keys just plain unsupported? You skip them.
setLocation: not that it really matters, but for consistency, you should use
PigTextInputFormat instead of PigFileInputFormat here.
schema: probably makes sense to implement getSchema?
> Support load/store JSON data in Pig
> -----------------------------------
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.8.0, 0.9.0
> Reporter: Chao Tian
> Attachments: PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing
> structured data, especially for JavaScript data exchange.
> Pig should have the ability to load/store JSON format data. I plan to write
> one for the piggy bank.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira