[ 
https://issues.apache.org/jira/browse/PIG-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555512#comment-13555512
 ] 

Josh Levy commented on PIG-3121:
--------------------------------

I think (but would love to be convinced otherwise) that casting would be 
unpleasant and fragile.   The original data has a very complex schema.  The Pig 
DESCRIBE command creates almost 10kb of output. I'm not excited about 
generating cast statements from the description, and I'd hate to have to redo 
the casts if the schema gets tweaked.  

For a bit more background on that, the data comes from Protobufs files.  I use 
ElephantBird to load the data into Pig, and ElephantBird automatically creates 
the Pig schema from Protobufs.

I do have other options besides changing JsonStorage.  

* Changing the values in Protobufs is politically difficult for me, but it is 
probably the most elegant / least hacky solution
* I could modify ElephantBird to optionally do the cast at load time
* I could write a UDF to walk through an arbitrary schema and do all of the 
casts
* I could modify JsonStorage as proposed
* I could continue what I'm currently doing and postprocess the output of 
JsonStorage

The real problem is in the other tool and not in JsonStorage.  Patching 
JsonStorage is attractive because it is so easy to take advantage of when 
writing new Pig scripts, and hopefully it can give others a quick path out of 
this problem 

                
> Optionally convert long to chararray in JsonStorage
> ---------------------------------------------------
>
>                 Key: PIG-3121
>                 URL: https://issues.apache.org/jira/browse/PIG-3121
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.10.0
>            Reporter: Josh Levy
>
> I work with a data set that uses random longs (64 bit integers) as 
> identifiers.  Recently I've been accessing the data from Pig and using 
> JsonStorage to save records, that I then run through another script to get 
> JSON that I can feed into other tools.  One of the tools I use is broken in 
> the sense that it treats all numbers as 64 bit floating point, and it can't 
> faithfully reproduce most of the identifiers I pass it.  My work around is to 
> convert the identifiers to strings before they get to that tool.  
> If I provide a patch, is there interest in adding an option to JsonStorage 
> that tells it to serialize all longs as if they are strings? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to