[ https://issues.apache.org/jira/browse/PIG-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555512#comment-13555512 ]
Josh Levy commented on PIG-3121: -------------------------------- I think (but would love to be convinced otherwise) that casting would be unpleasant and fragile. The original data has a very complex schema. The Pig DESCRIBE command creates almost 10kb of output. I'm not excited about generating cast statements from the description, and I'd hate to have to redo the casts if the schema gets tweaked. For a bit more background on that, the data comes from Protobufs files. I use ElephantBird to load the data into Pig, and ElephantBird automatically creates the Pig schema from Protobufs. I do have other options besides changing JsonStorage. * Changing the values in Protobufs is politically difficult for me, but it is probably the most elegant / least hacky solution * I could modify ElephantBird to optionally do the cast at load time * I could write a UDF to walk through an arbitrary schema and do all of the casts * I could modify JsonStorage as proposed * I could continue what I'm currently doing and postprocess the output of JsonStorage The real problem is in the other tool and not in JsonStorage. Patching JsonStorage is attractive because it is so easy to take advantage of when writing new Pig scripts, and hopefully it can give others a quick path out of this problem > Optionally convert long to chararray in JsonStorage > --------------------------------------------------- > > Key: PIG-3121 > URL: https://issues.apache.org/jira/browse/PIG-3121 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.10.0 > Reporter: Josh Levy > > I work with a data set that uses random longs (64 bit integers) as > identifiers. Recently I've been accessing the data from Pig and using > JsonStorage to save records, that I then run through another script to get > JSON that I can feed into other tools. One of the tools I use is broken in > the sense that it treats all numbers as 64 bit floating point, and it can't > faithfully reproduce most of the identifiers I pass it. My work around is to > convert the identifiers to strings before they get to that tool. > If I provide a patch, is there interest in adding an option to JsonStorage > that tells it to serialize all longs as if they are strings? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira