[ https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-1031: ---------------------------- Description: I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: ({(Infinity)}) ({(AF533765)}) The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... < DOUBLENUMBER: (["-","+"])? <FLOATINGPOINT> ( ["e","E"] ([ "-","+"])? <FLOATINGPOINT> )?> < FLOATNUMBER: <DOUBLENUMBER> (["f","F"])? > ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj was: I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: {code} ({(Infinity)}) ({(AF533765)}) {code} The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... < DOUBLENUMBER: (["-","+"])? <FLOATINGPOINT> ( ["e","E"] ([ "-","+"])? <FLOATINGPOINT> )?> < FLOATNUMBER: <DOUBLENUMBER> (["f","F"])? > ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj > PigStorage interpreting chararray/bytearray for a tuple element inside a bag > as float or double > ----------------------------------------------------------------------------------------------- > > Key: PIG-1031 > URL: https://issues.apache.org/jira/browse/PIG-1031 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.5.0 > Reporter: Viraj Bhat > Fix For: 0.5.0, 0.6.0 > > > I have a data stored in a text file as: > {(4153E765)} > {(AF533765)} > I try reading it using PigStorage as: > {code} > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term:bytearray)}); > dump A; > {code} > I get the following results: > ({(Infinity)}) > ({(AF533765)}) > The problem seems to be with the method: parseFromBytes(byte[] b) in class > Utf8StorageConverter. This method uses the TextDataParser (class generated > via jjt) to interpret the type of data from content, even though the schema > tells it is a bytearray. > TextDataParser.jjt sample code > {code} > TOKEN : > { > ... > < DOUBLENUMBER: (["-","+"])? <FLOATINGPOINT> ( ["e","E"] ([ "-","+"])? > <FLOATINGPOINT> )?> > < FLOATNUMBER: <DOUBLENUMBER> (["f","F"])? > > ... > } > {code} > I tried the following options, but it will not work as we need to call > bytesToBag(byte[] b) in the Utf8StorageConverter class. > {code} > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term)}); > A = load 'pigstoragebroken.dat' using PigStorage() as > (intersectionBag:bag{T:tuple(term:chararray)}); > {code} > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.