[ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860452#action_12860452 ]
Viraj Bhat commented on PIG-798: -------------------------------- Hi Ashutosh, The problem here is not about using the data interchangeably between BinStorage() and PigStorage(), it is about the consistency issues in schema. Sorry if the description was unclear. I can see that it is possible to write statements such as this using BinStorage() {code} A = load 'somedata' using BinStorage(); B = foreach A generate $0 as name:chararray; dump B; {code} and not write it using PigStorage(). Should we not support the following statement, as a user I am interested in projecting the first column and casting it to a chararray. I am not interested in knowing what the schemas are of other columns!! Fails when I do the following: {code} A = load 'somedata' using PigStorage(); B = foreach A generate $0 as name:chararray; dump B; {code} Can you tell me why the schema specification in FOREACH GENERATE works with BinStorage and not in PigStorage? Viraj > Schema errors when using PigStorage and none when using BinStorage in > FOREACH?? > ------------------------------------------------------------------------------- > > Key: PIG-798 > URL: https://issues.apache.org/jira/browse/PIG-798 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.2.0 > Reporter: Viraj Bhat > Attachments: binstoragecreateop, schemaerr.pig, visits.txt > > > In the following script I have a tab separated text file, which I load using > PigStorage() and store using BinStorage() > {code} > A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, > url:chararray, time:chararray); > B = group A by name; > store B into '/user/viraj/binstoragecreateop' using BinStorage(); > dump B; > {code} > I later load file 'binstoragecreateop' in the following way. > {code} > A = load '/user/viraj/binstoragecreateop' using BinStorage(); > B = foreach A generate $0 as name:chararray; > dump B; > {code} > Result > ======================================================================= > (Amy) > (Fred) > ======================================================================= > The above code work properly and returns the right results. If I use > PigStorage() to achieve the same, I get the following error. > {code} > A = load '/user/viraj/visits.txt' using PigStorage(); > B = foreach A generate $0 as name:chararray; > dump B; > {code} > ======================================================================= > {code} > 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other > Field Schema: name: chararray > Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log > {code} > ======================================================================= > So why should the semantics of BinStorage() be different from PigStorage() > where is ok not to specify a schema??? Should it not be consistent across > both. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.