[ 
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860452#action_12860452
 ] 

Viraj Bhat commented on PIG-798:
--------------------------------

Hi Ashutosh,
 The problem here is not about using the data interchangeably between 
BinStorage() and PigStorage(), it is about the consistency issues in schema. 
Sorry if the description was unclear.

I can see that it is possible to write statements such as this using 
BinStorage() 

{code}
A = load 'somedata' using BinStorage();
B = foreach A generate $0 as name:chararray;
dump B;
{code}

and not write it using PigStorage().

Should we not support the following statement, as a user I am interested in 
projecting the first column and casting it to a chararray. I am not interested 
in knowing what the schemas are of other columns!!

Fails when I do the following:
{code}
A = load 'somedata' using PigStorage();
B = foreach A generate $0 as name:chararray;
dump B;
{code}

Can you tell me why the schema specification in FOREACH GENERATE works with 
BinStorage and not in PigStorage? 

Viraj

> Schema errors when using PigStorage and none when using BinStorage in 
> FOREACH??
> -------------------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>         Attachments: binstoragecreateop, schemaerr.pig, visits.txt
>
>
> In the following script I have a tab separated text file, which I load using 
> PigStorage() and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, 
> url:chararray, time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use 
> PigStorage() to achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other 
> Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() 
> where is ok not to specify a schema??? Should it not be consistent across 
> both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to