[ 
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510645#comment-13510645
 ] 

Joseph Adler commented on PIG-2684:
-----------------------------------

I'm addressing this right now in PIG-3015. This isn't a bug; it's just a 
mismatch between the set of names that Avro allows and the names that Pig 
allows. (As a side note, there are good reasons why only some variable names 
are allowed in Avro: limiting the characters in names allows Avro to generate 
code to process Avro objects in a number of different languages. Colons in 
variable names would make it difficult to do this.)

First, there are two workaround for this problem right now:

- The user can rename variables before storing the bag
- The user can manually specify the output schema 

Second, I don't like the idea of using namespaces for this. Namespaces are 
important for specific record types in Avro; they are translated by the 
protocol and schema compiles into package names for java classes.

To make AvroStorage easier to user, I think it would make sense to add an 
option to AvroStorage to translate names with colons in some reasonable way: 
maybe translating the double colons to double underscores.
                
> :: in field name causes AvroStorage to fail
> -------------------------------------------
>
>                 Key: PIG-2684
>                 URL: https://issues.apache.org/jira/browse/PIG-2684
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>            Reporter: Fabian Alenius
>
> There appears to be a bug in AvroStorage which causes it to fail when there 
> are field names that contain ::
> For example, the following will fail:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group);                                 
>                                                                               
>                                                                     
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> ERROR 2999: Unexpected internal error. Illegal character in: group::one
> While the following will succeed:
> data = load 'test.txt' as (one, two);
> grp = GROUP data by (one, two);
> result = foreach grp generate FLATTEN(group) as (one,two);                    
>                                                          
> store result into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> Here is a minimal test case:
> data = load 'test.txt' as (one::two, three);                                  
>                                                                               
>     
> store data into 'test.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to