[ 
https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800315#action_12800315
 ] 

Viraj Bhat commented on PIG-1187:
---------------------------------

Hi Jeff,
 This is specific to the data we are using and it looks like parser failed when 
it is trying to interpret some characters. As such we have tested this with 
Chinese characters and it works.
Viraj

> UTF-8 (international code) breaks with loader when load with schema is 
> specified
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1187
>                 URL: https://issues.apache.org/jira/browse/PIG-1187
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.6.0
>
>
> I have a set of Pig statements which dump an international dataset.
> {code}
> INPUT_OBJECT = load 'internationalcode';
> describe INPUT_OBJECT;
> dump INPUT_OBJECT;
> {code}
> Sample output
> (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)})
> It works and dumps results but when I use a schema for loading it fails.
> {code}
> INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag 
> {T: tuple(label:chararray)});
> describe INPUT_OBJECT;
> {code}
> The error message is as follows:2010-01-14 02:23:27,320 FATAL 
> org.apache.hadoop.mapred.Child: Error running child : 
> org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop 
> caused by repeated empty string matches at line 1, column 21.
>       at 
> org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620)
>       at 
> org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569)
>       at 
> org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651)
>       at 
> org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152)
>       at 
> org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100)
>       at 
> org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382)
>       at 
> org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
>       at 
> org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68)
>       at 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>       at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to