[ https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1187: -------------------------------- Fix Version/s: (was: 0.7.0) Since a simple change did not help - unlinking from 0.7.0 release. This will be addressed as part of parser change > UTF-8 (international code) breaks with loader when load with schema is > specified > -------------------------------------------------------------------------------- > > Key: PIG-1187 > URL: https://issues.apache.org/jira/browse/PIG-1187 > Project: Pig > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Assignee: Ashutosh Chauhan > > I have a set of Pig statements which dump an international dataset. > {code} > INPUT_OBJECT = load 'internationalcode'; > describe INPUT_OBJECT; > dump INPUT_OBJECT; > {code} > Sample output > (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)}) > It works and dumps results but when I use a schema for loading it fails. > {code} > INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag > {T: tuple(label:chararray)}); > describe INPUT_OBJECT; > {code} > The error message is as follows:2010-01-14 02:23:27,320 FATAL > org.apache.hadoop.mapred.Child: Error running child : > org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop > caused by repeated empty string matches at line 1, column 21. > at > org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620) > at > org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569) > at > org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651) > at > org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152) > at > org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100) > at > org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382) > at > org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42) > at > org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68) > at > org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.