Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases ------------------------------------------------------------------------------------------------
Key: PIG-1368 URL: https://issues.apache.org/jira/browse/PIG-1368 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Consider the following data: 1\t ( hello , bye ) \n 1\t( hello , bye )a\n 2 \t (good , bye)\n The following script gives the results below: a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a; (1,( hello , bye )) (1,( hello , bye )) (2,(good , bye)) The current bytesToTuple implementation discards leading and trailing characters before the tuple delimiters and parses the tuple out - I think instead it should treat any leading and trailing characters (including space) near the delimiters as an indication of a malformed tuple and return null. Also in the code, consumeBag() should handle the special case of {} and not delegate the handling to consumeTuple(). In consumeBag() null tuples should not be skipped. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.