Illustrate and Dump do not seem to work correctly for files containing utf8
---------------------------------------------------------------------------
Key: PIG-504
URL: https://issues.apache.org/jira/browse/PIG-504
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Environment: Hadoop 18
Reporter: Viraj Bhat
For the snippet of code which runs on the latest type branch
{code}
A = load 'utf8.txt' using PigStorage() as (text: chararray);
illustrate A;
{code}
results in this output being produced
---------------------------------
| A | text: bytearray cn: 1 |
---------------------------------
| | ???????????????? |
---------------------------------
Three observations:
1) text should be chararray, not bytearray.
2) cn: 1 should be removed from the display
3) Value for text is "???????????????" is not displayed properly
Now replacing illustrate with dump
{code}
A = load 'utf8.txt' using PigStorage() as (text: chararray);
dump A;
{code}
produces (??????)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.