[
https://issues.apache.org/jira/browse/PIG-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Bhat updated PIG-504:
---------------------------
Description:
For the snippet of code which runs on the latest types branch
{code}
A = load 'utf8.txt' using PigStorage() as (t1: chararray);
illustrate A;
{code}
results in this output being produced
-------------------------------
| A | t1: bytearray cn: 1 |
-------------------------------
| | gabriella?? |
-------------------------------
Three observations:
1) text should be chararray, not bytearray.
2) cn: 1 should be removed from the display
3) Value for text is "username??" is not displayed properly
Now replacing illustrate with dump
{code}
A = load 'utf8.txt' using PigStorage() as (t1: chararray);
dump A;
{code}
(david?)
(rachel?)
(jessica?)
(sarah?)
(katie?)
(wendy?)
(david?)
(priscilla?)
(oscar?)
(xavier?)
..some more. The utf8 characters after username are not displayed.
was:
For the snippet of code which runs on the latest type branch
{code}
A = load 'utf8.txt' using PigStorage() as (text: chararray);
illustrate A;
{code}
results in this output being produced
---------------------------------
| A | text: bytearray cn: 1 |
---------------------------------
| | ???????????????? |
---------------------------------
Three observations:
1) text should be chararray, not bytearray.
2) cn: 1 should be removed from the display
3) Value for text is "???????????????" is not displayed properly
Now replacing illustrate with dump
{code}
A = load 'utf8.txt' using PigStorage() as (text: chararray);
dump A;
{code}
produces (??????)
> Illustrate and Dump do not seem to work correctly for files containing utf8
> ---------------------------------------------------------------------------
>
> Key: PIG-504
> URL: https://issues.apache.org/jira/browse/PIG-504
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: Hadoop 18
> Reporter: Viraj Bhat
>
> For the snippet of code which runs on the latest types branch
> {code}
> A = load 'utf8.txt' using PigStorage() as (t1: chararray);
> illustrate A;
> {code}
> results in this output being produced
> -------------------------------
> | A | t1: bytearray cn: 1 |
> -------------------------------
> | | gabriella?? |
> -------------------------------
> Three observations:
> 1) text should be chararray, not bytearray.
> 2) cn: 1 should be removed from the display
> 3) Value for text is "username??" is not displayed properly
> Now replacing illustrate with dump
> {code}
> A = load 'utf8.txt' using PigStorage() as (t1: chararray);
> dump A;
> {code}
> (david?)
> (rachel?)
> (jessica?)
> (sarah?)
> (katie?)
> (wendy?)
> (david?)
> (priscilla?)
> (oscar?)
> (xavier?)
> ..some more. The utf8 characters after username are not displayed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.