[jira] Commented: (HIVE-1505) Support non-UTF8 data

2010-08-24 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901753#action_12901753
 ] 

Ted Xu commented on HIVE-1505:
--

Thanks Edward.

I dug into the problem and found the patch will not working when the query have 
subqueries, it is very hard to retain encoding information in those queries.

Table properties may miss in queries, the problem is the same as missing field 
delimiter setting, because whenever hive can't get table properties in subquery 
(e.g., join operation), the default value is used (^A for field delimiter, 
that's why the deserializer will fail most of the time when data contains ^A 
character even if ^A is not set for field delimiter).

 

> Support non-UTF8 data
> -
>
> Key: HIVE-1505
> URL: https://issues.apache.org/jira/browse/HIVE-1505
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: bc Wong
>Assignee: Ted Xu
> Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1505) Support non-UTF8 data

2010-08-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900697#action_12900697
 ] 

Edward Capriolo commented on HIVE-1505:
---

 Maybe you should fork hive and call it chive. 

On a serious node . Great job. Would you consider editing the cli.xml in the 
xdocs to explain this feature? I think it would be very helpful look in 
docs/xdocs/.

> Support non-UTF8 data
> -
>
> Key: HIVE-1505
> URL: https://issues.apache.org/jira/browse/HIVE-1505
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: bc Wong
>Assignee: Ted Xu
> Attachments: trunk-encoding.patch
>
>
> I'd like to work with non-UTF8 data easily.
> Suppose I have data in latin1. Currently, doing a "select *" will return the 
> upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
> '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
> encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.