[
https://issues.apache.org/jira/browse/SQOOP-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yulei Yang updated SQOOP-3263:
------------------------------
Attachment: screenshot-1.png
> Duplicate rows found when split-by column is of textual type due to different
> charset difference of sqoop and hadoop
> --------------------------------------------------------------------------------------------------------------------
>
> Key: SQOOP-3263
> URL: https://issues.apache.org/jira/browse/SQOOP-3263
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.6
> Reporter: Yulei Yang
> Attachments: screenshot-1.png
>
>
> This is issue can be found in any kind of RMDBS, because the root cause is
> not on RMDBS. Steps to reproduce this issue:
> 1. create a mysql table: create table ora_test (id varchar(32) primary key
> not null);
> 2. insert *4* rows:
> insert into ora_test values ('08125FC4C8FDA064E053C0A8028DA064');
> insert into ora_test values ('4FFE68419D3502E2E0537F000001F3E8');
> insert into ora_test values ('4FFF9CF5861E003EE0537F0000017FF7');
> insert into ora_test values ('56DAC2D0F14901B0E0537F000001D3FA');
> 3. import it to hive with sqoop import -m 32. (m=189 is also ok)。 Then you
> will get *6* rows in hive.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)