[
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335982#comment-16335982
]
Daniel Voros commented on SQOOP-3267:
-------------------------------------
[~vasas], thanks for your reply!
The current implementation does in fact issue separate Put commands, for every
column, but I've already addressed this issue in [^SQOOP-3267.1.patch] (see
first comment on this issue). There is still a slight overhead in adding all
columns to the Put command but it's way better then 1 Put/column.
Despite this I still agree with you on providing the option to configure the
behavior (ignore/delete/null-string). However, knowing it's only a small
performance overhead, we could as well make null-string the default.
All right, let's introduce --hbase-null-string and do not overload
--null-string.
If you agree with making it the default, I think I'd implement the null-string
mode in this ticket and open a new Jira for the configurability of
ignore/delete/null-string and --hbase-null-string.
> Incremental import to HBase deletes only last version of column
> ---------------------------------------------------------------
>
> Key: SQOOP-3267
> URL: https://issues.apache.org/jira/browse/SQOOP-3267
> Project: Sqoop
> Issue Type: Bug
> Components: hbase-integration
> Affects Versions: 1.4.7
> Reporter: Daniel Voros
> Assignee: Daniel Voros
> Priority: Major
> Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last
> version of a column when the corresponding cell was set to NULL in the source
> table.
> This can lead to unexpected and misleading results if the row has been
> transferred multiple times, which can easily happen if it's being modified on
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a
> single Put per row as before. This could probably lead to a performance drop
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be
> the expected behavior here?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)