[ 
https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292590#comment-16292590
 ] 

Attila Szabo commented on SQOOP-3267:
-------------------------------------

Hey [~dvoros], [~vasas],

IMHO I would keep the history by default, and if the (b/c of the existing cmd 
line arguments, and b/c as an end user I would really get my data deleted 
without explicitly requesting that).

Aggregating your findings and my thoughts my recommendations are the following:
By default (no other options present) I would insert null value, and keep the 
history.
If the mode aims for the last modified entry only, I would delete the history, 
and only keep the last meaningful value (and of course in case of null value 
delete the column as you've suggested). I would definitely go with this 
direction, b/c we're speaking about incremental mode, and according to the 
existing doucmentation 'mode' is related to incremental mode (and we did not 
made any differentiation for incremental mode with append only tables and 
incremental mode for HBase where we can do "real" modificaitons).

Though if you dislike using and leveraging from the mode cmd line argument, I'm 
not against to introduce new cmd line arguments on this front, for making it 
straightfwd, when we do deletes, when we insert null values, when we keep 
history and when we do not. Although in this case I would also highly recommend 
to introduce some fail fast scenario (form 1.5 version) which would give a 
meaningful error message in case of mode+HBase table+incremental import.

My 2cents,
Attila

ps.: [~vasas] your test cases are very well defined, and very detailed! Nice 
job!!!

> Incremental import to HBase deletes only last version of column
> ---------------------------------------------------------------
>
>                 Key: SQOOP-3267
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3267
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hbase-integration
>    Affects Versions: 1.4.7
>            Reporter: Daniel Voros
>            Assignee: Daniel Voros
>         Attachments: SQOOP-3267.1.patch
>
>
> Deletes are supported since SQOOP-3149, but we're only deleting the last 
> version of a column when the corresponding cell was set to NULL in the source 
> table.
> This can lead to unexpected and misleading results if the row has been 
> transferred multiple times, which can easily happen if it's being modified on 
> the source side.
> Also SQOOP-3149 is using a new Put command for every column instead of a 
> single Put per row as before. This could probably lead to a performance drop 
> for wide tables (for which HBase is otherwise usually recommended).
> [~jilani], [~anna.szonyi] could you please comment on what you think would be 
> the expected behavior here?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to