[ https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292590#comment-16292590 ]
Attila Szabo commented on SQOOP-3267: ------------------------------------- Hey [~dvoros], [~vasas], IMHO I would keep the history by default, and if the (b/c of the existing cmd line arguments, and b/c as an end user I would really get my data deleted without explicitly requesting that). Aggregating your findings and my thoughts my recommendations are the following: By default (no other options present) I would insert null value, and keep the history. If the mode aims for the last modified entry only, I would delete the history, and only keep the last meaningful value (and of course in case of null value delete the column as you've suggested). I would definitely go with this direction, b/c we're speaking about incremental mode, and according to the existing doucmentation 'mode' is related to incremental mode (and we did not made any differentiation for incremental mode with append only tables and incremental mode for HBase where we can do "real" modificaitons). Though if you dislike using and leveraging from the mode cmd line argument, I'm not against to introduce new cmd line arguments on this front, for making it straightfwd, when we do deletes, when we insert null values, when we keep history and when we do not. Although in this case I would also highly recommend to introduce some fail fast scenario (form 1.5 version) which would give a meaningful error message in case of mode+HBase table+incremental import. My 2cents, Attila ps.: [~vasas] your test cases are very well defined, and very detailed! Nice job!!! > Incremental import to HBase deletes only last version of column > --------------------------------------------------------------- > > Key: SQOOP-3267 > URL: https://issues.apache.org/jira/browse/SQOOP-3267 > Project: Sqoop > Issue Type: Bug > Components: hbase-integration > Affects Versions: 1.4.7 > Reporter: Daniel Voros > Assignee: Daniel Voros > Attachments: SQOOP-3267.1.patch > > > Deletes are supported since SQOOP-3149, but we're only deleting the last > version of a column when the corresponding cell was set to NULL in the source > table. > This can lead to unexpected and misleading results if the row has been > transferred multiple times, which can easily happen if it's being modified on > the source side. > Also SQOOP-3149 is using a new Put command for every column instead of a > single Put per row as before. This could probably lead to a performance drop > for wide tables (for which HBase is otherwise usually recommended). > [~jilani], [~anna.szonyi] could you please comment on what you think would be > the expected behavior here? -- This message was sent by Atlassian JIRA (v6.4.14#64029)