[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guido Serra aka Zeph updated HBASE-7645: ---------------------------------------- Description: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0> Date.new(1358853505756).toString() => "Tue Jan 22 11:18:25 UTC 2013" yesterday's sqoop run hbase(main):032:0> Date.new(1358790515451).toString() => "Mon Jan 21 17:48:35 UTC 2013" {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { List<KeyValue> list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} was: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => "mysql:created_at", VERSIONS => 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0> Date.new(1358853505756).toString() => "Tue Jan 22 11:18:25 UTC 2013" yesterday's sqoop run hbase(main):032:0> Date.new(1358790515451).toString() => "Mon Jan 21 17:48:35 UTC 2013" {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { List<KeyValue> list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} > put without timestamp duplicates the record/row > ----------------------------------------------- > > Key: HBASE-7645 > URL: https://issues.apache.org/jira/browse/HBASE-7645 > Project: HBase > Issue Type: Brainstorming > Components: Client > Reporter: Guido Serra aka Zeph > > if I call a couple of times SQOOP on the same dataset, outputting to HBase, > I will end up with duplicated data... > {code} > hbase(main):030:0> get "dump_HKFAS.sales_order", "1", {COLUMN => > "mysql:created_at", VERSIONS => 4} > COLUMN CELL > > mysql:created_at timestamp=1358853505756, value=2011-12-21 > 18:07:38.0 > mysql:created_at timestamp=1358790515451, value=2011-12-21 > 18:07:38.0 > 2 row(s) in 0.0040 seconds > today's sqoop run > hbase(main):031:0> Date.new(1358853505756).toString() > => "Tue Jan 22 11:18:25 UTC 2013" > yesterday's sqoop run > hbase(main):032:0> Date.new(1358790515451).toString() > => "Mon Jan 21 17:48:35 UTC 2013" > {code} > the fact that the Put.add() method writes the kv without checking if, apart > of the timestamp, the value has not changed, is it by design? or a bug? > I mean, what's the idea behind? Shall it be SQOOP (the client application) > supposed to handle the read on the value before issuing an add() statement > call? > from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java > {code} > public Put add(byte [] family, byte [] qualifier, byte [] value) { > return add(family, qualifier, this.ts, value); > } > public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { > List<KeyValue> list = getKeyValueList(family); > KeyValue kv = createPutKeyValue(family, qualifier, ts, value); > list.add(kv); > familyMap.put(kv.getFamily(), list); > return this; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira