[jira] [Created] (HBASE-7924) thrift interface is inconsistently implemented on timestamp/range filtering
Guido Serra aka Zeph created HBASE-7924: --- Summary: thrift interface is inconsistently implemented on timestamp/range filtering Key: HBASE-7924 URL: https://issues.apache.org/jira/browse/HBASE-7924 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.94.5, 0.92.0 Reporter: Guido Serra aka Zeph a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour see: HBASE-5694, HBASE-7907 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7924) thrift interface is inconsistently implemented on timestamp/range scan
[ https://issues.apache.org/jira/browse/HBASE-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7924: Summary: thrift interface is inconsistently implemented on timestamp/range scan (was: thrift interface is inconsistently implemented on timestamp/range filtering) thrift interface is inconsistently implemented on timestamp/range scan -- Key: HBASE-7924 URL: https://issues.apache.org/jira/browse/HBASE-7924 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.0, 0.94.5 Reporter: Guido Serra aka Zeph a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour see: HBASE-5694, HBASE-7907 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7924) thrift interface is inconsistently implemented on timestamp/range scan
[ https://issues.apache.org/jira/browse/HBASE-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7924: Description: a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed - see: HBASE-7907 instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} see: HBASE-5694 was: a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} see: HBASE-5694, HBASE-7907 thrift interface is inconsistently implemented on timestamp/range scan -- Key: HBASE-7924 URL: https://issues.apache.org/jira/browse/HBASE-7924 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.0, 0.94.5 Reporter: Guido Serra aka Zeph a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed - see: HBASE-7907 instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} see: HBASE-5694 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7924) thrift interface is inconsistently implemented on timestamp/range scan
[ https://issues.apache.org/jira/browse/HBASE-7924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7924: Description: a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} see: HBASE-5694, HBASE-7907 was: a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour see: HBASE-5694, HBASE-7907 thrift interface is inconsistently implemented on timestamp/range scan -- Key: HBASE-7924 URL: https://issues.apache.org/jira/browse/HBASE-7924 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.0, 0.94.5 Reporter: Guido Serra aka Zeph a getRowsWithColumnsTs or a Scan object are being exposed (as by documentation and .thrift description file) only as *exact* timestamp matcher, no timerange functionality is (supposedly) being exposed instead, the Scan object is behaving as by documentation but the getRowsWithColumnsTs() beneath has a timerange behaviour {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} see: HBASE-5694, HBASE-7907 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585768#comment-13585768 ] Guido Serra aka Zeph commented on HBASE-5694: - k, [~ted_yu] I opened a bug report HBASE-7924 ... fix will follow getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch, setTimestamp.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5154) Can't put small timestamp after delete the column
[ https://issues.apache.org/jira/browse/HBASE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584254#comment-13584254 ] Guido Serra aka Zeph commented on HBASE-5154: - seems that someone is trying to fix it HBASE-5241 Can't put small timestamp after delete the column - Key: HBASE-5154 URL: https://issues.apache.org/jira/browse/HBASE-5154 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Environment: OS: Linux 2.6.32-33-server #70-Ubuntu SMP JRE: Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) Hadoop: Version: 0.20-append-r1056497, r1056491 Hbase run on 4 HRegion + 1 HMaster cluster. Reporter: robi Priority: Critical 1. Call put to insert some value in column 'fm:a' like: Put.add('fm', 'a', 1000, 'abc'), here timestamp = 1000. 2. Delete the column 'fm:a' 3. Try to do #1 again.(it doesn't work, but can insert put which use timestamp 1000) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4155) the problem in hbase thrift client when scan/get rows by timestamp
[ https://issues.apache.org/jira/browse/HBASE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584331#comment-13584331 ] Guido Serra aka Zeph commented on HBASE-4155: - similar issue... {code} hbase(main):007:0 scan AAA_customer, {TIMERANGE = [0, 1360032970]} ROW COLUMN+CELL 0 row(s) in 1.5590 seconds hbase(main):008:0 scan AAA_customer ROW COLUMN+CELL 1 column=mysql:birthday, timestamp=1360292144, value=1999-01-01 {code} the problem in hbase thrift client when scan/get rows by timestamp -- Key: HBASE-4155 URL: https://issues.apache.org/jira/browse/HBASE-4155 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.90.0 Reporter: zezhou Attachments: 4155.txt, patch.txt, patch.txt.svn Original Estimate: 1m Remaining Estimate: 1m I want to scan rows by specified timestamp. I use following hbase shell command : scan 'testcrawl',{TIMESTAMP=1312268202071} ROW COLUMN+CELL put1.com column=crawl:data, timestamp=1312268202071, value=htmlput1/html put1.com column=crawl:type, timestamp=1312268202071, value=html put1.com column=links:outlinks, timestamp=1312268202071, value=www.163.com;www.sina.com As I expected, I can get the rows which timestamp is 1312268202071. But when I use thift client to do the same thing ,the return data is the rows which time before specified timestamp , not the same as hbase shell.following is timestamp of return data: 131217917 1312268202059 I look up the source in hbase/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java, it use following code to set time parameter . scan.setTimeRange(Long.MIN_VALUE, timestamp); This cause thrift client return rows before specified row ,not the rows timestamp specified. But in hbase client and avro client ,it use following code to set time parameter. scan.setTimeStamp(timestamp); this will return rows timestamp specified. Is this a feature or a bug in thrift client ? if this is a feature, which method in thrift client can get the rows by specified timestamp? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4155) the problem in hbase thrift client when scan/get rows by timestamp
[ https://issues.apache.org/jira/browse/HBASE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584334#comment-13584334 ] Guido Serra aka Zeph commented on HBASE-4155: - while, this other works {code} hbase(main):009:0 scan AAA_customer, {TIMESTAMP = 1360032970} ROW COLUMN+CELL 0 row(s) in 1.3960 seconds hbase(main):010:0 scan AAA_customer, {TIMESTAMP = 1360292144} ROW COLUMN+CELL 1 column=mysql:birthday, timestamp=1360292144, 1999-01-01 {code} the problem in hbase thrift client when scan/get rows by timestamp -- Key: HBASE-4155 URL: https://issues.apache.org/jira/browse/HBASE-4155 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.90.0 Reporter: zezhou Attachments: 4155.txt, patch.txt, patch.txt.svn Original Estimate: 1m Remaining Estimate: 1m I want to scan rows by specified timestamp. I use following hbase shell command : scan 'testcrawl',{TIMESTAMP=1312268202071} ROW COLUMN+CELL put1.com column=crawl:data, timestamp=1312268202071, value=htmlput1/html put1.com column=crawl:type, timestamp=1312268202071, value=html put1.com column=links:outlinks, timestamp=1312268202071, value=www.163.com;www.sina.com As I expected, I can get the rows which timestamp is 1312268202071. But when I use thift client to do the same thing ,the return data is the rows which time before specified timestamp , not the same as hbase shell.following is timestamp of return data: 131217917 1312268202059 I look up the source in hbase/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java, it use following code to set time parameter . scan.setTimeRange(Long.MIN_VALUE, timestamp); This cause thrift client return rows before specified row ,not the rows timestamp specified. But in hbase client and avro client ,it use following code to set time parameter. scan.setTimeStamp(timestamp); this will return rows timestamp specified. Is this a feature or a bug in thrift client ? if this is a feature, which method in thrift client can get the rows by specified timestamp? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7907) time range Scan to be made available via Thrift
Guido Serra aka Zeph created HBASE-7907: --- Summary: time range Scan to be made available via Thrift Key: HBASE-7907 URL: https://issues.apache.org/jira/browse/HBASE-7907 Project: HBase Issue Type: New Feature Reporter: Guido Serra aka Zeph this is the mapping of the Scan Object in Thrift as of today at - http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift?view=markup {code} 132 /** 133 * A Scan object is used to specify scanner parameters when opening a scanner. 134 */ 135 struct TScan { 136 1:optional Text startRow, 137 2:optional Text stopRow, 138 3:optional i64 timestamp, 139 4:optional listText columns, 140 5:optional i32 caching, 141 6:optional Text filterString 142 } {code} this is the Scan Object - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html which has: bq. To only retrieve columns within a specific range of version timestamps, execute setTimeRange. and bq. To only retrieve columns with a specific timestamp, execute setTimestamp. the second functionality/method is reachable, the first one setTimeRange() is not (or at least at me) via Thrift -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584370#comment-13584370 ] Guido Serra aka Zeph commented on HBASE-5694: - confirmed on Version 0.92.1-cdh4.1.2, even worse, without specifying the columns, given a timestamp it behaves like a range filter from 0 (epoch) to timestamp -1 (basically an .. until, excluded) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584412#comment-13584412 ] Guido Serra aka Zeph commented on HBASE-5694: - [~uws] if this is actually the patch (taken from above) {code} --- ThriftServer.java.orig 2012-04-01 23:41:16.881172406 +0200 +++ ThriftServer.java 2012-04-01 23:41:30.177238337 +0200 @@ -477,8 +477,8 @@ get.addColumn(famAndQf[0], famAndQf[1]); } } -get.setTimeRange(Long.MIN_VALUE, timestamp); } + get.setTimeRange(Long.MIN_VALUE, timestamp); gets.add(get); } Result[] result = table.get(gets); {code} it is the wrong behavior that I'm getting, as it is inconsistent with the scannerOpenWithScan we shall not use the setTimeRange but the setTimestamp... as the signature in Thrift states: {code} 471 * Get the specified columns for the specified table and rows at the specified 472 * timestamp. Returns an empty list if no rows exist. {code} and not a range scan from Long.MIN_VALUE to timestamp as implemented above getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-5694: Attachment: setTimestamp.patch up to me the correct patch shall be setTimestamp.patch that I computed against origin/0.92.0rc4 from the github repository {code} index 231a564..4c46a4f 100644 --- a/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java +++ b/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java @@ -413,7 +413,7 @@ public class ThriftServer { HTable table = getTable(tableName); if (columns == null) { Get get = new Get(getBytes(row)); - get.setTimeRange(Long.MIN_VALUE, timestamp); + get.setTimestamp(timestamp); Result result = table.get(get); return ThriftUtilities.rowResultFromHBase(result); } @@ -426,7 +426,7 @@ public class ThriftServer { get.addColumn(famAndQf[0], famAndQf[1]); } } -get.setTimeRange(Long.MIN_VALUE, timestamp); +get.setTimestamp(timestamp); Result result = table.get(get); return ThriftUtilities.rowResultFromHBase(result); } catch (IOException e) { {code} getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch, setTimestamp.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584468#comment-13584468 ] Guido Serra aka Zeph commented on HBASE-5694: - argh... leave it... all of this is just WRONG... {code} if (tScan.isSetTimestamp()) { scan.setTimeRange(Long.MIN_VALUE, tScan.getTimestamp()); } {code} instead of exposing the setTimeRange on the Thrift interface someone decided to hide it this way getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch, setTimestamp.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly
[ https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584472#comment-13584472 ] Guido Serra aka Zeph commented on HBASE-5694: - [~yuzhih...@gmail.com] I will (sorry for this thread, I'd better go home and enjoy the weekend) getRowsWithColumnsTs() in Thrift service handles timestamps incorrectly --- Key: HBASE-5694 URL: https://issues.apache.org/jira/browse/HBASE-5694 Project: HBase Issue Type: Bug Components: Thrift Affects Versions: 0.92.1 Reporter: Wouter Bolsterlee Fix For: 0.94.0 Attachments: HBASE-5694.patch, HBASE-5694-trunk-20120402.patch, setTimestamp.patch The getRowsWithColumnsTs() method in the Thrift interface only applies the timestamp if columns are explicitly specified. However, this method also allows for columns to be unspecified (this is even used internally to implement e.g. getRows()). The cause of the bug is a minor scoping issue: the time range is set inside a wrong if statement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4769) Abort RegionServer Immediately on OOME
[ https://issues.apache.org/jira/browse/HBASE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582112#comment-13582112 ] Guido Serra aka Zeph commented on HBASE-4769: - guys... this is so stupid... I lost the whole morning cause HBase's RegionServer was dying with no logs, no nothing... how Am I supposed to debug the issue if u do not even generate a core dump? or a log message? ... argh Abort RegionServer Immediately on OOME -- Key: HBASE-4769 URL: https://issues.apache.org/jira/browse/HBASE-4769 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4769.patch, HBASE-4769.patch Currently, when the HRegionServer runs out of the memory, it will call master, which will cause more heap allocations and throw a second exception that it's run out of memory again. The easiest safest way to avoid this OOME storm is to abort the RegionServer immediately when it hits the memory boundary. Part of the 89-fb to trunk port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7645) put without timestamp duplicates the record/row
Guido Serra aka Zeph created HBASE-7645: --- Summary: put without timestamp duplicates the record/row Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} /** * Add the specified column and value to this Put operation. * @param family family name * @param qualifier column qualifier * @param value column value * @return this */ public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } /** * Add the specified column and value, with the specified timestamp as * its version to this Put operation. * @param family family name * @param qualifier column qualifier * @param ts version timestamp * @param value column value * @return this */ public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7645) put without timestamp duplicates the record/row
[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7645: Description: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} was: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} /** * Add the specified column and value to this Put operation. * @param family family name * @param qualifier column qualifier * @param value column value * @return this */ public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } /** * Add the specified column and value, with the specified timestamp as * its version to this Put operation. * @param family family name * @param qualifier column qualifier * @param ts version timestamp * @param value column value * @return this */ public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} put without timestamp duplicates the record/row --- Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking
[jira] [Updated] (HBASE-7645) put without timestamp duplicates the record/row
[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7645: Description: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} was: if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} put without timestamp duplicates the record/row --- Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from:
[jira] [Updated] (HBASE-7645) put without timestamp duplicates the record/row
[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph updated HBASE-7645: Priority: Trivial (was: Major) put without timestamp duplicates the record/row --- Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph Priority: Trivial if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7645) put without timestamp duplicates the record/row
[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559783#comment-13559783 ] Guido Serra aka Zeph commented on HBASE-7645: - [~anoopsamjohn] uh, ok... so that is then expect. Thanks for clarifying :) I'll handle it on client side then put without timestamp duplicates the record/row --- Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph Priority: Trivial if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7645) put without timestamp duplicates the record/row
[ https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guido Serra aka Zeph resolved HBASE-7645. - Resolution: Not A Problem put without timestamp duplicates the record/row --- Key: HBASE-7645 URL: https://issues.apache.org/jira/browse/HBASE-7645 Project: HBase Issue Type: Brainstorming Components: Client Reporter: Guido Serra aka Zeph Priority: Trivial if I call a couple of times SQOOP on the same dataset, outputting to HBase, I will end up with duplicated data... {code} hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = mysql:created_at, VERSIONS = 4} COLUMN CELL mysql:created_at timestamp=1358853505756, value=2011-12-21 18:07:38.0 mysql:created_at timestamp=1358790515451, value=2011-12-21 18:07:38.0 2 row(s) in 0.0040 seconds today's sqoop run hbase(main):031:0 Date.new(1358853505756).toString() = Tue Jan 22 11:18:25 UTC 2013 yesterday's sqoop run hbase(main):032:0 Date.new(1358790515451).toString() = Mon Jan 21 17:48:35 UTC 2013 {code} the fact that the Put.add() method writes the kv without checking if, apart of the timestamp, the value has not changed, is it by design? or a bug? I mean, what's the idea behind? Shall it be SQOOP (the client application) supposed to handle the read on the value before issuing an add() statement call? from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java {code} public Put add(byte [] family, byte [] qualifier, byte [] value) { return add(family, qualifier, this.ts, value); } public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) { ListKeyValue list = getKeyValueList(family); KeyValue kv = createPutKeyValue(family, qualifier, ts, value); list.add(kv); familyMap.put(kv.getFamily(), list); return this; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira