[jira] [Updated] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-8751: Attachment: HBASE-8751-0.94-v1.patch rebase on 0.94 latest branch Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: Improvement Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860106#comment-13860106 ] Feng Honghua commented on HBASE-8751: - the new patch also contains the change according to [~jdcryans]'s above feedback, such as method renaming and add 'volatile' for ReplicationPeer.tableCFs Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: Improvement Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860112#comment-13860112 ] Feng Honghua commented on HBASE-8751: - Ping [~stack], [~apurtell], [~jdcryans], [~yuzhih...@gmail.com] and [~lhofhansl] : Would you guys evaluate this feature and also please someone of you help review the patch? thanks a lot :-) This improvement has been mentioned in the HBase FAQ(a feature as nice-to-have but should be implemented much later), we encountered the requirement for it and implemented last year, since then we deployed in all our production clusters and found it's convenient for our cluster replication management and it has run smoothly for quite a long time(over one year). We believe it can be a good feature to be merged to 0.94 main branch :-) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: Improvement Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-8751: Issue Type: New Feature (was: Improvement) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: New Feature Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9203) Secondary index support through coprocessors
[ https://issues.apache.org/jira/browse/HBASE-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860165#comment-13860165 ] rajeshbabu commented on HBASE-9203: --- [~yuzhih...@gmail.com] Thanks for review. bq. In figure CreateTableWithSplits, there is an arrow from CreateTableHandler to IndexMasterObserver labeled CreateIndexTableCameout. I don't find such callback in the patch. What does the arrow represent ? Just wanted to represent coming back to IndexMasterObserver from CreateTableHandler. There are no such call backs in the code, the arrow should start from CreateTableHandler(by mistake drawn from HMaster). bq. Can you explain the second sentence in more detail ? Since the rowkey for the index table put is like this {code} startkey of index region + index name + indexed column(s) value(s) + user table rowkey {code} The last two bytes represents starting position of actual rowkey. bq. IndexLoadIncrementalHFile is the utility that does the loading into index table. Yes it should be IndexLoadIncrementalHFile utility. Nice catch Ted. bq. Is IndexTsvImporterMapper this new class ? Yes its IndexTsvImporterMapper. It will be used to prepare puts for both user table index table from raw data. IndexCreationMapper will be used to prepare index table puts from user table data. I will add javadoc. bq. there is a dummy CF under .indexTable directory. What's its purpose ? Actually it's column family in index table name. Didn't decide the name at that time so given some name. Secondary index support through coprocessors Key: HBASE-9203 URL: https://issues.apache.org/jira/browse/HBASE-9203 Project: HBase Issue Type: New Feature Reporter: rajeshbabu Assignee: rajeshbabu Attachments: SecondaryIndex Design.pdf, SecondaryIndex Design_Updated.pdf We have been working on implementing secondary index in HBase and open sourced on hbase 0.94.8 version. The project is available on github. https://github.com/Huawei-Hadoop/hindex This Jira is to support secondary index on trunk(0.98). Following features will be supported. - multiple indexes on table, - multi column index, - index based on part of a column value, - equals and range condition scans using index, and - bulk loading data to indexed table (Indexing done with bulk load) Most of the kernel changes needed for secondary index is available in trunk. Very minimal changes needed for it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10147) Canary additions
[ https://issues.apache.org/jira/browse/HBASE-10147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gustavo Anatoly updated HBASE-10147: Attachment: HBASE-10147-v3.patch The patch has been updated, [~takeshi.miao]. Thanks for review. Canary additions Key: HBASE-10147 URL: https://issues.apache.org/jira/browse/HBASE-10147 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Gustavo Anatoly Attachments: HBASE-10147-v2.patch, HBASE-10147-v3.patch, HBASE-10147.patch, HBASE-10147.patch, HBASE-10147.patch, HBASE-10147.patch I've been using the canary to quickly identify the dodgy machine in my cluster. It is useful for this. What would make it better would be: + Rather than saying how long it took to get a region after you have gotten the region, it'd be sweet to log BEFORE you went to get the region the regionname and the server it is on. I ask for this because as is, I have to wait for the canary to timeout which can be a while. + Second ask is that when I pass the -t, that when it fails, it says what it failed against -- what region and hopefully what server location (might be hard). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10265) Upgrade to commons-logging 1.1.3
Liang Xie created HBASE-10265: - Summary: Upgrade to commons-logging 1.1.3 Key: HBASE-10265 URL: https://issues.apache.org/jira/browse/HBASE-10265 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.99.0 Reporter: Liang Xie Assignee: Liang Xie Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to common-logging in HBase, to me, it's still worth to bump the version. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10265) Upgrade to commons-logging 1.1.3
[ https://issues.apache.org/jira/browse/HBASE-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-10265: -- Attachment: HBASE-10265.txt Upgrade to commons-logging 1.1.3 Key: HBASE-10265 URL: https://issues.apache.org/jira/browse/HBASE-10265 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.99.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-10265.txt Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to common-logging in HBase, to me, it's still worth to bump the version. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10265) Upgrade to commons-logging 1.1.3
[ https://issues.apache.org/jira/browse/HBASE-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HBASE-10265: -- Status: Patch Available (was: Open) Upgrade to commons-logging 1.1.3 Key: HBASE-10265 URL: https://issues.apache.org/jira/browse/HBASE-10265 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.99.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-10265.txt Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to common-logging in HBase, to me, it's still worth to bump the version. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10266) Disable additional m2eclipse plugin execution
Eric Charles created HBASE-10266: Summary: Disable additional m2eclipse plugin execution Key: HBASE-10266 URL: https://issues.apache.org/jira/browse/HBASE-10266 Project: HBase Issue Type: Improvement Components: build Reporter: Eric Charles Priority: Minor Disable pluginExecutions to ease the import in m2eclipse. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10266) Disable additional m2eclipse plugin execution
[ https://issues.apache.org/jira/browse/HBASE-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Charles updated HBASE-10266: - Attachment: HBASE-10266.patch Simple changes on the root pom.xml, no impact on the build. Disable additional m2eclipse plugin execution - Key: HBASE-10266 URL: https://issues.apache.org/jira/browse/HBASE-10266 Project: HBase Issue Type: Improvement Components: build Reporter: Eric Charles Priority: Minor Attachments: HBASE-10266.patch Disable pluginExecutions to ease the import in m2eclipse. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10265) Upgrade to commons-logging 1.1.3
[ https://issues.apache.org/jira/browse/HBASE-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860304#comment-13860304 ] stack commented on HBASE-10265: --- +1 Upgrade to commons-logging 1.1.3 Key: HBASE-10265 URL: https://issues.apache.org/jira/browse/HBASE-10265 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.99.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-10265.txt Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to common-logging in HBase, to me, it's still worth to bump the version. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10265) Upgrade to commons-logging 1.1.3
[ https://issues.apache.org/jira/browse/HBASE-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860310#comment-13860310 ] stack commented on HBASE-10265: --- Here is list of changes since 1.1.1: http://commons.apache.org/proper/commons-logging/changes-report.html Looks like good stuff only. Upgrade to commons-logging 1.1.3 Key: HBASE-10265 URL: https://issues.apache.org/jira/browse/HBASE-10265 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.99.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HBASE-10265.txt Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to common-logging in HBase, to me, it's still worth to bump the version. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860330#comment-13860330 ] Jimmy Xiang commented on HBASE-8912: As I said in my comments for HBASE-7521, there are many synchronization issues in 0.94 AM. So it (AM#0.94) has problems to handle racing properly. We can not avoid those racing. We have to deal with them and make sure internal states won't mess up. In trunk, we have better synchronizations so it (AM#trunk) should be better. [~jmspaggi], could you run your test on 0.96 and let us know if you can reproduce this on trunk? We have IT with CM all the time and could not see such issues on 0.96/trunk. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10078) Dynamic Filter - Not using DynamicClassLoader when using FilterList
[ https://issues.apache.org/jira/browse/HBASE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-10078: --- Assignee: Jimmy Xiang Dynamic Filter - Not using DynamicClassLoader when using FilterList --- Key: HBASE-10078 URL: https://issues.apache.org/jira/browse/HBASE-10078 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.13 Reporter: Federico Gaule Assignee: Jimmy Xiang Priority: Minor I've tried to use dynamic jar load (https://issues.apache.org/jira/browse/HBASE-1936) but seems to have an issue with FilterList. Here is some log from my app where i send a Get with a FilterList containing AFilter and other with BFilter. {noformat} 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Class d.p.AFilter not found - using dynamical class loader 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class: d.p.AFilter 2013-12-02 13:55:42,564 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Loading new jar files, if any 2013-12-02 13:55:42,677 DEBUG org.apache.hadoop.hbase.util.DynamicClassLoader: Finding class again: d.p.AFilter 2013-12-02 13:55:43,004 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Can't find class d.p.BFilter java.lang.ClassNotFoundException: d.p.BFilter at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.hbase.io.HbaseObjectWritable.getClassByName(HbaseObjectWritable.java:792) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:679) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:324) at org.apache.hadoop.hbase.client.Get.readFields(Get.java:405) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:594) at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:690) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:126) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1311) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1226) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} AFilter is not found so it tries with DynamicClassLoader, but when it tries to load AFilter, it uses URLClassLoader and fails without checking out for dynamic jars. I think the issue is releated to FilterList#readFields {code:title=FilterList.java|borderStyle=solid} public void readFields(final DataInput in) throws IOException { byte opByte = in.readByte(); operator = Operator.values()[opByte]; int size = in.readInt(); if (size 0) { filters = new ArrayListFilter(size); for (int i = 0; i size; i++) { Filter filter = (Filter)HbaseObjectWritable.readObject(in, conf); filters.add(filter); } } } {code} HbaseObjectWritable#readObject uses a conf (created by calling HBaseConfiguration.create()) which i suppose doesn't include a DynamicClassLoader instance. -- This message was sent by
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860348#comment-13860348 ] Jimmy Xiang commented on HBASE-9721: Using ServerName is really a good choice from nice/expandable interface point of view. However, from performance point of view, I think it is better to use just startcode. Although it is not much more data, it is pure overhead. It could have some impact if we are going to support lots of regions (and redundant regions for read availability). As I mentioned in HBASE-10210, I think we should make sure the startocode the real/only differentiate for region server instances running on the same host, port pair. Therefore, if ServerName gets new fields later on, those new fields should not be added so as to differentiate two region server instances running on the same host, port pair. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860376#comment-13860376 ] stack commented on HBASE-10255: --- This issue remains unresolved though it has had a patch applied to a few branches. -1 on current patch as applied since no supporting benchmark that new dependency does not slow throughput in a critical section. I've also taken the time to attach an alternative suggestion that would have us back on the original class w/o need of import from a 'foreign' domain. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9721) RegionServer should not accept regionOpen RPC intended for another(previous) server
[ https://issues.apache.org/jira/browse/HBASE-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860373#comment-13860373 ] Jimmy Xiang commented on HBASE-9721: Another thing, for closeRegion, maybe we don't need to change anything, right? If it is a different instance running on the same host, port pair, the region must be not served there. AM can handle such NotServingRegionException properly. bq. It seems that the RS behaved correct by not being able to open the region by transitioning the zk assignment node. However, the master fails to timeout the assignment even though the meta region is reported in RIT. In trunk, the timeout logic is off by default. This situation should be fixed by the meta SSH. Do you run the latest code in trunk/0.96? With this patch, any affected region should be assigned fast than before. RegionServer should not accept regionOpen RPC intended for another(previous) server --- Key: HBASE-9721 URL: https://issues.apache.org/jira/browse/HBASE-9721 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-9721_v0.patch, hbase-9721_v1.patch, hbase-9721_v2.patch On a test cluster, this following events happened with ITBLL and CM leading to meta being unavailable until master is restarted. An RS carrying meta died, and master assigned the region to one of the RSs. {code} 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 2013-10-03 23:30:06,611 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.RegionStates: Transitioned {1588230740 state=OFFLINE, ts=1380843006601, server=null} to {1588230740 state=PENDING_OPEN, ts=1380843006611, server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820} 2013-10-03 23:30:06,611 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-1] master.ServerManager: New admin connection to gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} At the same time, the RS that meta recently got assigned also died (due to CM), and restarted: {code} 2013-10-03 23:30:07,636 DEBUG [RpcServer.handler=17,port=6] master.ServerManager: REPORT: Server gs-hdp2-secure-1380781860-hbase-8.cs1cloud.internal,60020,1380843002494 came back up, removed it from the dead servers list 2013-10-03 23:30:08,769 INFO [RpcServer.handler=18,port=6] master.ServerManager: Triggering server recovery; existingServer gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 looks stale, new server:gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,771 DEBUG [RpcServer.handler=18,port=6] master.ServerManager: Added=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 to dead servers, submitted shutdown handler to be executed meta=true 2013-10-03 23:30:08,771 INFO [RpcServer.handler=18,port=6] master.ServerManager: Registering server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380843006362 2013-10-03 23:30:08,772 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Splitting hbase:meta logs for gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 {code} AM/SSH sees that the RS that died was carrying meta, but the assignment RPC request was still not sent: {code} 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] master.AssignmentManager: Checking region=hbase:meta,,1.1588230740, zk server=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 current=gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820, matches=true 2013-10-03 23:30:08,791 INFO [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2] handler.MetaServerShutdownHandler: Server gs-hdp2-secure-1380781860-hbase-5.cs1cloud.internal,60020,1380842900820 was carrying META. Trying to assign. 2013-10-03 23:30:08,791 DEBUG [MASTER_META_SERVER_OPERATIONS-gs-hdp2-secure-1380781860-hbase-12:6-2]
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860397#comment-13860397 ] Sergey Shelukhin commented on HBASE-10210: -- 2 RS could have the same timestamp, why not? during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-9374: --- Fix Version/s: 0.99.0 0.96.2 0.98.0 Status: Patch Available (was: Open) Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-9374: --- Attachment: hbase-9374.patch Instead of throwing a RuntimeException, the patch logs a warning, so that things can still work if dynamic class loading is not needed at all. Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860404#comment-13860404 ] Jimmy Xiang commented on HBASE-10210: - 2 RS can have the same timestamp, but they should not have the same timestamp, host, and port. I think the purpose of the startcode is to differentiate two RS running on the same host, port pair, right? We need to check the HRegionServer code to make sure this is true all the time. during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860407#comment-13860407 ] Ted Yu commented on HBASE-10255: BoundedInputStream is used in the following classes in hadoop: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java I searched for LimitInputStream in hadoop but didn't find any occurrence. I did some performance testing. Below was the comparison: --0.96.1.1 org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation: append: count = 762367720 mean rate = 2335.76 bytes/ms 1-minute rate = 2362.35 bytes/ms 5-minute rate = 1967.47 bytes/ms 15-minute rate = 1562.28 bytes/ms syncMeter: count = 1204781 mean rate = 3.69 syncs/ms 1-minute rate = 3.70 syncs/ms 5-minute rate = 3.08 syncs/ms 15-minute rate = 2.42 syncs/ms -with patch: append: count = 983061720 mean rate = 2337.51 bytes/ms 1-minute rate = 2360.75 bytes/ms 5-minute rate = 2117.03 bytes/ms 15-minute rate = 1757.53 bytes/ms syncMeter: count = 1554486 mean rate = 3.70 syncs/ms 1-minute rate = 3.73 syncs/ms 5-minute rate = 3.34 syncs/ms 15-minute rate = 2.76 syncs/ms Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860416#comment-13860416 ] Andrew Purtell commented on HBASE-8751: --- We can't introduce new functionality on 0.94 branch without introducing it to trunk and later releases first. Any chance of a patch for trunk? While it's less likely to be encountered in newer releases of HBase, from my experience operating HBase clusters (and ending up with bad state on many a testing cluster), it can be useful to clear all HBase state from ZooKeeper before a cold restart to clear up issues. It is easier and safer if an operator can clear out all HBase state in ZooKeeper as opposed to specific znodes, because some cannot be lost. I believe we still do not keep the primary/only copy of any HBase state in ZooKeeper. This patch would change that, so it deserves discussion. We should avoid that if possible in my opinion. Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: New Feature Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860426#comment-13860426 ] Andrew Purtell commented on HBASE-10255: Since there is a -1 on the current patch, I will back it out of 0.98 branch if the disagreement is not shortly resolved in favor of the current patch. Thanks guys. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860435#comment-13860435 ] Andrew Purtell commented on HBASE-10249: bq. Do you want this in 0.98 ? +1 Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860456#comment-13860456 ] Andrew Purtell edited comment on HBASE-10249 at 1/2/14 6:12 PM: Committed to 0.98 as r1554867. was (Author: apurtell): Commited to 0.98 as r1554867. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860456#comment-13860456 ] Andrew Purtell commented on HBASE-10249: Commited to 0.98 as r1554867. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860476#comment-13860476 ] Nick Dimiduk commented on HBASE-9374: - This patch works around the problem, doesn't really solve it. What happens when this code runs on the RegionServer, the directory isn't created/doesn't exist, and only the warning is printed? Does the process limp along in a broken state? Nit: the test doesn't explicitly create this condition, it just assumes the user running the test won't have access to that path. It also assumes a UNIX directory structure, so will not work on Windows. Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860483#comment-13860483 ] Sergey Shelukhin commented on HBASE-10210: -- Why not? start code is just system time... it can collide if clock is moved backwards by ntpd or via some other means during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860505#comment-13860505 ] Andrew Purtell commented on HBASE-6104: --- Thanks for the feedback [~giacomotaylor]. Yes, EXEC privilege affects only the invocation of endpoints. The essential change is if a user is not granted EXEC permission then the invocation will be rejected. Regarding your point #2, what prevents your observer from taking the data in an attribute of the first mutation presented and applying it on a regionserver level? Whether your observer does something locally on the region or globally on the regionserver in response to an attribute is up to you. LarsH put in state sharing for region observers in HBASE-6505. Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9291) Enable client to setAttribute that is sent once to each region server
[ https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860510#comment-13860510 ] Andrew Purtell commented on HBASE-9291: --- Copied from https://issues.apache.org/jira/browse/HBASE-6104?focusedCommentId=13860505page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13860505: What prevents your observer from taking the data in an attribute of the first mutation presented and applying it on a regionserver level? Whether your observer does something locally on the region or globally on the regionserver in response to an attribute is up to you. LarsH put in state sharing for region observers in HBASE-6505. Enable client to setAttribute that is sent once to each region server - Key: HBASE-9291 URL: https://issues.apache.org/jira/browse/HBASE-9291 Project: HBase Issue Type: New Feature Components: IPC/RPC Reporter: James Taylor Currently a Scan and Mutation allow the client to set its own attributes that get passed through the RPC layer and are accessible from a coprocessor. This is very handy, but breaks down if the amount of information is large, since this information ends up being sent again and again to every region. Clients can work around this with an endpoint pre and post coprocessor invocation that: 1) sends the information and caches it on the region server in the pre invocation 2) invokes the Scan or sends the batch of Mutations, and then 3) removes it in the post invocation. In this case, the client is forced to identify all region servers (ideally, all region servers that will be involved in the Scan/Mutation), make extra RPC calls, manage the caching of the information on the region server, age-out the information (in case the client dies before step (3) that clears the cached information), and must deal with the possibility of a split occurring while this operation is in-progress. Instead, it'd be much better if an attribute could be identified as a region server attribute in OperationWithAttributes and the HBase RPC layer would take care of doing the above. The use case where the above are necessary in Phoenix include: 1) Hash joins, where the results of the smaller side of a join scan are packaged up and sent to each region server, and 2) Secondary indexing, where the metadata of knowing a) which column family/column qualifier pairs and b) which part of the row key contributes to which indexes are sent to each region server that will process a batched put. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860506#comment-13860506 ] Ted Yu commented on HBASE-10255: Reverted from 0.98 and trunk for further discussion. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10243) store mvcc in WAL
[ https://issues.apache.org/jira/browse/HBASE-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HBASE-10243. -- Resolution: Duplicate store mvcc in WAL - Key: HBASE-10243 URL: https://issues.apache.org/jira/browse/HBASE-10243 Project: HBase Issue Type: Sub-task Components: HFile, regionserver, Scanners Reporter: Sergey Shelukhin Priority: Minor mvcc needs to be stored in WAL. Right now seqId is already stored, so if they are combined, it would be removed or deprecated. Might also happen before this jira. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860513#comment-13860513 ] Andrew Purtell commented on HBASE-6104: --- Will wait on commit to trunk for a bit pending additional discussion. Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay
[ https://issues.apache.org/jira/browse/HBASE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860515#comment-13860515 ] Sergey Shelukhin commented on HBASE-10227: -- resolved that as dup of this as an earlier issue. Would be nice to see patch :) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay Key: HBASE-10227 URL: https://issues.apache.org/jira/browse/HBASE-10227 Project: HBase Issue Type: Bug Components: regionserver Reporter: Feng Honghua Assignee: Gustavo Anatoly When opening a region, all stores are examined to get the max MemstoreTS and it's used as the initial mvcc for the region, and then split hlogs are replayed. In fact the edits in split hlogs have kvs with greater mvcc than all MemstoreTS in all store files, but replaying them don't increment the mvcc according at all. From an overall perspective this mvcc recovering is 'logically' incorrect/incomplete. Why currently it doesn't incur problem is because no active scanners exists and no new scanners can be created before the region opening completes, so the mvcc of all kvs in the resulted hfiles from hlog replaying can be safely set to zero. They are just treated as kvs put 'earlier' than the ones in HFiles with mvcc greater than zero(say 'earlier' since they have mvcc less than the ones with non-zero mvcc, but in fact they are put 'later'), and without any incorrect impact just because during region opening there are no active scanners existing / created. This bug is just in 'logic' sense for the time being, but if later on we need to survive mvcc in the region's whole logic lifecycle(across regionservers) and never set them to zero, this bug needs to be fixed first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860532#comment-13860532 ] stack commented on HBASE-10255: --- -1 The justification is irrational so not to be trusted. The context in hbase is pb (pb uses pb LIS). In hadoop the quoted use is web and log aggregation. Irrelevant. Test numbers look good but I don't trust them given all that has proceeded here and they are provided out of thin air: nothing on whether with or without hdfs, on cluster or not. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860527#comment-13860527 ] Jimmy Xiang commented on HBASE-10210: - Probably we should not support manipulating the clock? Otherwise, the startcode is useless. during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860534#comment-13860534 ] Lars Hofhansl commented on HBASE-8912: -- Thanks [~jxiang]. Are you good the -fix-races patch I posted here. It fixes all issue that can detect for me. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860537#comment-13860537 ] Lars Hofhansl commented on HBASE-10259: --- Any objections? [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860534#comment-13860534 ] Lars Hofhansl edited comment on HBASE-8912 at 1/2/14 6:28 PM: -- Thanks [~jxiang]. Are you good the -fix-races patch I posted here? It fixes all issues that I can detect. was (Author: lhofhansl): Thanks [~jxiang]. Are you good the -fix-races patch I posted here. It fixes all issue that can detect for me. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860549#comment-13860549 ] Jimmy Xiang commented on HBASE-9374: That's right. It works around the problem. The issue is that we don't know if it is in the server side or client side. Let me check if I can add some checking in the server side. Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Assignee: Jimmy Xiang Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860560#comment-13860560 ] Ted Yu commented on HBASE-10255: Should have provided more background on the perf test: This was done on a 5-RS cluster where hadoop 2.2 was deployed. Here was the command line used for both tests: bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -path hdfs://hor12n22.gq1.ygridcore.net:8020/tmp -threads 10 -roll 1000 -verify I obtained the first stat using 0.96 Then I switched jars recompiled with patch v2 and obtained the second set of numbers. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 0.98.0, 0.99.0 Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10222) Implement secondary indexes
[ https://issues.apache.org/jira/browse/HBASE-10222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860564#comment-13860564 ] Jonathan Hsieh commented on HBASE-10222: I reviewed the doc over in HBASE-9203 and did a skim of the code review to answer some of the questions I didn't gleam from the design doc. Overall, I think it is thought out well, but it is massive and there many questions I still have about it. I really think this is really something that should be added as a branch in trunk. We can get pieces reviewed and committed in piece meal more quickly, (which makes reviews of the next pieces easier because we don't have to revisit, and because we don't have things breaking beneath this due to other trunk changes). wdyt? Instead of asking the questions about the design here, maybe we should discuss on the dev list? At this moment I've got about 3 pages of a design outline + questions which may be hard to deal with in Jira. Implement secondary indexes --- Key: HBASE-10222 URL: https://issues.apache.org/jira/browse/HBASE-10222 Project: HBase Issue Type: Sub-task Affects Versions: 0.99.0 Reporter: rajeshbabu Assignee: rajeshbabu Attachments: HBASE-10222-WIP.patch, HBASE-10222.patch, HBASE-10222_WIP.patch, HBASE-10222_WIP_2.patch The parent issue(HBASE-9203) is more than just implementation. This sub taks for major implemenation of the proposal at HBASE-9203. For more information can refer(https://issues.apache.org/jira/secure/attachment/12598763/SecondaryIndex%20Design.pdf) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10267) TestNamespaceCommands occasionally fails
Andrew Purtell created HBASE-10267: -- Summary: TestNamespaceCommands occasionally fails Key: HBASE-10267 URL: https://issues.apache.org/jira/browse/HBASE-10267 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860571#comment-13860571 ] Jonathan Hsieh commented on HBASE-10259: My question about if the custom parts in the old junit are present (or are the problem) in comparison to the newer 4.11 version still is unanswered. [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860587#comment-13860587 ] Lars Hofhansl commented on HBASE-10259: --- Nicolas answered that above (his answer was yes :) ) Why does it matter if the test suite passes, though? [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860590#comment-13860590 ] Jonathan Hsieh commented on HBASE-10259: my bad, missed that. no objections from me. [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10268) [JDK6] TestSplitLogWorker occasionally fails
Andrew Purtell created HBASE-10268: -- Summary: [JDK6] TestSplitLogWorker occasionally fails Key: HBASE-10268 URL: https://issues.apache.org/jira/browse/HBASE-10268 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Environment: 64-bit JDK 6 (Java(TM) SE Runtime Environment (build 1.6.0_43-b01) HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)) on Ubuntu 12 Reporter: Andrew Purtell Fix For: 0.98.0, 0.99.0 TestSplitLogWorker failed in 10% of 50 runs of the 0.98 branch test suite, but only when using JDK 6 on Ubuntu 12. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10156) Fix up the HBASE-8755 slowdown when low contention
[ https://issues.apache.org/jira/browse/HBASE-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10156: -- Attachment: 10156v2.txt Raw patch. Much in need of cleanup. I owe a writeup. This patch would seem to have better perf -- test completes about 30% quicker -- but need to do more test runs to ensure this finding holds up. I just tested 100 threads at mo. This patch seems we have more throughput and slightly better latency. Will put up more comprehensize numbers and a cleaned up patch in a while. {code} WITH PATCH .. 2014-01-02 09:30:10,213 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=100, syncInterval=0 took 2137.059s 46793.277ops/s ... Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 100': 9469922.337766 task-clock#4.425 CPUs utilized 150,849,553 context-switches #0.016 M/sec 50,123,607 CPU-migrations#0.005 M/sec 73,772 page-faults #0.008 K/sec 14,710,061,505,302 cycles#1.553 GHz [83.32%] 10,451,901,705,698 stalled-cycles-frontend # 71.05% frontend cycles idle [83.32%] 5,540,518,559,443 stalled-cycles-backend# 37.66% backend cycles idle [66.71%] 10,468,567,537,009 instructions #0.71 insns per cycle #1.00 stalled cycles per insn [83.38%] 1,729,960,111,202 branches # 182.679 M/sec [83.32%] 21,231,188,285 branch-misses #1.23% of all branches [83.33%] 2140.159656084 seconds time elapsed WITHOUT PATCH ... 2014-01-02 10:38:51,479 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=100, iterations=100, syncInterval=0 took 3087.523s 32388.424ops/s ... Performance counter stats for './bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -threads 100': 7708297.226562 task-clock#2.494 CPUs utilized 300,427,649 context-switches #0.039 M/sec 18,194,920 CPU-migrations#0.002 M/sec 63,271 page-faults #0.008 K/sec 11,267,121,080,680 cycles#1.462 GHz [83.30%] 8,257,311,140,280 stalled-cycles-frontend # 73.29% frontend cycles idle [83.33%] 4,895,094,825,672 stalled-cycles-backend# 43.45% backend cycles idle [66.71%] 6,186,011,736,133 instructions #0.55 insns per cycle #1.33 stalled cycles per insn [83.36%] 1,034,182,786,238 branches # 134.165 M/sec [83.34%] 32,306,693,838 branch-misses #3.12% of all branches [83.33%] 3090.537038965 seconds time elapsed {code} Fix up the HBASE-8755 slowdown when low contention -- Key: HBASE-10156 URL: https://issues.apache.org/jira/browse/HBASE-10156 Project: HBase Issue Type: Sub-task Components: wal Reporter: stack Assignee: stack Attachments: 10156.txt, 10156v2.txt, Disrupting.java HBASE-8755 slows our writes when only a few clients. Fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10255: --- Fix Version/s: (was: 0.99.0) (was: 0.98.0) Unscheduled. Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860601#comment-13860601 ] Lars Hofhansl commented on HBASE-10259: --- Thanks [~jmhsieh]. I'll do the following: # commit this patch # wait for a successful run of 0.94 and 0.94-security # then switch the jenkins build to JDK 7 [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10259. --- Resolution: Fixed Assignee: Lars Hofhansl Hadoop Flags: Reviewed Committed to 0.94 [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10258) [0.94] Fix JDK 7 related test failures.
[ https://issues.apache.org/jira/browse/HBASE-10258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10258. --- Resolution: Duplicate Child HBASE-10259 fixes all JDK7 test issues. [0.94] Fix JDK 7 related test failures. --- Key: HBASE-10258 URL: https://issues.apache.org/jira/browse/HBASE-10258 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Starting with these: org.apache.hadoop.hbase.mapreduce.TestCopyTable.testCopyTable org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.testShouldFailSplitIfZNodeDoesNotExistDueToPrevRollBack org.apache.hadoop.hbase.regionserver.TestStore.testDeleteExpiredStoreFiles org.apache.hadoop.hbase.snapshot.TestRestoreFlushSnapshotFromClient.testRestoreSnapshotOfCloned org.apache.hadoop.hbase.util.TestEnvironmentEdgeManager.testManageSingleton -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10259: -- Issue Type: Test (was: Sub-task) Parent: (was: HBASE-10258) [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Test Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860625#comment-13860625 ] Jimmy Xiang commented on HBASE-8912: For the fix-races patch, I understand the change to HRegionServer#removeFromRegionsInTransition. For the OpenRegionHandler change, we do call this.rsServices.removeFromRegionsInTransition(this.regionInfo) in the final block. I was wondering how the change will help. It should help if master tries to assign the region to the same host again, which is very common in unit tests. However, if we removed the region from the transition region list before we change the znode, if another openRegion call gets to this server now, it could see wrong znode state. This is unlikely to happen (master assigns the same region using the same znode version). However, in the finally block we could remove the region from transition by mistake (the new transition is removed instead). Will this cause any issue? [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860623#comment-13860623 ] James Taylor commented on HBASE-6104: - It's about conserving network bandwidth - we don't want to take the hit of transferring the same data between client and server multiple times. For example, with secondary indexing, we'd be tacking on data for every Put - if you have a batch of 10,000, that's a lot of extra data. We could try to figure out which Put is the first one for each region, but what if a split occurs after we figure this out -- this seems too brittle. In the case of a Hash Join, we'd be sending over the compressed results of a scan that ran over the smaller table (which gets joined against in a coprocessor when the scan over the other table is ran). This can become very large - imagine you're joining against a table with 10M rows. We would not want to send this data for every region of the region server (or even multiple times per region depending on how the scan gets parallelized on the client). Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10210) during master startup, RS can be you-are-dead-ed by master in error
[ https://issues.apache.org/jira/browse/HBASE-10210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860632#comment-13860632 ] Sergey Shelukhin commented on HBASE-10210: -- We actually recommend running ntpd to avoid clock skew. Clock manipulation is needed to keep clock in sync. Normally ntpd would not move time backwards but it can happen, and there can also be other tools, esp. on different OSes that we don't know of. It's not useless, we just need to be careful about it. One part where it's definitely harmful is server-generated KV timestamps, but that's a completely separate discussion :) during master startup, RS can be you-are-dead-ed by master in error --- Key: HBASE-10210 URL: https://issues.apache.org/jira/browse/HBASE-10210 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.96.1, 0.99.0, 0.96.1.1 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10210.patch Not sure of the root cause yet, I am at how did this ever work stage. We see this problem in 0.96.1, but didn't in 0.96.0 + some patches. It looks like RS information arriving from 2 sources - ZK and server itself, can conflict. Master doesn't handle such cases (timestamp match), and anyway technically timestamps can collide for two separate servers. So, master YouAreDead-s the already-recorded reporting RS, and adds it too. Then it discovers that the new server has died with fatal error! Note the threads. Addition is called from master initialization and from RPC. {noformat} 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Finished waiting for region servers count to settle; checked in 2, slept for 18262 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.ServerManager: Registering server=h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,290 INFO [master:h2-ubuntu12-sec-1387431063-hbase-10:6] master.HMaster: Registered server found up in zk but who has not yet reported in: h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Triggering server recovery; existingServer h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 looks stale, new server:h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 2013-12-19 11:16:45,380 INFO [RpcServer.handler=4,port=6] master.ServerManager: Master doesn't enable ServerShutdownHandler during initialization, delay expiring server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 ... 2013-12-19 11:16:46,925 ERROR [RpcServer.handler=7,port=6] master.HMaster: Region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 reported a fatal error: ABORTING region server h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing h2-ubuntu12-sec-1387431063-hbase-8.cs1cloud.internal,60020,1387451803800 as dead server {noformat} Presumably some of the recent ZK listener related changes b -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860634#comment-13860634 ] Jimmy Xiang commented on HBASE-8912: It seems to me we replace one racing with a new one (OpenRegionHandler). If the new one happens less, it is good to me. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860652#comment-13860652 ] Andrew Purtell commented on HBASE-6104: --- This discussion is about HBASE-9291. Let's take it there. Regarding requiring an EXEC privilege to call coprocessor endpoints, the feedback is that security would be inconvenient. Unfortunately, that's the nature of security. Will commit this to trunk shortly. Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9291) Enable client to setAttribute that is sent once to each region server
[ https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860654#comment-13860654 ] Andrew Purtell commented on HBASE-9291: --- Coped from https://issues.apache.org/jira/browse/HBASE-6104?focusedCommentId=13860623page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13860623: It's about conserving network bandwidth - we don't want to take the hit of transferring the same data between client and server multiple times. For example, with secondary indexing, we'd be tacking on data for every Put - if you have a batch of 10,000, that's a lot of extra data. We could try to figure out which Put is the first one for each region, but what if a split occurs after we figure this out – this seems too brittle. In the case of a Hash Join, we'd be sending over the compressed results of a scan that ran over the smaller table (which gets joined against in a coprocessor when the scan over the other table is ran). This can become very large - imagine you're joining against a table with 10M rows. We would not want to send this data for every region of the region server (or even multiple times per region depending on how the scan gets parallelized on the client). Enable client to setAttribute that is sent once to each region server - Key: HBASE-9291 URL: https://issues.apache.org/jira/browse/HBASE-9291 Project: HBase Issue Type: New Feature Components: IPC/RPC Reporter: James Taylor Currently a Scan and Mutation allow the client to set its own attributes that get passed through the RPC layer and are accessible from a coprocessor. This is very handy, but breaks down if the amount of information is large, since this information ends up being sent again and again to every region. Clients can work around this with an endpoint pre and post coprocessor invocation that: 1) sends the information and caches it on the region server in the pre invocation 2) invokes the Scan or sends the batch of Mutations, and then 3) removes it in the post invocation. In this case, the client is forced to identify all region servers (ideally, all region servers that will be involved in the Scan/Mutation), make extra RPC calls, manage the caching of the information on the region server, age-out the information (in case the client dies before step (3) that clears the cached information), and must deal with the possibility of a split occurring while this operation is in-progress. Instead, it'd be much better if an attribute could be identified as a region server attribute in OperationWithAttributes and the HBase RPC layer would take care of doing the above. The use case where the above are necessary in Phoenix include: 1) Hash joins, where the results of the smaller side of a join scan are packaged up and sent to each region server, and 2) Secondary indexing, where the metadata of knowing a) which column family/column qualifier pairs and b) which part of the row key contributes to which indexes are sent to each region server that will process a batched put. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860653#comment-13860653 ] Lars Hofhansl commented on HBASE-8912: -- Thanks [~jxiang]. The idea was that we remove the region from RITs *before* we transition the znode. So there should no race anymore since the master cannot know about the FAILED_OPEN before the region was removed from RITs, and since the master also avoids concurrent assigns of the same region now. Cool. I'll wait for an hour or two and then commit and resolve. [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-10264: Attachment: HBase-10264.patch Two issues in the mapred phase: a) It misses hbase specific jars, so TT doesn't get any hbase-xxx jars shipped to them. b) The job also requires Counter.class from high_scalability-xxx library, as TT instantiates HRegion object. Here is a patch to do so. Testing: Ran the patched job on a yarn cluster. {code} [jenkins@tarball-target-2 hbase]$ bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/sampleTable_api-compat-8.ent.cloudera.com ... 2014-01-02 11:33:02,550 INFO [main] mapreduce.Job: The url to track the job: http://tarball-target-2.ent.cloudera.com:8088/proxy/application_1388690541295_0011/ 2014-01-02 11:33:02,551 INFO [main] mapreduce.Job: Running job: job_1388690541295_0011 2014-01-02 11:33:14,018 INFO [main] mapreduce.Job: Job job_1388690541295_0011 running in uber mode : false 2014-01-02 11:33:14,020 INFO [main] mapreduce.Job: map 0% reduce 0% 2014-01-02 11:33:23,151 INFO [main] mapreduce.Job: map 100% reduce 0% 2014-01-02 11:33:23,172 INFO [main] mapreduce.Job: Job job_1388690541295_0011 completed successfully 2014-01-02 11:33:23,362 INFO [main] mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=109926 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=5749 HDFS: Number of bytes written=968 HDFS: Number of read operations=22 HDFS: Number of large read operations=0 HDFS: Number of write operations=6 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=7181 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=1 Map output records=0 Input split bytes=154 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=77 CPU time spent (ms)=1940 Physical memory (bytes) snapshot=178450432 Virtual memory (bytes) snapshot=883052544 Total committed heap usage (bytes)=114360320 File Input Format Counters Bytes Read=143 File Output Format Counters Bytes Written=0 {code} [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860665#comment-13860665 ] Enis Soztutar commented on HBASE-10264: --- +1. [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860683#comment-13860683 ] Andrew Purtell commented on HBASE-6104: --- If we introduce a new client API to the effect of send one RPC to each RS, then this amounts to a modified coprocessor endpoint execution, but with an invocation target that is a singleton to each RS, and should be subject to the same security considerations. Passing an attribute on the first put to a RS sidesteps the need for EXEC grants on any endpoint invocation target. Remainder of response on HBASE-9291. Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-6104) Require EXEC permission to call coprocessor endpoints
[ https://issues.apache.org/jira/browse/HBASE-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860683#comment-13860683 ] Andrew Purtell edited comment on HBASE-6104 at 1/2/14 8:06 PM: --- If we introduce a new client API to the effect of send one RPC to each RS, then this amounts to a modified coprocessor endpoint execution, but with an invocation target that is a singleton to each RS, and should be subject to the same security considerations. Passing an attribute on the first put to a RS sidesteps the need for EXEC grants on any endpoint invocation target, which is what sounds like the goal you are after. Remainder of response on HBASE-9291. was (Author: apurtell): If we introduce a new client API to the effect of send one RPC to each RS, then this amounts to a modified coprocessor endpoint execution, but with an invocation target that is a singleton to each RS, and should be subject to the same security considerations. Passing an attribute on the first put to a RS sidesteps the need for EXEC grants on any endpoint invocation target. Remainder of response on HBASE-9291. Require EXEC permission to call coprocessor endpoints - Key: HBASE-6104 URL: https://issues.apache.org/jira/browse/HBASE-6104 Project: HBase Issue Type: New Feature Components: Coprocessors, security Reporter: Gary Helmling Assignee: Andrew Purtell Fix For: 0.99.0 Attachments: 6104-addendum-1.patch, 6104-revert.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch, 6104.patch The EXEC action currently exists as only a placeholder in access control. It should really be used to enforce access to coprocessor endpoint RPC calls, which are currently unrestricted. How the ACLs to support this would be modeled deserves some discussion: * Should access be scoped to a specific table and CoprocessorProtocol extension? * Should it be possible to grant access to a CoprocessorProtocol implementation globally (regardless of table)? * Are per-method restrictions necessary? * Should we expose hooks available to endpoint implementors so that they could additionally apply their own permission checks? Some CP endpoints may want to require READ permissions, others may want to enforce WRITE, or READ + WRITE. To apply these kinds of checks we would also have to extend the RegionObserver interface to provide hooks wrapping HRegion.exec(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9291) Enable client to setAttribute that is sent once to each region server
[ https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860731#comment-13860731 ] Andrew Purtell commented on HBASE-9291: --- First let me clarify my above second suggestion: We could hang a map accessible to all CPs in the RS off of RegionServerServices as like was done for the region level in HBASE-6505. Then the client would provide the (large) state as an attribute on only the first mutation sent to each regionserver. (More on this below.) The CP would observe the attribute and apply it to RS-level shared state. Then the mutation and subsequent mutations could be processed referring to the updated RS-level state. The client side is tricky. bq. We could try to figure out which Put is the first one for each region, but what if a split occurs after we figure this out – this seems too brittle. If we introduce a new client API to the effect of send one RPC to each RS, then this amounts to a modified coprocessor endpoint execution, but with an invocation target that is a singleton to each RS, and should be subject to the same security considerations. Passing an attribute on the first put to a RS sidesteps the need for EXEC grants (HBASE-6104) on any endpoint invocation target, which is what sounds like the goal you are after. Whether an endpoint invocation or a mutation, we have the same issue that the local knowledge of cluster state can at any point be stale. Live servers can come and go, and regions can move around, and there is no transactional state update protocol running between clients and servers for updating this information. Even if there were, cluster topology can change mid flight. A send one RPC to each RS API could miss a newly onlined server that came up after the call(s) started and yet opened some relevant regions asynchronously. Whether trying to figure out which put is the first for a RS, or selecting keys for a set of coprocessor endpoints such that you only invoke one per RS, or using a new send one RPC to each RS, on the server you'd have to handle the same set of issues, right? There could be 0, 1, or ~2 large data transfers per RS: - 0 if a new server is onlined and regions are assigned after the put or send one RPC to each RS calls are in progress - 1 if the cluster topology is unchanged over the entire client action - ~2 if a region is moved or split, or even in the case of one-RPC-per-server if there is a RPC retry on account of the failed transmission back to the client of a server side success indication I wouldn't use the word brittle. Messy is better. It always is. Enable client to setAttribute that is sent once to each region server - Key: HBASE-9291 URL: https://issues.apache.org/jira/browse/HBASE-9291 Project: HBase Issue Type: New Feature Components: IPC/RPC Reporter: James Taylor Currently a Scan and Mutation allow the client to set its own attributes that get passed through the RPC layer and are accessible from a coprocessor. This is very handy, but breaks down if the amount of information is large, since this information ends up being sent again and again to every region. Clients can work around this with an endpoint pre and post coprocessor invocation that: 1) sends the information and caches it on the region server in the pre invocation 2) invokes the Scan or sends the batch of Mutations, and then 3) removes it in the post invocation. In this case, the client is forced to identify all region servers (ideally, all region servers that will be involved in the Scan/Mutation), make extra RPC calls, manage the caching of the information on the region server, age-out the information (in case the client dies before step (3) that clears the cached information), and must deal with the possibility of a split occurring while this operation is in-progress. Instead, it'd be much better if an attribute could be identified as a region server attribute in OperationWithAttributes and the HBase RPC layer would take care of doing the above. The use case where the above are necessary in Phoenix include: 1) Hash joins, where the results of the smaller side of a join scan are packaged up and sent to each region server, and 2) Secondary indexing, where the metadata of knowing a) which column family/column qualifier pairs and b) which part of the row key contributes to which indexes are sent to each region server that will process a batched put. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860733#comment-13860733 ] Nick Dimiduk commented on HBASE-10264: -- +1 as well. I can commit if no one objects. [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860735#comment-13860735 ] Hudson commented on HBASE-10249: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #46 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/46/]) HBASE-10249. Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860734#comment-13860734 ] Hudson commented on HBASE-10255: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #46 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/46/]) HBASE-10255 Remove dependency on LimitInputStream - revert (Tedyu: rev 1554869) * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/io/LimitInputStream.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860747#comment-13860747 ] Aleksandr Shulman commented on HBASE-10264: --- +1 - looks good to me as well. Smoke tested it against MRv1 as well. [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860746#comment-13860746 ] Hudson commented on HBASE-10259: SUCCESS: Integrated in HBase-0.94-security #376 (See [https://builds.apache.org/job/HBase-0.94-security/376/]) HBASE-10259 [0.94] Upgrade JUnit to 4.11 (larsh: rev 1554879) * /hbase/branches/0.94/pom.xml * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Test Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10264) [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath
[ https://issues.apache.org/jira/browse/HBASE-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860761#comment-13860761 ] stack commented on HBASE-10264: --- +1 for 0.96 -- bugfix. [MapReduce]: CompactionTool in mapred mode is missing classes in its classpath -- Key: HBASE-10264 URL: https://issues.apache.org/jira/browse/HBASE-10264 Project: HBase Issue Type: Bug Components: Compaction, mapreduce Affects Versions: 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Himanshu Vashishtha Attachments: HBase-10264.patch Calling o.a.h.h.regionserver.CompactionTool fails due to classpath-related issues in both MRv1 and MRv2. {code}hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred -major hdfs://`hostname`:8020/hbase/data/default/orig_1388179858868{code} Results: {code}2013-12-27 13:31:49,478 INFO [main] mapreduce.Job: Task Id : attempt_1388179525649_0011_m_00_2, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.TableInfoMissingException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionWorker.compact(CompactionTool.java:115) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:231) at org.apache.hadoop.hbase.regionserver.CompactionTool$CompactionMapper.map(CompactionTool.java:207) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9291) Enable client to setAttribute that is sent once to each region server
[ https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860763#comment-13860763 ] James Taylor commented on HBASE-9291: - For the sending of the cache for Put operations, there needs to be a guarantee that the Region Server has the data being cached prior to any calling of the coprocessor hooks on the server-side. If this data is added to the first Put for each region server, is there any guarantee that one of the other regions isn't processed first (since these are sent in parallel from the client)? I think the join/scan scenarios may be more complicated, as Phoenix does it's own parallelization of the scan by breaking it up into row key ranges. From the POV of the HBase client, these look like separate scans. I think we're stuck establishing the region server cache ourselves for this case. Given the flexibility of region observer coprocessors, I'm sure we can work out a way to send the cache through these rather than an endpoint coprocessor. For example, we can just issue a Get with a single key per region server to get the data over. FWIW, in the case of the data not being on the region server as expected, we'll end up throwing and the client will retry. As far as HBASE-6505, we couldn't take advantage of it since it only allows the shared state to be shared between the same coprocessor. In this case, we have a different one that sends the data to cache versus the ones that use the data (our Put and Scan region observer coprocessors). Enable client to setAttribute that is sent once to each region server - Key: HBASE-9291 URL: https://issues.apache.org/jira/browse/HBASE-9291 Project: HBase Issue Type: New Feature Components: IPC/RPC Reporter: James Taylor Currently a Scan and Mutation allow the client to set its own attributes that get passed through the RPC layer and are accessible from a coprocessor. This is very handy, but breaks down if the amount of information is large, since this information ends up being sent again and again to every region. Clients can work around this with an endpoint pre and post coprocessor invocation that: 1) sends the information and caches it on the region server in the pre invocation 2) invokes the Scan or sends the batch of Mutations, and then 3) removes it in the post invocation. In this case, the client is forced to identify all region servers (ideally, all region servers that will be involved in the Scan/Mutation), make extra RPC calls, manage the caching of the information on the region server, age-out the information (in case the client dies before step (3) that clears the cached information), and must deal with the possibility of a split occurring while this operation is in-progress. Instead, it'd be much better if an attribute could be identified as a region server attribute in OperationWithAttributes and the HBase RPC layer would take care of doing the above. The use case where the above are necessary in Phoenix include: 1) Hash joins, where the results of the smaller side of a join scan are packaged up and sent to each region server, and 2) Secondary indexing, where the metadata of knowing a) which column family/column qualifier pairs and b) which part of the row key contributes to which indexes are sent to each region server that will process a batched put. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860774#comment-13860774 ] Hudson commented on HBASE-10249: FAILURE: Integrated in HBase-0.98 #49 (See [https://builds.apache.org/job/HBase-0.98/49/]) HBASE-10249. Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867) * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10255) Remove dependency on LimitInputStream
[ https://issues.apache.org/jira/browse/HBASE-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860773#comment-13860773 ] Hudson commented on HBASE-10255: FAILURE: Integrated in HBase-0.98 #49 (See [https://builds.apache.org/job/HBase-0.98/49/]) HBASE-10255 Remove dependency on LimitInputStream - revert (Tedyu: rev 1554869) * /hbase/branches/0.98/hbase-common/src/main/java/org/apache/hadoop/hbase/io/LimitInputStream.java * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java Remove dependency on LimitInputStream - Key: HBASE-10255 URL: https://issues.apache.org/jira/browse/HBASE-10255 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: 10255-v1.txt, 10255-v2.txt, alternate_lis.txt LimitInputStream has always been a @Beta API and beta apis aren't guaranteed to remain stable over such a long period (v12 to v15). LimitInputStream was copied from Guava v12 The recommended replacement is to use ByteStreams#limit(java.io.InputStream, long) instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10267) TestNamespaceCommands occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-10267: --- Description: {noformat} junit.framework.AssertionFailedError: Waiting timed out after [10,000] msec at junit.framework.Assert.fail(Assert.java:57) at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193) at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:146) at org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3251) at org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:241) at org.apache.hadoop.hbase.security.access.SecureTestUtil.grantOnNamespace(SecureTestUtil.java:321) at org.apache.hadoop.hbase.security.access.TestNamespaceCommands.beforeClass(TestNamespaceCommands.java:88) {noformat} TestNamespaceCommands occasionally fails Key: HBASE-10267 URL: https://issues.apache.org/jira/browse/HBASE-10267 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.99.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.98.0, 0.99.0 {noformat} junit.framework.AssertionFailedError: Waiting timed out after [10,000] msec at junit.framework.Assert.fail(Assert.java:57) at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193) at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:146) at org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3251) at org.apache.hadoop.hbase.security.access.SecureTestUtil.updateACLs(SecureTestUtil.java:241) at org.apache.hadoop.hbase.security.access.SecureTestUtil.grantOnNamespace(SecureTestUtil.java:321) at org.apache.hadoop.hbase.security.access.TestNamespaceCommands.beforeClass(TestNamespaceCommands.java:88) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8889) TestIOFencing#testFencingAroundCompaction occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8889: -- Attachment: TestIOFencing.tar.gz Test output captured previously. TestIOFencing#testFencingAroundCompaction occasionally fails Key: HBASE-8889 URL: https://issues.apache.org/jira/browse/HBASE-8889 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: TestIOFencing.tar.gz From https://builds.apache.org/job/PreCommit-HBASE-Build/6232//testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompaction/ : {code} java.lang.AssertionError: Timed out waiting for new server to open region at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestIOFencing.doTest(TestIOFencing.java:269) at org.apache.hadoop.hbase.TestIOFencing.testFencingAroundCompaction(TestIOFencing.java:205) {code} {code} 2013-07-06 23:13:53,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:54,120 INFO [pool-1-thread-1] hbase.TestIOFencing(266): Waiting for the new server to pick up the region tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] hbase.TestIOFencing$CompactionBlockerRegion(102): allowing compactions 2013-07-06 23:13:55,121 INFO [pool-1-thread-1] hbase.HBaseTestingUtility(911): Shutting down minicluster 2013-07-06 23:13:55,121 DEBUG [pool-1-thread-1] util.JVMClusterUtil(237): Shutting down HBase Cluster 2013-07-06 23:13:55,121 INFO [RS:0;asf002:39065-smallCompactions-1373152134716] regionserver.HStore(951): Starting compaction of 2 file(s) in family of tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. into tmpdir=hdfs://localhost:50140/user/jenkins/hbase/tabletest/6e62d3b24ea23160931362b60359ff03/.tmp, totalSize=108.4k ... 2013-07-06 23:13:55,155 INFO [RS:0;asf002:39065] regionserver.HRegionServer(2476): Received CLOSE for the region: 6e62d3b24ea23160931362b60359ff03 ,which we are already trying to CLOSE 2013-07-06 23:13:55,157 WARN [RS:0;asf002:39065] regionserver.HRegionServer(2414): Failed to close tabletest,,1373152125442.6e62d3b24ea23160931362b60359ff03. - ignoring and continuing org.apache.hadoop.hbase.exceptions.NotServingRegionException: The region 6e62d3b24ea23160931362b60359ff03 was already closing. New CLOSE request is ignored. at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2479) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2409) at org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2011) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:903) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:158) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:142) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:41) at org.apache.hadoop.hbase.security.User.call(User.java:420) at org.apache.hadoop.hbase.security.User.access$300(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:260) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:140) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
Aleksandr Shulman created HBASE-10269: - Summary: [Nit]: Spelling issue in HFileContext.setCompresssion Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10270) Remove DataBlockEncoding from BlockCacheKey
Arjen Roodselaar created HBASE-10270: Summary: Remove DataBlockEncoding from BlockCacheKey Key: HBASE-10270 URL: https://issues.apache.org/jira/browse/HBASE-10270 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.89-fb Reporter: Arjen Roodselaar Assignee: Arjen Roodselaar Priority: Minor Fix For: 0.89-fb When a block is added to the BlockCache its DataBlockEncoding is stored on the BlockCacheKey. This block encoding is used in the calculation of the hashCode and as such matters when cache lookups are done. Because the keys differ for encoded and unencoded (data) blocks, there is a potential for caching them twice or missing the cache. This happens for example when using Scan preloading as AbstractScannerV2.readNextDataBlock() does a read without knowing the block type or the encoding. This patch removes the block encoding from the key and forces the caller of HFileReaderV2.readBlock() to specify the expected BlockType as well as the expected DataBlockEncoding when these matter. This allows for a decision on either of these at read time instead of cache time, puts responsibility where appropriate, fixes some cache misses when using the scan preloading (which does a read without knowing the type or encoding), allows for the BlockCacheKey to be re-used by the L2 BucketCache and sets us up for a future CompoundScannerV2 which can read both un-encoded and encoded data blocks. A gotcha here: ScannerV2 and EncodedScannerV2 expect BlockType.DATA and BlockType.ENCODED_DATA respectively and will throw when given a block of the wrong type. Adding the DataBlockEncoding on the cache key caused a cache miss if the block was cached with the wrong encoding, implicitly defining the BlockType and thus keeping this from happening. It is now the scanner's responsibility to specify both the expected type and encoding (which is more appropriate). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Attachment: HBASE-10269-1.patch It looks like this call was not used anywhere, so the change is a one-liner. [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Affects Version/s: 0.99.0 0.98.1 0.98.0 [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10269: -- Status: Patch Available (was: Open) [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860802#comment-13860802 ] Aleksandr Shulman commented on HBASE-10269: --- Should apply cleanly to both 0.98 and trunk. Not necessary for 0.96. [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860805#comment-13860805 ] Jonathan Hsieh commented on HBASE-10269: Are there any calls to this method? [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860805#comment-13860805 ] Jonathan Hsieh edited comment on HBASE-10269 at 1/2/14 9:41 PM: Why aren't there any calls to this method? was (Author: jmhsieh): Are there any calls to this method? [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10259) [0.94] Upgrade JUnit to 4.11
[ https://issues.apache.org/jira/browse/HBASE-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860806#comment-13860806 ] Hudson commented on HBASE-10259: FAILURE: Integrated in HBase-0.94 #1244 (See [https://builds.apache.org/job/HBase-0.94/1244/]) HBASE-10259 [0.94] Upgrade JUnit to 4.11 (larsh: rev 1554879) * /hbase/branches/0.94/pom.xml * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMultiParallel.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java [0.94] Upgrade JUnit to 4.11 Key: HBASE-10259 URL: https://issues.apache.org/jira/browse/HBASE-10259 Project: HBase Issue Type: Test Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.16 Attachments: 10259-v2.txt, 10259-v3.txt, 10259-v4.txt, 10259.txt Right now we're using a custom version it seems: 4.10-HBASE\-1. Let's upgrade that to 4.11. See parent for rationale. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10269) [Nit]: Spelling issue in HFileContext.setCompresssion
[ https://issues.apache.org/jira/browse/HBASE-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860817#comment-13860817 ] Jonathan Hsieh commented on HBASE-10269: anyway, +1. Will commit once the bot +1's it. [Nit]: Spelling issue in HFileContext.setCompresssion - Key: HBASE-10269 URL: https://issues.apache.org/jira/browse/HBASE-10269 Project: HBase Issue Type: Bug Affects Versions: 0.98.0, 0.98.1, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Attachments: HBASE-10269-1.patch As part of HBase-7544, there was introduced a misspelling into HFileContext.java: https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileContext.java#L103 The fix is trivial. Will attach. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10270) Remove DataBlockEncoding from BlockCacheKey
[ https://issues.apache.org/jira/browse/HBASE-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjen Roodselaar updated HBASE-10270: - Attachment: datablockencoding_blockcachekey.patch Remove DataBlockEncoding from BlockCacheKey --- Key: HBASE-10270 URL: https://issues.apache.org/jira/browse/HBASE-10270 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.89-fb Reporter: Arjen Roodselaar Assignee: Arjen Roodselaar Priority: Minor Fix For: 0.89-fb Attachments: datablockencoding_blockcachekey.patch When a block is added to the BlockCache its DataBlockEncoding is stored on the BlockCacheKey. This block encoding is used in the calculation of the hashCode and as such matters when cache lookups are done. Because the keys differ for encoded and unencoded (data) blocks, there is a potential for caching them twice or missing the cache. This happens for example when using Scan preloading as AbstractScannerV2.readNextDataBlock() does a read without knowing the block type or the encoding. This patch removes the block encoding from the key and forces the caller of HFileReaderV2.readBlock() to specify the expected BlockType as well as the expected DataBlockEncoding when these matter. This allows for a decision on either of these at read time instead of cache time, puts responsibility where appropriate, fixes some cache misses when using the scan preloading (which does a read without knowing the type or encoding), allows for the BlockCacheKey to be re-used by the L2 BucketCache and sets us up for a future CompoundScannerV2 which can read both un-encoded and encoded data blocks. A gotcha here: ScannerV2 and EncodedScannerV2 expect BlockType.DATA and BlockType.ENCODED_DATA respectively and will throw when given a block of the wrong type. Adding the DataBlockEncoding on the cache key caused a cache miss if the block was cached with the wrong encoding, implicitly defining the BlockType and thus keeping this from happening. It is now the scanner's responsibility to specify both the expected type and encoding (which is more appropriate). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10271) [regression] Cannot use the wildcard address since HBASE-9593
Jean-Daniel Cryans created HBASE-10271: -- Summary: [regression] Cannot use the wildcard address since HBASE-9593 Key: HBASE-10271 URL: https://issues.apache.org/jira/browse/HBASE-10271 Project: HBase Issue Type: Bug Affects Versions: 0.96.1, 0.94.13 Reporter: Jean-Daniel Cryans Priority: Critical HBASE-9593 moved the creation of the ephemeral znode earlier in the region server startup process such that we don't have access to the ServerName from the Master's POV. HRS.getMyEphemeralNodePath() calls HRS.getServerName() which at that point will return this.isa.getHostName(). If you set hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that address. What happens next is that the RS will report for duty correctly but the master will do this: {noformat} 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:6] master.ServerManager: Registering server=0:0:0:0:0:0:0:0%0,60020,1388691892014 2014-01-02 11:45:49,498 INFO [master:172.21.3.117:6] master.HMaster: Registered server found up in zk but who has not yet reported in: 0:0:0:0:0:0:0:0%0,60020,1388691892014 {noformat} The cluster is then unusable. I think a better solution is to track the heartbeats for the region servers and expire those that haven't checked-in for some time. The 0.89-fb branch has this concept, and they also use it to detect rack failures: https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224. In this jira's scope I would just add the heartbeat tracking and add a unit test for the wildcard address. What do you think [~rajesh23]? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10270) Remove DataBlockEncoding from BlockCacheKey
[ https://issues.apache.org/jira/browse/HBASE-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arjen Roodselaar updated HBASE-10270: - Status: Patch Available (was: Open) Remove DataBlockEncoding from BlockCacheKey --- Key: HBASE-10270 URL: https://issues.apache.org/jira/browse/HBASE-10270 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.89-fb Reporter: Arjen Roodselaar Assignee: Arjen Roodselaar Priority: Minor Fix For: 0.89-fb When a block is added to the BlockCache its DataBlockEncoding is stored on the BlockCacheKey. This block encoding is used in the calculation of the hashCode and as such matters when cache lookups are done. Because the keys differ for encoded and unencoded (data) blocks, there is a potential for caching them twice or missing the cache. This happens for example when using Scan preloading as AbstractScannerV2.readNextDataBlock() does a read without knowing the block type or the encoding. This patch removes the block encoding from the key and forces the caller of HFileReaderV2.readBlock() to specify the expected BlockType as well as the expected DataBlockEncoding when these matter. This allows for a decision on either of these at read time instead of cache time, puts responsibility where appropriate, fixes some cache misses when using the scan preloading (which does a read without knowing the type or encoding), allows for the BlockCacheKey to be re-used by the L2 BucketCache and sets us up for a future CompoundScannerV2 which can read both un-encoded and encoded data blocks. A gotcha here: ScannerV2 and EncodedScannerV2 expect BlockType.DATA and BlockType.ENCODED_DATA respectively and will throw when given a block of the wrong type. Adding the DataBlockEncoding on the cache key caused a cache miss if the block was cached with the wrong encoding, implicitly defining the BlockType and thus keeping this from happening. It is now the scanner's responsibility to specify both the expected type and encoding (which is more appropriate). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-9374: --- Fix Version/s: (was: 0.99.0) (was: 0.96.2) (was: 0.98.0) Status: Open (was: Patch Available) Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Assignee: Jimmy Xiang Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-9374: --- Assignee: (was: Jimmy Xiang) Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9374) Client requires write access to hbase.local.dir unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860828#comment-13860828 ] Jimmy Xiang commented on HBASE-9374: Other solutions I thought about are all hacks. Should we leave this as Won't Fix? Client requires write access to hbase.local.dir unnecessarily - Key: HBASE-9374 URL: https://issues.apache.org/jira/browse/HBASE-9374 Project: HBase Issue Type: Bug Components: Client, Protobufs Affects Versions: 0.95.2 Reporter: Nick Dimiduk Attachments: hbase-9374.patch Per this [thread|http://mail-archives.apache.org/mod_mbox/hbase-dev/201308.mbox/%3cCANZa=GuLO0jTLs1fF+5_NRDczO+M=ssqjeagveeicy8injb...@mail.gmail.com%3e] from the dev list. {quote} It appears that as of HBASE-1936, we now require that client applications have write access to hbase.local.dir. This is because ProtobufUtil instantiates a DyanamicClassLoader as part of static initialization. This classloader is used for instantiating Comparators, Filters, and Exceptions. {quote} Client applications do not need to use DynamicClassLoader and so should not require this write access. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
[ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-8912. -- Resolution: Fixed Assignee: Lars Hofhansl Hadoop Flags: Reviewed Committed this to 0.94 [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE -- Key: HBASE-8912 URL: https://issues.apache.org/jira/browse/HBASE-8912 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.16 Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt AM throws this exception which subsequently causes the master to abort: {code} java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} This exception trace is from the failing test TestMetaReaderEditor which is failing pretty frequently, but looking at the test code, I think this is not a test-only issue, but affects the main code path. https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/ -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9830) Backport HBASE-9605 to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860834#comment-13860834 ] Lars Hofhansl commented on HBASE-9830: -- It would be a slight behavioral change in 0.94. You need this in 0.94 [~tobe]? Backport HBASE-9605 to 0.94 --- Key: HBASE-9830 URL: https://issues.apache.org/jira/browse/HBASE-9830 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: chendihao Priority: Minor Fix For: 0.94.16 Attachments: HBASE-9830-0.94-v1.patch Backport HBASE-9605 which is about Allow AggregationClient to skip specifying column family for row count aggregate -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-8619) [HBase Client]: Improve client to retry if master is down when requesting getHTableDescriptor
[ https://issues.apache.org/jira/browse/HBASE-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-8619: - Fix Version/s: (was: 0.94.16) Unscheduling for now. [HBase Client]: Improve client to retry if master is down when requesting getHTableDescriptor - Key: HBASE-8619 URL: https://issues.apache.org/jira/browse/HBASE-8619 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.8 Reporter: Aleksandr Shulman Assignee: Elliott Clark Priority: Minor In our rolling upgrade testing, running ImportTsv failed in the job submission phase when the master was down. This was when the master was failing over to the backup master. In this case, a retry would have been helpful and made sure that the job would get submitted. A good solution would be to refresh the master information before placing the call to getHTableDescriptor. Command: {code} sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,c1,c2,c3 -Dimporttsv.bulk.output=/user/hbase/storeFiles2_2/import2_table1369439156 import2_table1369439156 /user/hbase/tsv2{code} Here is the stack trace: {code} 13/05/24 16:55:49 INFO compress.CodecPool: Got brand-new compressor [.deflate] 16:45:44 Exception in thread main java.lang.reflect.UndeclaredThrowableException 16:45:44 at $Proxy7.getHTableDescriptors(Unknown Source) 16:45:44 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHTableDescriptor(HConnectionManager.java:1861) 16:45:44 at org.apache.hadoop.hbase.client.HTable.getTableDescriptor(HTable.java:440) 16:45:44 at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureCompression(HFileOutputFormat.java:458) 16:45:44 at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:375) 16:45:44 at org.apache.hadoop.hbase.mapreduce.ImportTsv.createSubmittableJob(ImportTsv.java:280) 16:45:44 at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:424) 16:45:44 Caused by: java.io.IOException: Call to hbase-rolling-6.ent.cloudera.com/10.20.186.99:22001 failed on local exception: java.io.EOFException 16:45:44 at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1030) 16:45:44 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999) 16:45:44 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) 16:45:44 ... 7 more 16:45:44 Caused by: java.io.EOFException 16:45:44 at java.io.DataInputStream.readInt(DataInputStream.java:375) 16:45:44 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646) 16:45:44 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580){code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860839#comment-13860839 ] Lars Hofhansl commented on HBASE-10249: --- [~stack], I assume you want this in 0.96? :) Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860844#comment-13860844 ] Lars Hofhansl commented on HBASE-10249: --- Committed to 0.94. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10249) Intermittent TestReplicationSyncUpTool failure
[ https://issues.apache.org/jira/browse/HBASE-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860849#comment-13860849 ] stack commented on HBASE-10249: --- +1 for 0.96. Thanks Lars. Intermittent TestReplicationSyncUpTool failure -- Key: HBASE-10249 URL: https://issues.apache.org/jira/browse/HBASE-10249 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Demai Ni Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0 Attachments: HBASE-10249-trunk-v0.patch New issue to keep track of this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)