Re: Moving 2.0 forward
While I don't disagree that half finished features are undesirable, I'm not suggesting that as a strategy so much as we kick out stuff that just doesn't seem to be getting done. Pushing 2.0 out another three months is fine if there's a good chance this is realistic and we won't be having this discussion again then. Let me have a look at the doc and return with specific points for further discussion (if any). > On Jan 13, 2017, at 11:25 PM, Stack wrote: > >> On Sat, Dec 31, 2016 at 12:16 PM, Stephen Jiang >> wrote: >> Hello, Andrew, I was a helper on Matteo so that we can help each other >> while we are focusing on the new Assignment Manager work. Now he is not >> available (at least in the next few months). I have to be more focused on >> the new AM work; plus other work in my company; it would be too much for me >> to 2.0 RM alone. I am happy someone would help to take primary 2.0 RM role >> while I am still help to make this 2.0 release smooth. >> > > (I could help out Stephen. We could co-RM?) > >> For branch-2, I think it is too early to cut it, as we still have a lot of >> moving parts and on-going project that needs to be part of 2.0. For >> example, the mentioned new AM (and other projects, such as HBASE-14414, >> HBASE-15179, HBASE-14070, HBASE-14850, HBASE-16833, HBASE-15531, just name >> a few). Cutting branch now would add burden to complete those projects. >> > > Agree with Stephen. A bunch of stuff is half-baked so a '2.0.0' now would be > all loose ends and it'd make for a messy narrative. > > I started a doc listing state of 2.0.0: > https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit?usp=sharing > > In the doc I made an estimate of what the community considers core 2.0.0 > items based in part off old lists and after survey of current state of JIRA. > The doc is open for comment. Please chime in if I am off or if I am missing > something that should be included. I also make a rough estimate on state of > each core item. > > I intend to keep up this macro-view doc as we progress on 2.0.0 with > reflection where pertinent in JIRA . Suggest we branch only when code compete > on the core set most of which are complete or near-so. End-of-February should > be time enough (First 2.0.0 RC in at the start of May?). > > Thanks, > St.Ack > > >> thanks >> Stephen >> >> On Sat, Dec 31, 2016 at 10:54 AM, Andrew Purtell >> wrote: >> >> > Hi all, >> > >> > I've heard a rumor the co-RM situation with 2.0 may have changed. Can we >> > get an update from co-RMs Matteo and Steven on their availability and >> > interest in continuing in this role? >> > >> > To assist in moving 2.0 forward I intend to branch branch-2 from master >> > next week. Unless there is an objection I will take this action under >> > assumption of lazy consensus. Master branch will be renumbered to >> > 3.0.0-SNAPSHOT. Once we have a branch-2 I will immediately begin scale >> > tests and stabilization (via bug fixes or reverts of unfinished work) and >> > invite interested collaborators to do the same. >> > >> > >> > >
Re: Merge and HMerge
On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang wrote: > Revive this thread > > I am in the process of removing Region Server side merge (and split) > transaction code in master branch; as now we have merge (and split) > procedure(s) from master doing the same thing. > > Good (Issue?) > The Merge tool depends on RS-side merge code. I'd like to use this chance > to remove the util.Merge tool. This is for 2.0 and up releases only. > Deprecation does not work here; as keeping the RS-side merge code would > have duplicate logic in source code and make the new Assignment manager > code more complicated. > > Could util.Merge be changed to ask the Master run the merge (via AMv2)? If you remove the util.Merge tool, how then does an operator ask for a merge in its absence? Thanks Stephen S > Please let me know whether you have objection. > > Thanks > Stephen > > PS. I could deprecated HMerge code if anyone is really using it. It has > its own logic and standalone (supposed to dangerously work offline and > merge more than 2 regions - the util.Merge and shell not support these > functionality for now). > > On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar > wrote: > > > @Appy what is not clear from above? > > > > I think we should get rid of both Merge and HMerge. > > > > We should not have any tool which will work in offline mode by going over > > the HDFS data. Seems very brittle to be broken when things get changed. > > Only use case I can think of is that somehow you end up with a lot of > > regions and you cannot bring the cluster back up because of OOMs, etc and > > you have to reduce the number of regions in offline mode. However, we did > > not see this kind of thing in any of our customers for the last couple of > > years so far. > > > > I think we should seriously look into improving normalizer and enabling > > that by default for all the tables. Ideally, normalizer should be running > > much more frequently, and should be configured with higher-level goals > and > > heuristics. Like on average how many regions per node, etc and should be > > looking at the global state (like the balancer) to decide on split / > merge > > points. > > > > Enis > > > > On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma > > wrote: > > > > > bq. HMerge can merge multiple regions by going over the list of > > > regions and checking > > > their sizes. > > > bq. But both of these tools (Merge and HMerge) are very dangerous > > > > > > I came across HMerge and it looks like dead code. Isn't referenced from > > > anywhere except one test. (This is what lars also pointed out in the > > first > > > email too). > > > It would make perfect sense if it was a tool or was being referenced > from > > > somewhere, but with lack of either of that, am a bit confused here. > > > @Enis, you seem to know everything about them, please educate me. > > > Thanks > > > - Appy > > > > > > > > > > > > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar > > > wrote: > > > > > > > Merge has very limited usability singe it can do a single merge and > can > > > > only run when HBase is offline. > > > > HMerge can merge multiple regions by going over the list of regions > and > > > > checking their sizes. > > > > And of course we have the "supported" online merge which is the shell > > > > command. > > > > > > > > But both of these tools (Merge and HMerge) are very dangerous I > think. > > I > > > > would say we should deprecate both to be replaced by the online > merger > > > > tool. We should not allow offline merge at all. I fail to see the > > usecase > > > > that you have to use an offline merge. > > > > > > > > Enis > > > > > > > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George > > > > wrote: > > > > > > > > > Hey, > > > > > > > > > > Sorry to resurrect this old thread, but working on the book > update, I > > > > > came across the same today, i.e. we have Merge and HMerge. I tried > > and > > > > > Merge works fine now. It is also the only one of the two flagged as > > > > > being a tool. Should HMerge be removed? At least deprecated? > > > > > > > > > > Cheers, > > > > > Lars > > > > > > > > > > > > > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu > wrote: > > > > > >>> there is already an issue to do this but not revamp of these > > Merge > > > > > > classes > > > > > > I guess the issue is HBASE-1621 > > > > > > > > > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack wrote: > > > > > > > > > > > >> Yeah, can you file an issue Lars. This stuff is ancient and > needs > > > to > > > > > >> be redone AND redone so we can do merging while table is online > > > (there > > > > > >> is already an issue to do this but not revamp of these Merge > > > classes). > > > > > >> The unit tests for Merge are also all junit3 and do whacky > stuff > > to > > > > > >> put up multiple regions. This should be redone too (they are > > often > > > > > >> first thing broke when major change and putting them back > together > > > is > > > > > >> a headache since they do no
Re: Moving 2.0 forward
On Sat, Dec 31, 2016 at 12:16 PM, Stephen Jiang wrote: > Hello, Andrew, I was a helper on Matteo so that we can help each other > while we are focusing on the new Assignment Manager work. Now he is not > available (at least in the next few months). I have to be more focused on > the new AM work; plus other work in my company; it would be too much for me > to 2.0 RM alone. I am happy someone would help to take primary 2.0 RM role > while I am still help to make this 2.0 release smooth. > > (I could help out Stephen. We could co-RM?) > For branch-2, I think it is too early to cut it, as we still have a lot of > moving parts and on-going project that needs to be part of 2.0. For > example, the mentioned new AM (and other projects, such as HBASE-14414, > HBASE-15179, HBASE-14070, HBASE-14850, HBASE-16833, HBASE-15531, just name > a few). Cutting branch now would add burden to complete those projects. > > Agree with Stephen. A bunch of stuff is half-baked so a '2.0.0' now would be all loose ends and it'd make for a messy narrative. I started a doc listing state of 2.0.0: https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_ktczrlKHK8N4SZzs/edit?usp=sharing In the doc I made an estimate of what the community considers core 2.0.0 items based in part off old lists and after survey of current state of JIRA. The doc is open for comment. Please chime in if I am off or if I am missing something that should be included. I also make a rough estimate on state of each core item. I intend to keep up this macro-view doc as we progress on 2.0.0 with reflection where pertinent in JIRA . Suggest we branch only when code compete on the core set most of which are complete or near-so. End-of-February should be time enough (First 2.0.0 RC in at the start of May?). Thanks, St.Ack > thanks > Stephen > > On Sat, Dec 31, 2016 at 10:54 AM, Andrew Purtell > > wrote: > > > Hi all, > > > > I've heard a rumor the co-RM situation with 2.0 may have changed. Can we > > get an update from co-RMs Matteo and Steven on their availability and > > interest in continuing in this role? > > > > To assist in moving 2.0 forward I intend to branch branch-2 from master > > next week. Unless there is an objection I will take this action under > > assumption of lazy consensus. Master branch will be renumbered to > > 3.0.0-SNAPSHOT. Once we have a branch-2 I will immediately begin scale > > tests and stabilization (via bug fixes or reverts of unfinished work) and > > invite interested collaborators to do the same. > > > > > > >
[jira] [Created] (HBASE-17469) Properly handle empty TableName in TablePermission#readFields and #write
Ted Yu created HBASE-17469: -- Summary: Properly handle empty TableName in TablePermission#readFields and #write Key: HBASE-17469 URL: https://issues.apache.org/jira/browse/HBASE-17469 Project: HBase Issue Type: Bug Reporter: Ted Yu HBASE-17450 handles the empty table name in equals(). This JIRA is to properly handle empty TableName in TablePermission#readFields() and TablePermission#write() methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Merge and HMerge
Revive this thread I am in the process of removing Region Server side merge (and split) transaction code in master branch; as now we have merge (and split) procedure(s) from master doing the same thing. The Merge tool depends on RS-side merge code. I'd like to use this chance to remove the util.Merge tool. This is for 2.0 and up releases only. Deprecation does not work here; as keeping the RS-side merge code would have duplicate logic in source code and make the new Assignment manager code more complicated. Please let me know whether you have objection. Thanks Stephen PS. I could deprecated HMerge code if anyone is really using it. It has its own logic and standalone (supposed to dangerously work offline and merge more than 2 regions - the util.Merge and shell not support these functionality for now). On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar wrote: > @Appy what is not clear from above? > > I think we should get rid of both Merge and HMerge. > > We should not have any tool which will work in offline mode by going over > the HDFS data. Seems very brittle to be broken when things get changed. > Only use case I can think of is that somehow you end up with a lot of > regions and you cannot bring the cluster back up because of OOMs, etc and > you have to reduce the number of regions in offline mode. However, we did > not see this kind of thing in any of our customers for the last couple of > years so far. > > I think we should seriously look into improving normalizer and enabling > that by default for all the tables. Ideally, normalizer should be running > much more frequently, and should be configured with higher-level goals and > heuristics. Like on average how many regions per node, etc and should be > looking at the global state (like the balancer) to decide on split / merge > points. > > Enis > > On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma > wrote: > > > bq. HMerge can merge multiple regions by going over the list of > > regions and checking > > their sizes. > > bq. But both of these tools (Merge and HMerge) are very dangerous > > > > I came across HMerge and it looks like dead code. Isn't referenced from > > anywhere except one test. (This is what lars also pointed out in the > first > > email too). > > It would make perfect sense if it was a tool or was being referenced from > > somewhere, but with lack of either of that, am a bit confused here. > > @Enis, you seem to know everything about them, please educate me. > > Thanks > > - Appy > > > > > > > > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar > > wrote: > > > > > Merge has very limited usability singe it can do a single merge and can > > > only run when HBase is offline. > > > HMerge can merge multiple regions by going over the list of regions and > > > checking their sizes. > > > And of course we have the "supported" online merge which is the shell > > > command. > > > > > > But both of these tools (Merge and HMerge) are very dangerous I think. > I > > > would say we should deprecate both to be replaced by the online merger > > > tool. We should not allow offline merge at all. I fail to see the > usecase > > > that you have to use an offline merge. > > > > > > Enis > > > > > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George > > > wrote: > > > > > > > Hey, > > > > > > > > Sorry to resurrect this old thread, but working on the book update, I > > > > came across the same today, i.e. we have Merge and HMerge. I tried > and > > > > Merge works fine now. It is also the only one of the two flagged as > > > > being a tool. Should HMerge be removed? At least deprecated? > > > > > > > > Cheers, > > > > Lars > > > > > > > > > > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu wrote: > > > > >>> there is already an issue to do this but not revamp of these > Merge > > > > > classes > > > > > I guess the issue is HBASE-1621 > > > > > > > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack wrote: > > > > > > > > > >> Yeah, can you file an issue Lars. This stuff is ancient and needs > > to > > > > >> be redone AND redone so we can do merging while table is online > > (there > > > > >> is already an issue to do this but not revamp of these Merge > > classes). > > > > >> The unit tests for Merge are also all junit3 and do whacky stuff > to > > > > >> put up multiple regions. This should be redone too (they are > often > > > > >> first thing broke when major change and putting them back together > > is > > > > >> a headache since they do not follow the usual pattern). > > > > >> > > > > >> St.Ack > > > > >> > > > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George < > lars.geo...@gmail.com > > > > > > > >> wrote: > > > > >> > Hi Ted, > > > > >> > > > > > >> > The log is from an earlier attempt, I tried this a few times. > This > > > is > > > > all > > > > >> local, after rm'ing the /hbase. So the files are all pretty empty, > > but > > > > since > > > > >> I put data in I was assuming it should work. Once you gotten into > > this > > > > >> state, you al
[jira] [Created] (HBASE-17468) unread messages in TCP connections - possible connection leak
Shridhar Sahukar created HBASE-17468: Summary: unread messages in TCP connections - possible connection leak Key: HBASE-17468 URL: https://issues.apache.org/jira/browse/HBASE-17468 Project: HBase Issue Type: Bug Reporter: Shridhar Sahukar Priority: Critical We are running HBase 1.2.0-cdh5.7.1 (Cloudera distribution). On our Hadoop cluster, we are seeing that each HBase region server has large number of TCP connections to all the HDFS data nodes and all these connections have unread data in socket buffers. Some of these connections are also in CLOSE_WAIT or FIN_WAIT1 state while the rest are in ESTABLISHED state. Looks like HBase is creating some connections requesting data from HDFS, but its forgetting about those connections before it could read the data. Thus the connections are left lingering around with large data stuck in their receive buffers. Also, it seems HDFS closes these connections after a while, but since there is data in receive buffer the connection is left in CLOSE_WAIT/FIN_WAIT1 states. Below is a snapshot from one of the region servers: ## Total number of connections to HDFS (pid of region server is 143722) [bda@md-bdadev-42 hbase]$ sudo netstat -anp|grep 143722 | wc -l 827 ## Connections that are not in ESTABLISHED state [bda@md-bdadev-42 hbase]$ sudo netstat -anp|grep 143722 | grep -v ESTABLISHED | wc -l 344 ##Snapshot of some of these connections: tcp 133887 0 146.1.180.43:48533 146.1.180.40:50010 ESTABLISHED 143722/java tcp82934 0 146.1.180.43:59647 146.1.180.42:50010 ESTABLISHED 143722/java tcp0 0 146.1.180.43:50761 146.1.180.27:2181 ESTABLISHED 143722/java tcp 234084 0 146.1.180.43:58335 146.1.180.42:50010 ESTABLISHED 143722/java tcp 967667 0 146.1.180.43:56136 146.1.180.68:50010 ESTABLISHED 143722/java tcp 156037 0 146.1.180.43:59659 146.1.180.42:50010 ESTABLISHED 143722/java tcp 212488 0 146.1.180.43:56810 146.1.180.48:50010 ESTABLISHED 143722/java tcp61871 0 146.1.180.43:53593 146.1.180.35:50010 ESTABLISHED 143722/java tcp 121216 0 146.1.180.43:35324 146.1.180.38:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:32982 146.1.180.42:50010 CLOSE_WAIT 143722/java tcp82934 0 146.1.180.43:42359 146.1.180.54:50010 ESTABLISHED 143722/java tcp 159422 0 146.1.180.43:59731 146.1.180.42:50010 ESTABLISHED 143722/java tcp 134573 0 146.1.180.43:60210 146.1.180.76:50010 ESTABLISHED 143722/java tcp82934 0 146.1.180.43:59713 146.1.180.42:50010 ESTABLISHED 143722/java tcp 135765 0 146.1.180.43:44412 146.1.180.29:50010 ESTABLISHED 143722/java tcp 161655 0 146.1.180.43:43117 146.1.180.42:50010 ESTABLISHED 143722/java tcp75990 0 146.1.180.43:59729 146.1.180.42:50010 ESTABLISHED 143722/java tcp78583 0 146.1.180.43:59971 146.1.180.42:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:39893 146.1.180.67:50010 CLOSE_WAIT 143722/java tcp1 0 146.1.180.43:38834 146.1.180.47:50010 CLOSE_WAIT 143722/java tcp1 0 146.1.180.43:40707 146.1.180.50:50010 CLOSE_WAIT 143722/java tcp 106102 0 146.1.180.43:48208 146.1.180.75:50010 ESTABLISHED 143722/java tcp 332013 0 146.1.180.43:34795 146.1.180.37:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:57644 146.1.180.67:50010 CLOSE_WAIT 143722/java tcp79119 0 146.1.180.43:54438 146.1.180.70:50010 ESTABLISHED 143722/java tcp77438 0 146.1.180.43:35259 146.1.180.38:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:57579 146.1.180.41:50010 CLOSE_WAIT 143722/java tcp 318091 0 146.1.180.43:60124 146.1.180.42:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:51715 146.1.180.70:50010 CLOSE_WAIT 143722/java tcp 126519 0 146.1.180.43:36389 146.1.180.49:50010 ESTABLISHED 143722/java tcp1 0 146.1.180.43:45656 146.1.180.75:50010 CLOSE_WAIT 143722/java tcp 113720 0 146.1.180.43:59741 146.1.180.42:50010 ESTABLISHED 143722/java tcp74599 0 146.1.180.43:44192 146.1.180.60:50010 ESTABLISHED 143722/java tcp 131224 0 146.1.180.43:53708 146.1.180.44:50010 ESTABLISHED 143722/java tcp 1433915
Re: Region comapction failed
w.r.t. #2, I did a quick search for bloom related fixes. I found HBASE-13123 but it was in 1.0.2 Planning to spend more time in the next few days. On Fri, Jan 13, 2017 at 5:29 PM, Pankaj kr wrote: > Thanks Ted for replying. > > Actually issue happened in production environment and there are many > HFiles in that store (can't get the file). As we don't log the file name > which is corrupted, Is there anyway to get the corrupted file name? > > Block encoding is "NONE", table schema has bloom filter as "ROW", > compression type is "Snappy" and durability is SKIP_WAL. > > > Regards, > Pankaj > > > -Original Message- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: Friday, January 13, 2017 10:30 PM > To: dev@hbase.apache.org > Cc: u...@hbase.apache.org > Subject: Re: Region comapction failed > > In the second case, the error happened when writing hfile. Can you track > down the path of the new file so that further investigation can be done ? > > Does the table use any encoding ? > > Thanks > > > On Jan 13, 2017, at 2:47 AM, Pankaj kr wrote: > > > > Hi, > > > > We met a weird issue in our production environment. > > > > Region compaction is always failing with following errors, > > > > 1. > > 2017-01-10 02:19:10,427 | ERROR | regionserver/RS-HOST/RS-IP: > PORT-longCompactions-1483858654825 | Compaction failed Request = > regionName=., storeName=XYZ, fileCount=6, fileSize=100.7 M (3.2 M, 20.8 > M, 15.1 M, 20.9 M, 21.0 M, 19.7 M), priority=-5, time=1747414906352088 | > org.apache.hadoop.hbase.regionserver.CompactSplitThread$ > CompactionRunner.doCompaction(CompactSplitThread.java:562) > > java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a > column actually smaller than the previous column: XXX > >at org.apache.hadoop.hbase.regionserver. > ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker. > java:114) > >at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher. > match(ScanQueryMatcher.java:457) > >at org.apache.hadoop.hbase.regionserver.StoreScanner. > next(StoreScanner.java:551) > >at org.apache.hadoop.hbase.regionserver.compactions. > Compactor.performCompaction(Compactor.java:328) > >at org.apache.hadoop.hbase.regionserver.compactions. > DefaultCompactor.compact(DefaultCompactor.java:104) > >at org.apache.hadoop.hbase.regionserver. > DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine. > java:133) > >at org.apache.hadoop.hbase.regionserver.HStore.compact( > HStore.java:1243) > >at org.apache.hadoop.hbase.regionserver.HRegion.compact( > HRegion.java:1895) > >at org.apache.hadoop.hbase.regionserver. > CompactSplitThread$CompactionRunner.doCompaction( > CompactSplitThread.java:546) > >at org.apache.hadoop.hbase.regionserver. > CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) > >at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > >at java.util.concurrent.ThreadPoolExecuto > > > > 2. > > 2017-01-10 02:33:53,009 | ERROR | regionserver/RS-HOST/RS-IP: > PORT-longCompactions-1483686810953 | Compaction failed Request = > regionName=YY, storeName=ABC, fileCount=6, fileSize=125.3 M (20.9 M, > 20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M), priority=-68, > time=1748294500157323 | org.apache.hadoop.hbase.regionserver. > CompactSplitThread$CompactionRunner.doCompaction( > CompactSplitThread.java:562) > > java.io.IOException: Non-increasing Bloom keys: XX > after > >at org.apache.hadoop.hbase.regionserver.StoreFile$Writer. > appendGeneralBloomfilter(StoreFile.java:911) > >at org.apache.hadoop.hbase.regionserver.StoreFile$Writer. > append(StoreFile.java:947) > >at org.apache.hadoop.hbase.regionserver.compactions. > Compactor.performCompaction(Compactor.java:337) > >at org.apache.hadoop.hbase.regionserver.compactions. > DefaultCompactor.compact(DefaultCompactor.java:104) > >at org.apache.hadoop.hbase.regionserver. > DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine. > java:133) > >at org.apache.hadoop.hbase.regionserver.HStore.compact( > HStore.java:1243) > >at org.apache.hadoop.hbase.regionserver.HRegion.compact( > HRegion.java:1895) > >at org.apache.hadoop.hbase.regionserver. > CompactSplitThread$CompactionRunner.doCompaction( > CompactSplitThread.java:546) > >at org.apache.hadoop.hbase.regionserver. > CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) > >at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > >at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > >at java.lang.Thre
RE: Region comapction failed
Thanks Ted for replying. Actually issue happened in production environment and there are many HFiles in that store (can't get the file). As we don't log the file name which is corrupted, Is there anyway to get the corrupted file name? Block encoding is "NONE", table schema has bloom filter as "ROW", compression type is "Snappy" and durability is SKIP_WAL. Regards, Pankaj -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 13, 2017 10:30 PM To: dev@hbase.apache.org Cc: u...@hbase.apache.org Subject: Re: Region comapction failed In the second case, the error happened when writing hfile. Can you track down the path of the new file so that further investigation can be done ? Does the table use any encoding ? Thanks > On Jan 13, 2017, at 2:47 AM, Pankaj kr wrote: > > Hi, > > We met a weird issue in our production environment. > > Region compaction is always failing with following errors, > > 1. > 2017-01-10 02:19:10,427 | ERROR | > regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483858654825 | Compaction > failed Request = regionName=., storeName=XYZ, fileCount=6, fileSize=100.7 > M (3.2 M, 20.8 M, 15.1 M, 20.9 M, 21.0 M, 19.7 M), priority=-5, > time=1747414906352088 | > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) > java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column > actually smaller than the previous column: XXX >at > org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker.java:114) >at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:457) >at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:551) >at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:328) >at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) >at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) >at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >at java.util.concurrent.ThreadPoolExecuto > > 2. > 2017-01-10 02:33:53,009 | ERROR | > regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483686810953 | Compaction > failed Request = regionName=YY, storeName=ABC, fileCount=6, > fileSize=125.3 M (20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M), > priority=-68, time=1748294500157323 | > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) > java.io.IOException: Non-increasing Bloom keys: XX after > >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:911) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:947) >at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:337) >at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) >at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) >at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >at java.lang.Thread.run(Thread.java:745) > > HBase version : 1.0.2 > > We have verified all the HFiles in the store using HFilePrettyPrinter with > "k" (checkrow), all report is normal. Full scan is also successful. > We don't have the access to the actual data and may be customer wont agree to > share that . > > Have anyone f
[jira] [Created] (HBASE-17467) HBase Examples: C# DemoClient
Jeff Saremi created HBASE-17467: --- Summary: HBase Examples: C# DemoClient Key: HBASE-17467 URL: https://issues.apache.org/jira/browse/HBASE-17467 Project: HBase Issue Type: Task Components: Client Affects Versions: 1.1.8 Reporter: Jeff Saremi I am attaching DemoClient.cs which is taken from the C++ version of the same file along with the generated HBase Thrift files (0.9.3). Hoping that someone would be using them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17466) [C++] Speed up the tests a bit
Enis Soztutar created HBASE-17466: - Summary: [C++] Speed up the tests a bit Key: HBASE-17466 URL: https://issues.apache.org/jira/browse/HBASE-17466 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar The tests takes too long due to sleep and starting / stopping the cluster. We can do some speed up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17465) [C++] implement request retry mechanism over RPC
Xiaobing Zhou created HBASE-17465: - Summary: [C++] implement request retry mechanism over RPC Key: HBASE-17465 URL: https://issues.apache.org/jira/browse/HBASE-17465 Project: HBase Issue Type: Sub-task Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17464) Fix HBaseTestingUtility.getNewDataTestDirOnTestFS to always return a unique path
Zach York created HBASE-17464: - Summary: Fix HBaseTestingUtility.getNewDataTestDirOnTestFS to always return a unique path Key: HBASE-17464 URL: https://issues.apache.org/jira/browse/HBASE-17464 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0 Reporter: Zach York Assignee: Zach York Priority: Minor Currently, HBaseTestingUtility.getNewDataTestDirOnTestFS() returns a unique path only on non-local filesystems. This method should always return a unique directory. This bug fix is needed to accurately test HBASE-17437. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-17463) [C++] RpcClient should close the thread pool
[ https://issues.apache.org/jira/browse/HBASE-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-17463. --- Resolution: Fixed > [C++] RpcClient should close the thread pool > > > Key: HBASE-17463 > URL: https://issues.apache.org/jira/browse/HBASE-17463 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: HBASE-14850 > > Attachments: hbase-17463_v1.patch > > > RpcClient and connection pool should close their resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17463) [C++] RpcClient sholud close the thread pools
Enis Soztutar created HBASE-17463: - Summary: [C++] RpcClient sholud close the thread pools Key: HBASE-17463 URL: https://issues.apache.org/jira/browse/HBASE-17463 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: HBASE-14850 RpcClient and connection pool should close their resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17462) Investigate using sliding window for read/write request costs in StochasticLoadBalancer
Ted Yu created HBASE-17462: -- Summary: Investigate using sliding window for read/write request costs in StochasticLoadBalancer Key: HBASE-17462 URL: https://issues.apache.org/jira/browse/HBASE-17462 Project: HBase Issue Type: Improvement Reporter: Ted Yu In the thread, http://search-hadoop.com/m/HBase/YGbbyUZKXWALkX1, Timothy was asking whether the read/write request costs in StochasticLoadBalancer should be calculated as rates. This makes sense since read / write load on region server tends to fluctuate over time. Using sliding window would reflect more recent trend in read / write load. Some factors to consider: The data structure used by StochasticLoadBalancer should be concise. The number of regions in a cluster can be expected to approach 1 million. We cannot afford to store long history of read / write requests in master. Efficiency of cost calculation should be high - there're many cost functions the balancer goes through, it is expected for each cost function to return quickly. Otherwise we would not come up with proper region movement plan(s) in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Successful: HBase Generate Website
Build status: Successful If successful, the website and docs have been generated. To update the live site, follow the instructions below. If failed, skip to the bottom of this email. Use the following commands to download the patch and apply it to a clean branch based on origin/asf-site. If you prefer to keep the hbase-site repo around permanently, you can skip the clone step. git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git cd hbase-site wget -O- https://builds.apache.org/job/hbase_generate_website/460/artifact/website.patch.zip | funzip > 2f8ddf6fc5f904f0273b07469286e01aa02c7da5.patch git fetch git checkout -b asf-site-2f8ddf6fc5f904f0273b07469286e01aa02c7da5 origin/asf-site git am --whitespace=fix 2f8ddf6fc5f904f0273b07469286e01aa02c7da5.patch At this point, you can preview the changes by opening index.html or any of the other HTML pages in your local asf-site-2f8ddf6fc5f904f0273b07469286e01aa02c7da5 branch. There are lots of spurious changes, such as timestamps and CSS styles in tables, so a generic git diff is not very useful. To see a list of files that have been added, deleted, renamed, changed type, or are otherwise interesting, use the following command: git diff --name-status --diff-filter=ADCRTXUB origin/asf-site To see only files that had 100 or more lines changed: git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}' When you are satisfied, publish your changes to origin/asf-site using these commands: git commit --allow-empty -m "Empty commit" # to work around a current ASF INFRA bug git push origin asf-site-2f8ddf6fc5f904f0273b07469286e01aa02c7da5:asf-site git checkout asf-site git branch -D asf-site-2f8ddf6fc5f904f0273b07469286e01aa02c7da5 Changes take a couple of minutes to be propagated. You can verify whether they have been propagated by looking at the Last Published date at the bottom of http://hbase.apache.org/. It should match the date in the index.html on the asf-site branch in Git. As a courtesy- reply-all to this email to let other committers know you pushed the site. If failed, see https://builds.apache.org/job/hbase_generate_website/460/console
Re: Region comapction failed
In the second case, the error happened when writing hfile. Can you track down the path of the new file so that further investigation can be done ? Does the table use any encoding ? Thanks > On Jan 13, 2017, at 2:47 AM, Pankaj kr wrote: > > Hi, > > We met a weird issue in our production environment. > > Region compaction is always failing with following errors, > > 1. > 2017-01-10 02:19:10,427 | ERROR | > regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483858654825 | Compaction > failed Request = regionName=., storeName=XYZ, fileCount=6, fileSize=100.7 > M (3.2 M, 20.8 M, 15.1 M, 20.9 M, 21.0 M, 19.7 M), priority=-5, > time=1747414906352088 | > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) > java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column > actually smaller than the previous column: XXX >at > org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker.java:114) >at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:457) >at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:551) >at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:328) >at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) >at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) >at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >at java.util.concurrent.ThreadPoolExecuto > > 2. > 2017-01-10 02:33:53,009 | ERROR | > regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483686810953 | Compaction > failed Request = regionName=YY, storeName=ABC, fileCount=6, > fileSize=125.3 M (20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M), > priority=-68, time=1748294500157323 | > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) > java.io.IOException: Non-increasing Bloom keys: XX after > >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:911) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:947) >at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:337) >at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) >at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) >at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) >at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >at java.lang.Thread.run(Thread.java:745) > > HBase version : 1.0.2 > > We have verified all the HFiles in the store using HFilePrettyPrinter with > "k" (checkrow), all report is normal. Full scan is also successful. > We don't have the access to the actual data and may be customer wont agree to > share that . > > Have anyone faced this issue, any pointers will be much appreciated. > > Thanks & Regards, > Pankaj
[jira] [Created] (HBASE-17461) HBase shell *major_compact* command should properly convert *table_or_region_name* parameter to java byte array properly before simply calling *HBaseAdmin.majorCompact*
Wellington Chevreuil created HBASE-17461: Summary: HBase shell *major_compact* command should properly convert *table_or_region_name* parameter to java byte array properly before simply calling *HBaseAdmin.majorCompact* method Key: HBASE-17461 URL: https://issues.apache.org/jira/browse/HBASE-17461 Project: HBase Issue Type: Bug Components: shell Reporter: Wellington Chevreuil On HBase shell, *major_compact* command simply passes the received *table_or_region_name* parameter straight to java *HBaseAdmin.majorCompact* method. On some corner cases, HBase tables row keys may have special characters. Then, if a region is split in such a way that row keys with special characters are now part of the region name, calling *major_compact* on this regions will fail, if the special character ASCII code is higher than 127. This happens because Java byte type is signed, while ruby byte type isn't, causing the region name to be converted to a wrong string at Java side. For example, considering a region named as below: {noformat} test,\xF8\xB9B2!$\x9C\x0A\xFEG\xC0\xE3\x8B\x1B\xFF\x15,1481745228583.b4bc69356d89018bfad3ee106b717285. {noformat} Calling major_compat on it fails as follows: {noformat} hbase(main):008:0* major_compact "test,\xF8\xB9B2!$\x9C\x0A\xFEG\xC0\xE3\x8B\x1B\xFF\x15,1484177359169.8128fa75ae0cd4eba38da2667ac8ec98." ERROR: Illegal character code:44, <,> at 4. User-space table qualifiers can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9-.]: test,�B2!$� �G���1484177359169.8128fa75ae0cd4eba38da2667ac8ec98. {noformat} An easy solution is to convert *table_or_region_name* parameter properly, prior to calling *HBaseAdmin.majorCompact* in the same way as it's already done on some other shell commands, such as *get*: {noformat} admin.major_compact(table_or_region_name.to_s.to_java_bytes, family) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Region comapction failed
Hi, We met a weird issue in our production environment. Region compaction is always failing with following errors, 1. 2017-01-10 02:19:10,427 | ERROR | regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483858654825 | Compaction failed Request = regionName=., storeName=XYZ, fileCount=6, fileSize=100.7 M (3.2 M, 20.8 M, 15.1 M, 20.9 M, 21.0 M, 19.7 M), priority=-5, time=1747414906352088 | org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column actually smaller than the previous column: XXX at org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker.java:114) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:457) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:551) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:328) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecuto 2. 2017-01-10 02:33:53,009 | ERROR | regionserver/RS-HOST/RS-IP:PORT-longCompactions-1483686810953 | Compaction failed Request = regionName=YY, storeName=ABC, fileCount=6, fileSize=125.3 M (20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M, 20.9 M), priority=-68, time=1748294500157323 | org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:562) java.io.IOException: Non-increasing Bloom keys: XX after at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:911) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:947) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:337) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:104) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:133) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1243) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1895) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:546) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:583) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) HBase version : 1.0.2 We have verified all the HFiles in the store using HFilePrettyPrinter with "k" (checkrow), all report is normal. Full scan is also successful. We don't have the access to the actual data and may be customer wont agree to share that . Have anyone faced this issue, any pointers will be much appreciated. Thanks & Regards, Pankaj
[jira] [Created] (HBASE-17460) enable_table_replication can not perform cyclic replication of a table
NITIN VERMA created HBASE-17460: --- Summary: enable_table_replication can not perform cyclic replication of a table Key: HBASE-17460 URL: https://issues.apache.org/jira/browse/HBASE-17460 Project: HBase Issue Type: Bug Components: Replication Reporter: NITIN VERMA The enable_table_replication operation is broken for cyclic replication of HBase table as we compare all the properties of column families (including REPLICATION_SCOPE). Below is exactly what happens: 1. Running "enable_table_replication 'table1' " opeartion on first cluster will set the REPLICATION_SCOPE of all column families to peer id '1'. This will also create a table on second cluster where REPLICATION_SCOPE is still set to peer id '0'. 2. Now when we run "enable_table_replication 'table1'" on second cluster, we compare all the properties of table (including REPLICATION_SCOPE_, which obviously is different now. I am proposing a fix for this issue where we should avoid comparing REPLICATION_SCOPE inside HColumnDescriotor::compareTo() method, especially when replication is not already enabled on the desired table. I have made that change and it is working. I will submit the patch soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)