RE: 答复: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state.
Hi, Yes Ming that’s an excellent point. Though I didn’t mention it, my first attempt at recovery centered on trying to verify the hbase:meta table was okay using the HBase OfflineMetaRepair utility. Even after that tool said the table was fine, I still tried another restart because the obvious symptom leads you to believe it is the file that is causing the problem. It is very unusual to get into this situation but when you do, you have a tendency to overreact because HBase was working fine and after the restart no regions can be accessed. So it’s important to examine all of the log files looking for the root cause of the problem. The Master log file gave one view, but the Region Server’s log file made it very obvious what had to be resolved. Thanks, Dennis From: Amanda Moran [mailto:amanda.mo...@esgyn.com] Sent: Wednesday, March 09, 2016 12:17 PM To: user@trafodion.incubator.apache.org Subject: Re: 答复: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state. HI there All- I have made a jira for the installer, based on this issue. https://issues.apache.org/jira/browse/TRAFODION-1884 Thanks! On Wed, Mar 9, 2016 at 8:41 AM, Liu, Ming (Ming) mailto:ming@esgyn.cn> > wrote: Thanks Denies to share this. We saw this issue during an expansion of Trafodion from 4 nodes to 5 nodes, since newly add node is empty, META region should not be there, so it does no harm. But the problem is similar, the newly added RS cannot work until we update Trafodion into that RS node. There are two related JIRAs: TRAFODION-1729 and TRAFODION-1730. we are working on them to solve the issue. Since Trafodion currently modify the HBase server's hbase-site.xml to add coprocessor, it affect *ALL* regions in the hbase, including META region. This is no need and not good. META region definitely no need to load Trafodion coprocessors. It is system region, Trafodion never need to access it directly, and once its open fail, the whole hbase system cannot work. So with that JIRA fully addressed, we can remove hbase-site.xml modification from Trafodion installer, and no need to restart HBase. And as a proper installation, Trafodion should be installed on all RS node, so coprocessor jar files should be copied to all RS nodes. If Trafodion is not installed on all RS node, there may still be issues, I assume Installer still need to consider this. A better approach is to save coprocessor jar file on HDFS, but that is just a theory, need to study further. Thanks, Ming -邮件原件- 发件人: D. Markt [mailto:dmarkt7...@gmail.com <mailto:dmarkt7...@gmail.com> ] 发送时间: 2016年3月9日 15:23 收件人: user@trafodion.incubator.apache.org <mailto:user@trafodion.incubator.apache.org> 主题: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state. Hi, I ran into this situation during a recent installation and thought it might be useful if others were to hit a similar situation in the future. This isn't the only way to recover from the situation but it is one option and was proven to work as expected. Regards, Dennis During a recent Trafodion cluster install the daily build was broken in such a way that much of the installation proceeded, but the Trafodion files were not copied to each node. This system was using CDH but I assume the following would happen for HDP as well. After HBase was restarted as part of the installation I noticed the HBase icon was red. I know this will likely not look the best in plain text, but the hbase:meta showed (in a red box): Region State RIT time (ms) 1588230740 hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07 07:19:00 UTC 2016 (1289s ago), server=perf-sles-2.novalocal,60020,14573351205071289706 Looking at the Region Server's log file that was assigned the hbase:meta table there was this output: 2016-03-07 16:45:27,243 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Open hbase:meta,,1.1588230740 2016-03-07 16:45:27,249 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740, starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op enRegionHandler.java:356) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
Re: 答复: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state.
HI there All- I have made a jira for the installer, based on this issue. https://issues.apache.org/jira/browse/TRAFODION-1884 Thanks! On Wed, Mar 9, 2016 at 8:41 AM, Liu, Ming (Ming) wrote: > Thanks Denies to share this. We saw this issue during an expansion of > Trafodion from 4 nodes to 5 nodes, since newly add node is empty, META > region should not be there, so it does no harm. But the problem is similar, > the newly added RS cannot work until we update Trafodion into that RS node. > > There are two related JIRAs: TRAFODION-1729 and TRAFODION-1730. > we are working on them to solve the issue. Since Trafodion currently > modify the HBase server's hbase-site.xml to add coprocessor, it affect > *ALL* regions in the hbase, including META region. This is no need and not > good. META region definitely no need to load Trafodion coprocessors. It is > system region, Trafodion never need to access it directly, and once its > open fail, the whole hbase system cannot work. > So with that JIRA fully addressed, we can remove hbase-site.xml > modification from Trafodion installer, and no need to restart HBase. And as > a proper installation, Trafodion should be installed on all RS node, so > coprocessor jar files should be copied to all RS nodes. If Trafodion is not > installed on all RS node, there may still be issues, I assume Installer > still need to consider this. A better approach is to save coprocessor jar > file on HDFS, but that is just a theory, need to study further. > > Thanks, > Ming > > -邮件原件- > 发件人: D. Markt [mailto:dmarkt7...@gmail.com] > 发送时间: 2016年3月9日 15:23 > 收件人: user@trafodion.incubator.apache.org > 主题: A failed Trafodion installation can lead to the hbase:meta table > staying in the FAILED_OPEN state. > > Hi, > > I ran into this situation during a recent installation and thought it > might be useful if others were to hit a similar situation in the future. > This isn't the only way to recover from the situation but it is one option > and was proven to work as expected. > > Regards, > Dennis > > During a recent Trafodion cluster install the daily build was broken in > such a way that much of the installation proceeded, but the Trafodion files > were not copied to each node. This system was using CDH but I assume the > following would happen for HDP as well. After HBase was restarted as part > of the installation I noticed the HBase icon was red. I know this will > likely not look the best in plain text, but the hbase:meta showed (in a red > box): > > Region State RIT time (ms) > 1588230740 hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07 > 07:19:00 UTC 2016 (1289s ago), > server=perf-sles-2.novalocal,60020,14573351205071289706 > > Looking at the Region Server's log file that was assigned the hbase:meta > table there was this output: > > 2016-03-07 16:45:27,243 INFO > org.apache.hadoop.hbase.regionserver.RSRpcServices: Open > hbase:meta,,1.1588230740 > 2016-03-07 16:45:27,249 ERROR > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed > open of region=hbase:meta,,1.1588230740, starting to roll back the global > memstore size. > java.lang.IllegalStateException: Could not instantiate a region instance. > at > org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672) > at > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op > enRegionHandler.java:356) > at > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR > egionHandler.java:126) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 > 45) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 > 15) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class > org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion > not found > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) > at > org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5475) > ... 10 more > Caused by: java.lang.ClassNotFoundException: Class > org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not > found > at > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110) > ... 11 more > 2016-03-07 16:45:27,250
答复: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state.
Thanks Denies to share this. We saw this issue during an expansion of Trafodion from 4 nodes to 5 nodes, since newly add node is empty, META region should not be there, so it does no harm. But the problem is similar, the newly added RS cannot work until we update Trafodion into that RS node. There are two related JIRAs: TRAFODION-1729 and TRAFODION-1730. we are working on them to solve the issue. Since Trafodion currently modify the HBase server's hbase-site.xml to add coprocessor, it affect *ALL* regions in the hbase, including META region. This is no need and not good. META region definitely no need to load Trafodion coprocessors. It is system region, Trafodion never need to access it directly, and once its open fail, the whole hbase system cannot work. So with that JIRA fully addressed, we can remove hbase-site.xml modification from Trafodion installer, and no need to restart HBase. And as a proper installation, Trafodion should be installed on all RS node, so coprocessor jar files should be copied to all RS nodes. If Trafodion is not installed on all RS node, there may still be issues, I assume Installer still need to consider this. A better approach is to save coprocessor jar file on HDFS, but that is just a theory, need to study further. Thanks, Ming -邮件原件- 发件人: D. Markt [mailto:dmarkt7...@gmail.com] 发送时间: 2016年3月9日 15:23 收件人: user@trafodion.incubator.apache.org 主题: A failed Trafodion installation can lead to the hbase:meta table staying in the FAILED_OPEN state. Hi, I ran into this situation during a recent installation and thought it might be useful if others were to hit a similar situation in the future. This isn't the only way to recover from the situation but it is one option and was proven to work as expected. Regards, Dennis During a recent Trafodion cluster install the daily build was broken in such a way that much of the installation proceeded, but the Trafodion files were not copied to each node. This system was using CDH but I assume the following would happen for HDP as well. After HBase was restarted as part of the installation I noticed the HBase icon was red. I know this will likely not look the best in plain text, but the hbase:meta showed (in a red box): Region State RIT time (ms) 1588230740 hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07 07:19:00 UTC 2016 (1289s ago), server=perf-sles-2.novalocal,60020,14573351205071289706 Looking at the Region Server's log file that was assigned the hbase:meta table there was this output: 2016-03-07 16:45:27,243 INFO org.apache.hadoop.hbase.regionserver.RSRpcServices: Open hbase:meta,,1.1588230740 2016-03-07 16:45:27,249 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=hbase:meta,,1.1588230740, starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op enRegionHandler.java:356) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR egionHandler.java:126) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 45) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 15) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5475) ... 10 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110) ... 11 more 2016-03-07 16:45:27,250 INFO org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 115 After consulting with our installer expert, the issue was in fact that the needed files had not been copied to each node. At that po