[ https://issues.apache.org/jira/browse/HBASE-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allan Yang updated HBASE-21751: ------------------------------- Summary: WAL creation fails during region open may cause region assign forever fail (was: WAL create fails during region open may cause region assign forever fail) > WAL creation fails during region open may cause region assign forever fail > -------------------------------------------------------------------------- > > Key: HBASE-21751 > URL: https://issues.apache.org/jira/browse/HBASE-21751 > Project: HBase > Issue Type: Bug > Affects Versions: 2.1.2, 2.0.4 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Fix For: 2.2.0, 2.1.3, 2.0.5 > > Attachments: HBASE-21751.patch, HBASE-21751v2.patch > > > During the first region opens on the RS, WALFactory will create a WAL file, > but if the wal creation fails, in some cases, HDFS will leave a empty file in > the dir(e.g. disk full, file is created succesfully but block allocation > fails). We have a check in AbstractFSWAL that if WAL belong to the same > factory exists, then a error will be throw. Thus, the region can never be > open on this RS later. > {code:java} > 2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] > handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740 > java.io.IOException: Target WAL already exists within directory > hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888 > at > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.<init>(AbstractFSWAL.java:382) > at > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<init>(AsyncFSWAL.java:210) > at > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72) > at > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47) > at > org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138) > at > org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57) > at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)