[ https://issues.apache.org/jira/browse/PHOENIX-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved PHOENIX-7039. ----------------------------------- Resolution: Fixed > Snapshot scanner should skip replay WAL and update seqid while opening region > ----------------------------------------------------------------------------- > > Key: PHOENIX-7039 > URL: https://issues.apache.org/jira/browse/PHOENIX-7039 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.1.3 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 5.2.0, 5.1.4 > > > When PhoenixRecordReader needs to iterate the records from the snapshot > restored table, it uses TableSnapshotResultIterator to retrieve the snapshot > manifest and the corresponding region manifests from the snapshot. > TableSnapshotResultIterator#next initializes ScanningResultIterator using > SnapshotScanner, which in turn opens the given region to perform scan. > However, this region is opened by a client and not any regionserver and hence > if the original region was split or merged, the current region would be > holding reference to parent regions in the hbase archive dir. If the region > is already removed from meta as well as file system (hbase data dir) after > the successful split/merge operations, region initialization by client still > leads to the creation of new seqid file in the region's data dir (on WAL > filesystem). While the region data is read from the archive dir, due to the > region dir creation in hbase data dir, we get a new orphan region with only > .seqid file and no store file. At the same time, hbase archive dir still > contains the old region dir with reference to parent region. > > 1. Snapshot creation: > {code:java} > 2023-09-13 01:01:50,103 DEBUG [557)-snapshot-pool-2] > snapshot.SnapshotManifest - Storing > 'TABLE1,00DAG00000005sXa07\x80\x00\x01\x87p/\xB38a07AG000017Kx7Z017AG00002j9Jxe,1684558830177.b5d1b622ef045b52aede650db8690d53.' > region-info for snapshot=SNAPSHOT_TABLE1_1694566851085_1694566876390_0 > {code} > 2. Region getting archived after merge: > {code:java} > 2023-09-13 02:46:58,177 DEBUG [gionserver-4:60020-8] backup.HFileArchiver - > Archived from FileableStoreFile, > hdfs://cluster1/hbase/data/default/TABLE1/53161e6b59b7a2dcdb85b26e676fd72a/0/aa5058a23c024463bb33bbb2abc68577.b5d1b622ef045b52aede650db8690d53 > > to > hdfs://cluster1/hbase/archive/data/default/TABLE1/53161e6b59b7a2dcdb85b26e676fd72a/0/aa5058a23c024463bb33bbb2abc68577.b5d1b622ef045b52aede650db8690d53 > > {code} > 3. Region is deleted from meta and file system: > {code:java} > 2023-09-13 02:50:26,054 DEBUG [PEWorker-53] backup.HFileArchiver - Deleted > hdfs://cluster1/hbase/data/default/TABLE1/b5d1b622ef045b52aede650db8690d53 > 2023-09-13 02:50:26,123 INFO [PEWorker-53] hbase.MetaTableAccessor - Deleted > TABLE1,00DAG00000005sXa07\x80\x00\x01\x87p/\xB38a07AG000017Kx7Z017AG00002j9Jxe,1684558830177.b5d1b622ef045b52aede650db8690d53. > 2023-09-13 02:50:26,340 INFO [PEWorker-58] procedure2.ProcedureExecutor - > Finished pid=1006984, state=SUCCESS; GCMultipleMergedRegionsProcedure > child=53161e6b59b7a2dcdb85b26e676fd72a, > parents:[b5d1b622ef045b52aede650db8690d53], > [cbf697faee6a0c3eaf8c17e1bf12239a] in 434 msec > 2023-09-13 02:50:26,269 INFO [PEWorker-58] hbase.MetaTableAccessor - Deleted > merge references in > TABLE1,00DAG00000005sXa07\x80\x00\x01\x87p/\xB38a07AG000017Kx7Z017AG00002j9Jxe,1685345080046.53161e6b59b7a2dcdb85b26e676fd72a., > deleted qualifiers merge0000, merge0001 > {code} > 4. Snapshot scanner region init > {code:java} > 2023-09-13 04:06:27,637 INFO [main] > org.apache.phoenix.iterate.SnapshotScanner: > Creating SnapshotScanner for region: > {ENCODED => b5d1b622ef045b52aede650db8690d53, NAME => > 'TABLE1,00DAG00000005sXa07\x80\x00\x01\x87p/\xB38a07AG000017Kx7Z017AG00002j9Jxe,1684558830177.b5d1b622ef045b52aede650db8690d53.', > STARTKEY => > '00DAG00000005sXa07\x80\x00\x01\x87p/\xB38a07AG000017Kx7Z017AG00002j9Jxe', > ENDKEY => > '00DAG00000005sXa07\x80\x00\x01\x87\x80\x02P@a07AG0000183cN3017AG00002lPrRe'} > {code} > 5. Region dir with seqid gets created > {code:java} > 2023-09-13 04:06:28,431 INFO [on default port 9000] hdfs.StateChange - DIR* > completeFile: > /hbase/data/default/TABLE1/b5d1b622ef045b52aede650db8690d53/recovered.edits/17042749.seqid > is closed by DFSClient_attempt_1692995189831_25389_m_000797_0_-1558517803_1 > {code} > 6. Remaining region init with store init completion: > {code:java} > 2023-09-13 04:06:28,354 INFO [StoreOpener-b5d1b622ef045b52aede650db8690d53-1] > org.apache.hadoop.hbase.regionserver.HStore: > Store=b5d1b622ef045b52aede650db8690d53/0, memstore type=DefaultMemStore, > storagePolicy=HOT, verifyBulkLoads=false, parallelPutCountPrintThreshold=50, > encoding=FAST_DIFF, compression=NONE > 2023-09-13 04:06:28,439 INFO [main] > org.apache.hadoop.hbase.regionserver.HRegion: > Opened b5d1b622ef045b52aede650db8690d53; > next sequenceid=17042750; > SteppingSplitPolicysuper{IncreasingToUpperBoundRegionSplitPolicy{initialSize=536870912, > ConstantSizeRegionSplitPolicy{desiredMaxFileSize=11007665920, > jitterRate=0.025168776512145996}}}, > FlushLargeStoresPolicy{flushSizeLowerBound=-1} > {code} > While opening region from the client side, we should provide flag to ensure > the seqid file is not generated as per HBASE-21977. -- This message was sent by Atlassian Jira (v8.20.10#820010)