[ https://issues.apache.org/jira/browse/PHOENIX-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032682#comment-16032682 ]
Ankit Singhal edited comment on PHOENIX-3797 at 6/1/17 9:27 AM: ---------------------------------------------------------------- bq. Actually... Can the same thing happen with just a regular merge operation from HBaseAdmin? No, this will not happen with the regular merge operation from HBaseAdmin as there will be reference files created with split row as start key of 2nd region, so it is easy to detect start key of the store file which can be used to parse the rows of daughter region during the scan and re-write the complete data with new start key during compaction using LocalIndexStoreFileScanner (IndexHalfStoreFileReader) bq. Here's yet another idea: Can we hook a scanner right above the HFiles? That scanner would rewrite the keys based on the new region startkey. So now the store scanner for the index would do the right thing (merge sort between the values from the HFile scanners). yes, If we can identify the start key of the 2nd region, we can make a use of LocalIndexStoreFileScanner. But with new region start key , we can't parse the local index data from store files of second region. bq. So that(v2 approach) can work. For large regions that would lead to a lot of HFiles, though (for a 10g region with 256mb flush size it would lead to 40 files after the major compaction). Yes, but I think it will be same if we need to build the local index for the region again from client or something. Another problem of doing repair during compaction only, is that the data will be inconsistent for the queries until we find and fix them during compaction was (Author: an...@apache.org): bq. Actually... Can the same thing happen with just a regular merge operation from HBaseAdmin? No, this will not happen with the regular merge operation from HBaseAdmin as there will be reference files created with split row as start key of 2nd region, so it is easy to detect start key of the store file which can be used to parse the rows of daughter region during the scan and re-write the complete data with new start key during compaction using LocalIndexStoreFileScanner (IndexHalfStoreFileReader) bq. Here's yet another idea: Can we hook a scanner right above the HFiles? That scanner would rewrite the keys based on the new region startkey. So now the store scanner for the index would do the right thing (merge sort between the values from the HFile scanners). yes, If we can identify the start key of the 2nd region, we can make a use of LocalIndexStoreFileScanner. But with new region start key , we can't parse the local index data from store files of second region. > Local Index - Compaction fails on table with local index due to > non-increasing bloom keys > ----------------------------------------------------------------------------------------- > > Key: PHOENIX-3797 > URL: https://issues.apache.org/jira/browse/PHOENIX-3797 > Project: Phoenix > Issue Type: Bug > Environment: Head of 4.x-HBase-0.98 with PHOENIX-3796 patch applied. > HBase 0.98.23-hadoop2 > Reporter: Mujtaba Chohan > Assignee: Ankit Singhal > Priority: Blocker > Fix For: 4.11.0 > > Attachments: PHOENIX-3797.patch, PHOENIX-3797_v2.patch > > > Compaction fails on table with local index. > {noformat} > 2017-04-19 16:37:56,521 ERROR > [RS:0;host:59455-smallCompactions-1492644947594] > regionserver.CompactSplitThread: Compaction failed Request = > regionName=FHA,00Dxx0000001gES005001xx000003DGPd,1492644985470.92ec6436984981cdc8ef02388005a957., > storeName=L#0, fileCount=3, fileSize=44.4 M (23.0 M, 10.7 M, 10.8 M), > priority=7, time=7442973347247614 > java.io.IOException: Non-increasing Bloom keys: > 00Dxx0000001gES005001xx000003DGPd\x00\x00\x80\x00\x01H+&\xA1(00Dxx0000001gER001001xx000003DGPb01739544DCtf > after > 00Dxx0000001gES005001xx000003DGPd\x00\x00\x80\x00\x01I+\xF4\x9Ax00Dxx0000001gER001001xx000003DGPa017115434KTM > > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:960) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:996) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:428) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:276) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:64) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:121) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1154) > at > org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1559) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:502) > at > org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:540) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)