[ https://issues.apache.org/jira/browse/HBASE-26567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-26567: ----------------------------- Description: For {{CellChunkMap}}, cells in data chunks in is indexed by index chunks by ChunkID ,not by strong reference, so in order to prevent the data chunks in {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks in {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on the data chunk is pooled or the index type of {{CompactingMemStore}} is {{IndexType.CHUNK_MAP}}. So when we call {{ChunkCreator.getChunk}}, it is very important to specify the IndexType. {{ChunkCreator}} has following package-visible method, which does not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}. {code:java} Chunk getChunk(ChunkType chunkType) { return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType); } {code} This method is used by following line 363 in {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by {{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into {{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}. {code:java} 359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) { 360 switch (chunkType) { 361 case INDEX_CHUNK: 362 case DATA_CHUNK: 363 Chunk c = this.chunkCreator.getChunk(chunkType); 364 chunks.add(c.getId()); 365 return c; 366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size 367 default: 368 return null; 369 } 370 } {code} Even though fortunately this IndexType mismatch does not cause bug now because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than chunkSize,but I think it is dangerous to use this {{ChunkCreator.getChunk(ChunkType)}} method if we want to refactor or add new functionalities in the future. We would better to specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}. My fix is as follows: 1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}}, {{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let {{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to determine the {{IndexType}} explicitly. 2. Unify the determination of which {{IndexType}} to use into {{CompactingMemStore}}. was: For {{CellChunkMap}}, cells in data chunks in is indexed by index chunks by ChunkID ,not by strong reference, so in order to prevent the data chunks in {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks in {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on the data chunk is pooled or the index type of {{CompactingMemStore}} is {{IndexType.CHUNK_MAP}}. So when we call {{ChunkCreator.getChunk}}, it is very important to specify the IndexType. {{ChunkCreator}} has following package-visible method, which does not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}. {code:java} Chunk getChunk(ChunkType chunkType) { return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType); } {code} This method is used by following line 363 in {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by {{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into {{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}. {code:java} 359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) { 360 switch (chunkType) { 361 case INDEX_CHUNK: 362 case DATA_CHUNK: 363 Chunk c = this.chunkCreator.getChunk(chunkType); 364 chunks.add(c.getId()); 365 return c; 366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size 367 default: 368 return null; 369 } 370 } {code} Even though fortunately this IndexType mismatch does not cause bug now because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than chunkSize,but I think it is dangerous to use this {{ChunkCreator.getChunk(ChunkType chunkType)}} method if we want to refactor or add new functionalities in the future. We would better to specify the {{IndexType}} explicitly when we use {{ChunkCreator.getChunk}}. My fix is as follows: 1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}}, {{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let {{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to determine the {{IndexType}} explicitly. 2. Unify the determination of which {{IndexType}} to use into {{CompactingMemStore}}. > Remove high-risk ChunkCreator.getChunk(ChunkType) method > -------------------------------------------------------- > > Key: HBASE-26567 > URL: https://issues.apache.org/jira/browse/HBASE-26567 > Project: HBase > Issue Type: Improvement > Components: in-memory-compaction > Affects Versions: 3.0.0-alpha-1, 2.4.8 > Reporter: chenglei > Priority: Major > > For {{CellChunkMap}}, cells in data chunks in is indexed by index chunks by > ChunkID ,not by strong reference, so in order to prevent the data chunks in > {{CellChunkMap}} from GC by JVM, HBASE-18375 put these data chunks in > {{ChunkCreator.chunkIdMap}} which is a the strong map. Whether or not the > {{ChunkCreator}} put the data chunk in {{ChunkCreator.chunkIdMap}} depends on > the data chunk is pooled or the index type of {{CompactingMemStore}} is > {{IndexType.CHUNK_MAP}}. So when we > call {{ChunkCreator.getChunk}}, it is very important to specify the > IndexType. {{ChunkCreator}} has following package-visible method, which does > not require the {{IndexType}} parameter and use {{IndexType.ARRAY_MAP}}. > {code:java} > Chunk getChunk(ChunkType chunkType) { > return getChunk(CompactingMemStore.IndexType.ARRAY_MAP, chunkType); > } > {code} > This method is used by following line 363 in > {{MemStoreLABImpl.getNewExternalChunk}} now, which is used by > {{CellChunkImmutableSegment.copyCellIntoMSLAB}} to force copy a big cell into > {{MemStoreLAB}},so the index type obviously should be {{IndexType.CHUNK_MAP}}. > {code:java} > 359 public Chunk getNewExternalChunk(ChunkCreator.ChunkType chunkType) { > 360 switch (chunkType) { > 361 case INDEX_CHUNK: > 362 case DATA_CHUNK: > 363 Chunk c = this.chunkCreator.getChunk(chunkType); > 364 chunks.add(c.getId()); > 365 return c; > 366 case JUMBO_CHUNK: // a jumbo chunk doesn't have a fixed size > 367 default: > 368 return null; > 369 } > 370 } > {code} > Even though fortunately this IndexType mismatch does not cause bug now > because {{MemStoreLABImpl.getNewExternalChunk}} in fact is not called by > {{CellChunkImmutableSegment.copyCellIntoMSLAB}} when the cell is bigger than > chunkSize,but I think it is dangerous to use this > {{ChunkCreator.getChunk(ChunkType)}} method if we want to refactor or add new > functionalities in the future. We would better to specify the {{IndexType}} > explicitly when we use {{ChunkCreator.getChunk}}. > My fix is as follows: > 1. Remove the error-prone {{ChunkCreator.getChunk(ChunkType chunkType)}}, > {{ChunkCreator}} does not determine the {{IndexType}} itself at all, and let > {{MemStoreLAB}} or {{CompactingMemStore}} who invokes the {{ChunkCreator}} to > determine the {{IndexType}} explicitly. > 2. Unify the determination of which {{IndexType}} to use into > {{CompactingMemStore}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)