dkranchii opened a new issue, #18404:
URL: https://github.com/apache/pinot/issues/18404

   In `FilePerIndexDirectory`, `removeIndex()` drops the entry from the
   buffer cache without ever closing the underlying `PinotDataBuffer`.
   The class itself even has a `// TODO` flagging the leak, so opening
   an issue to track it.
   
   ### Where
   
`pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/store/FilePerIndexDirectory.java:100-111`
   ```java
   @Override
   public void removeIndex(String columnName, IndexType<?, ?, ?> indexType) {
     // TODO: this leaks the removed data buffer (it's not going to be freed in 
close() method)
     _indexBuffers.remove(new IndexKey(columnName, indexType));
     if (indexType == StandardIndexes.text()) {
       TextIndexUtils.cleanupTextIndex(_segmentDirectory, columnName);
     } else if (indexType == StandardIndexes.vector()) {
       VectorIndexUtils.cleanupVectorIndex(_segmentDirectory, columnName);
     } else {
       getFilesFor(columnName, indexType).forEach(FileUtils::deleteQuietly);
     }
   }
   ```
   close() later iterates _indexBuffers.values() and closes whatever is still 
in the map — but the entry we just removed is no longer in there, so it's never 
closed.
   
   ### Why this is a bug
   `PinotDataBuffer` is usually mmap-backed for offline segments. Removing
   it from the cache without calling `close()` means the mmap region
   stays in the JVM's address space until the process exits (or the
   `Cleaner` happens to fire not something to rely on for deterministic
   cleanup). The on-disk file gets deleted on line 109, but the kernel
   mapping leaks.
   
   `removeIndex()` is called whenever Pinot decides an index isn't
   needed anymore segment reloads after index config changes, minion
   segment conversions, the various `*IndexHandler` drop/replace flows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to