itschrispeck opened a new pull request, #13776:
URL: https://github.com/apache/pinot/pull/13776

   Enhances existing `noRawDataForTextIndex` config to skip writing raw data 
when re-using the mutable index is enabled. This PR also contains a fix for 
configs added in [this PR](https://github.com/apache/pinot/pull/13577), which 
didn't take effect but instead globally disabled the re-use index feature (only 
default values were used). 
   
   The implementation is very similar to the existing 
`SegmentColumarIndexCreator` implementation for this config, which uses a 
`SameValueForwardIndexCreator` class in the same way. I'm open to suggestions 
if there's a better way to accomplish this, though the current implementation 
keeps the logic relatively isolated. It seems a bit weird to me that a text 
index config should affect the forward index, long term I think this can be 
accomplished generically w/ forward index configs. 
   
   In the short term, I think the enhancement provides significant benefits 
(especially when used with `SchemaConformingTransformer`). Some metrics from 
this are shown below, the improvements should be fairly evident:
   
   <img width="592" alt="image" 
src="https://github.com/user-attachments/assets/e38b53f6-fcf2-4378-bfa2-8254b56b2e0c";>
   <img width="592" alt="image" 
src="https://github.com/user-attachments/assets/f9fbcdc6-82ef-4ea1-a2a8-98f81f1f7816";>
   <img width="592" alt="image" 
src="https://github.com/user-attachments/assets/ca63140f-26cc-4997-adb1-27dd0bd3aaf2";>
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to