[PR] Fix SmartChinese to only serialize data from classpath with a native array filter applied and never cache dictionaries from custom locations [lucene]

via GitHub Fri, 26 Sep 2025 05:46:42 -0700


uschindler opened a new pull request, #15237:
URL: https://github.com/apache/lucene/pull/15237


   This PR improves SmartChinese analyzer to only load Java serialized `.mem` 
files from classpath and never ever cache custom dictionaries in serialized 
form.
   
   This also adds a filter to the `ObjectInputStream` used for loading the 
legacy dictionary format from classpath to only accept (arrays of) native types.
   
   At a later stage we should completely remove the serialized data and also 
add the original sources of the dictionaries (bigram and core) to the Lucene 
repo and use `DataInput/DataOutput` to write our custom file format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Fix SmartChinese to only serialize data from classpath with a native array filter applied and never cache dictionaries from custom locations [lucene]

Reply via email to