jackluo923 opened a new pull request, #13003: URL: https://github.com/apache/pinot/pull/13003
In this pull request, we introduce two significant feature enhancements that build upon #12027 [Configurable Lucene Analyzer] and enhance the flexibility and functionality of our text processing capabilities using Lucene: 1. **Enhanced Flexibility for Custom Lucene Analyzers:** We've introduced backward-compatible support that enables the passing of arbitrary arguments and variable types to custom Lucene analyzers. This enhancement allows for dynamic customization of analyzer behavior based on runtime configurations specified in table configs. This feature is particularly beneficial for adapting the tokenization process to varying requirements without needing to alter the underlying codebase. 2. **Configurable Lucene Query Parser:** We've added the ability to configure the which Lucene query parser to use at run-time. This addition makes it possible to tailor the behavior of the query parser to better align with specific use cases, enhancing the efficiency and relevance of search operations. At the moment, we only support QueryParser which inherits Lucene's `QueryParserBase` class and must implement `Class(Field f, Analyzer a)` constructor and `Query parse(String query)` instance method. In the future, we may add the ability to utilize more complex query parsers such as [MultiFieldQueryParser](https://lucene.apache.org/core/7_2_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html) which does not implement `Query parse(String query)` instance method. The combination of these enhancements significantly improves our production system's capability to control text tokenization at runtime. This is especially useful for implementing precise log search functionalities, such as supporting concurrent case-sensitive and case-insensitive searches using wildcards and regular expressions. The flexibility to configure both the Lucene analyzer and query parser dynamically ensures that our application can efficiently handle diverse and complex search requirements. The enhancement is contributed and improved by multiple developers (@jackluo923 @Bill-hbrhbr @itschrispeck @lnbest0707-uber) across multiple iterations and this PR is a summary of the internal changes to be contributed to OSS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org