[PR] [Feature] Support custom Lucene analyzer with args and custom query parser [pinot]

via GitHub Wed, 24 Apr 2024 14:03:44 -0700


jackluo923 opened a new pull request, #13003:
URL: https://github.com/apache/pinot/pull/13003

In this pull request, we introduce two significant feature enhancements that
build upon #12027 [Configurable Lucene Analyzer] and enhance the flexibility
and functionality of our text processing capabilities using Lucene:

1. **Enhanced Flexibility for Custom Lucene Analyzers:**
We've introduced backward-compatible support that enables the passing of
arbitrary arguments and variable types to custom Lucene analyzers. This
enhancement allows for dynamic customization of analyzer behavior based on
runtime configurations specified in table configs. This feature is particularly
beneficial for adapting the tokenization process to varying requirements
without needing to alter the underlying codebase.

2. **Configurable Lucene Query Parser:**
We've added the ability to configure the which Lucene query parser to use at
run-time. This addition makes it possible to tailor the behavior of the query
parser to better align with specific use cases, enhancing the efficiency and
relevance of search operations. At the moment, we only support QueryParser
which inherits Lucene's `QueryParserBase` class and must implement `Class(Field
f, Analyzer a)` constructor and `Query parse(String query)` instance method. In
the future, we may add the ability to utilize more complex query parsers such
as
[MultiFieldQueryParser](https://lucene.apache.org/core/7_2_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html)
which does not implement `Query parse(String query)` instance method.

The combination of these enhancements significantly improves our production
system's capability to control text tokenization at runtime. This is especially
useful for implementing precise log search functionalities, such as supporting
concurrent case-sensitive and case-insensitive searches using wildcards and
regular expressions. The flexibility to configure both the Lucene analyzer and
query parser dynamically ensures that our application can efficiently handle
diverse and complex search requirements.

The enhancement is contributed and improved by multiple developers
(@jackluo923 @Bill-hbrhbr @itschrispeck @lnbest0707-uber) across multiple
iterations and this PR is a summary of the internal changes to be contributed
to OSS.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[PR] [Feature] Support custom Lucene analyzer with args and custom query parser [pinot]

Reply via email to