jackluo923 opened a new pull request, #13003:
URL: https://github.com/apache/pinot/pull/13003

   In this pull request, we introduce two significant feature enhancements that 
build upon #12027 [Configurable Lucene Analyzer] and enhance the flexibility 
and functionality of our text processing capabilities using Lucene:
   
   1. **Enhanced Flexibility for Custom Lucene Analyzers:**
   We've introduced backward-compatible support that enables the passing of 
arbitrary arguments and variable types to custom Lucene analyzers. This 
enhancement allows for dynamic customization of analyzer behavior based on 
runtime configurations specified in table configs. This feature is particularly 
beneficial for adapting the tokenization process to varying requirements 
without needing to alter the underlying codebase.
   
   2. **Configurable Lucene Query Parser:**
   We've added the ability to configure the which Lucene query parser to use at 
run-time. This addition makes it possible to tailor the behavior of the query 
parser to better align with specific use cases, enhancing the efficiency and 
relevance of search operations. At the moment, we only support QueryParser 
which inherits Lucene's `QueryParserBase` class and must implement `Class(Field 
f, Analyzer a)` constructor and `Query parse(String query)` instance method. In 
the future, we may add the ability to utilize more complex query parsers such 
as 
[MultiFieldQueryParser](https://lucene.apache.org/core/7_2_0/queryparser/org/apache/lucene/queryparser/classic/MultiFieldQueryParser.html)
 which does not implement `Query parse(String query)` instance method.
   
   The combination of these enhancements significantly improves our production 
system's capability to control text tokenization at runtime. This is especially 
useful for implementing precise log search functionalities, such as supporting 
concurrent case-sensitive and case-insensitive searches using wildcards and 
regular expressions. The flexibility to configure both the Lucene analyzer and 
query parser dynamically ensures that our application can efficiently handle 
diverse and complex search requirements.
   
   The enhancement is contributed and improved by multiple developers 
(@jackluo923 @Bill-hbrhbr @itschrispeck @lnbest0707-uber) across multiple 
iterations and this PR is a summary of the internal changes to be contributed 
to OSS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to