Hello Drillers, I have been working on a lucene format plugin. In its current state, the below sample query successfully searches a lucene index and returns the results.
select path from dfs_test.`/search-index` where contents='maxItemsPerBlock' and contents = 'BlockTreeTermsIndex' *High Level Overview of Current Implementation:* *Parallelization:* A lucene segment is the lowest level of parrallelization. *Filter Pushdown:* Currently the format plugin is designed to push the complete filter into the scan. *Filter Evaluation:* Each condition in the filter is treated as a lucene TermQuery <http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html> and multiple conditions are joined using a BooleanQuery <http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html>. If we *do not* use a TermQuery, then we have to know the exact type of Analyzer <https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html> to use with each field in the query. Ex: 'contents' field might have been analyzed using a StandardAnalyzer <https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html> and the 'path' field might not have been analyzed at all. If desired, support for raw lucene queries with a reserved word should be easy to add. Ex: select * from dfs.`search-index` where searchQuery = "+contents:maxItemsPerBlock +path:/home/file.txt"; *Converting SqlFilter to Lucene Query:* Currently only "=" and "!=" operators are handled while converting a sql filter into a lucene query. For indexed fields this might be sufficient to handle a good number of cases. For non-indexed fields operators like ">,<, like etc" need to be handled. *FileSystems:* Currently the format plugin only works on a local filesystem. Though far from complete, I want to work with the community to get some feedback and avoid any chance of duplication of work. Kindly let me know your thoughts - Rahul
