Hey Rahul, This is really cool! Thanks for all of the time you put into writing this, I think we have a lot of available opportunities to reach new communities with efforts like this.
I noticed last week another contributor opened a JIRA for a solr plugin, there might be a good opportunity for the two of you to join efforts, as I believe he likely stated working on a lucene reader as part of his solr work. Would you like to post a link to your work on Github or another public host of your code? https://issues.apache.org/jira/browse/DRILL-3585 On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter <[email protected]> wrote: > Hi, > > I'm pretty new around here but I just wanted to tell you how much your work > can benefit us. This is great!. > > Look forward to trying it out. > > Regards, > -Stefán > > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < > [email protected]> wrote: > > > Hello Drillers, > > > > I have been working on a lucene format plugin. In its current state, the > > below sample query successfully searches a lucene index and returns the > > results. > > > > select path from dfs_test.`/search-index` where > contents='maxItemsPerBlock' > > and contents = 'BlockTreeTermsIndex' > > > > > > > > *High Level Overview of Current Implementation:* > > > > *Parallelization:* A lucene segment is the lowest level of > > parrallelization. > > *Filter Pushdown:* Currently the format plugin is designed to push the > > complete filter into the scan. > > *Filter Evaluation:* Each condition in the filter is treated as a lucene > > TermQuery > > < > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html > > > > > and multiple conditions are joined using a BooleanQuery > > < > > > http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html > > >. > > If we *do not* use a TermQuery, then we have to know the exact type of > > Analyzer > > < > > > https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html > > > > > to use with each field in the query. > > Ex: 'contents' field might have been analyzed using a > StandardAnalyzer > > < > > > https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html > > > > > and the 'path' field might not have been analyzed at all. > > If desired, support for raw lucene queries with a reserved word should be > > easy to add. > > Ex: select * from dfs.`search-index` where searchQuery = > > "+contents:maxItemsPerBlock > > +path:/home/file.txt"; > > *Converting SqlFilter to Lucene Query:* Currently only "=" and "!=" > > operators are handled while converting a sql filter into a lucene query. > > For indexed fields this might be sufficient to handle a good number of > > cases. For non-indexed fields operators like ">,<, like etc" need to be > > handled. > > *FileSystems:* Currently the format plugin only works on a local > > filesystem. > > > > > > Though far from complete, I want to work with the community to get some > > feedback and avoid any chance of duplication of work. Kindly let me know > > your thoughts > > > > - Rahul > > >
