Hi Rahul, I will start working on this later this week and over the weekend. I'm not sure how long it will take me to become productive but hopefully I will be able to share something soon.
I will fork your repo on github. Can you please make sure it's up to date with master? I'm assuming that it runs in current state so I can get straight to work :). Best regards, -Stefan On Sun, Aug 23, 2015 at 1:28 AM, rahul challapalli < [email protected]> wrote: > Hi Stefan, > > I was not able to make any further progress on this. Below are a list of > things to-do from a high level > > 1. Cleanup LuceneScanSpec : The current implementation serializes a lot of > low level state information to serialize/de-serialize lucene's > SegmentReader. This has to be changed otherwise the plugin is tightly > coupled to Lucene's implementation details > 2. Serialization of Lucene Query object > 3. Convert Sql filter into Lucene Query object : I just started it and > made it work in the simplest case. You can take a look at it here. > > https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java > As part of the ElasticSearch storage plugin, Andrew has converted the > sql filter to Elastic Search Query. It looks like he handled many cases. We > can leverage > this for the Lucene format plugin. Below is his code > > https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java > 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This > should be handled > 5. Pushing Agg functions and Limits into the scan. (This will be an > improvement) > 5. Testing > > I want to work on (1) sometime next week. > > - Rahul > > > On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter <[email protected] > > wrote: > >> Hi Rahul, >> >> Can you elaborate a bit on the status of the Lucene plugin and what needs >> to be done before using it? >> >> Also let me know if there are specific things that need improving. We >> want to try to using it in our project and perhaps we can contribute >> something meaningful. >> >> Regards, >> -Stefan >> >> >> >> On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee < >> [email protected]> wrote: >> >>> Hi Rahul, >>> >>> Thanks for sharing your code. I was trying to get plugin for solr >>> engine. But I thought of using solr's rest api to do the queries ,get >>> schema metadata info etc. >>> The goal for me is to expose a solr engine to tools like Tableau or MS >>> Excel and user can do stuff there. >>> >>> I am still very new to this and there is a learning curve. It would be >>> great if you can comment/review whatever I've done so far. >>> >>> https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr >>> >>> Thanks, >>> Sudip >>> >>> -----Original Message----- >>> From: rahul challapalli [mailto:[email protected]] >>> Sent: 10 August 2015 AM 05:21 >>> To: [email protected] >>> Subject: Re: Lucene Format Plugin >>> >>> Below is the link to my branch which contains the changes related to the >>> format plugin. >>> >>> https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene >>> >>> Any thoughts on how to handle contributions like this which still have >>> some work to be done? >>> >>> - Rahul >>> >>> >>> On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli < >>> [email protected]> wrote: >>> >>> > Thanks Jason. >>> > >>> > I want to look at the solr plugin and see where we can collaborate or >>> > if we already duplicated part of the effort. >>> > >>> > I still need to push a few commits. I will share the code once I get >>> > these changes pushed. >>> > >>> > - Rahul >>> > >>> > >>> > >>> > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse >>> > <[email protected] >>> > > wrote: >>> > >>> >> Hey Rahul, >>> >> >>> >> This is really cool! Thanks for all of the time you put into writing >>> >> this, I think we have a lot of available opportunities to reach new >>> >> communities with efforts like this. >>> >> >>> >> I noticed last week another contributor opened a JIRA for a solr >>> >> plugin, there might be a good opportunity for the two of you to join >>> >> efforts, as I believe he likely stated working on a lucene reader as >>> >> part of his solr work. >>> >> >>> >> Would you like to post a link to your work on Github or another >>> >> public host of your code? >>> >> >>> >> https://issues.apache.org/jira/browse/DRILL-3585 >>> >> >>> >> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter >>> >> <[email protected]> >>> >> wrote: >>> >> >>> >> > Hi, >>> >> > >>> >> > I'm pretty new around here but I just wanted to tell you how much >>> >> > your >>> >> work >>> >> > can benefit us. This is great!. >>> >> > >>> >> > Look forward to trying it out. >>> >> > >>> >> > Regards, >>> >> > -Stefán >>> >> > >>> >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < >>> >> > [email protected]> wrote: >>> >> > >>> >> > > Hello Drillers, >>> >> > > >>> >> > > I have been working on a lucene format plugin. In its current >>> >> > > state, >>> >> the >>> >> > > below sample query successfully searches a lucene index and >>> >> > > returns >>> >> the >>> >> > > results. >>> >> > > >>> >> > > select path from dfs_test.`/search-index` where >>> >> > contents='maxItemsPerBlock' >>> >> > > and contents = 'BlockTreeTermsIndex' >>> >> > > >>> >> > > >>> >> > > >>> >> > > *High Level Overview of Current Implementation:* >>> >> > > >>> >> > > *Parallelization:* A lucene segment is the lowest level of >>> >> > > parrallelization. >>> >> > > *Filter Pushdown:* Currently the format plugin is designed to >>> >> > > push the complete filter into the scan. >>> >> > > *Filter Evaluation:* Each condition in the filter is treated as a >>> >> lucene >>> >> > > TermQuery >>> >> > > < >>> >> > > >>> >> > >>> >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter >>> >> mQuery.html >>> >> > > > >>> >> > > and multiple conditions are joined using a BooleanQuery < >>> >> > > >>> >> > >>> >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo >>> >> leanQuery.html >>> >> > > >. >>> >> > > If we *do not* use a TermQuery, then we have to know the exact >>> >> > > type of Analyzer < >>> >> > > >>> >> > >>> >> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/ >>> >> Analyzer.html >>> >> > > > >>> >> > > to use with each field in the query. >>> >> > > Ex: 'contents' field might have been analyzed using a >>> >> > StandardAnalyzer >>> >> > > < >>> >> > > >>> >> > >>> >> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce >>> >> ne/analysis/standard/StandardAnalyzer.html >>> >> > > > >>> >> > > and the 'path' field might not have been analyzed at all. >>> >> > > If desired, support for raw lucene queries with a reserved word >>> >> should be >>> >> > > easy to add. >>> >> > > Ex: select * from dfs.`search-index` where searchQuery = >>> >> > > "+contents:maxItemsPerBlock >>> >> > > +path:/home/file.txt"; >>> >> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and >>> "!=" >>> >> > > operators are handled while converting a sql filter into a lucene >>> >> query. >>> >> > > For indexed fields this might be sufficient to handle a good >>> >> > > number of cases. For non-indexed fields operators like ">,<, like >>> >> > > etc" need to >>> >> be >>> >> > > handled. >>> >> > > *FileSystems:* Currently the format plugin only works on a local >>> >> > > filesystem. >>> >> > > >>> >> > > >>> >> > > Though far from complete, I want to work with the community to >>> >> > > get >>> >> some >>> >> > > feedback and avoid any chance of duplication of work. Kindly let >>> >> > > me >>> >> know >>> >> > > your thoughts >>> >> > > >>> >> > > - Rahul >>> >> > > >>> >> > >>> >> >>> > >>> > >>> >>> >>> >>> ***************************Legal Disclaimer*************************** >>> "This communication may contain confidential and privileged material for >>> the >>> sole use of the intended recipient. Any unauthorized review, use or >>> distribution >>> by others is strictly prohibited. If you have received the message by >>> mistake, >>> please advise the sender by reply email and delete the message. Thank >>> you." >>> ********************************************************************** >> >> >> >
