Stefan, I rebased my branch on top of latest master. Let me know if you hit any issues.
- Rahul On Wed, Aug 26, 2015 at 11:46 AM, rahul challapalli < [email protected]> wrote: > Stefan, > > I have some changes to push. I will push them and also rebase the branch > on top of latest mater. I will do it sometime tomorrow. > > - Rahul > > On Tue, Aug 25, 2015 at 11:49 PM, Stefán Baxter <[email protected] > > wrote: > >> Hi Rahul, >> >> I will start working on this later this week and over the weekend. I'm >> not sure how long it will take me to become productive but hopefully I will >> be able to share something soon. >> >> I will fork your repo on github. Can you please make sure it's up to date >> with master? >> I'm assuming that it runs in current state so I can get straight to work >> :). >> >> Best regards, >> -Stefan >> >> On Sun, Aug 23, 2015 at 1:28 AM, rahul challapalli < >> [email protected]> wrote: >> >>> Hi Stefan, >>> >>> I was not able to make any further progress on this. Below are a list of >>> things to-do from a high level >>> >>> 1. Cleanup LuceneScanSpec : The current implementation serializes a lot >>> of low level state information to serialize/de-serialize lucene's >>> SegmentReader. This has to be changed otherwise the plugin is tightly >>> coupled to Lucene's implementation details >>> 2. Serialization of Lucene Query object >>> 3. Convert Sql filter into Lucene Query object : I just started it and >>> made it work in the simplest case. You can take a look at it here. >>> >>> https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java >>> As part of the ElasticSearch storage plugin, Andrew has converted >>> the sql filter to Elastic Search Query. It looks like he handled many >>> cases. We can leverage >>> this for the Lucene format plugin. Below is his code >>> >>> https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java >>> 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This >>> should be handled >>> 5. Pushing Agg functions and Limits into the scan. (This will be an >>> improvement) >>> 5. Testing >>> >>> I want to work on (1) sometime next week. >>> >>> - Rahul >>> >>> >>> On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter < >>> [email protected]> wrote: >>> >>>> Hi Rahul, >>>> >>>> Can you elaborate a bit on the status of the Lucene plugin and what >>>> needs to be done before using it? >>>> >>>> Also let me know if there are specific things that need improving. We >>>> want to try to using it in our project and perhaps we can contribute >>>> something meaningful. >>>> >>>> Regards, >>>> -Stefan >>>> >>>> >>>> >>>> On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee < >>>> [email protected]> wrote: >>>> >>>>> Hi Rahul, >>>>> >>>>> Thanks for sharing your code. I was trying to get plugin for solr >>>>> engine. But I thought of using solr's rest api to do the queries ,get >>>>> schema metadata info etc. >>>>> The goal for me is to expose a solr engine to tools like Tableau or >>>>> MS Excel and user can do stuff there. >>>>> >>>>> I am still very new to this and there is a learning curve. It would be >>>>> great if you can comment/review whatever I've done so far. >>>>> >>>>> >>>>> https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr >>>>> >>>>> Thanks, >>>>> Sudip >>>>> >>>>> -----Original Message----- >>>>> From: rahul challapalli [mailto:[email protected]] >>>>> Sent: 10 August 2015 AM 05:21 >>>>> To: [email protected] >>>>> Subject: Re: Lucene Format Plugin >>>>> >>>>> Below is the link to my branch which contains the changes related to >>>>> the format plugin. >>>>> >>>>> https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene >>>>> >>>>> Any thoughts on how to handle contributions like this which still have >>>>> some work to be done? >>>>> >>>>> - Rahul >>>>> >>>>> >>>>> On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli < >>>>> [email protected]> wrote: >>>>> >>>>> > Thanks Jason. >>>>> > >>>>> > I want to look at the solr plugin and see where we can collaborate or >>>>> > if we already duplicated part of the effort. >>>>> > >>>>> > I still need to push a few commits. I will share the code once I get >>>>> > these changes pushed. >>>>> > >>>>> > - Rahul >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse >>>>> > <[email protected] >>>>> > > wrote: >>>>> > >>>>> >> Hey Rahul, >>>>> >> >>>>> >> This is really cool! Thanks for all of the time you put into writing >>>>> >> this, I think we have a lot of available opportunities to reach new >>>>> >> communities with efforts like this. >>>>> >> >>>>> >> I noticed last week another contributor opened a JIRA for a solr >>>>> >> plugin, there might be a good opportunity for the two of you to join >>>>> >> efforts, as I believe he likely stated working on a lucene reader as >>>>> >> part of his solr work. >>>>> >> >>>>> >> Would you like to post a link to your work on Github or another >>>>> >> public host of your code? >>>>> >> >>>>> >> https://issues.apache.org/jira/browse/DRILL-3585 >>>>> >> >>>>> >> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter >>>>> >> <[email protected]> >>>>> >> wrote: >>>>> >> >>>>> >> > Hi, >>>>> >> > >>>>> >> > I'm pretty new around here but I just wanted to tell you how much >>>>> >> > your >>>>> >> work >>>>> >> > can benefit us. This is great!. >>>>> >> > >>>>> >> > Look forward to trying it out. >>>>> >> > >>>>> >> > Regards, >>>>> >> > -Stefán >>>>> >> > >>>>> >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < >>>>> >> > [email protected]> wrote: >>>>> >> > >>>>> >> > > Hello Drillers, >>>>> >> > > >>>>> >> > > I have been working on a lucene format plugin. In its current >>>>> >> > > state, >>>>> >> the >>>>> >> > > below sample query successfully searches a lucene index and >>>>> >> > > returns >>>>> >> the >>>>> >> > > results. >>>>> >> > > >>>>> >> > > select path from dfs_test.`/search-index` where >>>>> >> > contents='maxItemsPerBlock' >>>>> >> > > and contents = 'BlockTreeTermsIndex' >>>>> >> > > >>>>> >> > > >>>>> >> > > >>>>> >> > > *High Level Overview of Current Implementation:* >>>>> >> > > >>>>> >> > > *Parallelization:* A lucene segment is the lowest level of >>>>> >> > > parrallelization. >>>>> >> > > *Filter Pushdown:* Currently the format plugin is designed to >>>>> >> > > push the complete filter into the scan. >>>>> >> > > *Filter Evaluation:* Each condition in the filter is treated as >>>>> a >>>>> >> lucene >>>>> >> > > TermQuery >>>>> >> > > < >>>>> >> > > >>>>> >> > >>>>> >> >>>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter >>>>> >> mQuery.html >>>>> >> > > > >>>>> >> > > and multiple conditions are joined using a BooleanQuery < >>>>> >> > > >>>>> >> > >>>>> >> >>>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo >>>>> >> leanQuery.html >>>>> >> > > >. >>>>> >> > > If we *do not* use a TermQuery, then we have to know the exact >>>>> >> > > type of Analyzer < >>>>> >> > > >>>>> >> > >>>>> >> >>>>> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/ >>>>> >> Analyzer.html >>>>> >> > > > >>>>> >> > > to use with each field in the query. >>>>> >> > > Ex: 'contents' field might have been analyzed using a >>>>> >> > StandardAnalyzer >>>>> >> > > < >>>>> >> > > >>>>> >> > >>>>> >> >>>>> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce >>>>> >> ne/analysis/standard/StandardAnalyzer.html >>>>> >> > > > >>>>> >> > > and the 'path' field might not have been analyzed at all. >>>>> >> > > If desired, support for raw lucene queries with a reserved word >>>>> >> should be >>>>> >> > > easy to add. >>>>> >> > > Ex: select * from dfs.`search-index` where searchQuery = >>>>> >> > > "+contents:maxItemsPerBlock >>>>> >> > > +path:/home/file.txt"; >>>>> >> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and >>>>> "!=" >>>>> >> > > operators are handled while converting a sql filter into a >>>>> lucene >>>>> >> query. >>>>> >> > > For indexed fields this might be sufficient to handle a good >>>>> >> > > number of cases. For non-indexed fields operators like ">,<, >>>>> like >>>>> >> > > etc" need to >>>>> >> be >>>>> >> > > handled. >>>>> >> > > *FileSystems:* Currently the format plugin only works on a local >>>>> >> > > filesystem. >>>>> >> > > >>>>> >> > > >>>>> >> > > Though far from complete, I want to work with the community to >>>>> >> > > get >>>>> >> some >>>>> >> > > feedback and avoid any chance of duplication of work. Kindly let >>>>> >> > > me >>>>> >> know >>>>> >> > > your thoughts >>>>> >> > > >>>>> >> > > - Rahul >>>>> >> > > >>>>> >> > >>>>> >> >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> ***************************Legal Disclaimer*************************** >>>>> "This communication may contain confidential and privileged material >>>>> for the >>>>> sole use of the intended recipient. Any unauthorized review, use or >>>>> distribution >>>>> by others is strictly prohibited. If you have received the message by >>>>> mistake, >>>>> please advise the sender by reply email and delete the message. Thank >>>>> you." >>>>> ********************************************************************** >>>> >>>> >>>> >>> >> >
