Stefan, I have some changes to push. I will push them and also rebase the branch on top of latest mater. I will do it sometime tomorrow.
- Rahul On Tue, Aug 25, 2015 at 11:49 PM, Stefán Baxter <ste...@activitystream.com> wrote: > Hi Rahul, > > I will start working on this later this week and over the weekend. I'm not > sure how long it will take me to become productive but hopefully I will be > able to share something soon. > > I will fork your repo on github. Can you please make sure it's up to date > with master? > I'm assuming that it runs in current state so I can get straight to work > :). > > Best regards, > -Stefan > > On Sun, Aug 23, 2015 at 1:28 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > >> Hi Stefan, >> >> I was not able to make any further progress on this. Below are a list of >> things to-do from a high level >> >> 1. Cleanup LuceneScanSpec : The current implementation serializes a lot >> of low level state information to serialize/de-serialize lucene's >> SegmentReader. This has to be changed otherwise the plugin is tightly >> coupled to Lucene's implementation details >> 2. Serialization of Lucene Query object >> 3. Convert Sql filter into Lucene Query object : I just started it and >> made it work in the simplest case. You can take a look at it here. >> >> https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java >> As part of the ElasticSearch storage plugin, Andrew has converted the >> sql filter to Elastic Search Query. It looks like he handled many cases. We >> can leverage >> this for the Lucene format plugin. Below is his code >> >> https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java >> 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This >> should be handled >> 5. Pushing Agg functions and Limits into the scan. (This will be an >> improvement) >> 5. Testing >> >> I want to work on (1) sometime next week. >> >> - Rahul >> >> >> On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter < >> ste...@activitystream.com> wrote: >> >>> Hi Rahul, >>> >>> Can you elaborate a bit on the status of the Lucene plugin and what >>> needs to be done before using it? >>> >>> Also let me know if there are specific things that need improving. We >>> want to try to using it in our project and perhaps we can contribute >>> something meaningful. >>> >>> Regards, >>> -Stefan >>> >>> >>> >>> On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee < >>> smukher...@commvault.com> wrote: >>> >>>> Hi Rahul, >>>> >>>> Thanks for sharing your code. I was trying to get plugin for solr >>>> engine. But I thought of using solr's rest api to do the queries ,get >>>> schema metadata info etc. >>>> The goal for me is to expose a solr engine to tools like Tableau or MS >>>> Excel and user can do stuff there. >>>> >>>> I am still very new to this and there is a learning curve. It would be >>>> great if you can comment/review whatever I've done so far. >>>> >>>> https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr >>>> >>>> Thanks, >>>> Sudip >>>> >>>> -----Original Message----- >>>> From: rahul challapalli [mailto:challapallira...@gmail.com] >>>> Sent: 10 August 2015 AM 05:21 >>>> To: dev@drill.apache.org >>>> Subject: Re: Lucene Format Plugin >>>> >>>> Below is the link to my branch which contains the changes related to >>>> the format plugin. >>>> >>>> https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene >>>> >>>> Any thoughts on how to handle contributions like this which still have >>>> some work to be done? >>>> >>>> - Rahul >>>> >>>> >>>> On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli < >>>> challapallira...@gmail.com> wrote: >>>> >>>> > Thanks Jason. >>>> > >>>> > I want to look at the solr plugin and see where we can collaborate or >>>> > if we already duplicated part of the effort. >>>> > >>>> > I still need to push a few commits. I will share the code once I get >>>> > these changes pushed. >>>> > >>>> > - Rahul >>>> > >>>> > >>>> > >>>> > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse >>>> > <altekruseja...@gmail.com >>>> > > wrote: >>>> > >>>> >> Hey Rahul, >>>> >> >>>> >> This is really cool! Thanks for all of the time you put into writing >>>> >> this, I think we have a lot of available opportunities to reach new >>>> >> communities with efforts like this. >>>> >> >>>> >> I noticed last week another contributor opened a JIRA for a solr >>>> >> plugin, there might be a good opportunity for the two of you to join >>>> >> efforts, as I believe he likely stated working on a lucene reader as >>>> >> part of his solr work. >>>> >> >>>> >> Would you like to post a link to your work on Github or another >>>> >> public host of your code? >>>> >> >>>> >> https://issues.apache.org/jira/browse/DRILL-3585 >>>> >> >>>> >> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter >>>> >> <ste...@activitystream.com> >>>> >> wrote: >>>> >> >>>> >> > Hi, >>>> >> > >>>> >> > I'm pretty new around here but I just wanted to tell you how much >>>> >> > your >>>> >> work >>>> >> > can benefit us. This is great!. >>>> >> > >>>> >> > Look forward to trying it out. >>>> >> > >>>> >> > Regards, >>>> >> > -Stefán >>>> >> > >>>> >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli < >>>> >> > challapallira...@gmail.com> wrote: >>>> >> > >>>> >> > > Hello Drillers, >>>> >> > > >>>> >> > > I have been working on a lucene format plugin. In its current >>>> >> > > state, >>>> >> the >>>> >> > > below sample query successfully searches a lucene index and >>>> >> > > returns >>>> >> the >>>> >> > > results. >>>> >> > > >>>> >> > > select path from dfs_test.`/search-index` where >>>> >> > contents='maxItemsPerBlock' >>>> >> > > and contents = 'BlockTreeTermsIndex' >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > > *High Level Overview of Current Implementation:* >>>> >> > > >>>> >> > > *Parallelization:* A lucene segment is the lowest level of >>>> >> > > parrallelization. >>>> >> > > *Filter Pushdown:* Currently the format plugin is designed to >>>> >> > > push the complete filter into the scan. >>>> >> > > *Filter Evaluation:* Each condition in the filter is treated as a >>>> >> lucene >>>> >> > > TermQuery >>>> >> > > < >>>> >> > > >>>> >> > >>>> >> >>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter >>>> >> mQuery.html >>>> >> > > > >>>> >> > > and multiple conditions are joined using a BooleanQuery < >>>> >> > > >>>> >> > >>>> >> >>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo >>>> >> leanQuery.html >>>> >> > > >. >>>> >> > > If we *do not* use a TermQuery, then we have to know the exact >>>> >> > > type of Analyzer < >>>> >> > > >>>> >> > >>>> >> >>>> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/ >>>> >> Analyzer.html >>>> >> > > > >>>> >> > > to use with each field in the query. >>>> >> > > Ex: 'contents' field might have been analyzed using a >>>> >> > StandardAnalyzer >>>> >> > > < >>>> >> > > >>>> >> > >>>> >> >>>> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce >>>> >> ne/analysis/standard/StandardAnalyzer.html >>>> >> > > > >>>> >> > > and the 'path' field might not have been analyzed at all. >>>> >> > > If desired, support for raw lucene queries with a reserved word >>>> >> should be >>>> >> > > easy to add. >>>> >> > > Ex: select * from dfs.`search-index` where searchQuery = >>>> >> > > "+contents:maxItemsPerBlock >>>> >> > > +path:/home/file.txt"; >>>> >> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and >>>> "!=" >>>> >> > > operators are handled while converting a sql filter into a lucene >>>> >> query. >>>> >> > > For indexed fields this might be sufficient to handle a good >>>> >> > > number of cases. For non-indexed fields operators like ">,<, like >>>> >> > > etc" need to >>>> >> be >>>> >> > > handled. >>>> >> > > *FileSystems:* Currently the format plugin only works on a local >>>> >> > > filesystem. >>>> >> > > >>>> >> > > >>>> >> > > Though far from complete, I want to work with the community to >>>> >> > > get >>>> >> some >>>> >> > > feedback and avoid any chance of duplication of work. Kindly let >>>> >> > > me >>>> >> know >>>> >> > > your thoughts >>>> >> > > >>>> >> > > - Rahul >>>> >> > > >>>> >> > >>>> >> >>>> > >>>> > >>>> >>>> >>>> >>>> ***************************Legal Disclaimer*************************** >>>> "This communication may contain confidential and privileged material >>>> for the >>>> sole use of the intended recipient. Any unauthorized review, use or >>>> distribution >>>> by others is strictly prohibited. If you have received the message by >>>> mistake, >>>> please advise the sender by reply email and delete the message. Thank >>>> you." >>>> ********************************************************************** >>> >>> >>> >> >