Re: Lucene Format Plugin

rahul challapalli Wed, 26 Aug 2015 11:48:18 -0700

Stefan,

I have some changes to push. I will push them and also rebase the branch on
top of latest mater. I will do it sometime tomorrow.


- Rahul

On Tue, Aug 25, 2015 at 11:49 PM, Stefán Baxter <ste...@activitystream.com>
wrote:

> Hi Rahul,
>
> I will start working on this later this week and over the weekend. I'm not
> sure how long it will take me to become productive but hopefully I will be
> able to share something soon.
>
> I will fork your repo on github. Can you please make sure it's up to date
> with master?
> I'm assuming that it runs in current state so I can get straight to work
> :).
>
> Best regards,
>  -Stefan
>
> On Sun, Aug 23, 2015 at 1:28 AM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
>> Hi Stefan,
>>
>> I was not able to make any further progress on this. Below are a list of
>> things to-do from a high level
>>
>> 1. Cleanup LuceneScanSpec : The current implementation serializes a lot
>> of low level state information to serialize/de-serialize lucene's
>> SegmentReader. This has to be changed otherwise the plugin is tightly
>> coupled to Lucene's implementation details
>> 2. Serialization of Lucene Query object
>> 3. Convert Sql filter into Lucene Query object : I just started it and
>> made it work in the simplest case. You can take a look at it here.
>>
>> https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java
>>     As part of the ElasticSearch storage plugin, Andrew has converted the
>> sql filter to Elastic Search Query. It looks like he handled many cases. We
>> can leverage
>>     this for the Lucene format plugin. Below is his code
>>
>> https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java
>> 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This
>> should be handled
>> 5. Pushing Agg functions and Limits into the scan. (This will be an
>> improvement)
>> 5. Testing
>>
>> I want to work on (1) sometime next week.
>>
>> - Rahul
>>
>>
>> On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter <
>> ste...@activitystream.com> wrote:
>>
>>> Hi Rahul,
>>>
>>> Can you elaborate a bit on the status of the Lucene plugin and what
>>> needs to be done before using it?
>>>
>>> Also let me know if there are specific things that need improving. We
>>> want to try to using it in our project and perhaps we can contribute
>>> something meaningful.
>>>
>>> Regards,
>>>  -Stefan
>>>
>>>
>>>
>>> On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee <
>>> smukher...@commvault.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>> Thanks for sharing your code. I was trying to get plugin for solr
>>>> engine. But I thought of using solr's rest api to do the queries ,get
>>>> schema metadata info etc.
>>>> The goal for me is to expose a solr engine to tools like Tableau or  MS
>>>> Excel and user can do stuff there.
>>>>
>>>> I am still very new to this and there is a learning curve. It would be
>>>> great if you can comment/review whatever I've done so far.
>>>>
>>>> https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr
>>>>
>>>> Thanks,
>>>> Sudip
>>>>
>>>> -----Original Message-----
>>>> From: rahul challapalli [mailto:challapallira...@gmail.com]
>>>> Sent: 10 August 2015 AM 05:21
>>>> To: dev@drill.apache.org
>>>> Subject: Re: Lucene Format Plugin
>>>>
>>>> Below is the link to my branch which contains the changes related to
>>>> the format plugin.
>>>>
>>>> https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene
>>>>
>>>> Any thoughts on how to handle contributions like this which still have
>>>> some work to be done?
>>>>
>>>> - Rahul
>>>>
>>>>
>>>> On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli <
>>>> challapallira...@gmail.com> wrote:
>>>>
>>>> > Thanks Jason.
>>>> >
>>>> > I want to look at the solr plugin and see where we can collaborate or
>>>> > if we already duplicated part of the effort.
>>>> >
>>>> > I still need to push a few commits. I will share the code once I get
>>>> > these changes pushed.
>>>> >
>>>> > - Rahul
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse
>>>> > <altekruseja...@gmail.com
>>>> > > wrote:
>>>> >
>>>> >> Hey Rahul,
>>>> >>
>>>> >> This is really cool! Thanks for all of the time you put into writing
>>>> >> this, I think we have a lot of available opportunities to reach new
>>>> >> communities with efforts like this.
>>>> >>
>>>> >> I noticed last week another contributor opened a JIRA for a solr
>>>> >> plugin, there might be a good opportunity for the two of you to join
>>>> >> efforts, as I believe he likely stated working on a lucene reader as
>>>> >> part of his solr work.
>>>> >>
>>>> >> Would you like to post a link to your work on Github or another
>>>> >> public host of your code?
>>>> >>
>>>> >> https://issues.apache.org/jira/browse/DRILL-3585
>>>> >>
>>>> >> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter
>>>> >> <ste...@activitystream.com>
>>>> >> wrote:
>>>> >>
>>>> >> > Hi,
>>>> >> >
>>>> >> > I'm pretty new around here but I just wanted to tell you how much
>>>> >> > your
>>>> >> work
>>>> >> > can benefit us. This is great!.
>>>> >> >
>>>> >> > Look forward to trying it out.
>>>> >> >
>>>> >> > Regards,
>>>> >> >  -Stefán
>>>> >> >
>>>> >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli <
>>>> >> > challapallira...@gmail.com> wrote:
>>>> >> >
>>>> >> > > Hello Drillers,
>>>> >> > >
>>>> >> > > I have been working on a lucene format plugin. In its current
>>>> >> > > state,
>>>> >> the
>>>> >> > > below sample query successfully searches a lucene index and
>>>> >> > > returns
>>>> >> the
>>>> >> > > results.
>>>> >> > >
>>>> >> > > select path from dfs_test.`/search-index` where
>>>> >> > contents='maxItemsPerBlock'
>>>> >> > > and contents = 'BlockTreeTermsIndex'
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > *High Level Overview of Current Implementation:*
>>>> >> > >
>>>> >> > > *Parallelization:* A lucene segment is the lowest level of
>>>> >> > > parrallelization.
>>>> >> > > *Filter Pushdown:* Currently the format plugin is designed to
>>>> >> > > push the complete filter into the scan.
>>>> >> > > *Filter Evaluation:* Each condition in the filter is treated as a
>>>> >> lucene
>>>> >> > > TermQuery
>>>> >> > > <
>>>> >> > >
>>>> >> >
>>>> >>
>>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter
>>>> >> mQuery.html
>>>> >> > > >
>>>> >> > > and multiple conditions are joined using a BooleanQuery <
>>>> >> > >
>>>> >> >
>>>> >>
>>>> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo
>>>> >> leanQuery.html
>>>> >> > > >.
>>>> >> > > If we *do not* use a TermQuery, then we have to know the exact
>>>> >> > > type of Analyzer <
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/
>>>> >> Analyzer.html
>>>> >> > > >
>>>> >> > > to use with each field in the query.
>>>> >> > >     Ex: 'contents' field might have been analyzed using a
>>>> >> > StandardAnalyzer
>>>> >> > > <
>>>> >> > >
>>>> >> >
>>>> >>
>>>> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce
>>>> >> ne/analysis/standard/StandardAnalyzer.html
>>>> >> > > >
>>>> >> > > and the 'path' field might not have been analyzed at all.
>>>> >> > > If desired, support for raw lucene queries with a reserved word
>>>> >> should be
>>>> >> > > easy to add.
>>>> >> > >     Ex: select * from dfs.`search-index` where searchQuery =
>>>> >> > > "+contents:maxItemsPerBlock
>>>> >> > > +path:/home/file.txt";
>>>> >> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and
>>>> "!="
>>>> >> > > operators are handled while converting a sql filter into a lucene
>>>> >> query.
>>>> >> > > For indexed fields this might be sufficient to handle a good
>>>> >> > > number of cases. For non-indexed fields operators like ">,<, like
>>>> >> > > etc" need to
>>>> >> be
>>>> >> > > handled.
>>>> >> > > *FileSystems:* Currently the format plugin only works on a local
>>>> >> > > filesystem.
>>>> >> > >
>>>> >> > >
>>>> >> > > Though far from complete, I want to work with the community to
>>>> >> > > get
>>>> >> some
>>>> >> > > feedback and avoid any chance of duplication of work. Kindly let
>>>> >> > > me
>>>> >> know
>>>> >> > > your thoughts
>>>> >> > >
>>>> >> > > - Rahul
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> ***************************Legal Disclaimer***************************
>>>> "This communication may contain confidential and privileged material
>>>> for the
>>>> sole use of the intended recipient. Any unauthorized review, use or
>>>> distribution
>>>> by others is strictly prohibited. If you have received the message by
>>>> mistake,
>>>> please advise the sender by reply email and delete the message. Thank
>>>> you."
>>>> **********************************************************************
>>>
>>>
>>>
>>
>

Re: Lucene Format Plugin

Reply via email to