Hi Rahul,

I will start working on this later this week and over the weekend. I'm not
sure how long it will take me to become productive but hopefully I will be
able to share something soon.

I will fork your repo on github. Can you please make sure it's up to date
with master?
I'm assuming that it runs in current state so I can get straight to work :).

Best regards,
 -Stefan

On Sun, Aug 23, 2015 at 1:28 AM, rahul challapalli <
[email protected]> wrote:

> Hi Stefan,
>
> I was not able to make any further progress on this. Below are a list of
> things to-do from a high level
>
> 1. Cleanup LuceneScanSpec : The current implementation serializes a lot of
> low level state information to serialize/de-serialize lucene's
> SegmentReader. This has to be changed otherwise the plugin is tightly
> coupled to Lucene's implementation details
> 2. Serialization of Lucene Query object
> 3. Convert Sql filter into Lucene Query object : I just started it and
> made it work in the simplest case. You can take a look at it here.
>
> https://github.com/rchallapalli/drill/blob/lucene/contrib/format-lucene/src/main/java/org/apache/drill/exec/planner/logical/SqlFilterToLuceneQuery.java
>     As part of the ElasticSearch storage plugin, Andrew has converted the
> sql filter to Elastic Search Query. It looks like he handled many cases. We
> can leverage
>     this for the Lucene format plugin. Below is his code
>
> https://github.com/aleph-zero/drill/blob/elastic/contrib/storage-elasticsearch/src/main/java/org/apache/drill/exec/store/elasticsearch/rules/PredicateAnalyzer.java
> 4. Currently the lucene format plugin does not work on HDFS/MaprFs. This
> should be handled
> 5. Pushing Agg functions and Limits into the scan. (This will be an
> improvement)
> 5. Testing
>
> I want to work on (1) sometime next week.
>
> - Rahul
>
>
> On Sat, Aug 22, 2015 at 12:00 AM, Stefán Baxter <[email protected]
> > wrote:
>
>> Hi Rahul,
>>
>> Can you elaborate a bit on the status of the Lucene plugin and what needs
>> to be done before using it?
>>
>> Also let me know if there are specific things that need improving. We
>> want to try to using it in our project and perhaps we can contribute
>> something meaningful.
>>
>> Regards,
>>  -Stefan
>>
>>
>>
>> On Mon, Aug 10, 2015 at 5:01 AM, Sudip Mukherjee <
>> [email protected]> wrote:
>>
>>> Hi Rahul,
>>>
>>> Thanks for sharing your code. I was trying to get plugin for solr
>>> engine. But I thought of using solr's rest api to do the queries ,get
>>> schema metadata info etc.
>>> The goal for me is to expose a solr engine to tools like Tableau or  MS
>>> Excel and user can do stuff there.
>>>
>>> I am still very new to this and there is a learning curve. It would be
>>> great if you can comment/review whatever I've done so far.
>>>
>>> https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr
>>>
>>> Thanks,
>>> Sudip
>>>
>>> -----Original Message-----
>>> From: rahul challapalli [mailto:[email protected]]
>>> Sent: 10 August 2015 AM 05:21
>>> To: [email protected]
>>> Subject: Re: Lucene Format Plugin
>>>
>>> Below is the link to my branch which contains the changes related to the
>>> format plugin.
>>>
>>> https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene
>>>
>>> Any thoughts on how to handle contributions like this which still have
>>> some work to be done?
>>>
>>> - Rahul
>>>
>>>
>>> On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli <
>>> [email protected]> wrote:
>>>
>>> > Thanks Jason.
>>> >
>>> > I want to look at the solr plugin and see where we can collaborate or
>>> > if we already duplicated part of the effort.
>>> >
>>> > I still need to push a few commits. I will share the code once I get
>>> > these changes pushed.
>>> >
>>> > - Rahul
>>> >
>>> >
>>> >
>>> > On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse
>>> > <[email protected]
>>> > > wrote:
>>> >
>>> >> Hey Rahul,
>>> >>
>>> >> This is really cool! Thanks for all of the time you put into writing
>>> >> this, I think we have a lot of available opportunities to reach new
>>> >> communities with efforts like this.
>>> >>
>>> >> I noticed last week another contributor opened a JIRA for a solr
>>> >> plugin, there might be a good opportunity for the two of you to join
>>> >> efforts, as I believe he likely stated working on a lucene reader as
>>> >> part of his solr work.
>>> >>
>>> >> Would you like to post a link to your work on Github or another
>>> >> public host of your code?
>>> >>
>>> >> https://issues.apache.org/jira/browse/DRILL-3585
>>> >>
>>> >> On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter
>>> >> <[email protected]>
>>> >> wrote:
>>> >>
>>> >> > Hi,
>>> >> >
>>> >> > I'm pretty new around here but I just wanted to tell you how much
>>> >> > your
>>> >> work
>>> >> > can benefit us. This is great!.
>>> >> >
>>> >> > Look forward to trying it out.
>>> >> >
>>> >> > Regards,
>>> >> >  -Stefán
>>> >> >
>>> >> > On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli <
>>> >> > [email protected]> wrote:
>>> >> >
>>> >> > > Hello Drillers,
>>> >> > >
>>> >> > > I have been working on a lucene format plugin. In its current
>>> >> > > state,
>>> >> the
>>> >> > > below sample query successfully searches a lucene index and
>>> >> > > returns
>>> >> the
>>> >> > > results.
>>> >> > >
>>> >> > > select path from dfs_test.`/search-index` where
>>> >> > contents='maxItemsPerBlock'
>>> >> > > and contents = 'BlockTreeTermsIndex'
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > *High Level Overview of Current Implementation:*
>>> >> > >
>>> >> > > *Parallelization:* A lucene segment is the lowest level of
>>> >> > > parrallelization.
>>> >> > > *Filter Pushdown:* Currently the format plugin is designed to
>>> >> > > push the complete filter into the scan.
>>> >> > > *Filter Evaluation:* Each condition in the filter is treated as a
>>> >> lucene
>>> >> > > TermQuery
>>> >> > > <
>>> >> > >
>>> >> >
>>> >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter
>>> >> mQuery.html
>>> >> > > >
>>> >> > > and multiple conditions are joined using a BooleanQuery <
>>> >> > >
>>> >> >
>>> >> http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo
>>> >> leanQuery.html
>>> >> > > >.
>>> >> > > If we *do not* use a TermQuery, then we have to know the exact
>>> >> > > type of Analyzer <
>>> >> > >
>>> >> >
>>> >> https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/
>>> >> Analyzer.html
>>> >> > > >
>>> >> > > to use with each field in the query.
>>> >> > >     Ex: 'contents' field might have been analyzed using a
>>> >> > StandardAnalyzer
>>> >> > > <
>>> >> > >
>>> >> >
>>> >> https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce
>>> >> ne/analysis/standard/StandardAnalyzer.html
>>> >> > > >
>>> >> > > and the 'path' field might not have been analyzed at all.
>>> >> > > If desired, support for raw lucene queries with a reserved word
>>> >> should be
>>> >> > > easy to add.
>>> >> > >     Ex: select * from dfs.`search-index` where searchQuery =
>>> >> > > "+contents:maxItemsPerBlock
>>> >> > > +path:/home/file.txt";
>>> >> > > *Converting SqlFilter to Lucene Query:* Currently only "=" and
>>> "!="
>>> >> > > operators are handled while converting a sql filter into a lucene
>>> >> query.
>>> >> > > For indexed fields this might be sufficient to handle a good
>>> >> > > number of cases. For non-indexed fields operators like ">,<, like
>>> >> > > etc" need to
>>> >> be
>>> >> > > handled.
>>> >> > > *FileSystems:* Currently the format plugin only works on a local
>>> >> > > filesystem.
>>> >> > >
>>> >> > >
>>> >> > > Though far from complete, I want to work with the community to
>>> >> > > get
>>> >> some
>>> >> > > feedback and avoid any chance of duplication of work. Kindly let
>>> >> > > me
>>> >> know
>>> >> > > your thoughts
>>> >> > >
>>> >> > > - Rahul
>>> >> > >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>> ***************************Legal Disclaimer***************************
>>> "This communication may contain confidential and privileged material for
>>> the
>>> sole use of the intended recipient. Any unauthorized review, use or
>>> distribution
>>> by others is strictly prohibited. If you have received the message by
>>> mistake,
>>> please advise the sender by reply email and delete the message. Thank
>>> you."
>>> **********************************************************************
>>
>>
>>
>

Reply via email to