date:20150809

Re: Lucene Format Plugin

2015-08-09 Thread rahul challapalli

Below is the link to my branch which contains the changes related to the
format plugin.

https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene

Any thoughts on how to handle contributions like this which still have some
work to be done?

- Rahul

On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli
challapallira...@gmail.com wrote:

Thanks Jason.

I want to look at the solr plugin and see where we can collaborate or if
we already duplicated part of the effort.

I still need to push a few commits. I will share the code once I get these
changes pushed.

- Rahul

On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse altekruseja...@gmail.com
wrote:

Hey Rahul,

This is really cool! Thanks for all of the time you put into writing this,
I think we have a lot of available opportunities to reach new communities
with efforts like this.

I noticed last week another contributor opened a JIRA for a solr plugin,
there might be a good opportunity for the two of you to join efforts, as I
believe he likely stated working on a lucene reader as part of his solr
work.

Would you like to post a link to your work on Github or another public
host
of your code?

https://issues.apache.org/jira/browse/DRILL-3585

On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter ste...@activitystream.com
wrote:

Hi,

I'm pretty new around here but I just wanted to tell you how much your
work
can benefit us. This is great!.

Look forward to trying it out.

Regards,
-Stefán

On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli
challapallira...@gmail.com wrote:

Hello Drillers,

I have been working on a lucene format plugin. In its current state,
the
below sample query successfully searches a lucene index and returns
the
results.

select path from dfs_test.`/search-index` where
contents='maxItemsPerBlock'
and contents = 'BlockTreeTermsIndex'

*High Level Overview of Current Implementation:*

*Parallelization:* A lucene segment is the lowest level of
parrallelization.
*Filter Pushdown:* Currently the format plugin is designed to push the
complete filter into the scan.
*Filter Evaluation:* Each condition in the filter is treated as a
lucene
TermQuery

http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/TermQuery.html

and multiple conditions are joined using a BooleanQuery

http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/BooleanQuery.html
.
If we *do not* use a TermQuery, then we have to know the exact type of
Analyzer

https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/Analyzer.html

to use with each field in the query.
Ex: 'contents' field might have been analyzed using a
StandardAnalyzer

https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html

and the 'path' field might not have been analyzed at all.
If desired, support for raw lucene queries with a reserved word
should be
easy to add.
Ex: select * from dfs.`search-index` where searchQuery =
+contents:maxItemsPerBlock
+path:/home/file.txt;
*Converting SqlFilter to Lucene Query:* Currently only = and !=
operators are handled while converting a sql filter into a lucene
query.
For indexed fields this might be sufficient to handle a good number of
cases. For non-indexed fields operators like ,, like etc need to
be
handled.
*FileSystems:* Currently the format plugin only works on a local
filesystem.

Though far from complete, I want to work with the community to get
some
feedback and avoid any chance of duplication of work. Kindly let me
know
your thoughts

- Rahul

RE: Lucene Format Plugin

2015-08-09 Thread Sudip Mukherjee

Hi Rahul,

Thanks for sharing your code. I was trying to get plugin for solr engine. But I 
thought of using solr's rest api to do the queries ,get schema metadata info 
etc. 
The goal for me is to expose a solr engine to tools like Tableau or  MS Excel 
and user can do stuff there.

I am still very new to this and there is a learning curve. It would be great if 
you can comment/review whatever I've done so far.

https://github.com/sudipmukherjee/drill/tree/master/contrib/storage-solr

Thanks,
Sudip

-Original Message-
From: rahul challapalli [mailto:challapallira...@gmail.com] 
Sent: 10 August 2015 AM 05:21
To: dev@drill.apache.org
Subject: Re: Lucene Format Plugin

Below is the link to my branch which contains the changes related to the format 
plugin.

https://github.com/rchallapalli/drill/tree/lucene/contrib/format-lucene

Any thoughts on how to handle contributions like this which still have some 
work to be done?

- Rahul


On Mon, Aug 3, 2015 at 12:21 PM, rahul challapalli  
challapallira...@gmail.com wrote:

 Thanks Jason.

 I want to look at the solr plugin and see where we can collaborate or 
 if we already duplicated part of the effort.

 I still need to push a few commits. I will share the code once I get 
 these changes pushed.

 - Rahul



 On Mon, Aug 3, 2015 at 11:31 AM, Jason Altekruse 
 altekruseja...@gmail.com
  wrote:

 Hey Rahul,

 This is really cool! Thanks for all of the time you put into writing 
 this, I think we have a lot of available opportunities to reach new 
 communities with efforts like this.

 I noticed last week another contributor opened a JIRA for a solr 
 plugin, there might be a good opportunity for the two of you to join 
 efforts, as I believe he likely stated working on a lucene reader as 
 part of his solr work.

 Would you like to post a link to your work on Github or another 
 public host of your code?

 https://issues.apache.org/jira/browse/DRILL-3585

 On Mon, Aug 3, 2015 at 2:29 AM, Stefán Baxter 
 ste...@activitystream.com
 wrote:

  Hi,
 
  I'm pretty new around here but I just wanted to tell you how much 
  your
 work
  can benefit us. This is great!.
 
  Look forward to trying it out.
 
  Regards,
   -Stefán
 
  On Mon, Aug 3, 2015 at 8:38 AM, rahul challapalli  
  challapallira...@gmail.com wrote:
 
   Hello Drillers,
  
   I have been working on a lucene format plugin. In its current 
   state,
 the
   below sample query successfully searches a lucene index and 
   returns
 the
   results.
  
   select path from dfs_test.`/search-index` where
  contents='maxItemsPerBlock'
   and contents = 'BlockTreeTermsIndex'
  
  
  
   *High Level Overview of Current Implementation:*
  
   *Parallelization:* A lucene segment is the lowest level of 
   parrallelization.
   *Filter Pushdown:* Currently the format plugin is designed to 
   push the complete filter into the scan.
   *Filter Evaluation:* Each condition in the filter is treated as a
 lucene
   TermQuery
   
  
 
 http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Ter
 mQuery.html
   
   and multiple conditions are joined using a BooleanQuery 
  
 
 http://lucene.apache.org/core/5_2_0/core/org/apache/lucene/search/Boo
 leanQuery.html
   .
   If we *do not* use a TermQuery, then we have to know the exact 
   type of Analyzer 
  
 
 https://lucene.apache.org/core/5_2_1/core/org/apache/lucene/analysis/
 Analyzer.html
   
   to use with each field in the query.
   Ex: 'contents' field might have been analyzed using a
  StandardAnalyzer
   
  
 
 https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/luce
 ne/analysis/standard/StandardAnalyzer.html
   
   and the 'path' field might not have been analyzed at all.
   If desired, support for raw lucene queries with a reserved word
 should be
   easy to add.
   Ex: select * from dfs.`search-index` where searchQuery = 
   +contents:maxItemsPerBlock
   +path:/home/file.txt;
   *Converting SqlFilter to Lucene Query:* Currently only = and !=
   operators are handled while converting a sql filter into a lucene
 query.
   For indexed fields this might be sufficient to handle a good 
   number of cases. For non-indexed fields operators like ,, like 
   etc need to
 be
   handled.
   *FileSystems:* Currently the format plugin only works on a local 
   filesystem.
  
  
   Though far from complete, I want to work with the community to 
   get
 some
   feedback and avoid any chance of duplication of work. Kindly let 
   me
 know
   your thoughts
  
   - Rahul
  
 






***Legal Disclaimer***
This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you.
**

[jira] [Created] (DRILL-3618) Documentation on drill.apache.org/docs needs to be corrected

2015-08-09 Thread Abhishek Girish (JIRA)

Abhishek Girish created DRILL-3618:
--

 Summary: Documentation on drill.apache.org/docs needs to be 
corrected 
 Key: DRILL-3618
 URL: https://issues.apache.org/jira/browse/DRILL-3618
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Abhishek Girish
Assignee: Kristine Hahn
Priority: Minor


Link: http://drill.apache.org/docs/core-modules/

- Remove mention of M7. 
- Replace diagram with one having no red spellcheck underlines
- Replace Optiq with Calcite. Also provide an external link for reference.

Link: http://drill.apache.org/docs/performance/

- Replace diagram with one having no red spellcheck underlines



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Lucene Format Plugin

RE: Lucene Format Plugin

[jira] [Created] (DRILL-3618) Documentation on drill.apache.org/docs needs to be corrected

3 matches

Site Navigation

Mail list logo

Footer information