[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717231#action_12717231
 ] 

Michael McCandless commented on LUCENE-1453:


Since we've deprecated all methods that are using FSDirectory.getDirectory 
under-the-hood, why do we even need to fix this?  Ie why replace all these with 
the new FSDir.open, now, when we're just going to remove them in 3.0 anyway?


 When reopen returns a new IndexReader, both IndexReaders may now control the 
 lifecycle of the underlying Directory which is managed by reference counting
 -

 Key: LUCENE-1453
 URL: https://issues.apache.org/jira/browse/LUCENE-1453
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: Failing-testcase-LUCENE-1453.patch, LUCENE-1453.patch, 
 LUCENE-1453.patch, LUCENE-1453.patch


 Rough summary. Basically, FSDirectory tracks references to FSDirectory and 
 when IndexReader.reopen shares a Directory with a created IndexReader and 
 closeDirectory is true, FSDirectory's ref management will see two decrements 
 for one increment. You can end up getting an AlreadyClosed exception on the 
 Directory when the IndexReader is open.
 I have a test I'll put up. A solution seems fairly straightforward (at least 
 in what needs to be accomplished).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717242#action_12717242
 ] 

Uwe Schindler commented on LUCENE-1453:
---

Because the error sometimes also occurs with the refcounting directories, but 
more seldom (because of the refcounting helps to keep the directory open, even 
when it is closed one time too much).
And our problem: we want to really remove this ugly closeDir stuff from 
IndexReaders, the code is sometimes unreadable and its hard to find out whats 
going on.

 When reopen returns a new IndexReader, both IndexReaders may now control the 
 lifecycle of the underlying Directory which is managed by reference counting
 -

 Key: LUCENE-1453
 URL: https://issues.apache.org/jira/browse/LUCENE-1453
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: Failing-testcase-LUCENE-1453.patch, LUCENE-1453.patch, 
 LUCENE-1453.patch, LUCENE-1453.patch


 Rough summary. Basically, FSDirectory tracks references to FSDirectory and 
 when IndexReader.reopen shares a Directory with a created IndexReader and 
 closeDirectory is true, FSDirectory's ref management will see two decrements 
 for one increment. You can end up getting an AlreadyClosed exception on the 
 Directory when the IndexReader is open.
 I have a test I'll put up. A solution seems fairly straightforward (at least 
 in what needs to be accomplished).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717244#action_12717244
 ] 

Michael McCandless commented on LUCENE-1453:


But the refcounting is also deprecated?  And, IndexReader will no longer track 
closeDir in 3.0, since that's only set to true in the deprecated methods?

bq. Because the error sometimes also occurs with the refcounting directories,

Oh, you mean there is an intermittent failure on the current trunk?  (Ie, when 
using FSDir.getDirectory under the hood).

 When reopen returns a new IndexReader, both IndexReaders may now control the 
 lifecycle of the underlying Directory which is managed by reference counting
 -

 Key: LUCENE-1453
 URL: https://issues.apache.org/jira/browse/LUCENE-1453
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: Failing-testcase-LUCENE-1453.patch, LUCENE-1453.patch, 
 LUCENE-1453.patch, LUCENE-1453.patch


 Rough summary. Basically, FSDirectory tracks references to FSDirectory and 
 when IndexReader.reopen shares a Directory with a created IndexReader and 
 closeDirectory is true, FSDirectory's ref management will see two decrements 
 for one increment. You can end up getting an AlreadyClosed exception on the 
 Directory when the IndexReader is open.
 I have a test I'll put up. A solution seems fairly straightforward (at least 
 in what needs to be accomplished).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: WebLuke - include Jetty in Lucene binary distribution?

2009-06-08 Thread Grant Ingersoll

Hey John,

I like WebLuke too, but am not sure what ever became of it.  It seemed  
like it had a lot of traction (http://www.lucidimagination.com/search/document/3b06db2b12dffb70/webluke_include_jetty_in_lucene_binary_distribution 
) but that the main objection was the size of the GWT stuff and a Web  
Server as part of the distribution.


Not sure whether Mark has been maintaining it or not.

In other words, I'm +1 for WebLuke (and Luke, for that matter,  
although I know it has some GPL components) being a part of Lucene,  
even if, just maybe, it isn't part of the main distribution.


-Grant


On Jun 5, 2009, at 11:27 PM, John Wang wrote:


Hi guys:

 I am interested in what is the latest decision on webluke - I  
downloaded the zip, tried it and love it!


Does it support all Luke's functionality? (especially the plugin  
support)


Thanks

-John

On Sun, Apr 27, 2008 at 7:09 AM, Uwe Schindler u...@thetaphi.de  
wrote:

Here another Servlet 2.3 compatible container:

http://panfmp.svn.sourceforge.net/viewvc/panfmp/tools/mini-webserver/trunk/

It does not support web.xml files (instead uses a simple properties  
file),
but it supports almost everything needed to get simple servlets  
running with

path mappings etc. The support for web.xml was left out because of
compatibility with very old java versions without xml support and to  
keep it
small. JAR file is about 39 KB plus servlet.jar version 2.3 without  
JSP

classes (31 KB) and commons-logging.

We use it currenty for a CD-ROM based Lucene search engine. It's  
licensed in
Apache 2.0 and Java 1.3 compatible (no generics, StringBuffer).  
The SVN
currenty lacks documentation and startup shell scripts, but a  
working config

file is supplied.

The SVN contains a little bit more jar files, but needed is only
webserver.jar, servlet-2.3.jar and commons-logging.jar. Some  
features are,
that the static content servlet can serve files directly from ZIP  
files

(e.g., http://localhost/file.zip/some/example.txt).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Nadav Har'El [mailto:n...@math.technion.ac.il]
 Sent: Sunday, April 27, 2008 3:08 PM
 To: java-dev@lucene.apache.org
 Subject: Re: WebLuke - include Jetty in Lucene binary distribution?

 On Sun, Dec 09, 2007, markharw00d wrote about WebLuke - include  
Jetty in

 Lucene binary distribution?:
  The only open question is if we should bundle Jetty in the  
Lucene binary

  distribution as part of the build packaging. This could be used to
  launch both WebLuke and the existing luceneweb.war but adds  
about 6 or 7

  meg to the overall zipped download size.
  Thoughts?

 My thoughts is that 6-7 MB for a tiny HTTP Server and/or servlet  
engine is
 way, way, too much. I'm surprise that Jetty, originally intended  
to be

 simple
 and embeddable, reached that size (which is 10 times larger than  
Lucene's

 core,
 for example)!

 For demo purposes, I wrote myself something similar, and its
 (uncompressed)
 .class size is:
   14 K for the basic HTTP server
   24 K for the servlet container (jaxax.servlet API support)
 And there's also the Servlet API itself from Sun, at around 40 K  
(this is

 part
 of J2EE but not of J2SE, so you need to include this as well if  
you want

 to
 use the servlet API). And that's it.

 I'm sure that similar tiny Web Servers can also be found on the  
Web, but

 if
 there's interest, I can see about publishing mine.


 --
 Nadav Har'El|   Sunday, Apr 27 2008,  
22 Nisan

 5768
 IBM Haifa Research Lab   
|-

 
 |Why do we drive on a parkway  
and park

 on
 http://nadav.harel.org.il   |a driveway?

  
-

 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: WebLuke - include Jetty in Lucene binary distribution?

2009-06-08 Thread mark harwood
Hi John/Grant.

I haven't done any more in developing WebLuke - although still use it regularly.
As Grant suggests there was an unease (mine) about bloating the Lucene 
distribution size with GWT dependencies so it wasn't rolled into contrib. 
However I guess I'm comfortable if no one else is concerned about this.

The GWT skin is useful for remote working but I think Luke could/should be 
built with a front-end-independent back end leaving the door open for  Swing or 
SWT front-ends for work with local indexes.
The current Thinlet skin is the piece that has the unfortunate GPL 
dependency. GWT is Apache licensed and so would be OK.

I would probably need to upgrade WebLuke to the latest version of GWT prior to 
any contribution and would also like to de-GWT-ize the back end. 

I guess the main question is how to manage/build/package the contrib section 
given WebLuke could bring in Jetty and we already have 2 web-based contrib 
demos in there that could use this too.

Cheers
Mark








From: Grant Ingersoll gsing...@apache.org
To: java-dev@lucene.apache.org
Sent: Monday, 8 June, 2009 14:03:49
Subject: Re: WebLuke - include Jetty in Lucene binary distribution?

Hey John,

I like WebLuke too, but am not sure what ever became of it.  It seemed like it 
had a lot of traction 
(http://www.lucidimagination.com/search/document/3b06db2b12dffb70/webluke_include_jetty_in_lucene_binary_distribution)
 but that the main objection was the size of the GWT stuff and a Web Server as 
part of the distribution.

Not sure whether Mark has been maintaining it or not.  

In other words, I'm +1 for WebLuke (and Luke, for that matter, although I know 
it has some GPL components) being a part of Lucene, even if, just maybe, it 
isn't part of the main distribution.

-Grant



On Jun 5, 2009, at 11:27 PM, John Wang wrote:

Hi guys:

 I am interested in what is the latest decision on webluke - I downloaded 
the zip, tried it and love it!

Does it support all Luke's functionality? (especially the plugin support)

Thanks

-John


On Sun, Apr 27, 2008 at 7:09 AM, Uwe Schindler u...@thetaphi.de wrote:

Here another Servlet 2.3 compatible container:

http://panfmp.svn.sourceforge.net/viewvc/panfmp/tools/mini-webserver/trunk/

It does not support web.xml files (instead uses a simple properties file),
but it supports almost everything needed to get simple servlets running with
path mappings etc. The support for web.xml was left out because of
compatibility with very old java versions without xml support and to keep it
small. JAR file is about 39 KB plus servlet.jar version 2.3 without JSP
classes (31 KB) and commons-logging.

We use it currenty for a CD-ROM based Lucene search engine. It's licensed in
Apache 2.0 and Java 1.3 compatible (no generics, StringBuffer). The SVN
currenty lacks documentation and startup shell scripts, but a working config
file is supplied.

The SVN contains a little bit more jar files, but needed is only
webserver.jar, servlet-2.3.jar and commons-logging.jar. Some features are,
that the static content servlet can serve files directly from ZIP files
(e.g., http://localhost/file.zip/some/example.txt).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Nadav Har'El [mailto:n...@math.technion.ac.il]
 Sent: Sunday, April 27, 2008 3:08 PM
 To: java-dev@lucene.apache.org

 Subject: Re: WebLuke - include Jetty in Lucene binary distribution?

 On Sun, Dec 09, 2007, markharw00d wrote about WebLuke - include Jetty in
 Lucene binary distribution?:
  The only open question is if we should bundle Jetty in the Lucene binary
  distribution as part of the build packaging. This could be used to
  launch both WebLuke and the existing luceneweb.war but adds about 6 or 7
  meg to the overall zipped download size.
  Thoughts?

 My thoughts is that 6-7 MB for a tiny HTTP Server and/or servlet engine is
 way, way, too much. I'm surprise that Jetty, originally intended to be
 simple
 and embeddable, reached that size (which is 10 times larger than Lucene's
 core,
 for example)!

 For demo purposes, I wrote myself something similar, and its
 (uncompressed)
 .class size is:
   14 K for the basic HTTP server
   24 K for the servlet container (jaxax.servlet API support)
 And there's also the Servlet API itself from Sun, at around 40 K (this is
 part
 of J2EE but not of J2SE, so you need to include this as well if you want
 to
 use the servlet API). And that's it.

 I'm sure that similar tiny Web Servers can also be found on the Web, but
 if
 there's interest, I can see about publishing mine.


 --
 Nadav Har'El|   Sunday, Apr 27 2008, 22 Nisan
 5768
 IBM Haifa Research Lab  |-
 
 |Why do we drive on a parkway and park
 on
 http://nadav.harel.org.il   |a driveway?

 

Some thoughts around the use of reader.isDeleted and hasDeletions

2009-06-08 Thread Shai Erera
Hi

I recently read CHANGES to learn more about the readOnly parameter
IndexReader now supports, and came across LUCENE-1329 with a comment that
isDeleted was made not synchronized if readOnly=true (e.g.
ReadOnlyIndexReader), which can affect search code, as it is usually the
bottleneck for search operations.

I searched the code and was surprised to see isDeleted and hasDeletions are
not called from any search code. Instead, the code, such as SegmentTermDocs,
uses the deletedDocs instance directly. So in fact isDeleted wasn't the
bottleneck (unless this was part of the changes in 1329). Anyway, doesn't
matter, that's good !

However, I did find out some indexing code whic calls these two, like
SegmentMerger when it merges fields and term vectors. I think that we can
improve that code by writing some specialized code for merging - if the
reader has no deletions, there's no point checking for each document if
there are deletions and/or if the document was deleted. In fact,
SegmentMerger checks for each document: (1) whether the reader has deletions
and if the document was deleted, (2) if the reader has a matching reader and
(3) if checkAbort is not null.

I have a couple of suggestions to simplify that code:
1. Create two specialized copyFieldsWithDeletions/copyFieldsNoDeletions to
get rid of the unnecessary if (hasDeletions) check for each doc.
2. In these, check if the reader has matching reader or not, and execute the
appropriate code in each case.
3. Also, check up front if checkAbort != null.
3.1 (3) duplicates the code - so perhaps instead I can create a dummy
checkAbort, which does nothing? That way we'll always call
checkAbort.work(). But this adds a method call, so I'm not sure.

(same optimizations for mergeVectors()).

In addition, I think something can be done w/ SegmentTermDocs. Either create
a specialized STD based on whether it has deletions or not, or create a
dummy BitVector which returns false for every check. That way we can
eliminate the checks in each next(), skipTo(). Dummy BitVector will leave
the file as-is, but will add a method call, so I think I lean towards the
specialized STD. This can be as simple as making STD abstract, with a static
create() method which creates the right instance.

I believe there are other places where we can make such optimizations. If
the overall direction seems right to you, I can open an issue and start to
work on a patch. Currently I've patched SegmentMerger, and besides the class
getting bigger, nothing bad happened (meaning all tests pass), though it
might be interesting to check how it affects performance.

What do you think?

Shai


[jira] Updated: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1453:
--

Attachment: LUCENE-1453.patch

This is the solution using the FilterIndexReader, all tests now pass (with 
refcounting deprectated dirs as well as FSDir.open-dirs, see next Patch).
The solution consists of two parts:
- All closeDirectory stuff is removed from DirectoryIndexReader (even the ugly 
FSDir cloning) and from ReadOnlyDirectoryIndexReader; the code is now simplier 
to understand. It is now on the status for 3.0, no deprecated helper stuff 
anymore in these internal classes. So they can be used in 3.0 without 
modifications.
- As the DirectoryIndexReader is not closing the directory anymore, the 
deprectated IndexReader.open methods taking String/File would not work anymore 
correctly (because they miss to close the dir on closing). To fix this easily, 
a deprectated private class extends FIlterIndexReader was added, that wraps 
around the DirectoryIndexReader, when File/String opens are used. This class 
keeps a refcounter that is incremented on reopen/clone and decremented on 
doClose(). The last doClose, closes the directory. In 3.0 this class can be 
removed easily with all File/String open() methods. I could remove this class 
from IndexReader.java and put in a separate package-private file, if you like.

I would like to have this in 2.9, to get rid of these ugly closeDirectory 
hacks! All tests pass (I retried TestIndexReaderReopen about hundred times and 
no variant fails anymore). It also works, when replacing the refcounting 
FSDir.getDirectory by FSDir.open() calls (see next patch).

 When reopen returns a new IndexReader, both IndexReaders may now control the 
 lifecycle of the underlying Directory which is managed by reference counting
 -

 Key: LUCENE-1453
 URL: https://issues.apache.org/jira/browse/LUCENE-1453
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: Failing-testcase-LUCENE-1453.patch, LUCENE-1453.patch, 
 LUCENE-1453.patch, LUCENE-1453.patch, LUCENE-1453.patch


 Rough summary. Basically, FSDirectory tracks references to FSDirectory and 
 when IndexReader.reopen shares a Directory with a created IndexReader and 
 closeDirectory is true, FSDirectory's ref management will see two decrements 
 for one increment. You can end up getting an AlreadyClosed exception on the 
 Directory when the IndexReader is open.
 I have a test I'll put up. A solution seems fairly straightforward (at least 
 in what needs to be accomplished).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1453) When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

2009-06-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1453:
--

Attachment: LUCENE-1453-with-FSDir-open.patch

This is a variant for testing the same with FSDir.open(). As you see, the 
reopening now also works correctly here and the underlying directory is not 
closed too often.

This patch is for demonstration only.

 When reopen returns a new IndexReader, both IndexReaders may now control the 
 lifecycle of the underlying Directory which is managed by reference counting
 -

 Key: LUCENE-1453
 URL: https://issues.apache.org/jira/browse/LUCENE-1453
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: Failing-testcase-LUCENE-1453.patch, 
 LUCENE-1453-with-FSDir-open.patch, LUCENE-1453.patch, LUCENE-1453.patch, 
 LUCENE-1453.patch, LUCENE-1453.patch


 Rough summary. Basically, FSDirectory tracks references to FSDirectory and 
 when IndexReader.reopen shares a Directory with a created IndexReader and 
 closeDirectory is true, FSDirectory's ref management will see two decrements 
 for one increment. You can end up getting an AlreadyClosed exception on the 
 Directory when the IndexReader is open.
 I have a test I'll put up. A solution seems fairly straightforward (at least 
 in what needs to be accomplished).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717486#action_12717486
 ] 

Michael Busch commented on LUCENE-1567:
---

Is it mostly internal stuff you need to change to compile with 1.4, or do also 
a lot of public APIs use generics?

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a wrapper.
 Recent posts show that there is demand for query syntax improvements,
 e.g improved range query syntax or operator precedence. There are
 already different QP implementations in Lucene+contrib, however I think
 we did not keep them all up to date and in sync. This is not too
 surprising, because usually when fixes and changes are made to the main
 query parser, people don't make the corresponding changes in the contrib
 parsers. (I'm guilty here too)
 With this new architecture it will be much easier to maintain different
 query syntaxes, as the actual code for the first layer is not very much.
 All syntaxes would benefit from patches and improvements we make to the
 underlying layers, which will make supporting different syntaxes much
 more manageable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717492#action_12717492
 ] 

Adriano Crestani commented on LUCENE-1567:
--

It's mostly internal stuffs, the only api that uses generics is QueryNode tha 
returns ListQueryNode and receives it as param, I actually don't think it's a 
big deal :p

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a wrapper.
 Recent posts show that there is demand for query syntax improvements,
 e.g improved range query syntax or operator precedence. There are
 already different QP implementations in Lucene+contrib, however I think
 we did not keep them all up to date and in sync. This is not too
 surprising, because usually when fixes and changes are made to the main
 query parser, people don't make the corresponding changes in the contrib
 parsers. (I'm guilty here too)
 With this new architecture it will be much easier to maintain different
 query syntaxes, as the actual code for the first layer is not very much.
 All syntaxes would benefit from patches and improvements we make to the
 underlying layers, which will make supporting different syntaxes much
 more manageable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to 

Re: [jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Luis Alves
I actually think we should give the parser to contrib on 2.9 using jdk 
1.5 syntax

and move it to main on 3.0 using jdk1.5 syntax.

I don't think it's  a small change, and will affect the interfaces,
and all implementations of QueryNode Objects.

I would see nothing wrong with having a jdk 1.4 version if we were 100% 
compatible with the old queryparser,

but since that is not the case.
(the wrapper we built does not support the case where users extend the 
old queryparser class and overwrite methods to add new functionality)




Adriano Crestani (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717492#action_12717492 ] 


Adriano Crestani commented on LUCENE-1567:
--

It's mostly internal stuffs, the only api that uses generics is QueryNode tha returns 
ListQueryNode and receives it as param, I actually don't think it's a big 
deal :p

  

New flexible query parser
-

Key: LUCENE-1567
URL: https://issues.apache.org/jira/browse/LUCENE-1567
Project: Lucene - Java
 Issue Type: New Feature
 Components: QueryParser
Environment: N/A
   Reporter: Luis Alves
   Assignee: Grant Ingersoll
Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
QueryParser_restructure_meetup_june2009_v2.pdf


From New flexible query parser thread by Micheal Busch
in my team at IBM we have used a different query parser than Lucene's in
our products for quite a while. Recently we spent a significant amount
of time in refactoring the code and designing a very generic
architecture, so that this query parser can be easily used for different
products with varying query syntaxes.
This work was originally driven by Andreas Neumann (who, however, left
our team); most of the code was written by Luis Alves, who has been a
bit active in Lucene in the past, and Adriano Campos, who joined our
team at IBM half a year ago. Adriano is Apache committer and PMC member
on the Tuscany project and getting familiar with Lucene now too.
We think this code is much more flexible and extensible than the current
Lucene query parser, and would therefore like to contribute it to
Lucene. I'd like to give a very brief architecture overview here,
Adriano and Luis can then answer more detailed questions as they're much
more familiar with the code than I am.
The goal was it to separate syntax and semantics of a query. E.g. 'a AND
b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
We distinguish the semantics of the different query components, e.g.
whether and how to tokenize/lemmatize/normalize the different terms or
which Query objects to create for the terms. We wanted to be able to
write a parser with a new syntax, while reusing the underlying
semantics, as quickly as possible.
In fact, Adriano is currently working on a 100% Lucene-syntax compatible
implementation to make it easy for people who are using Lucene's query
parser to switch.
The query parser has three layers and its core is what we call the
QueryNodeTree. It is a tree that initially represents the syntax of the
original query, e.g. for 'a AND b':
  AND
 /   \
A B
The three layers are:
1. QueryParser
2. QueryNodeProcessor
3. QueryBuilder
1. The upper layer is the parsing layer which simply transforms the
query text string into a QueryNodeTree. Currently our implementations of
this layer use javacc.
2. The query node processors do most of the work. It is in fact a
configurable chain of processors. Each processors can walk the tree and
modify nodes or even the tree's structure. That makes it possible to
e.g. do query optimization before the query is executed or to tokenize
terms.
3. The third layer is also a configurable chain of builders, which
transform the QueryNodeTree into Lucene Query objects.
Furthermore the query parser uses flexible configuration objects, which
are based on AttributeSource/Attribute. It also uses message classes that
allow to attach resource bundles. This makes it possible to translate
messages, which is an important feature of a query parser.
This design allows us to develop different query syntaxes very quickly.
Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
underlying processors and builders in a few days. We now have a 100%
compatible Lucene query parser, which means the syntax is identical and
all query parser test cases pass on the new one too using a wrapper.
Recent posts show that there is demand for query syntax improvements,
e.g improved range query syntax or operator precedence. There are
already different QP implementations in Lucene+contrib, however I think
we did not keep them all up to date and in sync. This is not too
surprising, because usually when fixes and changes are made to the main
query 

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717535#action_12717535
 ] 

Luis Alves commented on LUCENE-1567:


I actually think we should give the parser to contrib on 2.9 using jdk 1.5 
syntax
and move it to main on 3.0 using jdk1.5 syntax.

I don't think it's  a small change and this change will affect the interfaces 
and future 
versions of the parser (to be 1.4 compatible).

I would see nothing wrong with having a jdk 1.4 version if we were 100% 
compatible with the old queryparser,
but since that is not the case, I don't think it is worth it. (the wrapper we 
built does not support the case where users extend the old queryparser class 
and overwrite methods to add new functionality)

If everyone else thinks making the queryparser interfaces 1.4 compatible is a 
must, I will be OK with it.
But only if we actually move the new queryparser to main on 2.9 and break the 
compatibility with the old lucene Queryparser class, for users that are 
extending this class.

The new queryparser supports 100% on the syntax, and 100% of the lucene Junits. 
But does not support users that extended the QueryParser class and overwrote 
some methods.




 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a 

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717543#action_12717543
 ] 

Adriano Crestani commented on LUCENE-1567:
--

I went through the new QP and listed what exactly needs to be changed:

QueryNode class has 2 methods: set(ListQueryNode), add(ListQueryNode) and 
ListQueryNode getChildren(). All the generics would be removed. I don't see 
any back compatibility problem if we add generics in future, we could hardcode 
the type checking if we release with 1.4 and any user impl of this class will 
need to do the same and follow the documentation.

ModifierQueryNode has an enum called Modifier with values MOD_NOT, MOD_NONE and 
MOD_REQ. An enum can be almost completely reproduced on 1.4 using: 

...
final public static class Modifier implements Serializable {

   final public static Modifier MOD_NOT = new Modifier();

   final public static Modifier MOD_NOT = new Modifier();

   final public static Modifier MOD_NOT = new Modifier();

   private Modifier() { // empty constructor }

   // we might add some Enum methods, like name(), etc...

}
...

The only back compatibility problem I see when we change the Modifier to enum 
again is if on the version 1.4 the user checks for 
Modifier.class.isEnum()...does anybody see any other back-compatibility issue?

The last thing that will need to be changed is on the QueryBuilder and 
LuceneQueryBuilder. The QueryBuilder.build() returns an Object and when 
LuceneQueryBuilder implements it, it specializes the return to Query, which 
will start throwing Object instead if we change to 1.4. On this case I don't 
see any back-compatibility issue also.

Regarding the new QP framework, I don't see any problem about back 
compatibility, because Lucene will only be Java 1.5 on version 3.0, and back 
compatibility may be broken. But...

I would see nothing wrong with having a jdk 1.4 version if we were 100% 
compatible with the old queryparser,
but since that is not the case, I don't think it is worth it. (the wrapper we 
built does not support the case where users extend the old queryparser class 
and overwrite methods to add new functionality)

I agree with Luis, if we only release the new QP framework 2.9, we will 
definitely brake the back-compatiblity of the old QP, so, why not release the 
old and the new QP together on 2.9?

Suggestions? :)

Best Regards,
Adriano Crestani Campos
Adriano Crestani Campos

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-08 Thread Luis Alves (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717560#action_12717560
 ] 

Luis Alves commented on LUCENE-1567:


There will be a couple of more changes need:
We also have  to change List change QueryNode getchildren(); and  public 
MapCharSequence, Object getTags();
We also have change QueryNodeImpl, we will have to patch all QueryNode classes 
implementations and perform forced casts.
and users implementing QueryNode's will also have to do that.

It's about 30 changes, not that a big change, I agree. But if we release both 
parsers I see no need to change it.

 I agree with Luis, if we only release the new QP framework 2.9, we will 
 definitely brake the back-compatiblity of the old QP, 
 so, why not release the old and the new QP together on 2.9?

Some extras:
If we chose to release both parsers, we should deprecate the old one,
allowing people to migrate to the new one with release 2.9. and drop the old 
queryparser classes on 3.0.
(we can keep the wrappers in 2.9 throwing exceptions in all methods to remind 
people to move to the new framework
we probably can also keep the wrapper in 3.0, if we think is still necessary).



 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a wrapper.