DO NOT REPLY [Bug 35284] New: - CoordConstrainedBooleanQuery + QueryParser support

2005-06-09 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35284

   Summary: CoordConstrainedBooleanQuery + QueryParser support
   Product: Lucene
   Version: unspecified
  Platform: Other
OS/Version: other
Status: NEW
  Severity: enhancement
  Priority: P3
 Component: Search
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: [EMAIL PROTECTED]


Attached 2 new classes:

1) CoordConstrainedBooleanQuery
A boolean query that only matches if a specified number of the contained clauses
match. An example use might be a query that returns a list of books where ANY 2
people from a list of people were co-authors, eg:
"Lucene In Action" would match ("Erik Hatcher" "Otis Gospodnetić" "Mark 
Harwood"
"Doug Cutting") with a minRequiredOverlap of 2 because Otis and Erik wrote that.
The book "Java Development with Ant" would not match because only 1 element in
the list (Erik) was selected.

2) CustomQueryParserExample
A customised QueryParser that allows definition of
CoordConstrainedBooleanQueries. The solution (mis)uses fieldnames to pass
parameters to the custom query.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



DO NOT REPLY [Bug 35284] - CoordConstrainedBooleanQuery + QueryParser support

2005-06-09 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35284





--- Additional Comments From [EMAIL PROTECTED]  2005-06-09 13:05 ---
Created an attachment (id=15346)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=15346&action=view)
CoordConstrainedBooleanQuery class


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



DO NOT REPLY [Bug 35284] - CoordConstrainedBooleanQuery + QueryParser support

2005-06-09 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35284





--- Additional Comments From [EMAIL PROTECTED]  2005-06-09 13:05 ---
Created an attachment (id=15347)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=15347&action=view)
CustomQueryParserExample class


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



DO NOT REPLY [Bug 35284] - CoordConstrainedBooleanQuery + QueryParser support

2005-06-09 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35284





--- Additional Comments From [EMAIL PROTECTED]  2005-06-09 13:53 ---
Damn. This coord trick works nicely on its own, eg
  authors:(min_coord:2 Erik Otis Doug)

but doesnt work when combined with other queries like this:

  title:"lucene in action" AND authors:(min_coord:3 Erik Otis Doug)

A result is returned even thought the coord restriction of 3 is NOT satisfied
(Doug was not an author). Needs some more investigation.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Exception in full text search

2005-06-09 Thread avrootshell

Hello,

   I'm able to create index file for full text search.And i'm sure it 
has the required entries as i have traced the traversal path through the 
tables i have specified. And also documents are added to the index file.


But when i specify some string to search,it throws an exception like this.


.E
Time: 0.234
There was 1 error:
1) 
testSrch(com.board.fts.FtsSearchCmdTest)java.lang.NullPointerException: 
null values not allowed
	at 
org.apache.commons.collections.map.ReferenceMap.put(ReferenceMap.java:571)

at com.sandra.servicer.txtsrch.SrchMan.search(SrchMan.java:108)
at com.board.fts.FtsSearchCmd.execute(FtsSearchCmd.java)
at com.board.fts.FtsSearchCmdTest.testSrch(FtsSearchCmdTest.java)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at com.board.fts.FtsSearchCmdTest.main(FtsSearchCmdTest.java)

FAILURES!!!
Tests run: 1,  Failures: 0,  Errors: 1


Is there any way to view the contents of index file which has been created?
If anyone comes up with some suggesions for this kind of error,I appreciate.

TIA,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exception in full text search

2005-06-09 Thread Bernhard Messer

hi,

"luke" is an OpenSource utility which allows you to analyze and modify 
lucene's index internals. It can be downloaded from 
http://www.getopt.org/luke/


Bernhard


avrootshell wrote:


Hello,

   I'm able to create index file for full text search.And i'm sure it 
has the required entries as i have traced the traversal path through 
the tables i have specified. And also documents are added to the index 
file.


But when i specify some string to search,it throws an exception like 
this.



.E
Time: 0.234
There was 1 error:
1) 
testSrch(com.board.fts.FtsSearchCmdTest)java.lang.NullPointerException: 
null values not allowed
at 
org.apache.commons.collections.map.ReferenceMap.put(ReferenceMap.java:571) 


at com.sandra.servicer.txtsrch.SrchMan.search(SrchMan.java:108)
at com.board.fts.FtsSearchCmd.execute(FtsSearchCmd.java)
at com.board.fts.FtsSearchCmdTest.testSrch(FtsSearchCmdTest.java)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at com.board.fts.FtsSearchCmdTest.main(FtsSearchCmdTest.java)

FAILURES!!!
Tests run: 1,  Failures: 0,  Errors: 1


Is there any way to view the contents of index file which has been 
created?
If anyone comes up with some suggesions for this kind of error,I 
appreciate.


TIA,


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-09 Thread Bernhard Messer

Doug Cutting wrote:


Bernhard Messer wrote:

sorry for the confusion. On the first look, i thought the new class 
IndexFileNames, containing the necessary constant values, fits 
perfect into org.apache.lucene.index. After a more detailed look, i 
get the feeling that it would be much better to place the new class 
into org.apache.store. If done, we can avoid all dependencies within 
FSDirectory to org.apache.lucene.index, which is very clean.



I think that's an illusion: the store package would actually become 
more dependent on the index package.  If someone changes the set of 
files in an index then the changes will not be localized to the index 
package. Nothing outside of the index package should know anything 
about the internal structure of an index.


If insteaed the index package exposes a public API that permits other 
packages to inquire whether particular file names belong to an index 
then only a small dependency on what should be a stable API is 
exposed.  Changes to index structure can be made without changing 
anything outside of the index package.


Why not creating a new public final class 
org.apache.lucene.store.IndexFileNames and move LuceneFileFilter, 
Constants.INDEX_*, SegmentMerger.COMPOUND_EXTENSIONS, 
SegmentMerger.VECTOR_EXTENSIONS and IndexReader.FILENAME_EXTENSIONS 
to it.



I still think this class should be in the index package.  I'm not 
convinced that anything other than the FileFilter needs to be public.


I finished the changes and commited the changes. There are two new 
classes in package org.apache.lucene.index. 
org.apache.lucene.index.IndexFileNames contains common lucene related 
filenames and extensions, the scope of the class itself and it's members 
are package. org.apache.lucene.index.IndexFileFilter is public and used 
in FSDirectory to decide whether a file belongs to an lucene index and 
can be deleted.


Bernhard


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexFileNames

2005-06-09 Thread Doug Cutting

Bernhard Messer wrote:
I finished the changes and commited the changes. There are two new 
classes in package org.apache.lucene.index. 
org.apache.lucene.index.IndexFileNames contains common lucene related 
filenames and extensions, the scope of the class itself and it's members 
are package. org.apache.lucene.index.IndexFileFilter is public and used 
in FSDirectory to decide whether a file belongs to an lucene index and 
can be deleted.


This looks great!

Thanks,

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: Re: Optimizing indexes with mulitiple processors?

2005-06-09 Thread Chris Collins
Forwarding to the dev list as I dont know if this is usefull datatell me to
shut up if it isnt.

Chris
Note: forwarded message attached.


--- Begin Message ---
To follow up.  I was surprised to find that from the experiment of indexing 4k
documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
following data from my profile:

70 % time was spent in inverting the document
30 % in merge

Ok that part isnt surprising.  However only about 1% of 30% of the merge was
spent in the OS.flush call (not very IO bound at all with this controller). 
And almost all of the invert was in the StandardAnalyzer pegged in the javacc
generated code.  The profile was based upon duration and not cpu. The profiler
was JProbe.  I was using a lower case analyzer and this was a slightly hacked
lucene-1.4.3 source code line that I swapped out some of the synchronized data
structures (hashtable ->hashmap,  Vector->ArrayList).

<>

--- Chris Collins <[EMAIL PROTECTED]> wrote:

> I found with a fast RAID controller that I can easily be CPU bound, some of
> the
> io is related to latency.  You can hide the latency by having overlapping IO
> (you get that with multiple indexers going on at the same time).
> 
> I think there possibly could be more horsepower you can get out of the
> inverter
> and merge aspects of the indexing.  I am currently jprobeing this at the
> moment.
> 
> If your using high latency disks (such as a filer) during merge you may want
> to
> consider increasing the size of the buffers to reduce the amount of rpc's to
> the filerhowever my previous attempts to change this failed.
> 
> C 
> 
> --- Bill Au <[EMAIL PROTECTED]> wrote:
> 
> > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> > you.
> > 
> > Bill
> > 
> > On 6/9/05, Kevin Burton <[EMAIL PROTECTED]> wrote:
> > > Is it possible to get Lucene to do an index optimize on multiple
> > > processors?
> > > 
> > > Its a single threaded algorithm currently right?
> > > 
> > > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > > capacity.  Thats a heck of a performance hit.
> > > 
> > > Kevin
> > > 
> > > --
> > > 
> > > 
> > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > See irc.freenode.net #rojo if you want to chat.
> > > 
> > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > 
> > >Kevin A. Burton, Location - San Francisco, CA
> > >   AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > 
> > > 
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > 
> > >
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--- End Message ---
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]