date:20050106

RE: Lucene Book in UK

2005-01-06 Thread Peter Veentjer - Anchor Men

You could buy the ebook from Manning.. It only costs $22,50 and that
makes 16 euro :)

I have bought it there.. 2 minutes later I was reading it.

-Oorspronkelijk bericht-
Van: David Townsend [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 6 januari 2005 19:24
Aan: Lucene Users List (E-mail)
Onderwerp: Lucene Book in UK

Sorry if this is the wrong forum but I wondered what's happened to
'Lucene In Action' in the UK.  Looking forward to reading it but
amazon.co.uk report it as a 'hard to find' item and are now quoting a
4-6 week delivery time and  tacking on a rare book charge.  Amazon.com
are quoting shipping in 24hrs.  Is this a new 'Boston Tea Party'? 

cheers

David




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

questions

2005-01-06 Thread jac jac


Hi I am a newbie and i just installed Tomcat on my machine.
May I know, when i placed the Luceneweb folder in the webapps folder of Tomcat, 
how come I couldn't conduct the search operation when i test the website? Did I 
missed out anything?
 
It prompts me that there is no c:\opt\index\segment folder...
I created but i still couldnt get Lucene to work...
 
At http://jakarta.apache.org/lucene/docs/demo.html:
under the Indexing file instruction where should I do the following "type "java 
org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". "??? 
Is it a must to install ant?
 
Please kindly help!!! Thanks very much in advance
 
regards,
jac 



-
Do you Yahoo!?
 The all-new My Yahoo!  What will yours do?

Re: RemoteSearcher

2005-01-06 Thread Otis Gospodnetic

Nutch (nutch.org) has a pretty sophisticated infrastructure for
distributed searching, but it doesn't use RemoteSearcher.

Otis

--- Yura Smolsky <[EMAIL PROTECTED]> wrote:

> Hello.
> 
> Does anyone know application which based on RemoteSearcher to
> distribute index on many servers?
> 
> Yura Smolsky,
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Book in UK

2005-01-06 Thread Otis Gospodnetic

The book is $44.95 USD - it's printed on the back cover.  Amazon had
the correct price (minus their discount) until recently.  They are just
very slow with their site/book info updates, but I'm sure they'll fix
it eventually.

Otis


--- Erik Hatcher <[EMAIL PROTECTED]> wrote:

> 
> On Jan 6, 2005, at 3:49 PM, Chris Hostetter wrote:
> > B&N agrees that the list price is $60.95 ... which may be what
> Manning 
> > is
> > citing to resellers.
> 
> This is incorrect information that has somehow gotten out.  Amazon
> and 
> B&N are slow to update their information, but Manning assures me that
> 
> they have provided the correct information to Amazon to update.  The 
> actual price you're paying is certainly not indicative of a $60.95
> list 
> price - Amazon doesn't discount 50%, I'm sure.
> 
>   Erik
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: reading fields selectively

2005-01-06 Thread Otis Gospodnetic

Hi John,

There is no API for this, but I recall somebody talking about adding
support for this a few months back.  I even think that somebody might
have contributed a patch for this.  I am not certain about this, but
check the patch queue (link on Lucene site).  If there is a patch
there, even if the patch no longer applies cleanly, you'll be able to
borrow the code for your own patch.  Also note that the CVS version has
support for field compression, which should help with performance if
you are working with large fields.

Otis

--- John Wang <[EMAIL PROTECTED]> wrote:

> Hi:
> 
>Is there some way to read only 1 field value from an index given a
> docID?
> 
>From the current API, in order to get a field from given a docID,
> I
> would call:
>  
> IndexSearcher.document(docID)
> 
>  which in turn reads in all fields from the disk.
> 
>Here is my problem:
> 
>After the search, I have a set of docIDs. For each
> document, I have a unique string identifier. At this point I only
> need
> these identifiers but with the above API, I am forced to read the
> entire row of fields for each document in the search result, which in
> my case can be very large.
> 
>Is there an alternative?
> 
> I am thinking more on the lines of a call:
> 
>Field[] getFields(int docID,String fieldName);
> 
> Thanks
> 
> -John
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

reading fields selectively

2005-01-06 Thread John Wang

Hi:

   Is there some way to read only 1 field value from an index given a docID?

   From the current API, in order to get a field from given a docID, I
would call:
 
IndexSearcher.document(docID)

 which in turn reads in all fields from the disk.

   Here is my problem:

   After the search, I have a set of docIDs. For each
document, I have a unique string identifier. At this point I only need
these identifiers but with the above API, I am forced to read the
entire row of fields for each document in the search result, which in
my case can be very large.

   Is there an alternative?

I am thinking more on the lines of a call:

   Field[] getFields(int docID,String fieldName);

Thanks

-John

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems...

2005-01-06 Thread Erik Hatcher

On Jan 6, 2005, at 6:23 PM, Ross Rankin wrote:
Could you explain this piece further, Erik "BooleanQuery and AND in
TermQuery for resellerId"
Your code did a textual concatenation (and I'm paraphrasing as I don't 
have your previous e-mail handy) of "id:" + resellerId.  And then it 
parsed the expression.  This is not necessarily a problem, though I red 
flag it because of what QueryParser and Analyzers can do with that 
resellerId.  Regardless of how you indexed the reseller id field, an 
analyzer will process it when using QueryParser on it.  If that id is 
completely numeric, some analyzers will toss it, others may leave it 
alone.  If it has alpha characters in it, they may be lowercased.  In 
other words there are lots of variables.  This can be avoided by doing 
this:

TermQuery tq = new TermQuery(new Term("id", resellerId));
Query query = QueryParser.parse(/* the main expression */)
BooleanQuery bq = new BooleanQuery();
bq.add(tq, true, false);
bq.add(query, true, false);
Now use bq as the query passed to search().
Make sense?
I would love to improve the code of this piece and understand the 
engine
more.  Like for example, if something is indexed, it will be found in 
the
search but what about something that is just in the document and not
indexed?
If the field is not indexed (but just stored), you cannot search on it.
  I don't know the difference in Stored, Tokenized, Indexed, and
Vector and where I would do what...  Is there info on that piece on 
the web
somewhere?
Stored = as-is value stored in the Lucene index
Tokenized = field is analyzed using the specified Analyzer - the tokens 
emitted are indexed

Indexed = the text (either as-is with keyword fields, or the tokens 
from tokenized fields) is made searchable (aka inverted)

Vectored = term frequency is stored in the index in an easily 
retrievable fashion.

Like I have a large (6000 chars) text field I would like to add to the
document, it's HTML.  I am guessing first it would need to be parsed 
then
added?  But added and indexed?  The field contains product specs and 
product
compatibility (most in a table form).
You definitely want to parse the HTML file (using NekoHTML, perhaps) 
and extract the text into fields.  Maybe the  and  should 
be separated, for example.

And yes, you would want these fields indexed since you want to search 
on them, I presume.

Stored fields, but not indexed, fields are for metadata you want 
carried along with search results (like the primary key to a database 
row, or a filename) that you'd use to display the results but is not 
needed for searching.

Sorry for the newbie questions but I am not finding Google very chock 
full
of Lucene info...
Have I got a book to sell you!  :)  http://www.lucenebook.com
Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing flat files with out .txt extension

2005-01-06 Thread Erik Hatcher

On Jan 6, 2005, at 6:49 PM, Hetan Shah wrote:
Hi Erik,
I got the source downloaded and unpacked. I am having difficulty in 
building and of the modules. Maybe something's wrong with my Ant 
installation.

LuceneInAction% ant test
Buildfile: build.xml

BUILD FAILED
file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element 
"available"
The good ol' README says this:
R E Q U I R E M E N T S
---
  * JDK 1.4+
  * Ant 1.6+ (to run the automated examples)
  * JUnit 3.8.1+
- junit.jar should be in ANT_HOME/lib
You are not running Ant 1.6, I'm sure.  Upgrade your version of Ant, 
and of course follow the rest of the README and all should be well.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing flat files with out .txt extension

2005-01-06 Thread Hetan Shah

Hi Erik,
I got the source downloaded and unpacked. I am having difficulty in 
building and of the modules. Maybe something's wrong with my Ant 
installation.

LuceneInAction% ant test
Buildfile: build.xml

BUILD FAILED
file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element 
"available"

Total time: 5 seconds
LuceneInAction% ant Indexer
Buildfile: build.xml
BUILD FAILED
file:/home/hs152827/LuceneInAction/build.xml:12: Unexpected element 
"available"

Total time: 5 seconds
**
Can you point me to proper module for creating my own indexer? I tried 
looking into the indexing module but was not sure.

TIA,
-H
Erik Hatcher wrote:
On Jan 5, 2005, at 6:31 PM, Hetan Shah wrote:
How can one index simple text files with out the .txt extension. I am 
trying to use the IndexFiles and IndexHTML but not to my 
satisfaction. In the IndexFiles I do not get any control over the 
content of the file and in case of IndexHTML the files with out any 
extension do not get index all together. Any pointers are really 
appreciated.

Try out the Indexer code from Lucene in Action.  You can download it 
from the link here: 
http://www.lucenebook.com/blog/announcements/sourcecode.html

It'll be cleaner to follow and borrow from.  The code that ships with 
Lucene is for demonstration purposes.  It surprises me how often folks 
use that code to build real indexes.  It's quite straightforward to 
create your own Java code to do the indexing in whatever manner you 
like, borrowing from examples.

When you get the download unpacked, simply run "ant Indexer" to see it 
in action.  And then "ant Searcher" to search the index just built.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Multi-threading problem: couldn't delete segments

2005-01-06 Thread Luke Francl

We are having a problem with Lucene in a high concurrency
create/delete/search situation. I thought I fixed all these problems,
but I guess not.

Here's what's happening.

We are conducting load testing on our application.

On a Windows 2000 server using lucene-1.3-final with compound file
enabled, a worker thread is creating new Documents as it ingests
content. Meanwhile, a test script is going that is hitting the search
part of our application (I think the script also updates and deletes
Documents, but I am not sure. My colleague who wrote it has left for the
day so I can't ask him.).

The scripted test passes with 1, 5, and 10 users hitting the
application. At 20 users, we get this exception:

[Task Worker1] ERROR com.ancept.ams.search.lucene.LuceneIndexer  -
Caught exception closing IndexReader in finally block
java.io.IOException: couldn't delete segments
at
org.apache.lucene.store.FSDirectory.renameFile(FSDirectory.java:236)
at
org.apache.lucene.index.SegmentInfos.write(SegmentInfos.java(Compiled
 Code))
at
org.apache.lucene.index.SegmentReader$1.doBody(SegmentReader.java:179
)
at org.apache.lucene.store.Lock$With.run(Lock.java:148)
at
org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java(Comp
iled Code))
at
org.apache.lucene.index.IndexReader.close(IndexReader.java(Inlined Co
mpiled Code))
at
org.apache.lucene.index.SegmentsReader.doClose(SegmentsReader.java(Co
mpiled Code))
at
org.apache.lucene.index.IndexReader.close(IndexReader.java(Compiled C
ode))
at
com.ancept.ams.search.lucene.LuceneIndexer.delete(LuceneIndexer.java:
266)

All write access to the index is controlled in that LuceneIndexer class
by synchronizing on a static lock object. 

Searching is handled in another part of the code, which creates new
IndexSearchers as necessary when the index changes. I do not rely on
finalization to clean up these searchers because we found it to be
unreliable. I keep track of threads using each searcher and then close
it when that number drops to 0 if the searcher is outdated. 

My problem seems similar to what Robert Leftwich asked about on this
mailing list in January 2001.  

Google Cache:
http://64.233.179.104/search?q=cache:1D4h1vSh5AQJ:www.geocrawler.com/mail/msg.php3%3Fmsg_id%3D5020057++lucene+multithreading+problems+site:geocrawler.com&hl=en

Doug Cutting replied to him saying that he should synchronize calls to
IndexReader.open() and IndexReader.close():

Google Cache:
http://64.233.179.104/search?q=cache:arztiytQ42QJ:www.geocrawler.com/archives/3/2624/2001/1/0/5020870/++lucene+multithreading+problems+site:geocrawler.com&hl=en

Robert Leftwich then found a problem with his code and eliminated a
second IndexReader that was messing stuff up:

Google Cache:
http://64.233.179.104/search?q=cache:jSIsi6t9KH8J:www.geocrawler.com/mail/msg.php3%3Fmsg_id%3D5037517++lucene+multithreading+problems+site:geocrawler.com&hl=en

However, there are differences between Leftwich's design and mine, and
besides, that thread is four years old. (Are there even exisiting
archives for lucene-user throughout 2001 anywhere?)

So any advice would be appreciated.

Do I need to synchronize _all_ IndexReader.open() and
IndexReader.close() calls? Or is it more likely that I'm missing
something in my class that modifies the index? The code is attached.

Thank you,

Luke Francl
// $Id: LuceneIndexer.java 20473 2004-10-19 17:20:10Z lfrancl $
package com.ancept.ams.search.lucene;

import com.ancept.ams.asset.AssetUtils;
import com.ancept.ams.asset.AttributeValue;
import com.ancept.ams.asset.IAsset;
import com.ancept.ams.asset.IAssetIdentifier;
import com.ancept.ams.asset.IAssetList;
import com.ancept.ams.asset.ITimeMetadataAsset;
import com.ancept.ams.asset.IVideoAssetView;
import com.ancept.ams.controller.RelayFactory;
import com.ancept.ams.enums.AttributeNamespace;
import com.ancept.ams.enums.AttributeType;
import com.ancept.ams.enums.TimeMetadataType;
import com.ancept.ams.relay.IAssetRelay;
import com.ancept.ams.search.Indexer;
import com.ancept.ams.search.Fields;
import com.ancept.ams.util.SystemConfig;
import com.ancept.ams.util.PerformanceMonitor;

import org.apache.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.snowball.SnowballAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;

import java.io.File;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.List;

/**
 * Controls access to the Lucene index.
 *
 * @author Luke Francl
 **/
public final class LuceneIndexer implements Indexer {

private static final Logger l4j = Logger.getLogger( LuceneIndexer

RE: Problems...

2005-01-06 Thread Chris Hostetter


: Hoss, could you tell me what to exceptions I'm missing?  Thanks!

anytime you have a "catch" block, you should be doing something with that
exception.  If possible, you can recover from an exception, but no matter
what you should log the exception in some way so that you know it
happened.

Your code has two places where it was catching an exception and doing
absolutely nothing at all -- allowing processing to continue without even
a warning.  there was also an area of your code where if you encountered a
parse exception from the user input, you invented your own query instead
-- again without any sort of logging to let you know waht was happening in
the code.  building your own query when the users query is giberish isn't
neccessarily bad, but logging is your friend.

it wasn't clear from the descirption of your problem what you were trying
to query for so it was very possible that there was a problem parsing your
query, and it was doing the "default" search in that catch block and
giving you back zero results ... hence my question about the
SYstem.out.println calls that *were* in your code.


logging is (again) your friend.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Problems...

2005-01-06 Thread Ross Rankin

Thanks for the responses...  It took a bit of time but I'm learning more and
more every day on this.

To answer Hoss's first question here's the properties for the engine:
lucene.path.to.index=/home/httpd/htdocs/index
lucene.time.interval=15000
lucene.paramOffset = 0

Hoss, could you tell me what to exceptions I'm missing?  Thanks!

I figured out my issue, with a lot of help from Luke.  (Thanks to the other
Luke)  The document I was creating for Lucene to index was missing data due
to a size issue in with the database records.  So Lucene was doing its job
there data wasn't there in the index.  Took a while to figure out why the
document was missing the data, didn't dawn on me that the size and number of
the database records would be the issue, but it really was the only thing
that changed.  

Could you explain this piece further, Erik "BooleanQuery and AND in
TermQuery for resellerId"   

I would love to improve the code of this piece and understand the engine
more.  Like for example, if something is indexed, it will be found in the
search but what about something that is just in the document and not
indexed?  I don't know the difference in Stored, Tokenized, Indexed, and
Vector and where I would do what...  Is there info on that piece on the web
somewhere?

Like I have a large (6000 chars) text field I would like to add to the
document, it's HTML.  I am guessing first it would need to be parsed then
added?  But added and indexed?  The field contains product specs and product
compatibility (most in a table form).  

Sorry for the newbie questions but I am not finding Google very chock full
of Lucene info...

Ross 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Hostetter
Sent: Tuesday, January 04, 2005 6:48 PM
To: Lucene Users List
Subject: Re: Problems...



To start with, there has to be more to the "search" side of things then
what you included.  this search function is not static, which means it's
getting called on an object, which obviously has some internal state
(paramOffset, hits, and pathToIndex are a few that jump out at me)  what
are the values of those variables when this method gets called?

second, there are at least two places in your code where potential
exceptions get thrown away and execution continues.  as a matter of good
practice, you should add logging to these spots to make sure you aren't
ignoring errors...

third, you said " I'm not getting anything in the log that I can point to
that says what is not working," but what about what is/isn't in the log?
there are several System.out.println calls in this code ... I'm assuming
you're logging STDOUT, what do those messages (with variables) say?
what is the value of currentOffset on the initial search? what does the
query.toString look like? how many total hits are being found when the
search is executed?  (or is that line not getting logged because the
search is getting skipped becuase of some initial state in paramOffset?)




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Book in UK

2005-01-06 Thread Erik Hatcher

On Jan 6, 2005, at 3:49 PM, Chris Hostetter wrote:
B&N agrees that the list price is $60.95 ... which may be what Manning 
is
citing to resellers.
This is incorrect information that has somehow gotten out.  Amazon and 
B&N are slow to update their information, but Manning assures me that 
they have provided the correct information to Amazon to update.  The 
actual price you're paying is certainly not indicative of a $60.95 list 
price - Amazon doesn't discount 50%, I'm sure.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread Doug Cutting

John Wang wrote:
Is the operation IndexSearcher.search I/O or CPU bound if I am doing
100's of searches on the same query?
CPU bound.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene Book in UK

2005-01-06 Thread Chris Hostetter


: I ordered my from Amazon a while back and was notified yesterday that it
: shipped. Here was my price:

really??? .. those bastards.  I ordered two copies for my work on December
10th and they still haven't shipped them.

: 1Lucene In Action (In Action)   $27.17  1   $27.17

Hmm, they only charged me $26.37 each ... but Amazon has been known to
experiment with pricepoints.  (On my browser, they're currently showing a
discounted price as 38.40).

I can tell you that on December 10th, Amazon's "List" price was roughly
the same as Manning, hence I was about to order from Manning and get the
free ebook, when i realized I was looking at the "List price" and not the
"Amazon Price".  With Amazon's free shipping it was cheaper to buy two
the two paper copies from amazon *and* give Manning the $22 for the ebook.

: Does anyone know why Amazon.com lists the list price for Lucene in
: Action as $60.95? Bookpool.com has the list price as $44.95, which is
: the price that Manning is charging. After discounting, bookpool.com has
: it on sale for $27.50.

B&N agrees that the list price is $60.95 ... which may be what Manning is
citing to resellers.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-06 Thread Bill Janssen

> > Is this workable for you, Bill?
> 
> No, it doesn't appear to work for me.

Whoops!  I was testing the wrong jar file.  Yes, it *does* appear to
work for me.  I'll put this in my production code.  Thanks again,
Erik.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-06 Thread Bill Janssen

Erik,

> Is this workable for you, Bill?

No, it doesn't appear to work for me.

I modified my class to add the extra method, as you suggested.  I just
forwarded the method to the existing one, as seen below:

protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText,
   int slop)
throws ParseException {
return getFieldQuery(field, a, queryText);
}

protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException {
...
}

It's still not getting called.  My query string is of the form:

  name:"Bill Janssen"

which is a little different from the one you were testing with.  It
does work OK (with both versions of Lucene) on simple queries like the
one you tested with.

My guess is that somewhere between 1.4.1 and 1.4.3, someone decided
that FieldQueries and PhraseQueries should be handled differently.

Bill

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene Book in UK

2005-01-06 Thread O'Hare, Thomas

I ordered my from Amazon a while back and was notified yesterday that it
shipped. Here was my price:

The following items were included in this shipment:
-
Qty  Item   Price  Shipped  Subtotal
-
1Lucene In Action (In Action)   $27.17  1   $27.17

-
   Item Subtotal:  $27.17
 Shipping & Handling:  $3.99 

Super Saver Discount: -$3.99 

   Total:  $27.17

-Original Message-
From: Peter Kim [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 06, 2005 2:16 PM
To: Lucene Users List
Subject: RE: Lucene Book in UK

Does anyone know why Amazon.com lists the list price for Lucene in
Action as $60.95? Bookpool.com has the list price as $44.95, which is
the price that Manning is charging. After discounting, bookpool.com has
it on sale for $27.50.

Looking forward to getting my copy.

Peter 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 06, 2005 1:54 PM
To: Lucene Users List
Subject: Re: Lucene Book in UK

On Jan 6, 2005, at 1:23 PM, David Townsend wrote:

> Sorry if this is the wrong forum but I wondered what's happened to 
> 'Lucene In Action' in the UK.  Looking forward to reading it but 
> amazon.co.uk report it as a 'hard to find' item and are now quoting a
> 4-6 week delivery time and  tacking on a rare book charge.  Amazon.com

> are quoting shipping in 24hrs.  Is this a new 'Boston Tea Party'?

It's news to me that Amazon is shipping it in the U.S. even but I
just checked and you're right!  They *just* got it in stock though, so
I'm sure it takes a bit more time for the U.K. to get it.

It's been shipping from Manning's site for a couple of weeks now though,
and as noted, it includes the e-book along with it.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Lucene Book in UK

2005-01-06 Thread Peter Kim

Does anyone know why Amazon.com lists the list price for Lucene in
Action as $60.95? Bookpool.com has the list price as $44.95, which is
the price that Manning is charging. After discounting, bookpool.com has
it on sale for $27.50.

Looking forward to getting my copy.

Peter 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 06, 2005 1:54 PM
To: Lucene Users List
Subject: Re: Lucene Book in UK

On Jan 6, 2005, at 1:23 PM, David Townsend wrote:

> Sorry if this is the wrong forum but I wondered what's happened to 
> 'Lucene In Action' in the UK.  Looking forward to reading it but 
> amazon.co.uk report it as a 'hard to find' item and are now quoting a
> 4-6 week delivery time and  tacking on a rare book charge.  Amazon.com

> are quoting shipping in 24hrs.  Is this a new 'Boston Tea Party'?

It's news to me that Amazon is shipping it in the U.S. even but I
just checked and you're right!  They *just* got it in stock though, so
I'm sure it takes a bit more time for the U.K. to get it.

It's been shipping from Manning's site for a couple of weeks now though,
and as noted, it includes the e-book along with it.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang

Thanks Doug! You are right, by adding a Thread.sleep() helped greatly.

Mysteries of Java...

Another Java threading question.
With 1 thread, iterations of 100 searches, it took about 850 ms.
by adding a Thread.sleep(10) in the loop. It is taking about 2200 ms.

So there is 2200 - 1850 = 350 ms unaccounted for. Is that due to
thread scheduling/context switching?

Thanks

-John


On Thu, 6 Jan 2005 10:36:12 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Is the operation IndexSearcher.search I/O or CPU bound if I am doing
> 100's of searches on the same query?
> 
> Thanks
> 
> -John
> 
> 
> On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > John Wang wrote:
> > > 1 thread: 445 ms.
> > > 2 threads: 870 ms.
> > > 5 threads: 2200 ms.
> > >
> > > Pretty much the same numbers you'd get if you are running them 
> > > sequentially.
> > >
> > > Any ideas? Am I doing something wrong?
> >
> > If you're performing compute-bound work on a single-processor machine
> > then threading should give you no better performance than sequential,
> > perhaps a bit worse.  If you're performing io-bound work on a
> > single-disk machine then threading should again provide no improvement.
> >   If the task is evenly compute and i/o bound then you could achieve at
> > best a 2x speedup on a single CPU system with a single disk.
> >
> > If you're compute-bound on an N-CPU system then threading should
> > optimally be able to provide a factor of N speedup.
> >
> > Java's scheduling of compute-bound theads when no threads call
> > Thread.sleep() can also be very unfair.
> >
> > Doug
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Book in UK

2005-01-06 Thread Erik Hatcher

On Jan 6, 2005, at 1:23 PM, David Townsend wrote:
Sorry if this is the wrong forum but I wondered what's happened to 
'Lucene In Action' in the UK.  Looking forward to reading it but 
amazon.co.uk report it as a 'hard to find' item and are now quoting a 
4-6 week delivery time and  tacking on a rare book charge.  Amazon.com 
are quoting shipping in 24hrs.  Is this a new 'Boston Tea Party'?
It's news to me that Amazon is shipping it in the U.S. even but I 
just checked and you're right!  They *just* got it in stock though, so 
I'm sure it takes a bit more time for the U.K. to get it.

It's been shipping from Manning's site for a couple of weeks now 
though, and as noted, it includes the e-book along with it.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang

Is the operation IndexSearcher.search I/O or CPU bound if I am doing
100's of searches on the same query?

Thanks

-John


On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> > 1 thread: 445 ms.
> > 2 threads: 870 ms.
> > 5 threads: 2200 ms.
> >
> > Pretty much the same numbers you'd get if you are running them sequentially.
> >
> > Any ideas? Am I doing something wrong?
> 
> If you're performing compute-bound work on a single-processor machine
> then threading should give you no better performance than sequential,
> perhaps a bit worse.  If you're performing io-bound work on a
> single-disk machine then threading should again provide no improvement.
>   If the task is evenly compute and i/o bound then you could achieve at
> best a 2x speedup on a single CPU system with a single disk.
> 
> If you're compute-bound on an N-CPU system then threading should
> optimally be able to provide a factor of N speedup.
> 
> Java's scheduling of compute-bound theads when no threads call
> Thread.sleep() can also be very unfair.
> 
> Doug
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread Doug Cutting

John Wang wrote:
1 thread: 445 ms.
2 threads: 870 ms.
5 threads: 2200 ms.
Pretty much the same numbers you'd get if you are running them sequentially.
Any ideas? Am I doing something wrong?
If you're performing compute-bound work on a single-processor machine 
then threading should give you no better performance than sequential, 
perhaps a bit worse.  If you're performing io-bound work on a 
single-disk machine then threading should again provide no improvement. 
 If the task is evenly compute and i/o bound then you could achieve at 
best a 2x speedup on a single CPU system with a single disk.

If you're compute-bound on an N-CPU system then threading should 
optimally be able to provide a factor of N speedup.

Java's scheduling of compute-bound theads when no threads call 
Thread.sleep() can also be very unfair.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene Book in UK

2005-01-06 Thread [EMAIL PROTECTED]

Have you checked Manning's site (http://www.manning.com), where you can 
order the book directly from them (the publisher) and they will also 
provide you
with a copy of an eBook in the mean time until your paperback arrives in 
mail?

-pedja
P.S. two cubes of sugar with that tea, please :)
David Townsend said the following on 1/6/2005 1:23 PM:
Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In Action' in the UK.  Looking forward to reading it but amazon.co.uk report it as a 'hard to find' item and are now quoting a 4-6 week delivery time and  tacking on a rare book charge.  Amazon.com are quoting shipping in 24hrs.  Is this a new 'Boston Tea Party'? 

cheers
David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene Book in UK

2005-01-06 Thread David Townsend

Sorry if this is the wrong forum but I wondered what's happened to 'Lucene In 
Action' in the UK.  Looking forward to reading it but amazon.co.uk report it as 
a 'hard to find' item and are now quoting a 4-6 week delivery time and  tacking 
on a rare book charge.  Amazon.com are quoting shipping in 24hrs.  Is this a 
new 'Boston Tea Party'? 

cheers

David




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread John Wang

I actually ran a few tests. But seeing similar behaviors.

After removing all the possible variations, this is what I used:

1 Index, doccount is 15,000.
Using FSDirectory, e.g. new IndexSearcher(String path), by default I
think it uses FSDirectory.

each thread is doing 100 iterations of search, e.g.

for (int i=0;i<100;++i){
idxSearcher.search(q);
}

for each thread and each iteration, I am using the same query.

I am timing them the following way:

long start=System.currenTimeInMillis();

for (int i =0;i wrote:
> 
> : This is what we found:
> :
> :  1 thread, search takes 20 ms.
> :
> :   2 threads, search takes 40 ms.
> :
> :   5 threads, search takes 100 ms.
> 
> how big is your index?  What are the term frequencies like in your index?
> how many differnt queries did you try? what was the structure of your
> query objects like?  were you using a RAMDirectory or an FSDirectory? what
> hardware were you running on?
> 
> Is your test application small enough that you can post it to the list?
> 
> I haven't done a lot of PMA testing of Lucene, but from what limited
> testing i have done I'm a little suprised at those numbers, you'd get
> results just as good if you ran the queries sequentially.
> 
> -Hoss
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene appreciation

2005-01-06 Thread Sven Duzont

Hello,

Nice work, this mail is just to say that you already have a french concurrent ;)
www.keljob.com

For now, it's a sql server search engine, but we are planning to implement 
lucene in two or three months.
Of course, we don't handle the same jobs volume (165,000 jobs vs
1,756,000) so it's reasonably  "fast",
Concerning the crawler, we use a proprietary robot made in C/C++, we
plan to move to a java & open src solution this year.
we already have a nice experience with lucene (with jakarta James)
implemented for a recruiter tool (emailed job applications management)
Also planning to implement it (maybe in association with Carrot²) in a
resume search engine.
Lot of work to be done this year so :)

-- 
 Sven Duzont[EMAIL PROTECTED]

38, rue du Sentier / 75002 Paris
Tél. : 00 33 (1) 40 13 63 30
Fax  : 00 33 (1) 40 13 01 84

En Octobre 2004 le Groupe Keljob cest : 
* Le 1er acteur du e-recrutement,
* 477 000 abonnements à l'alerte email, 
* 125 000 CV de moins de 6 mois,
* 6 300 000 annonces lues,
* 2 488 599 visites.


jeudi 16 décembre 2004, 17:26:22, vous avez écrit:


RK> Hello fellow Lucene users,

RK> I'd like to introduce myself and say thanks. We've recently launched
RK> http://www.indeed.com, a search engine for jobs based on Lucene.  I'm
RK> consistently impressed with the quality, professionalism and support of the
RK> Lucene project and the Lucene community. This mailing list has been a great
RK> help. I'd also like to give mention to some of the consultants who had a big
RK> hand in making our project a reality ... Thank you Otis, Aviran, Sergiu &
RK> Dawid.

RK> As for our project, we're in beta and would love to get your feedback. The
RK> index size is currently ~1.8m jobs. My personal email address is rony a_t
RK> indeed.com. If you are interested in Lucene work you can set up an rss feed
RK> or email alert from here:
RK> http://www.indeed.com/search?q=lucene&sort=date 

RK> Is it possible to be added to the Wiki Powered By page?

RK> Thanks Everyone,
RK> Rony


RK> Indeed.com - one search. all Jobs.
RK> http://www.indeed.com


RK> -
RK> To unsubscribe, e-mail: [EMAIL PROTECTED]
RK> For additional commands, e-mail:
RK> [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[ANNOUNCE] dotLucene 1.4.3 RC2 (port of Jakarta Lucene to C#)

2005-01-06 Thread George Aroush

Hi Folks, 

I am pleased to announce the availability of "dotLucene 1.4.3 RC2 build-001"
This is the second "Release Candidate" release of version 1.4.3 of Jakarta
Lucene ported to C# and is intended to be "Final".

Please visit http://www.sourceforge.net/projects/dotlucene/ to learn more
about dotLucene and to download the source code.

Best regards,

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[ANNOUNCE] Highlighter.Net 1.4.0 RC1 (port of lucene Java highlighter to C#)

2005-01-06 Thread George Aroush

Hi Folks, 

I am pleased to announce the availability of "Highlighter.Net 1.4.0 RC1
build 001"  This is the first "Release Candidate" release of version 1.4.0
of Lucene's Java Highlighter ported to C# and is intended to be "Final".

Please visit http://www.sourceforge.net/projects/dotlucene/ to learn more
about Highlighter.Net as well as dotLucene and to download the source code.

Best regards,

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger

On Thu, 6 Jan 2005, Erik Hatcher wrote:

>
> On Jan 6, 2005, at 10:41 AM, Joseph Ottinger wrote:
> > SHouldn't Lucene warn the user if they do something like this?
>
> When a user indexes a null?  Or attempts to write to the index from two
> different IndexWriter instances?
>
> I believe you should get an NPE if you try index a null field value?
> No?

Well, I'd agree - the lack of an exception was rather disturbing,
considering how badly it destroyed Lucene for the application (requiring
not only restart but cleanup as well.)

I don't know Lucene well enough to say "according to the code..." but NOT
adding the null managed to correct the problem entirely.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock obtain timed out from an MDB

2005-01-06 Thread Erik Hatcher

On Jan 6, 2005, at 10:41 AM, Joseph Ottinger wrote:
SHouldn't Lucene warn the user if they do something like this?
When a user indexes a null?  Or attempts to write to the index from two 
different IndexWriter instances?

I believe you should get an NPE if you try index a null field value?  
No?

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread Mariella Di Giacomo

Hi,
I have a question.
How big (in size and documents)  is your index ?
How many indexes do you search ?
Thanks,
Mariella
At 10:54 AM 1/5/2005 -0800, you wrote:
Hi folks:
We are trying to measure thru-put lucene in a multi-threaded 
environment.

This is what we found:
 1 thread, search takes 20 ms.
  2 threads, search takes 40 ms.
  5 threads, search takes 100 ms.
 Seems like under a multi-threaded scenario, thru-put isn't good,
performance is not any better than that of 1 thread.
 I tried to share an IndexSearcher amongst all threads as well as
having an IndexSearcher per thread. Both yield same numbers.
 Is this consistent with what you'd expect?
Thanks
-John
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger

Well, I think I isolated the problem: stupid error on my part, I think. I
was adding an indexed field that had, um, a value of null. Correcting that
made the process go much more properly - although note that I haven't
scaled up to have multiple elements to index. Good milestone, though.

SHouldn't Lucene warn the user if they do something like this?

On Thu, 6 Jan 2005, Erik Hatcher wrote:

> Do you have two threads simultaneously either writing or deleting from
> the index?
>
>   Erik
>
> On Jan 6, 2005, at 9:27 AM, Joseph Ottinger wrote:
>
> > Sorry to reply to my own post, but I now have a greater understanding
> > of
> > PART of my problem - my SQLDirectory is not *quite* right, I think. So
> > I'm
> > rolling back to FSDirectory.
> >
> > Now, I have a servlet that writes to the filesystem to simplify things
> > (as
> > I'm not confident enough to debug the RDMS-based directory yet. That's
> > a
> > task for later, I think). The servlet says it successfully creates the
> > index like so:
> >
> > try {
> >open the index with create=false
> > } catch (file not found) {
> >open the index with create=true
> > }
> > index.optimize();
> > index.close();
> >
> > Now, when I fire off any messages to the MDB, it yields the following:
> >
> > java.io.IOException: Lock obtain timed out:
> > Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock
> > at org.apache.lucene.store.Lock.obtain(Lock.java:58)
> >
> > Now, this is on only two messages to the MDB, not just a flood of
> > messages. Two handlers, so I expect a lock in one's case, but not the
> > first MDB call - it should be the one causing the lock for the second
> > one,
> > if a lock exists at all.
> >
> > I've verified that when the servlet that initializes the index runs, a
> > lock file is NOT present, but again, it looks like every message fired
> > through looks for a lock and finds one, when I would think it wouldn't
> > be
> > there.
> >
> > What am I not understanding?
> >
> > On Thu, 6 Jan 2005, Joseph Ottinger wrote:
> >
> >> If this is a stupid question, I deeply apologize. I'm stumped.
> >>
> >> I have a message-driven EJB using Lucene. In *every* case where the
> >> MDB is
> >> trying to create an index, I'm getting "Lock obtain timed out."
> >>
> >> It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the
> >> user
> >> list has referred to before - but I don't see how the suggestions
> >> there
> >> apply to what I'm trying to do. (It's creating a lock file in
> >> /var/tmp/
> >> properly, from what I can see, so it's not write permissions, I
> >> imagine.)
> >>
> >> I set the infoStream in my index writer to System.out, but I don't
> >> see any
> >> extra information.
> >>
> >> I'm using a SQL-based Directory object, but I get the same problem if
> >> I
> >> refer to a file directly.
> >>
> >> Is there a way to override the Lock portably so that I can have the
> >> lock
> >> itself managed in an RDMS? (It's a J2EE project, so relying on file
> >> access
> >> is problematic; if the beans using lucene to write to the index are on
> >> multiple servers, multiple locks could exist anyway.)
> >>
> >> --
> >> -
> >> Joseph B. Ottinger
> >> http://enigmastation.com
> >> IT Consultant
> >> [EMAIL PROTECTED]
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >
> > ---
> > Joseph B. Ottinger http://enigmastation.com
> > IT Consultant[EMAIL PROTECTED]
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock obtain timed out from an MDB

2005-01-06 Thread Erik Hatcher

Do you have two threads simultaneously either writing or deleting from  
the index?

Erik
On Jan 6, 2005, at 9:27 AM, Joseph Ottinger wrote:
Sorry to reply to my own post, but I now have a greater understanding  
of
PART of my problem - my SQLDirectory is not *quite* right, I think. So  
I'm
rolling back to FSDirectory.

Now, I have a servlet that writes to the filesystem to simplify things  
(as
I'm not confident enough to debug the RDMS-based directory yet. That's  
a
task for later, I think). The servlet says it successfully creates the
index like so:

try {
   open the index with create=false
} catch (file not found) {
   open the index with create=true
}
index.optimize();
index.close();
Now, when I fire off any messages to the MDB, it yields the following:
java.io.IOException: Lock obtain timed out:
Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)
Now, this is on only two messages to the MDB, not just a flood of
messages. Two handlers, so I expect a lock in one's case, but not the
first MDB call - it should be the one causing the lock for the second  
one,
if a lock exists at all.

I've verified that when the servlet that initializes the index runs, a
lock file is NOT present, but again, it looks like every message fired
through looks for a lock and finds one, when I would think it wouldn't  
be
there.

What am I not understanding?
On Thu, 6 Jan 2005, Joseph Ottinger wrote:
If this is a stupid question, I deeply apologize. I'm stumped.
I have a message-driven EJB using Lucene. In *every* case where the  
MDB is
trying to create an index, I'm getting "Lock obtain timed out."

It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the  
user
list has referred to before - but I don't see how the suggestions  
there
apply to what I'm trying to do. (It's creating a lock file in  
/var/tmp/
properly, from what I can see, so it's not write permissions, I  
imagine.)

I set the infoStream in my index writer to System.out, but I don't  
see any
extra information.

I'm using a SQL-based Directory object, but I get the same problem if  
I
refer to a file directly.

Is there a way to override the Lock portably so that I can have the  
lock
itself managed in an RDMS? (It's a J2EE project, so relying on file  
access
is problematic; if the beans using lucene to write to the index are on
multiple servers, multiple locks could exist anyway.)

-- 
-
Joseph B. Ottinger  
http://enigmastation.com
IT Consultant 
[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger

Sorry to reply to my own post, but I now have a greater understanding of
PART of my problem - my SQLDirectory is not *quite* right, I think. So I'm
rolling back to FSDirectory.

Now, I have a servlet that writes to the filesystem to simplify things (as
I'm not confident enough to debug the RDMS-based directory yet. That's a
task for later, I think). The servlet says it successfully creates the
index like so:

try {
   open the index with create=false
} catch (file not found) {
   open the index with create=true
}
index.optimize();
index.close();

Now, when I fire off any messages to the MDB, it yields the following:

java.io.IOException: Lock obtain timed out:
Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)

Now, this is on only two messages to the MDB, not just a flood of
messages. Two handlers, so I expect a lock in one's case, but not the
first MDB call - it should be the one causing the lock for the second one,
if a lock exists at all.

I've verified that when the servlet that initializes the index runs, a
lock file is NOT present, but again, it looks like every message fired
through looks for a lock and finds one, when I would think it wouldn't be
there.

What am I not understanding?

On Thu, 6 Jan 2005, Joseph Ottinger wrote:

> If this is a stupid question, I deeply apologize. I'm stumped.
>
> I have a message-driven EJB using Lucene. In *every* case where the MDB is
> trying to create an index, I'm getting "Lock obtain timed out."
>
> It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user
> list has referred to before - but I don't see how the suggestions there
> apply to what I'm trying to do. (It's creating a lock file in /var/tmp/
> properly, from what I can see, so it's not write permissions, I imagine.)
>
> I set the infoStream in my index writer to System.out, but I don't see any
> extra information.
>
> I'm using a SQL-based Directory object, but I get the same problem if I
> refer to a file directly.
>
> Is there a way to override the Lock portably so that I can have the lock
> itself managed in an RDMS? (It's a J2EE project, so relying on file access
> is problematic; if the beans using lucene to write to the index are on
> multiple servers, multiple locks could exist anyway.)
>
> ---
> Joseph B. Ottinger http://enigmastation.com
> IT Consultant[EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Span Query Performance

2005-01-06 Thread Paul Elschot

Sorry for the duplicate on lucene-dev, it should have gone to lucene-user 
directly:

A bit more:

On Thursday 06 January 2005 10:22, Paul Elschot wrote:
> On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
> > Hi all,
> > 
> > I'm currently doing a query similar to the following:
> > 
> > for w in wordset:
> >     query = w near (word1 V word2 V word3 ... V word1422);
> >     perform query
> > 
> > and I am doing this through SpanQuery.getSpans(), iterating through the 
> > spans and counting
> > the matches, which can result in 4782282 matches (essentially I am only 
> > after the match count).
> > The query works but the performance can be somewhat slow; so I am 
wondering:
> > 
...
> > c) Is there a faster method to what I am doing I should consider?
> 
> Preindexing all word combinations that you're interested in.
> 

In case you know all the words in advance, you could also index a
helper word at the same position as each of those words.
This requires a custom analyzer that inserts the helper word in the
token stream with a zero position increment.
The query then simplifies to:
query = w near helperword
which would probably speed things up significantly.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: 1.4.3 breaks 1.4.1 QueryParser functionality

2005-01-06 Thread Erik Hatcher

On Jan 5, 2005, at 5:04 AM, Erik Hatcher wrote:
On Jan 4, 2005, at 9:43 PM, Bill Janssen wrote:
Let me be a bit more explicit.  My method (essentially an
after-method, for those Lisp'rs out there) begins thusly:
protected Query getFieldQuery (String field,
   Analyzer a,
   String queryText)
throws ParseException {
  Query x = super.getFieldQuery(field, a, queryText);
  ...
}
If I remove the "Analyzer a" from both the signature and the super
call, the super call won't compile because that method isn't in the
QueryParser in 1.4.1.  But my getFieldQuery() method won't even be
called in 1.4.1, because it doesn't exist in that version of the
QueryParser.
Will it work if you override this method also?
protected Query getFieldQuery(String field,
  Analyzer analyzer,
  String queryText,
  int slop)
My head is spinning looking at all the various signatures of this 
method we have and trying to backtrack where things went awry.
I tried out my suggestion (code pasted below) against 
lucene-1.4-final.jar and lucene-1.4-3.jar (I don't have the 1.4.1 JAR 
handy) and was successful.  If you override both signatures of 
getFieldQuery it should work fine for you across all 1.4.x versions.  
Not ideal, but at least a workaround.

Is this workable for you, Bill?
Erik
public class CustomQueryParser extends QueryParser {
public CustomQueryParser(String field, Analyzer analyzer) {
super(field, analyzer);
}
protected Query getFieldQuery(String field, Analyzer analyzer, 
String queryText, int slop) throws ParseException {
System.out.println("(slop) queryText = " + queryText);
return null;
}

protected Query getFieldQuery (String field,
  Analyzer a,
   String queryText)
throws ParseException {
System.out.println("(no-slop) queryText = " + queryText);
return null;
}
   public static void main(String[] args) throws Exception {
CustomQueryParser qp = new CustomQueryParser("f", new 
WhitespaceAnalyzer());
qp.parse("foo bar");
qp.parse("\"foo bar\"");
}
}

The output was identical with both versions of Lucene:
(no-slop) queryText = foo
(no-slop) queryText = bar
(slop) queryText = foo bar
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger

If this is a stupid question, I deeply apologize. I'm stumped.

I have a message-driven EJB using Lucene. In *every* case where the MDB is
trying to create an index, I'm getting "Lock obtain timed out."

It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user
list has referred to before - but I don't see how the suggestions there
apply to what I'm trying to do. (It's creating a lock file in /var/tmp/
properly, from what I can see, so it's not write permissions, I imagine.)

I set the infoStream in my index writer to System.out, but I don't see any
extra information.

I'm using a SQL-based Directory object, but I get the same problem if I
refer to a file directly.

Is there a way to override the Lock portably so that I can have the lock
itself managed in an RDMS? (It's a J2EE project, so relying on file access
is problematic; if the beans using lucene to write to the index are on
multiple servers, multiple locks could exist anyway.)

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Null Pointer Exception

2005-01-06 Thread Rupinder Singh Mazara

Do'H , Please disregard the message
forgot to assign the return value when i create the indexreader
Rupinder Singh Mazara wrote:
Hi all
 while executing a query  on lucene i get the following execption, if 
a check for the IndexSearcher object == nulll
 or a assert i donot get any errors ?  Please help me out on this . 
java.lang.NullPointerException
   at 
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69)
   at org.apache.lucene.search.Similarity.idf(Similarity.java:255)
   at 
org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.java:47) 

   at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.sumOfSquaredWeights(BooleanQuery.java:110) 

   at org.apache.lucene.search.Query.weight(Query.java:86)
   at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)

To access the Searchable object I use the following lines of  code, at 
various places in my web application, all was fine till this morning and
running command line test scripts does not show a error

 public static IndexReader fetchCitationReader(ServletContext context) 
throws IOException {

   IndexReader rval = (IndexReader) 
context.getAttribute("luceneIndexReader");

   if (rval == null) {
   String var = (String) context.getAttribute("luceneRootName");
   System.out.println("var = " + var);
   IndexReader indexReader = IndexReader.open(new File(var));
   context.setAttribute("luceneIndexReader", indexReader);
   }
   return rval;
   }
   public static Searcher fetchCitationSearcher(ServletContext context)
   throws IOException {
   Searcher rval = (Searcher) 
context.getAttribute("luceneSearchable");

   if (rval == null) {
   rval = new IndexSearcher(fetchCitationReader(context));
   context.setAttribute("luceneSearchable", rval);
   }
   return rval;
   }

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Null Pointer Exception

2005-01-06 Thread Rupinder Singh Mazara

Hi all
 while executing a query  on lucene i get the following execption, if a 
check for the IndexSearcher object == nulll
 or a assert i donot get any errors ?  Please help me out on this . 
java.lang.NullPointerException
   at 
org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69)
   at org.apache.lucene.search.Similarity.idf(Similarity.java:255)
   at 
org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery.java:47)
   at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.sumOfSquaredWeights(BooleanQuery.java:110)
   at org.apache.lucene.search.Query.weight(Query.java:86)
   at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85)

To access the Searchable object I use the following lines of  code, at 
various places in my web application, all was fine till this morning and
running command line test scripts does not show a error

 public static IndexReader fetchCitationReader(ServletContext context) 
throws IOException {

   IndexReader rval = (IndexReader) 
context.getAttribute("luceneIndexReader");

   if (rval == null) {
   String var = (String) context.getAttribute("luceneRootName");
   System.out.println("var = " + var);
   IndexReader indexReader = IndexReader.open(new File(var));
   context.setAttribute("luceneIndexReader", indexReader);
   }
   return rval;
   }
   public static Searcher fetchCitationSearcher(ServletContext context)
   throws IOException {
   Searcher rval = (Searcher) context.getAttribute("luceneSearchable");
   if (rval == null) {
   rval = new IndexSearcher(fetchCitationReader(context));
   context.setAttribute("luceneSearchable", rval);
   }
   return rval;
   }

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Span Query Performance

2005-01-06 Thread Paul Elschot

On Thursday 06 January 2005 02:17, Andrew Cunningham wrote:
> Hi all,
> 
> I'm currently doing a query similar to the following:
> 
> for w in wordset:
> query = w near (word1 V word2 V word3 ... V word1422);
> perform query
> 
> and I am doing this through SpanQuery.getSpans(), iterating through the 
> spans and counting
> the matches, which can result in 4782282 matches (essentially I am only 
> after the match count).
> The query works but the performance can be somewhat slow; so I am wondering:
> 
> a) Would the query potentially run faster if I used 
> Searcher.search(query) with a custom similarity,
> or do both methods essentially use the same mechanics

It would be somewhat slower, because it loops over the getSpans()
and computes document scores and constructs a Hits from the scores.

> b) Does using a RAMDirectory improve query performance any significant 
> amount.

That depends on your operating system, the size of the index, the amount
of RAM you can use, the file buffering efficiency, other loads on the 
computer ...
 
> c) Is there a faster method to what I am doing I should consider?

Preindexing all word combinations that you're interested in.

Regards,
Paul Elschot
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multi-threaded thru-put in lucene

2005-01-06 Thread Chris Hostetter


: This is what we found:
:
:  1 thread, search takes 20 ms.
:
:   2 threads, search takes 40 ms.
:
:   5 threads, search takes 100 ms.

how big is your index?  What are the term frequencies like in your index?
how many differnt queries did you try? what was the structure of your
query objects like?  were you using a RAMDirectory or an FSDirectory? what
hardware were you running on?

Is your test application small enough that you can post it to the list?

I haven't done a lot of PMA testing of Lucene, but from what limited
testing i have done I'm a little suprised at those numbers, you'd get
results just as good if you ran the queries sequentially.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question about Analyzer and words spelled in different languages

2005-01-06 Thread Chris Hostetter


: Is there any already written analyzer that would take that name
: (Schäffer or any other name that has entities) so that
: Lucene index could searched (once the field has been indexed) for the real
: version of the name, which is
:
: Schäffer
:
: and the english spelled version of the name which is
:
: Schaffer

I don't know about the un-xml-escaping part of things (there are lots
of xml escapng libraries out there, i'm sure one of them has an unescape)
but there was a recent discussion about unicode characters that look
similar and writting an analyzer that could know about them.  the last
message in the thread was from me, pointing out that it should be easy to
build the mapping table once, and then write a quick and dirty Analyzer
filter to use it ... but no one seemed to have any code handy that
allready did that...

http://mail-archives.apache.org/eyebrowse/[EMAIL 
PROTECTED]&by=thread&from=962022


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

43 matches

Mail list logo