Hi all, A good place to look for examples of the many features of Lucene.Net is the NUnit test code. Bring up the "Test" project in VS.NET and do a search in the "Demo" folder on the string "HitCollector" and you will find examples on how to use it.
Also, may I suggest "Lucene In Action" book? At least visit http://lucenebook.com/ and download the Java code of the book which has a lot of examples about Lucene. Regards, -- George Aroush -----Original Message----- From: Neil Carson [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 31, 2006 11:28 AM To: [email protected]; [email protected] Subject: RE: Storing primary key / Change lucene's document ID Sorry, no, haven't written it yet. ________________________________ From: Kaufmann M. [mailto:[EMAIL PROTECTED] Sent: Tue 10/31/2006 2:50 AM To: [email protected] Subject: Re: Storing primary key / Change lucene's document ID Hello Neil, Can you send me any Link on a sample or similar for using HitCollector & FieldCache? I do not seem to find anything but the API Documentation (simple Links) in the DotLucene Documentation. Thanks! Best Regards, Marc On 10/30/06, Neil Carson <[EMAIL PROTECTED]> wrote: > > We are going through this now. > > Having Lucene retrieve the docs is slow. > > The recommendation from Doug on some old mailing lists I found was, to > use a HitCollector (since the standard search mechanism re-queries > after accessing mroe than doc 100), and to use the FieldCache to > maintain a mapping of Lucene document ID <-> your primary key. > > We are planning to do this soon, for same reason - search is fast, > document retrieval is very slow. > > I noticed in Java version, the FieldCache is implemented with a weak > hashmap. I don't know if this is the case in .NET or not (it looked > more like a regular one on a quick initial inspection). > > Hope this helps. > > Neil > > ________________________________ > > From: Kaufmann M. [mailto:[EMAIL PROTECTED] > Sent: Mon 10/30/2006 6:44 AM > To: [email protected] > Subject: Re: Storing primary key / Change lucene's document ID > > > > Hello Jon, > The most difference in time needed I have found was between: > console.writeln(hits.id(i)) > and > console.writeln(hits.doc(i).get(fieldName) > > If I return the internal ID within this code, it is a lot faster than > returning a field-name trough ...get(). > > Overview of the current code: > dim qry as search.query=(...) > dim sw as new io.streamwriter(...) > dim hits as search.hits > hits=lis.search(qry) (lis is defined once at the start of code) > console.write(hits.length) > console.write(" writing file ") > dim intposmax as integer=hits.length-1 for intpos as integer=0 to > intposmax > if not intPos=0 then sw.write(",") > sw.write(hits.doc(intpos).get("id").tostring > next > sw.close > console.write(" - bulk insert ") > > ... bulk insert from sw.write file > > so you can see the time needed from search and bulk insert in the console. > Bulk insert is not as fast on large resultsets, but the search is > still slower - so my primary bottleneck :). > > I already did some tests from hits.id(intPos) to hits.doc > (intpos).get("id") > - those two had a big difference in time to take... > > Best Regards, Marc > > > > On 10/30/06, Jon Palmer <[EMAIL PROTECTED]> wrote: > > > > Marc, > > > > > > > > Can you give a few more details of how you are searching lucene. > > Maybe some pseudo code of the method that is fast and the one that > > is slow. I think you suggesting that there is a very large > > performance hit for doing this: > > > > > > > > DocID = Hits.Doc(i).Get("ID") > > > > > > > > rather than: > > > > > > > > DocID = Hits.ID(i) > > > > > > > > > > > > JP > > > > > > > > P.S. Your numbers suggested that your problem is mostly linear. It > > looks like you method has some setup cost and then processes approx > > 300 Id's a second > > > > > > > > 18260 ID's - 72.2 s -avg 253/s > > > > 3000 ID's - 10.02s -avg 294/s > > > > 830 ID's - 2.25s -avg 368/s > > > > 352 ID's - 1.08s -avg 325/s > > > > 350 ID's - 0.98s -avg 357/s > > > > 278 ID's - 0.48s -avg 162/s > > > > 96 ID's - 1.05s -avg 91/s > > > > 29 ID's - 0.66s -avg 43/s > > > > > > > > Given this linear-ish behavior are you sure that the bottle neck is > > not writing back to file or to SQL? > > > > > > > > > > > > > > > > -----Original Message----- > > From: Kaufmann M. [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 30, 2006 5:11 AM > > To: [email protected] > > Subject: Re: Storing primary key / Change lucene's document ID > > > > > > > > Hello George, > > > > The Problem is the speed, some samples: > > > > > > > > All Counts include writing IDs to file and BULK Insert to SQL: > > > > 18260 ID's - 72.2 s > > > > 352 ID's - 1.08s > > > > 96 ID's - 1.05s > > > > 29 ID's - 0.66s > > > > 3000 ID's - 10.02s > > > > 350 ID's - 0.98s > > > > 278 ID's - 0.48s > > > > 830 ID's - 2.25s > > > > > > > > As you can see - the time it takes for Records >500 is absolutely > > slow... > > > > If I write back the internal ID - it's a LOT faster... > > > > > > > > I'm not using the lucene-ordering because this also slowed down the > > > > returning process a lot. > > > > And I'd like to count the results in different ways (which I was not > > able to > > > > do in lucene) so I have to give back all ID's into SQL... > > > > > > > > Thanks for helpin'! > > > > > > > > > > > > On 10/30/06, George Aroush <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Hi Marc, > > > > > > > > > > You can't depend on Lucene's internal ID, it will change every > > > time > > when > > > > > you > > > > > update the index -- this is something you can't control. The way > > > you > > are > > > > > currently doing it, by storing an ID in a field named "id" is the > > right > > > > > way > > > > > to do it. Don't worry about slowing down Lucene if you call the > > > API > > to > > > > > get > > > > > the ID of your field "id". Lucene is supper fast. > > > > > > > > > > Regards, > > > > > > > > > > -- George Aroush > > > > > > > > > > -----Original Message----- > > > > > From: Kaufmann M. [mailto:[EMAIL PROTECTED] > > > > > Sent: Friday, October 27, 2006 4:20 PM > > > > > To: [email protected] > > > > > Subject: Storing primary key / Change lucene's document ID > > > > > > > > > > Hello everybody, > > > > > I've got a little question concerning the unique ID stored in the > > Lucene > > > > > index (hits.ID(i)). > > > > > Is it possible to change this ID, or set it on doc.add? > > > > > > > > > > Currently I'm running a test-project wich stores an external > > > primary > > key > > > > > in > > > > > a field named 'id', but if I call it from the search-engine I have > > > to > > use > > > > > the get-method - wich slows it down. > > > > > If I could use this primary key as lucene-ID the whole engine > > > would be > > a > > > > > lot > > > > > faster because I just need the ID's returned... > > > > > > > > > > Does anybody know if this is possible? > > > > > > > > > > Thanks! > > > > > Best Regards, Marc > > > > > > > > > > > > > > > > > > > > > > > >
