Apparently tp.nextPosition() is needed :( Any ideas? -John
On Thu, Apr 3, 2008 at 8:20 AM, John Wang <[EMAIL PROTECTED]> wrote: > I am loading both from disk. > But I found the culprit: > > My code: > > while (tp.next()) > > { > > //assert tp.doc() < maxDoc; > > tp.nextPosition(); <-- this call is the problem > > tp.getPayload(payloadBuffer, 0); > > byter.load(_array, tp.doc(), payloadBuffer); > > } > > The way I stored it, there is one position per doc. Removed call to > tp.nextPosition, performance improved by a factor of multiple digits. > > I would think this call should be free. > > > > Thanks > > -John > > On Thu, Apr 3, 2008 at 8:16 AM, Chris Lu <[EMAIL PROTECTED]> wrote: > > > If your index size grows larger, payload method would be more slower. > > It's because Payload are read from hard disk. Fieldcache is in the > > memory, which is much faster. > > > > Unless you are going with Solid State Disk, you'd better go with > > Fieldcache for faster search. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > > > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > DBSight customer, a shopping comparison site, (anonymous per request) > > got 2.6 Million Euro funding! > > > > > > On Thu, Apr 3, 2008 at 7:36 AM, John Wang <[EMAIL PROTECTED]> wrote: > > > Sorry, gmail was screwy and accidentally sent the msg. > > > Anyway, > > > > > > I have a large index, about 30M docs. > > > I have a date field (by days) and there are about 1000 of them, every > > doc > > > has a date field filled in. > > > > > > So out of curiosity I index the date field two ways: > > > 1) using "date" as a field, and set the date value for each doc. > > > 2) new term: "_payload:_val" and added the date (as a long or 8 byte > > array) > > > into the payload of each doc. > > > > > > loading into an array long[] of length maxdoc of dates, the > > performance was > > > surprising: > > > using payload is 7 times slower than using fieldcache. > > > > > > At first I thought it was because of the conversion between byte[8] > > to a > > > long for each doc, I changed it so it loads into byte[8*maxdoc] > > without > > > doing the conversion, and the result is the same. > > > > > > I then did another experiment: > > > lower the number of dates down to a small number, e.g. 50, and timed > > field > > > cache load, and it took much longer than when it had 1000. > > > > > > I did some profiling and the profiler is pointing to > > TermPositions.next > > > and TermPositions.nextPosition and TermPositions.getPayload as the > > culprit. > > > > > > I would think payload would always be faster. Any ideas? > > > > > > Thanks > > > -John > > > > > > On Thu, Apr 3, 2008 at 7:27 AM, John Wang <[EMAIL PROTECTED]> > > wrote: > > > > > > > Hi: > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >