Re: Structure of .tii-file

Alexander vom Berg Tue, 27 Jul 2010 09:06:57 -0700

Hello Mike,

Am 27.07.2010 14:38, schrieb Michael McCandless:

On Tue, Jul 27, 2010 at 7:58 AM, Alexander vom Berg<m...@avomberg.de>  wrote:

Hello Mike,


thanks for your answer!
I am currently working with Lucene 3.0.1 and except the .tii - file all
other descriptions are comprehensible.
The idea behind the tii/tis file structure is for faster retrieving the
correct terms.
At first I lookup in memory (tii-file) and take the most nearby hit. With
this information I can skip to the correct position in the tis-file and scan
up to my final hit. I don't exactly understand how this skipping is
realized.
Do I have a direct pointer to the postion on the hard drive? Or how do I
find the term without having to much file access? :D

Yes, you have to seek the tis file handle, then you do .next() until
the term matches.  Maybe you stop there, eg if you're just looking for
say the docFreq of that term.  Or, if you then need to iterate the
docs/positions, from that term entry you have the long file pointers
of frq and prx files, which you must seek to and decode.

Btw, what is it that you are doing?  You seem to be re-inventing
Lucene :)  You could simply use Lucene's low level APIs to do this...

this was meant more as a question and if my assumptions how Lucene worksare correct. :) Sorry for beeing unclear.

I don't want to implement it myself!

My intention behind this is that I want to run some performance tests on an
created index with different block sizes of the hard drive.
Can I just copy this created index on another drive (with different
blocksize) or do I have to generate the hole index again?

Ahhh.

You mean the block size of the underlying filesystem?  If so, then
copying will be fine in that the resulting index will function
correctly.

However, this may not be a fair performance test since with 'cp'
presumably the IO system may have optimized how the files are
allocated to blocks on disk. Ie, you'll get a different allocation
than had Lucene directly opened these files and written them itself on
the 2nd file system.  You could test both approaches and see if
there's a difference!

Do you mean problems with fragmentation here? Or what exactly is thedifference after I copy the index (faster because it's defragmented?)?What happens if I use the copy-Method fromorg.apache.lucene.store.Directory?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Best regards
Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Structure of .tii-file

Reply via email to