Hello,
Accessing http://incubator.apache.org/lucy/docs/perl/ is spotty at the
moment, so I'm reading the man pages...
I take it $index->add_index($other_index) is the method to merge
multiple indexes?
I'm thinking of the most efficient way to merge a batch of thousands of indexes:
pseudo-code:
# TRY1 - hell for leather
$master_index = Lucy::Index::Indexer->new...
foreach $sub_index (...) {
$master_index->add_index($sub_index);
}
$master_index->commit;
Now, I imagine this is no problem for a handful of sub_indexes, but
what are the risks when this involves thousands? Are there any kind
of limitations or pitfalls I should be aware of when doing this?
# TRY2 - tippy-toe
$cnt=0;
$MAX=1000;
foreach $sub_index (...) {
$master_index->add_index($sub_index);
if ($cnt++ > $MAX) { $cnt=0; $master_index->commit();
$master_index = Lucy::Index::Indexer->new($master_index,
create=>0,truncate=>0...; }
}
$master_index->commit unless $already_committed;
or,
# TRY3 - depending on whether I grok prepare_commit()
$cnt=0;
foreach $sub_index (...) {
$master_index->add_index($sub_index);
if ($cnt++ > $MAX) { $cnt=0; $master_index->prepare_commit(); }
}
$master_index->commit;
The question is also what's the most efficient $MAX (I imagine it
depends on RAM if stuff is kept therein before a commit)... or should
I not overcomplicate things and simply allow Lucy to worry about the
internals and gun for TRY1? TRY2 allows me an opportunity to check
on-disk $master_index size after a commit (are the buffers flushed
after a commit and things committed to disk so a qx(du -sh $master)
reflects actual size?),... I lean towards TRY2, or will TRY3 also
commit to disk?...
Comments?
Thanks