[htdig] Understanding htmerge

Jonas Pasche Wed, 04 Jun 2008 09:21:27 -0700

Hi there,

I'm relatively new to ht://Dig and did a successful install. However, to
proceed, I'd like to get a better understanding about how the modules
work together.


I'm especially interested in understanding "htmerge" in an installation
with a single database.

http://www.htdig.org/running.html says that "rundig" runs "htdig", then
"htmerge":

        After setting up all the configuration files, you can build the
        required databases simply by running rundig. This script will
        run htdig first to build the initial database, then it runs
        htmerge to create a document index and word database from the
        files that were created by htdig.

http://www.htdig.org/rundig.html correspondingly tells the same:

        It runs htdig first to build the initial database, then it runs
        htmerge to create a document index and word database from the
        files that were created by htdig.

However, when I take a look at the current rundig script ...

http://htdig.cvs.sourceforge.net/htdig/htdig/installdir/rundig?view=markup

... it can be clearly seen that rundig, in fact, does NOT run htmerge.

So what's the point with htmerge? I don't have an idea what it's for!

Which brings me to the list of files created by htdig. Judging from the
htdig man page, it creates/updates ...

db.docdb
  ("Stores data about each document (title, url, etc.)"

db.words.db
db.words.db_weakcmpr
  ("Record which documents each word occurs in")

db.excerpts
  ("Stores start of each document to show context of matches")

It does *not* mention the index file, db.docs.index.

But when I look into my database directory after running htdig (but NOT
running htmerge!), I'm seeing:

-rw-rw-r-- 1 ironkyo ironkyo    663552 Jun  4 12:36 db.docdb
-rw-rw-r-- 1 ironkyo ironkyo     90112 Jun  4 12:36 db.docs.index
-rw-rw-r-- 1 ironkyo ironkyo  15425536 Jun  4 12:36 db.excerpts
-rw-rw-r-- 1 ironkyo ironkyo 155675648 Jun  4 12:36 db.words.db
-rw-rw-r-- 1 ironkyo ironkyo     16384 Jun  4 12:36 db.words.db_weakcmpr

So obviously db.docs.index is up-to-date, too, even if it is the
_htmerge_ man page which days:

        Htmerge is used to create a document index and word database
        from the files that were created by htdig.

But obviously, htdig creates the index himself ... and running htmerge
obviously doesn't do anything - all the database files and their
timestamps simply don't change.

So, once again, what's the point with htmerge?

Thanks a lot for your help with understanding what's going on!

Jonas


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

[htdig] Understanding htmerge

Reply via email to