According to Jay Staton:
> Thanks for the help Gilles. Merci.
> 
> This is what I've done so far. i created a test directory with some 
> random files from the main site in it and specified that directory in 
> the htdig.conf file and ran rundig. it worked fine! so now i'm going 
> through each individual file form the site (there are a few hundred) 
> and trying to find out which one is the culprit.

Well, unless the htdig -vvv output you sent me was doctored beyond
recognition, the culprit seemed to be you main (home) page for your
domain.  That seemed to be the only document it looked at, and it
found no words in it.

> do you know which 
> order htdig tries to index the files in? which file is it looking for 
> first? is it alphabetically?

No, it doesn't sort URLs alphabetically or in any other way.  It
essentially queues them up as it finds them.  If you're only indexing
one host, it ends up doing a "depth-order traversal" of the hiearchy
of links it finds.  That means it takes all the URLs in start_url and
indexes them first, in the order they're given, and for any links it
finds in these documents, it queues them up so they're parsed next,
again in the order it finds the links.  It goes on down levels of links
that way until no new valid links are found.

> and i concealed the identity of the client 
> because i just started this job and i don't want to do anything too 
> incriminating right away.
> 
> and another thing, it worked fine until a few weeks ago. a few files 
> were modified, but that is all that i am aware of. the site is almost 
> entirely PHP and HTML. i have told htdig to exclude EVERYTHING else.

Well, I suggest you take a closer look at what exactly did change in
the past few weeks.  It doesn't take much to throw a monkey wrench into
the works.  See if htdig -ivvvvvv doesn't give you more clues.

> On Thursday, November 7, 2002, at 04:37 PM, Gilles Detillieux wrote:
> > According to Jay Staton:
> >> I am extremely new to using htdig, and I'm trying to update the index
> >> of one of our client's sites and I get the "DB2 problem...: missing or
> >> empty key value specified" error message. I have read everything
> >> previously posted about this topic and it seems that no one has been
> >> able to give a clear answer to resolving it.
> >
> > This is almost always due to there being no indexable documents, or no
> > indexable texts in the documents that htdig is given.  The tricky part
> > is figuring out why htdig is unable to parse and index anything.
> >
> >> I have tried running
> >> rundig -vvv -s and there is nothing in it that tells me where the
> >> problem is. The results are below:
> > ...
> >> 1:1:http://www.[domain].com/
> >> New server: www. [domain].com, 80
> > ...
> >> Read a total of 18951 bytes
> >>   size = 18951
> >> pick: www. [domain].com, # servers = 1
> >> htdig: Run complete
> >> htdig: 1 server seen:
> >> htdig:     www. [domain].com:80 1 document
> >> htmerge: Sorting...
> >> DB2 problem...: missing or empty key value specified
> >>
> >> htmerge: Total word count: 0
> >> 0/http://www. [domain].com/
> >>
> >> htmerge: Total documents: 1
> >> htmerge: Total size of documents (in K): 18
> >> Preamble text:
> >> =======================================
> >>
> >> If anyone could please help me, or at least tell me what the above
> >> means, I would be forever grateful. This problem is driving me insane!
> >
> > Well, the above tells me that htdig was able to successfully read in a
> > 18951 byte file for the http://www.[domain].com/ URL.  However, it 
> > doesn't
> > report any "href" entries, so apparently htdig couldn't find any HTML
> > links in this file.  Also, the "Total word count: 0" message suggests
> > htdig couldn't find any words in the file either.
> >
> > You may get more clues by using more verbosity, e.g. -vvvvvv, but if
> > that doesn't help, you may need to post the contents of the document
> > so that someone can look at it and figure out what's causing htdig to
> > skip over everything.  Given your edits of the htdig output to conceal
> > the domain, though, I would guess that this is something you don't want
> > everyone to see.
> >
> > Also have a look at the FAQ on the htdig.org web site.  htdig doesn't
> > index JavaScript content or links, only HTML (FAQ 5.18).  You should
> > also check to make sure the document doesn't have a meta robots tag 
> > that
> > prevents indexing (FAQ 4.15 & 4.22).  You'd need at least -vvvv to have
> > htdig report such tags.


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to