Hi everybody!
I'm having a problem with htdig not indexing the body of
MOST (but not all) pages using a certain layout. Here's a
good example:
http://search.wested.org/cgi-bin/htsearch?config=smu.net.htdig&words=resiliency
You see there's no excerpt for the first match, and in fact,
it's clear that htdig didn't index any part of that page
except the title - if you search on keywords on other parts
of that page, this page won't come up.
- At first, I thought it was something about the code on the
layout, because it seemed to be doing the same thing to any
page with that layout. But notice the third match. It's
using the same layout, but the entire page is indexed.
- It's also not the size of the pages. You see the third
match, fully indexed, is 32k whereas the first match, not
indexed, is only 27k.
- It's not the <!--htdignoindex--> tag. I've set the noindex
tag to be <htdignoindex>, and I'm using it successfully on
pages using the "NCLB" layout, like the second match.
You can see more of that here:
http://search.wested.org/cgi-bin/htsearch?config=smu.net.htdig&words=ideas%20action
... you see that all the "NCLB"-titled pages are being fully
indexed, and they all use the simple layout. The ones using
the main layout are not being indexed fully, however. The
NCLB pages are also using the <htdignoindex> tag
successfully; you see that it's keeping the "Back to NCLB
Overview" from showing up in the search results.
Can any of you see anything that's causing this problem?
I've been poking at this for days, and now I'm sure it must
be something obvious that's right before my eyes that I'm
just failing to see.
One more thing - the one page using the main layout that
does seem to be indexed fully is also the main page of the
site. So I wondered, is it just showing up because it's the
first indexed page? But I tried starting the indexing from
other pages on the site, and it's still the only page using
the main layout that gets indexed fully.
Right now htdig is set to no_excerpt_show_top, which is why
you're seeing blank spaces. If I take that off, it will show
the default message about your search terms not showing in
the top of the document. So that's cool.
Thanks so much for your time!
- Nada O'Neal
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general