Hi everybody!

I'm having a problem with htdig not indexing the body of MOST (but not all) pages using a certain layout. Here's a good example:

http://search.wested.org/cgi-bin/htsearch?config=smu.net.htdig&words=resiliency
You see there's no excerpt for the first match, and in fact, it's clear that htdig didn't index any part of that page except the title - if you search on keywords on other parts of that page, this page won't come up.

- At first, I thought it was something about the code on the layout, because it seemed to be doing the same thing to any page with that layout. But notice the third match. It's using the same layout, but the entire page is indexed. - It's also not the size of the pages. You see the third match, fully indexed, is 32k whereas the first match, not indexed, is only 27k. - It's not the <!--htdignoindex--> tag. I've set the noindex tag to be <htdignoindex>, and I'm using it successfully on pages using the "NCLB" layout, like the second match.

You can see more of that here:

http://search.wested.org/cgi-bin/htsearch?config=smu.net.htdig&words=ideas%20action
... you see that all the "NCLB"-titled pages are being fully indexed, and they all use the simple layout. The ones using the main layout are not being indexed fully, however. The NCLB pages are also using the <htdignoindex> tag successfully; you see that it's keeping the "Back to NCLB Overview" from showing up in the search results.

Can any of you see anything that's causing this problem? I've been poking at this for days, and now I'm sure it must be something obvious that's right before my eyes that I'm just failing to see.

One more thing - the one page using the main layout that does seem to be indexed fully is also the main page of the site. So I wondered, is it just showing up because it's the first indexed page? But I tried starting the indexing from other pages on the site, and it's still the only page using the main layout that gets indexed fully.

Right now htdig is set to no_excerpt_show_top, which is why you're seeing blank spaces. If I take that off, it will show the default message about your search terms not showing in the top of the document. So that's cool.

Thanks so much for your time!

- Nada O'Neal


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to