According to Todd Hooge:
>To Whom This May Concern:
>
>Many thanks for an excellent piece of software! I am the Web Designer
>for a very large site, "The Communication Initiative."
>http://www.comminit.com/
>
>There originally was a keywords search box on the Search page
>[http://www.comminit.com/search.html] which I have temporarily removed
>because the search was considered inaccurate.
>
>I am having trouble indexing text within comment tags and have been
>unable to find any information regarding my particular problem. I will
>use http://www.comminit.com/power_point/pdseu_06-28-99/sld025.htm as an
>example. This is one of many pages in the form of a converted Power
>Point presentation that I have modified for our site. You will notice in
>
>the HTML for this page there is a 'text' version of the page that has
>been commented out about halfway down the source code of the page. I
>need Ht-Dig to index anything within <!-- and -->. I also need to tell
>Ht-Dig to ignore [override] any syntax or grammatical issues within
>these tags. I have read in your mailing list that Ht-Dig ignores syntax
>ridden code and as Power Point tends to convert slide to HTML very
>innefficiently, I have no manual way around this that I know of. There
>is a command to exclude areas within the HTML but not one to 'include.'
>
>The 'illusion' I am trying to create here is that the search mechanism
>is indexing the Power Point slides which would otherwise be impossible
>because the slide is actually one large graphic. If Ht-Dig could index
>the 'hidden' text in the HTML, this would solve my problem.

Well, I had a look at that page and I still cannot figure out why you
should like to index such text like "<!-- Start of Bottom Navbar -->"
or "<!-- End of Buttons and Credits -->".  These will certainly spoil
the results of any search query that is done on the "true content" of
the page that is hidden in some other comments, therefore you cannot
index the "text content" of the page by hacking ht://Dig to ignore
comments.  Some "hand working" is still neccessary and I'd suggest -
in order to have the text also indexable by other search engines and
readable by text-only browsers - another solution (which of course
is but an ugly hack, too):
  Put all these slides not in single files but in directories and
  produce a trivial frame version of each page.  Cut out the ppt
  graphics and stuff it into the content frame and place the text
  content in the <NOFRAME> area of the index document.
That way you'll get a nicely indexable document for every spider ;-)
Of course, you will need to change URLs in the HREF tags, too, but
I can imagine some nice AWK or Perl scripting to take over most of the
conversion for you.


hth,
  Torsten

P.S.: I don't have any idea why you use a graphical image to present
a simple text table on that page!  This is the major cause of your
troubles and you probably should get rid of it asap in order to make
the document readable for more browsers and indexing spiders.  More-
over, replicating the document in a commented out text is even worse,
since it produces a higher network load than what is neccessary.

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to