Per Einar Ellefsen wrote:

>I suggest the following solution: if you meet a <pre> tag and no closing
></pre> tag we add it.
>Do you think this is possible Bill? Assuming that all the HTML is proper
>(no text without enclosing <p>),
>we can always tell which text is not HTML (i.e. <pre>) am I right?

It's not that easy.  Swish is what is storing the content.  It's being
parsed by libxml2 and it's just storing the text, not any of the tags.
It's also converting \n into white space, so any formatting would be lost
anyway.

For HTML in general, it's a fun task to add highlighting code around a
group of words -- and still keep the HTML valid.


Yes, I can understand it's very hard.
What I can suggest: as we generate our HTML from POD files, knowing what is code, could there maybe be some possibility of putting some <div> tags around the <pre> ones, and then patch Swish in some way to get it to treat those parts as searchable but not displayable? If I understood it right, it's already using some <div> tags to know what to index, so maybe it would be possible to make it a little more advanced?

I don't think this is possible, since the hit doesn't happen in the sentence but an index which points to the section which includes this sentence.


Another possibility: I know it's not optimal, but maybe the search results should only display descriptions of the page in question?

This brings out another issue: if the pages were more split out (the guide pages are veeery long), maybe we could get more concise results and descriptions matching more closely.

If you look again at how the new search works, you will see that its
results are pointing to the single page sections and not the whole page which can be big. This is a great boon compared to the previous situation and removed the need to created the split version of the guide which I was producing before, e.g. see here:
http://thingy.kcilink.com/modperlguide/strategy/The_Solution.html
the engine is here:
http://thingy.kcilink.com/modperlguide/index.html#search


In any case if you will try to write descriptions to all section you will find yourself writing another set of the documentation. It's feasible, but impractical.

I've another suggesting: is it possible to distinguish between sentences (or parts of) when presenting the hit's context? If so we could add <br>'s after each sentence/part of and therefore make it more readable. I know you said that \n are removed, but if there is a way to keep the original strings as tokens in the index, this will improve the readability a lot.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to