Bill Moseley wrote:
At 03:35 PM 2/7/2002 +0800, Stas Bekman wrote:
But I think here we talk about highlighting the matches in the actual pages. Which means that the pages will become sort of dynamic. At least I think that's what Randy's search results are.
That's doable, but slow. That's what's happening basically on lii.org.
ok, leave it off.
can't we just extend a definition of word? So $, % and other Perl symbols will be counted as valid parts of the word? So does | ^ and othersSo you could tell swish that a plus sign at the end of a "word" should be removed. So swish would index that as $|. But then swish would see "$+" as just "$". So that fails.
We can define any symbols we want to be in a word. With swish you define:
WordCharacters - the allowable chars in a work (basically split /[^WordChar]/)
IgnoreFirstChar/IgnoreLastChar - chars that are in WordChars but are
stripped from the start/end of each word. Allows a period inside a word,
but not at the end of a word, for example.
BeginChars/EndChars -- words must start/end with these chars (after Ignore*Chars have been stripped. I've never found a use for that setting.
hmm, I read in swish docs that you cannot index ':'.
Also the most critical is to be able to search for foo::bar. Which I think is impossible under swish-e since : is never counted.
e.g. search for URI::URL and you will find URI and URL but not what you are looking for. At least you will find much more than there are and most will be irrelevant.
No, you can put : in WordCharacters and then it will be counted. But if
you do that you can't then find "foo" or "bar" separately there. I
mentioned this before, but in some way (some cases) it's better to not
count some chars and use phrases for searches. So searching for
'"foo::bar"' as a phrase will find "foo::bar", but still allow you to
search for just 'foo' and find the foo in foo::bar.
That's not a good case, but $|++ is clearer. You would need to strip ++ to find $|. But then you can't search for $+. That's why parsing perl might be interesting since it could parse into language tokens, and then make those searchable. But I don't know how to do that, especially on queries.
Anyway, we should try indexing most chars. Then foo::bar will be indexed as one word (or foo::bar::new as one word too) then try to instruct people to use wildcard searches, such as foo::*. But then there's no way to find *bar in swish due to they way it stores the wildcard index.
I guess I wasn't clear to myself and others about what I meant by searching for Perl code. I don't care much about search the code sections per se. I care much about perl string found in the text. So I want Apache::Registry to be found and I want $| to be found.
If I understand correctly if I search for a sub-pattern it'll be found, right? So if I search for $|, I'll find $| and $|++, no?
Therefore we want most if not all chars to be indexed. Or at least $%@:-> (search for '$r->args' should be successful).
And we don't want to search for Apache AND Registry, nor Apache OR Registry when I ask for Apache::Registry. I think that's what most people will expect without knowing the internals of the search engine.
I've just asked if it's possible and easily do-able. If it's pain, then no problems. It's just the wrapped code seems bad to me. e.g. search for 'strict' here:
http://perl.apache.org/preview/modperl-site/search/swish.cgi Hit #4 looks bad.
Yes, results are mixing code and text. We can play with it, but if we displayed the text (especially code) formatted the results might be long. Google doesn't do any better for displaying code.
if google cannot handle this we are doomed then ;) it's a minor issue anyway.
let's get the indexing right and we are almost ready to go on.
_____________________________________________________________________
Stas Bekman JAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide http://perl.apache.org/guide
mailto:[EMAIL PROTECTED] http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
