Re: [Moin-user] using Xapian search

Thomas Waldmann Wed, 28 Sep 2011 05:14:53 -0700

> So, an uppercase letter is an indicator that the indexer should treat
> this as a word (until the next uppercase letter) as if there was white
> space. It would seem that hyphens and underscores have a similar
> effect.


It's not just upper/lowercase transitions. IIRC, it also tries to split
off numbers, split at blanks, punctuation, etc. - for details maybe have
a look at the tokenizer.

Generally speaking, moin tries its best to have a language-independant
tokenization. It does NOT break it down to arbitrary character
combinations, of course (like splitting foobar into foob and ar), the
cost (time/size) for that is simply much too high, especially if you
consider that there are languages that build rather long words
naturally, like german.

>  The implications of this would need to be considered when
> naming pages. Pages named with acronyms, e.g. IBM*, are a particular
> problem.

If you have a page called "IBM Services", that would work ok.
Same for "IBM-Services", "IBMServices" is maybe a bad idea.

> > Well, better get used to that (or refine your search with mime:wiki to
> > only find wiki pages). The different handling of pages and attachments
> > is going away in future anyway.
> 
> Is it possible to have mime:wiki used as the default for title searches?

No there isn't. It would not help in general anyway, because the
distinction needed does not really exist (or does only exist when using
the wiki in some specific way).

In moin2 that "problem" becomes even more visible: you can have
arbitrary items there (there is no distinction between "pages" and
"attachments") and each item has a content-type, e.g.:

text/x.moin.wiki
text/x.rst
text/plain
application/pdf
image/jpeg
application/octet-stream

So, how do you decide now what should be considered as kind of a
"content" and what is kind of an "additional file"?

We still search a solution for that, but I guess we can't do any better
than offering some groups of mimetypes.



------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Moin-user mailing list
Moin-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/moin-user

Re: [Moin-user] using Xapian search

Reply via email to