> Can you provide some examples where a > strip-the-whitespace-and-do-a-case-insensitive-comparison strategy > would not work, in Finnish? I'd like to understand this, seriously.
E.g. "maan alle" vs "maanalle". First means "into the ground", the next one is "earth bear". Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of fir timber"). Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks (literally) from the sides" vs "onlooker". The difference is subtler than with the previous ones, but the existence of the space is significant information. In fact, getting mixed up when two words go together and when they do not is one of the most common grammatical errors. Sometimes the results can be fairly hilarious and unintended. Often it looks just sad. But the point being that in Finnish (and other so-called constructed languages), whitespace is significant. So it should not be ignored arbitrarily. Besids, I am not aware of any wikiengines who would consider whitespace insignificant in determining pagename equality. mediawiki's rules concerning spaces are: <snip> Spaces/underscores which are ignored: * those at the start and end of a full page name * those at the end of a namespace prefix, before the colon * those after the colon of the namespace prefix * duplicate consecutive spaces <snap> > FYI, I took a look at JSPWiki.org to see what the scale of the problem > might be. The site has about 4850 pages. I yanked down all of the page > names and compared them. I detected exactly ONE name clash: "Text > formatting rulesKorean" and "TextformattingrulesKorean" appear to be > different pages. That is a 0.02% collision rate -- and easily handled > by a rename-on-import or special-page redirection strategy. That's not what I meant. I meant that we have many links of the form [word1 word2] embedded within running text. If we change those, then the running text becomes meaningless and needs to be *checked by hand*. /Janne
