I'd define it as "an uppercase latter that follows a non-whitespace character."
On Wed, Nov 4, 2009 at 2:52 PM, Harry Metske <[email protected]> wrote: > agreed on the 1) and 2) > > But how exactly do you define "adding a space before each uppercase letter > that starts a word" ? > How do you find this "uppercase letter that starts a word" in a pagename or > link ? > Can you give a few samples ? > > /Harry > > 2009/11/2 Andrew Jaquith <[email protected]> > >> Ok, that makes sense. I can think of cases in English too, like >> "averse" (opposed to) and "a verse" (a portion of a song or poem). I >> just decided that I didn't care. :) >> >> But assuming we do care... >> >> ...what about going the other way: on import, or on page save, or page >> lookup, forcibly expanding CamelCasePageNames (and inline page links) >> so that they have one space in between the words? That way, >> case-insensitive matching with spaces preserved (trimmed to one space) >> would work. >> >> So, the rules would be this: >> >> (1) When links in pages are parsed, or page names are saved, leading >> and trailing spaces will be trimmed, and all whitespace between words >> will be replaced with one space character. >> (2) Whitespace before and after the space name will be removed. >> (3) CamelCase page links or page names will be normalize by adding a >> space before each uppercase letter that starts a word >> (4) Tests for page name equality are done by applying rules (1) , (2) >> and (3) and making a case-insensitive comparison. >> >> That seems simple enough, no? >> >> Andrew >> >> On Mon, Nov 2, 2009 at 2:44 PM, Janne Jalkanen <[email protected]> >> wrote: >> >> Can you provide some examples where a >> >> strip-the-whitespace-and-do-a-case-insensitive-comparison strategy >> >> would not work, in Finnish? I'd like to understand this, seriously. >> > >> > E.g. "maan alle" vs "maanalle". First means "into the ground", the >> > next one is "earth bear". >> > >> > Or "kuusi puuta" vs "kuusipuuta" - "six trees" vs "at a fir" (or "of >> > fir timber"). >> > >> > Or simply "sivusta katsoja" vs "sivustakatsoja" - "a person who looks >> > (literally) from the sides" vs "onlooker". The difference is subtler >> > than with the previous ones, but the existence of the space is >> > significant information. >> > >> > In fact, getting mixed up when two words go together and when they do >> > not is one of the most common grammatical errors. Sometimes the >> > results can be fairly hilarious and unintended. Often it looks just >> > sad. >> > >> > But the point being that in Finnish (and other so-called constructed >> > languages), whitespace is significant. So it should not be ignored >> > arbitrarily. >> > >> > Besids, I am not aware of any wikiengines who would consider >> > whitespace insignificant in determining pagename equality. mediawiki's >> > rules concerning spaces are: >> > >> > <snip> >> > Spaces/underscores which are ignored: >> > * those at the start and end of a full page name >> > * those at the end of a namespace prefix, before the colon >> > * those after the colon of the namespace prefix >> > * duplicate consecutive spaces >> > <snap> >> > >> >> FYI, I took a look at JSPWiki.org to see what the scale of the problem >> >> might be. The site has about 4850 pages. I yanked down all of the page >> >> names and compared them. I detected exactly ONE name clash: "Text >> >> formatting rulesKorean" and "TextformattingrulesKorean" appear to be >> >> different pages. That is a 0.02% collision rate -- and easily handled >> >> by a rename-on-import or special-page redirection strategy. >> > >> > That's not what I meant. I meant that we have many links of the form >> > [word1 word2] embedded within running text. If we change those, then >> > the running text becomes meaningless and needs to be *checked by >> > hand*. >> > >> > /Janne >> > >> >
