If this is solely for the 3.0 timeframe, using a wiki:title property makes the problem much more tractable as a simple "trimmed, de-duped, and lowercased" name policy works just fine.
Reading over the proposal I tried to paraphrase it a bit at the end, edit at will. A few minor tweaks maybe? I think it's right about on target. My only disappointment is that the reliance on jcr properties means that normalization cannot be introduced prior to 3.0... Or can it? Is there any desire to see this introduced in the 2.8 branch? If this is just a 3.0 task, then it will likely lie fallow for quite some time. <side-note> There was mention of a repository export/import tool. The tool should apply the normalizations and alert if more than one page in the set normalize to the same name. </side-note> This could be introduced in 2.8 by stubbing in a WikiName class that pulls in all .cleanLink and .wikifyLink calls but this would trend to bigger modifications for 2.8 than I thought would be entertained. (I thought we were looking for moderately clean and simply hack for 2.8). In 2.8 there is no difference between a page's name and it's title (except if breakTitlesWithSpaces is enabled). My wikis' all have allowCamelCase=false, matchEnglishPlurals=false and breakTitlesWithSpaces=true, for me this means that with rare exceptions normalization is possible by thunking everything to CamelCase format names. But as Janne points out this doesn't help with some languages where CamelCase titles look bizarre. For now, I might take my current hack and push it into a custom 2.8 build and see how it plays live for a week or so. My biggest problem will be that I'll have to clean my repository of "duplicate" names (users have managed to get "Test%20Name" "Test+Name" and "TestName" all existing on disk believe it or not.) But this is a bit of housekeeping I should do anyway. Let me know if there is any desire to push this further at this time. Regards, John Volkar -----Original Message----- From: Janne Jalkanen [mailto:[email protected]] Sent: Sunday, December 28, 2008 9:50 AM To: [email protected] Subject: Re: WikiName normalization > I had a lot of trouble with WikiNormalization in 2.4. I strongly > support the idea of removing most constraints (for instance, most > Unicode printable characters should be allowed). I support the idea of > removing duplicated spaces. > > A small addition to your rules: I think it most of the time helps to > remove spaces at beginining and at end of page names. Yes, of course. We, in fact, should be doing this right now, but it's not been written anywhere. You can probably create a page like that through the API or by cleverly manipulating the Edit.jsp parameters. > May be the normalizer should be made "pluggable" with CamelCase being > one of those provided ? Mediawiki (which powers Wikipedia) has a simple regexp for filtering out illegal characters. At the moment it is as follows: "The list of illegal characters is as follows: #<>[]|{}, non- printable characters 0 through 31, and 'delete' character 127)." Everything else is allowed in wiki names. I think that if they can do it and have no problems, we can do it too. My proposal, basing it on John's and Murray's input. http://www.jspwiki.org/wiki/WikiNameNormalizationProposal > One problem is links from outside applications: when you rename a > page, JspWiki renames the internal links but obviously cannot rename > external references. With all JCR and WikiNames discussions, may I > suggest to have a version number attached to a name and to use it > within a "PermaLink" URL? This version number would be the version at > which the page name was last changed (0 would indicate the page never > changed its name, none would indicate the "latest" > name for backward compatibility). A separate history of page names > would be kept and would allow to redirect external links to the right > latest page name. Does such a mechanism exist in JCR? I am not sure whether that makes sense - if you reference Wikipedia's article on "Motley Crue", I would assume that you would want to reference that particular page in general, not just a version of it. JCR uses UUIDs to reference pages, so version histories do also store renames. If we use that feature, that is. /Janne
