If this is solely for the 3.0 timeframe, using a wiki:title property
makes the problem much more tractable as a simple "trimmed, de-duped,
and lowercased" name policy works just fine.

Reading over the proposal I tried to paraphrase it a bit at the end,
edit at will.  A few minor tweaks maybe?

I think it's right about on target.  My only disappointment is that the
reliance on jcr properties means that normalization cannot be introduced
prior to 3.0... 

Or can it?

Is there any desire to see this introduced in the 2.8 branch?  If this
is just a 3.0 task, then it will likely lie fallow for quite some time.

<side-note>
There was mention of a repository export/import tool.  The tool should
apply the normalizations and alert if more than one page in the set
normalize to the same name.
</side-note>


This could be introduced in 2.8 by stubbing in a WikiName class that
pulls in all .cleanLink and .wikifyLink calls but this would trend to
bigger modifications for 2.8 than I thought would be entertained.  (I
thought we were looking for moderately clean and simply hack for 2.8).  

In 2.8 there is no difference between a page's name and it's title
(except if breakTitlesWithSpaces is enabled).  My wikis' all have
allowCamelCase=false, matchEnglishPlurals=false and
breakTitlesWithSpaces=true, for me this means that with rare exceptions
normalization is possible by thunking everything to CamelCase format
names.  But as Janne points out this doesn't help with some languages
where CamelCase titles look bizarre.

For now, I might take my current hack and push it into a custom 2.8
build and see how it plays live for a week or so.  My biggest problem
will be that I'll have to clean my repository of "duplicate" names
(users have managed to get "Test%20Name" "Test+Name" and "TestName" all
existing on disk believe it or not.)  But this is a bit of housekeeping
I should do anyway.

Let me know if there is any desire to push this further at this time.

Regards,
John Volkar









-----Original Message-----
From: Janne Jalkanen [mailto:[email protected]] 
Sent: Sunday, December 28, 2008 9:50 AM
To: [email protected]
Subject: Re: WikiName normalization

> I had a lot of trouble with WikiNormalization in 2.4. I strongly 
> support the idea of removing most constraints (for instance, most 
> Unicode printable characters should be allowed). I support the idea of

> removing duplicated spaces.
>
> A small addition to your rules: I think it most of the time helps to 
> remove spaces at beginining and at end of page names.

Yes, of course.  We, in fact, should be doing this right now, but it's
not been written anywhere.  You can probably create a page like that
through the API or by cleverly manipulating the Edit.jsp parameters.

> May be the normalizer should be made "pluggable" with CamelCase being 
> one of those provided ?

Mediawiki (which powers Wikipedia) has a simple regexp for filtering out
illegal characters.  At the moment it is as follows:

"The list of illegal characters is as follows: #<>[]|{}, non- printable
characters 0 through 31, and 'delete' character 127)."

Everything else is allowed in wiki names.  I think that if they can do
it and have no problems, we can do it too.

My proposal, basing it on John's and Murray's input.

http://www.jspwiki.org/wiki/WikiNameNormalizationProposal

> One problem is links from outside applications: when you rename a 
> page, JspWiki renames the internal links but obviously cannot rename 
> external references. With all JCR and WikiNames discussions, may I 
> suggest to have a version number attached to a name and to use it 
> within a "PermaLink" URL? This version number would be the version at 
> which the page name was last changed (0 would indicate the page never 
> changed its name, none would indicate the "latest"
> name for backward compatibility). A separate history of page names 
> would be kept and would allow to redirect external links to the right 
> latest page name. Does such a mechanism exist in JCR?

I am not sure whether that makes sense - if you reference Wikipedia's
article on "Motley Crue", I would assume that you would want to
reference that particular page in general, not just a version of it.

JCR uses UUIDs to reference pages, so version histories do also store
renames.  If we use that feature, that is.

/Janne


Reply via email to