So does 35 fails and 7 errors sound right for the 2.8 branch? I don't
have RCS setup.
On my computer, all of them run. So you still probably have some
problems. Not having installed RCS would cause those tests to fail,
but certainly not others.
That's fine, but it makes no attempt to help with "Test some name",
"Test some Name" and "Test SomeName" being treated as different pages.
Here's a problem: it *does* happen on some platforms. The page names
are partly case-sensitive on a platform-dependent basis, and that
can't be helped outside of ripping out the entire backend and
replacing it with something more sane.
Um, how does it lose information? It *adds* spaces (fairly nicely
too[1]). What information does it *lose* (Maybe I'm being dense,
can you
give me an example?)
[1] The only corner case I've ever noticed that bothered me is
"PDFs Are
Nice" which turns out as "PD Fs Are Nice".
That would be a good example of where it loses information - PDF is a
single word, and it arbitrarily removes that information.
The allowed punctuation chars " ()&+,-=._$" greatly raises the
complexity (and flexibility) of WikiNames. Again, "Test(Name)" and
"Test (Name)" are two different pages as is "Test 2+2=4" and "Test
2 + 2
= 4". These punctuation chars could have rules for normalization
expressed easily for English, but I'm completely unsure how those rule
would work for other languages (the decimal separator rule would at
least need to be platform based).
Which is exactly why I think the only sane normalization would be to
compress spaces in the sense of
1..N spaces => 1 space
0 spaces = 0 spaces
with no other normalizations. If we can figure out a way to get rid
of any other normalizations, that would be great (but I don't think
that's really possible). This includes English plurals, beautifyString
(), etc.
Note that Wikipedia works well with punctuation characters. They are
just titles.
In wikis the link should always be equal to the title of the page.
The reason why we're having this discussion is the unfortunate
decision I made a long time ago to allow freelinks to map to
CamelCase names. If that had not been done, there would be no
problems whatsoever.
The 2.8 branch's JSPWikiMarkupParserTest has (8) failures as it is in
svn, they appear to be "%20" related in some checked URLs. I assume
these were known and accepted?
Nope. They all run 100% for me. Otherwise we wouldn't have released ;-)
I think before you hack anything, you should probably check what is
going on...
JSPWiki 2.8 and all earlier versions:
1) are Case sensitive when it comes to wiki page names. ("Test name"
isn't same page as "Test Name")
They are partly case sensitive.
2) Allow spaces in name to differentiate pages ("Test SomeName" isn't
same page as "Test Some Name".)
Yes, but on some platforms "Test Somename" and "Test SomeName" are
equal.
It is a good question on which behaviour we should standardize on. I
think I prefer the case insensitivity.
I chimed in on this normalization stuff because you mentioned
creating a
WikiName class or some such a while back. Looking thru the codebase
yesterday and today shows a zillion places where the paradigm "String
pageName" is used. The testcases especially have hardcoded page names
in them and the tests in many cases dip under the covers for setup &
scaffolding work... Ick, but fixable.
There's actually a good reason for a lot of that stuff; it's fast to
write the tests, and also, they try to isolate the components so that
any failures in other components would not affect the current test.
There is one area that is hard to unit test and that deals with
handling
"legacy" pages in the providers repository. For instance, this work
shows that a user can have multiple pages on disk for the *same*
normalized wiki name.
Correct. Which is a problem.
How should this be handled on a moving forward basis? I think it
*has*
to be handled, because I think case sensitive wikinames are too
confusing to casual users. I think AbstractFileProvider.findPage()
is a
place where this could be handled. But I am unwilling to proceed
further without input from the dev team.
I personally think that case insensitiveness is the way to go.
Unfortunately, that means that title beautification has to go, simply
because it would mean that
"Test Somename", "Test SomeName", and "Test Some Name" would need to
be equal, BUT it also means that "Testsomename", "test So Me Nam E",
"Test Som Ename", "Test S Omename" and all the other possible
variants would need to be considered equal too. This is just too
much variance, IMHO.
!!!Proposal:
JSPWiki user-visible page names should be clean & normal __and__ allow
spaces in them.
JSPWiki internal page names should be clean & normal and __not__ have
spaces in them.
I don't think this is simply possible due to the above limitation.
It means that all pagenames should be stored in lowercase, space-
compressed in the repository (i.e. "testsomename"), since JCR is case
sensitive. Which means that beautifyString() cannot have any capital
letters to work with, unless we start storing the page title outside
of its WikiName, which is of course possible, but kinda against Wiki
ideals.
BTW, this would then also have to be true for attachments as well,
since in 3.0 they are treated exactly like pages.
Is the above proposal tracking toward what you wanted? Or do you want
something more prose-like? Basically this would be putting
beautifyString() on steroids. Oddly though, it gets used to break
apart
names and add capitalization, but then the spaces get stripped right
back out.
Beautifystring() is a problem for us Finns, since it guesses the
proper capitalization wrong all the time. In Finnish, headlines don't
have Every First Letter In Capital, but we would write "Every first
letter in capital".
I think that it might be better to stop to guess what the user wants,
and just be as simple as possible. Get rid of our overly complicated
normalization, and just keep links from breaking.
/Janne