Yea I've seen that recommendation to use redirects. Its not a good solution. 
For example we have two pages:
mysite.com/wiki/Don't Speak (song)
mysite.com/wiki/Don'Amos
Both of these URL's would break at:
mysite.com/wiki/Don
and a redirect would not work. I could create a "help" page for the "Don" entry 
but I've lost the visitor's interest already. Onmy site I have many pages 
breaking "early" at the same location in this way, so this is an actual problem 
for me.

Sometimes we would need multiple redirects for a page, e.g. the same title:
mysite.com/wiki/Don't Speak (song)
There's 3 possible places where the link could break:

mysite.com/wiki/Don

mysite.com/wiki/Don't Speak (
mysite.com/wiki/Don't Speak (song

As again, the first break might not be enough to specify where the user wanted 
to go. So now we need to create multiple redirects for any page that has this 
problem and we're still not sure the visitor will arrive where they wanted to. 
So its not a feasible solution to create redirects. Plus suppose I do have a 
redirect for a certain entry. If a user posts a link on a forum the forum 
software renders it as:
mysite.com/wiki/Don't Speak (song
This will confuse the user and make them wonder if the link will work or not.

The only real solution is to not allow these characters in a URL at all:
- commas, apostrophes, brackets, colons, semicolons and so on.
That is the real problem that needs to be dealt with somehow. Since there are 
1000's of URL rendering software routines all over the web, we can be sure that 
if a URL just has letters, numbers, underscores and dashes, it will definitely 
work.
If someone was writing a URL rendering routine and they saw this:
"Hey, did you see the site I sent you 
(http://mediawiki.org/wiki/extensions(safe)), was actually not working?"
The software guy will break the URL before the ending bracket, while Mediawiki 
wants the bracket to be part of the URL.
In this case:
"Hey, I went to http://www.mediawiki.org/wiki/blah, had coffee and then went to 
bed".
Here again the software guy will break it on the comma while MW might want us 
to include the comma in the URL.

So in my opinion, MW should take care of this one way or the other and not 
allow people to use these forbidden characters in the URL, while allowing them 
to be used in the page heading. 
{{DISPLAYTITLE}} works but I have to take some additional steps. It should work 
like this:
- For a Page with the name (or URL) "Foo", if we use {{DISPLAYTITLE|Bar}} on 
that page, then:
--- if we make interwiki links to [[Foo]] or [[Bar]], it should automatically 
always link to wiki/Foo, but display it as "Bar". *
--- Any automatically generated page logs and contributions links and so on, 
should always link to Foo, but display it as Bar.
--- If we want to have a different text display, we can use [[Foo|Blah blah]] 
as usual.
The URL (Foo) is where we're restricted with characters, and Bar is where we 
have complete freedom to display anything in the page heading.
*: Once again, our first priority is to prevent broken links and although this 
creates an inconsistency as compared to other pages on the site, this is the 
only option we have.
Also, having a link that doesnt match up with the page heading, is very 
commonplace on non-wiki sites, so its not a problem.

I'm OK with the solution I posted. I would use DISPLAYTITLE, and use 
[[Foo|Bar]] for interwiki links. I would format Foo to be as close to Bar as 
possible, but not use any problematic characters.
So the page name would be:
Dont Speak - Song  (actual name of the page)
And the page heading, displayed with {{DISPLAYTITLE| }}, would be as I wanted 
it to be:
[[Dont Speak - song  | Don't Speak (song)]]
I have no option but to not use the apostrophe. Lets see how Yahoo mail and the 
list software format this URL:
http://en.wikipedia.org/wiki/Don't_Speak            - [1]
I know it wont work if its posted on topix.com and many other sites.
Another problem is that if we have these characters in the URL and they're 
encoded in % signs, e.g.:
http://en.wikipedia.org/wiki/Don%27t_Speak
That doesn't look good.
The % sign encoding does make the URL work, but its not guaranteed that the 
user will get it like I get it. Many times I've seen people copy pasting links 
to my site but they didnt have the % encoding and they broke. I dont know how 
that happened, but I wont blame the user. They simply copy pasted in a 
different environment. A URL should work if copy pasted and if we're tolerating 
a failure rate here, it should be extremely low (say, 1%). For example the word 
"Apple" can be copy pasted with the same results in every environment, but the 
URL [1] has a high failure rate, which is why I've seen many broken links. So 
to me, the % encoding is not a solution that prevents failure and therefore 
should not be used.

First, we are a website to the world, and then we are a wiki to the people who 
work on the site. So my first priority is to have links that don't break. If 
that means having links which will work, but dont look perfect (e.g. using 
"Dont", which is not grammatically correct), I would rather do that than have a 
grammatically correct link that will break when posted on some websites.

In any case I think this is something that should have been dealt with so these 
problematic characters would never be seen in the URL of wikipedia or any other 
mediawiki site. 
Its not a big problem for me to use the DISPLAYTITLE feature and do the work 
arounds and tolerate some non-ideal page logs, which will show the page URL, 
instead of the page title (I will try to have the smallest possible 
difference). I'm glad that solution is an option and the feature is built in.
I do think the the performance of URL's on a website is a serious issue, and 
they should always work and if I have to do some extra work to make them work, 
thats fine with me.

I wish I didn't have to use these characters in the page heading but many times 
I have to and that freedom should be there, as it exists on a non-wiki website 
and at the same time, I should not have a URL that might break and its OK to 
have the page heading and URL different from each other. I can imagine millions 
of Mediawiki links breaking every day due to the presence of these characters. 
If the MW software people decided to deal with this, they would have to figure 
out a way how to keep the page heading seperate from the URL and still have 
everything work fine.
Now my site wouldn't exist without the MW software so I'm very thankful to all 
those who have worked on it.
But anyway, yea - these are some of my thoughts on URL breaks and page headings.

Eric












________________________________
 From: Kilian <drehbue...@texttheater.net>
To: mediawiki-l@lists.wikimedia.org 
Sent: Monday, January 9, 2012 12:47 PM
Subject: Re: [Mediawiki-l] Links with (, ), :, ' - break all the time
 
On 01/09/2012 03:02 AM, Benjamin Lees wrote:
> But why don't you just use redirects?

Redirects wouldn't solve the problem. Users would be redirected to URLs
with spaces/punctuation, copy them from their browser's location bar and
still post them elsewhere.

_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to