Re: URL Theory & Best Practices

2002-11-30 Thread J.Pietschmann
Kjetil Kjernsmo wrote:

So, I've got this bad feeling that IE is going 
to ignore the content-type header ...
> But I can't for the life of me understand how it can be

standards-compliant...  

Well, IEx does not in general ignore the content-type
header, and it is, more or less, standards compliant,
just in a somewhat special way.
From various rumours and gossip I compiled the following
story: IEx uses a variety of COM components for handling
content. A correct implementation would be to open the
network connection, read the headers including the content
type header, decide which component handles the content,
and then hand over the relevant headers and the open
connection to the component. It seems that handing open
connections to arbitrary COM components is difficult, or
was difficult at the time the architecture of IEx was decided,
therefore the browser component takes a look at the URL,
extracts what it thinks could be a "file extension", then
looks up whatever component is registered for this string
in the Windows registry (note that MIME types are not keys
there) and then hands the URL to the component. Obviously
it's up to the component what happens if the content type
does not match one of the possible types the component can
handle, or whether it even honors the content-type header.
In many cases a mismatch causes the connection to be closed
and another component determined by the content-type gets
the URL. BTW this is the mechanism the Klez virus uses
to get into windows systems. Some components seem to take a
second look at the URL, and sometimes they return errors or
something which causes the browser component to fall back
to the default HTML renderer which then most often draws a
blank. Caching plays a role too. Also, the algorithms for
extracting a "file extension" and perhaps content negotiation
seem to be implemented multiple times and probably in
different ways in various components, or perhaps the
components don't have access to necessary data (like
cookies) all the time.
The user usually doesn't notice anything. Problems arise
if the URL points to dynamic content where a second GET
can cause different stuff to be retrieved, in particular if
the content was'n completely read or wasn't cached for other
reasons (like SSL).
Disclaimer: most of the above is second hand knowledge.

HTH
J.Pietschmann


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-11 Thread Kjetil Kjernsmo
On Sunday 10 November 2002 01:23, Justin Fagnani-Bell wrote:
> file extensions can (and IMO should) exist side-by-side. 

Is there a well-recognized standardization of file extensions? RFC? ISO 
standard? W3C recommendation? I'm curious, because I'm not aware of any 
such standard. 

Best,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Miles Elam
Nevermind.  I take it all back (well...some of it).  I admit that the 
trailing slash is an artifact of my experience and not of utility.

Damned mailing lists...you can't remove things you wish you hadn't said. 
Accountability's overrated.  ;-)

- Miles

Miles Elam wrote:

Justin Fagnani-Bell wrote:


This is definitely where we differ. I don't see why an intrinsic 
resource should always end in a '/'. If /a/b.pdf is the PDF 
representation then why shouldn't /a/b be the intrinsic resource? The 
only reason I see why the trailing slash is recommended is because 
developers are used to having their URI space tied to their 
filesystem structure with a static server like Apache. The trailing 
slash, from our experience with filesystems,  indicates that 
something is a directory, that it has children. But in a URI a 
resource can be both a viewable resource and a container node at the 
same time. There's certainly nothing stopping /a/b/, /a/b, /a/b.pdf 
and /a/b/c.pdf from all being valid URI's in the same space. To me 
the trailing slash simple indicates that there's more to come at 
lower levels, and the absence of it means the resource is a leaf.

You're right in that it is what we are used to but not necessarily 
because of the filesystem.  I misspoke in this case where /a/b could 
indeed be a resource in some cases.  One major problem lies in clients 
like IE (for better or for worse the dominant viewer) which don't 
always behave correctly even when the correct MIME type is sent.  The 
other is when the resource references other resources.

Take a web article by Oreilly for example.  These articles have 
images, multiple pages, talkbacks, etc.  If /a/b is the intrinsic 
resource, how do we logically access the first figure in that 
article?  How do we access the third page?  Aren't multiple pages just 
another representation of the resource?  PDFs can encompass multiple 
pages.  A web page made for printout would encompass only one long 
page.  Would it be /a/b/printable.html?

Is one more correct than another?  I don't think so -- it seems to 
come down to personal preference, all other things being equal.  I 
think IE would have fewer problems with the slash.  I personally don't 
view the trailing slash as a directory but as a resource collection.  
Perhaps a collection of representations?  But that's just semantics 
and I'm grasping here.

In the example of the Oreilly article, I think that there is more to 
come at the lower levels, there is no absence of lower levels when 
representations are considered lower levels, and that it's a node and 
not a leaf.  I can only think that a resource would be a leaf if it 
and its siblings never have inline constituents like images, multiple 
pages, plugins, etc..




-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Miles Elam
Justin Fagnani-Bell wrote:


This is where we differ slightly.  In my mind /a/b/ is the 
intrinsic resource.  /a/b/index.html is the explicit call for HTML 
represention of /a/b/.  If you redirect a client to /a/b/index.html 
and the client bookmarks it, they are bookmarking the HTML 
representation, not the intrinsic resource.


This is definitely where we differ. I don't see why an intrinsic 
resource should always end in a '/'. If /a/b.pdf is the PDF 
representation then why shouldn't /a/b be the intrinsic resource? The 
only reason I see why the trailing slash is recommended is because 
developers are used to having their URI space tied to their filesystem 
structure with a static server like Apache. The trailing slash, from 
our experience with filesystems,  indicates that something is a 
directory, that it has children. But in a URI a resource can be both a 
viewable resource and a container node at the same time. There's 
certainly nothing stopping /a/b/, /a/b, /a/b.pdf and /a/b/c.pdf from 
all being valid URI's in the same space. To me the trailing slash 
simple indicates that there's more to come at lower levels, and the 
absence of it means the resource is a leaf.

You're right in that it is what we are used to but not necessarily 
because of the filesystem.  I misspoke in this case where /a/b could 
indeed be a resource in some cases.  One major problem lies in clients 
like IE (for better or for worse the dominant viewer) which don't always 
behave correctly even when the correct MIME type is sent.  The other is 
when the resource references other resources.

Take a web article by Oreilly for example.  These articles have images, 
multiple pages, talkbacks, etc.  If /a/b is the intrinsic resource, how 
do we logically access the first figure in that article?  How do we 
access the third page?  Aren't multiple pages just another 
representation of the resource?  PDFs can encompass multiple pages.  A 
web page made for printout would encompass only one long page.  Would it 
be /a/b/printable.html?

Is one more correct than another?  I don't think so -- it seems to come 
down to personal preference, all other things being equal.  I think IE 
would have fewer problems with the slash.  I personally don't view the 
trailing slash as a directory but as a resource collection.  Perhaps a 
collection of representations?  But that's just semantics and I'm 
grasping here.

In the example of the Oreilly article, I think that there is more to 
come at the lower levels, there is no absence of lower levels when 
representations are considered lower levels, and that it's a node and 
not a leaf.  I can only think that a resource would be a leaf if it and 
its siblings never have inline constituents like images, multiple pages, 
plugins, etc..

P.S.  Thank god for the mailing lists.  They actually encourages me 
to write down some of my thoughts.  Even they are off the mark more 
often than not...  Does this make email better than web or simply 
justify the need for more discussion on the web? 


But it apparently has a detrimental effect on my proper use of English 
grammar.  *sigh*

- Miles



-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>



Re: URL Theory & Best Practices

2002-11-09 Thread Justin Fagnani-Bell

On Saturday, November 9, 2002, at 12:08  PM, Miles Elam wrote:


Tony Collen wrote:


Comments inline...

Miles Elam wrote:


But can't delivered types differ by the incoming client?


Yes, but a problem then arises when someone is using IE and they want 
a PDF, when your user-agent rules will only serve a PDF for FooCo PDF 
Browser 1.0.  IMO browsers should respect the mime-type header.  I 
believe the mime-type headers is very useful when you want to use 
something like a PHP script to send an image or a .tar.gz file.  In 
fact, it's essential for it to work, otherwise the browser interprets 
the data as garbage.

No, that's wasn't my intention at all.  If someone is using IE and 
they want a pdf (not a default expectation for that particular browser 
like html or xml), then the URL they would get directed to would be 
*.pdf. This is not the intrinsic resource.  You are explicitly asking 
for the PDF representation of that resource.

If the browser's default expectation is PDF (like in your FooCo PDF 
Browser 1.0 example), the trailing slash resource would give it PDF. 
However, it could still be pointed to *.pdf if you wanted to make it 
explicit.

This is very well put Miles, and was my intention with the previous 
email I wrote. Content negotiation and file extensions can (and IMO 
should) exist side-by-side. There is no precedent for a browser 
changing its accept header on a per-request basis, as someone 
suggested, not is there a way to specify this behavior in a hyper-link. 
If you have a link on a site that says "Click here for a PDF" then I 
would expect that the URI would end in .pdf, at least that's what makes 
the most sense to me.


In those cases where only PDF is available (common when it's not 
dynamically generated), I see no reason why the URI wouldn't be *.pdf.

Exactly.


This is where we differ slightly.  In my mind /a/b/ is the intrinsic 
resource.  /a/b/index.html is the explicit call for HTML 
represention of /a/b/.  If you redirect a client to /a/b/index.html 
and the client bookmarks it, they are bookmarking the HTML 
representation, not the intrinsic resource.

This is definitely where we differ. I don't see why an intrinsic 
resource should always end in a '/'. If /a/b.pdf is the PDF 
representation then why shouldn't /a/b be the intrinsic resource? The 
only reason I see why the trailing slash is recommended is because 
developers are used to having their URI space tied to their filesystem 
structure with a static server like Apache. The trailing slash, from 
our experience with filesystems,  indicates that something is a 
directory, that it has children. But in a URI a resource can be both a 
viewable resource and a container node at the same time. There's 
certainly nothing stopping /a/b/, /a/b, /a/b.pdf and /a/b/c.pdf from 
all being valid URI's in the same space. To me the trailing slash 
simple indicates that there's more to come at lower levels, and the 
absence of it means the resource is a leaf.

As for redirects, I don't see it being too much of a problem with more 
recent protocols. Also it should only happen when a visitor is being 
referred from an external page, since all the URL's in your site should 
be in the correct form. If you are linking to the intrinsic resource, I 
don't see the need for a redirect (as long as the browser correctly 
understands the mime-type header), so I don't see a problem with 
bookmarking.

- Miles

P.S.  Thank god for the mailing lists.  They actually encourages me to 
write down some of my thoughts.  Even they are off the mark more often 
than not...  Does this make email better than web or simply justify 
the need for more discussion on the web?

Here here. I say both, linear discussion is good, so is collaboration. 
A discussion board combined with a wiki would be awesome. Discuss a 
topic and collaborate on a document summing up the ideas at the same 
time. Hmm... Cocoon could do that :)

-Justin


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>



Re: URL Theory & Best Practices

2002-11-09 Thread Antonio A. Gallardo Rivera


Erik Bruchez dijo:
>> Uh-oh I'm catching some bad vibs... Can someone do me a favour of
>> going to http://www.kjernsmo.net/ with IE6 and see what happens?
>>
> I don't think IE ignores the content-type header. It may be more lax
> than NS when it sees a file extension, but the home page of your site
> does not have any. The page displays fine with IE 6.
>
> -Erik

I talked about the case of PDF serialization!

Antonio Gallardo
>
>
>
> -
> Please check that your question  has not already been answered in the
> FAQ before posting. 
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>




-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Antonio A. Gallardo Rivera
I dont know if MS IE 6.0 is standard compliant of not. I dont care. I
think MS IE try to be compliant. I develop under Linux. But I know too
that MS IE 6.0 is the most used browser in the world. Then? I also check
the MS IE presentation.

I dont want to continue this polemic about if this will work or not. Make
a simple example and you will see that:

"The use of PDF filename without extension in the MS IE 6.0 SP1 currently
does not work. This is a fact!

Maybe in the future we can make use of the URI without extension. But I am
developing for current browser. Every developer must take care of that.

Again, I agree with the theory of not use the extension. But for now. It
does not work for PDF case under MS Windows.

I already make use of the FOP and PDF serialization. Why make some wrong
afirmation to people (mainly newbies) about how to make it when it does
not work?

At the end, please dont take me wrong. I am just trying to help. :-D

Regards,

Antonio Gallardo.

Kjetil Kjernsmo dijo:
> On Saturday 09 November 2002 23:41, Antonio A. Gallardo Rivera wrote:
>> The true is that I wrote. If you dont believe me, I recommend you to
>> check the archive of this mailing list. This was not my fault. Not
>> only I found this error many other people had the same problem with IE
>> 6.0 SP1. I fighted with generation of PDF the content for a day after
>> I realize that the extension must be .pdf or it will not work!
>>
>> This is why I told you about the fine theory and the cruel reality.
>> :-D
>
> Uh-oh I'm catching some bad vibs... Can someone do me a favour of
> going to http://www.kjernsmo.net/ with IE6 and see what happens?
>
> The mainpage isn't a big thing, it is pure XHTML, but per the XHTML 1.0
> spec, it is served as text/html, but it is using simple Apache content
> negotation to set that. So, I've got this bad feeling that IE is going
> to ignore the content-type header and just list it as raw XML with no
> stylesheet, because that is what would be a logical consequence of what
> you write. But I can't for the life of me understand how it can be
> standards-compliant...
>
> Best,
>
> Kjetil
> --
> Kjetil Kjernsmo
> Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
> [EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
> Homepage: http://www.kjetil.kjernsmo.net/
>
>
> -
> Please check that your question  has not already been answered in the
> FAQ before posting. 
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>




-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Kjetil Kjernsmo
On Saturday 09 November 2002 23:57, Barbara Post wrote:
> Oh, I get 406 code, I didn't know this one !!
> I have IE6 SP1 on Windows 2000 Pro.

Hehe, oh well, that's another browser quirk, but a much less serious so. 
I use language negotation too, so what everybody _should_ do is go into 
their settings and make sure they enable all languages they know how to 
read... Check out http://www.debian.org/intro/cn for a howto... 
Lacking that, browser vendors should add an *;q=0.001 to their language 
strings to avoid this error, but that's a lot more IMHO than the other 
things I've written in this thread... :-) 

I'd tried to talk the Mozilla folks into that in Bug 55800, 
http://bugzilla.mozilla.org/show_bug.cgi?id=55800 and the fact that 
you're getting 406 is proof that they are wrong... :-)
Anyway, I've been logging language settings for a long time on one of my 
sites, and it was in fact very few users who had browsers were it would 
break, and language negotation is quite cool, so I decided to use it. 
Besides, those users who have it wrong could be catered for with a good 
error handler, if I had bothered... :-) 

Thanks for checking! :-)

Best,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Barbara Post
Oh, I get 406 code, I didn't know this one !!
I have IE6 SP1 on Windows 2000 Pro.

Babs
--
website : www.babsfrance.fr.st
ICQ : 135868405
- Original Message -
From: "Kjetil Kjernsmo" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, November 09, 2002 11:42 PM
Subject: Re: URL Theory & Best Practices


>
> Uh-oh I'm catching some bad vibs... Can someone do me a favour of
> going to http://www.kjernsmo.net/ with IE6 and see what happens?
>
> The mainpage isn't a big thing, it is pure XHTML, but per the XHTML 1.0
> spec, it is served as text/html, but it is using simple Apache content
> negotation to set that. So, I've got this bad feeling that IE is going
> to ignore the content-type header and just list it as raw XML with no
> stylesheet, because that is what would be a logical consequence of what
> you write. But I can't for the life of me understand how it can be
> standards-compliant...
>
> Best,
>
> Kjetil
> --
> Kjetil Kjernsmo
> Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
> [EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
> Homepage: http://www.kjetil.kjernsmo.net/
>
>
> -
> Please check that your question  has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html>
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>
>
>
> __
> Modem offert : 150,92 euros remboursés sur le Pack eXtense de Wanadoo !
> Haut débit à partir de 30 euros/mois : http://www.ifrance.com/_reloc/w


__
Modem offert : 150,92 euros remboursés sur le Pack eXtense de Wanadoo ! 
Haut débit à partir de 30 euros/mois : http://www.ifrance.com/_reloc/w


-
Please check that your question  has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Erik Bruchez
Uh-oh I'm catching some bad vibs... Can someone do me a favour of
going to http://www.kjernsmo.net/ with IE6 and see what happens?


I don't think IE ignores the content-type header. It may be more lax 
than NS when it sees a file extension, but the home page of your site 
does not have any. The page displays fine with IE 6.

-Erik



-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>



Re: URL Theory & Best Practices

2002-11-09 Thread Kjetil Kjernsmo
On Saturday 09 November 2002 23:41, Antonio A. Gallardo Rivera wrote:
> The true is that I wrote. If you dont believe me, I recommend you to
> check the archive of this mailing list. This was not my fault. Not
> only I found this error many other people had the same problem with
> IE 6.0 SP1. I fighted with generation of PDF the content for a day
> after I realize that the extension must be .pdf or it will not work!
>
> This is why I told you about the fine theory and the cruel reality.
> :-D

Uh-oh I'm catching some bad vibs... Can someone do me a favour of 
going to http://www.kjernsmo.net/ with IE6 and see what happens?

The mainpage isn't a big thing, it is pure XHTML, but per the XHTML 1.0 
spec, it is served as text/html, but it is using simple Apache content 
negotation to set that. So, I've got this bad feeling that IE is going 
to ignore the content-type header and just list it as raw XML with no 
stylesheet, because that is what would be a logical consequence of what 
you write. But I can't for the life of me understand how it can be 
standards-compliant...  

Best,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Antonio A. Gallardo Rivera
The true is that I wrote. If you dont believe me, I recommend you to check
the archive of this mailing list. This was not my fault. Not only I found
this error many other people had the same problem with IE 6.0 SP1. I
fighted with generation of PDF the content for a day after I realize that
the extension must be .pdf or it will not work!

This is why I told you about the fine theory and the cruel reality. :-D


Antonio Gallardo.

Kjetil Kjernsmo dijo:
> On Saturday 09 November 2002 21:33, Miles Elam wrote:
>> Antonio A. Gallardo Rivera wrote:
>> >Kjetil Kjernsmo dijo:
>> >>On Thursday 07 November 2002 23:57, Tony Collen wrote:
>> >>>However, later I realize that using file extensions is "bad".
>> >>> Read http://www.alistapart.com/stories/slashforward/ for more info
>> on this idea.
>> >
>> >I know about that. The theory is fine, but in the real world... Are
>> > you tried to open a PDF file without the .pdf extension with MS IE
>> 6.0 SP1?
>
> No, I have barely touched IE since 3.0.
>
>> > It does not work. MS IE relays mainly on the extension of
>> > the file to open a pdf file.
>
> What?!? What you're saying is that IE is ignoring the content-type?
> That's just incredibly silly...
>
>> How we can address this? I already
>> > know that Carsten and Mathew in his book dont recommend the use of
>> extension and I agree. But how we can tell MS Internet Explorer
>> about that?
>>
>> PDF isn't IE's normal method of receiving information (ease of use
>> with Acrobat aside).  If you specifically want the PDF
>> representation, specify *.pdf.  If what you want is the resource, then
>> you aren't asking specifically for PDF.  If all you have is PDF and
>> PDF is the only representation, then having your URL specify that you
>> are serving PDF hurts no one and corrupts no URLs.
>
> Yes it does! What representation is chosen should only depend on the
> Accept header, and what the UA should do with a file it receives should
> have nothing to do with the filename whatsoever, it should be based on
> the Content-Type-header in the response, solely. It's been a while
> since I read the HTTP 1.1 spec, but IIRC, it is pretty clearly spelled
> out there. It should only depend on the MIME-type. On the server and
> client sides, separately, how it is done is of no concern of anybody,
> but that the client depends on what file extension the server uses has
> to be a violation of the spec, again IIRC.
>
> During content negotation, an extensionless URL should be responded to
> with 200 if the server has a representation which is acceptable
> according to the client's Accept*-headers, with a Location-header
> saying where to find the best file, and that file may well have a .pdf
> extension. If no appropriate representation is found, the server should
> respond with 406.
>
> 
>
> Cheers,
>
> Kjetil
> --
> Kjetil Kjernsmo
> Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
> [EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
> Homepage: http://www.kjetil.kjernsmo.net/
>
>
> -
> Please check that your question  has not already been answered in the
> FAQ before posting. 
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>




-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Kjetil Kjernsmo
On Saturday 09 November 2002 21:33, Miles Elam wrote:
> Antonio A. Gallardo Rivera wrote:
> >Kjetil Kjernsmo dijo:
> >>On Thursday 07 November 2002 23:57, Tony Collen wrote:
> >>>However, later I realize that using file extensions is "bad". 
> >>> Read http://www.alistapart.com/stories/slashforward/ for more
> >>> info on this idea.
> >
> >I know about that. The theory is fine, but in the real world... Are
> > you tried to open a PDF file without the .pdf extension with MS IE
> > 6.0 SP1?

No, I have barely touched IE since 3.0.

> > It does not work. MS IE relays mainly on the extension of
> > the file to open a pdf file. 

What?!? What you're saying is that IE is ignoring the content-type? 
That's just incredibly silly... 

> How we can address this? I already
> > know that Carsten and Mathew in his book dont recommend the use of
> > extension and I agree. But how we can tell MS Internet Explorer
> > about that?
>
> PDF isn't IE's normal method of receiving information (ease of use
> with Acrobat aside).  If you specifically want the PDF
> representation, specify *.pdf.  If what you want is the resource,
> then you aren't asking specifically for PDF.  If all you have is PDF
> and PDF is the only representation, then having your URL specify that
> you are serving PDF hurts no one and corrupts no URLs.

Yes it does! What representation is chosen should only depend on the 
Accept header, and what the UA should do with a file it receives should 
have nothing to do with the filename whatsoever, it should be based on 
the Content-Type-header in the response, solely. It's been a while 
since I read the HTTP 1.1 spec, but IIRC, it is pretty clearly spelled 
out there. It should only depend on the MIME-type. On the server and 
client sides, separately, how it is done is of no concern of anybody, 
but that the client depends on what file extension the server uses has 
to be a violation of the spec, again IIRC. 

During content negotation, an extensionless URL should be responded to 
with 200 if the server has a representation which is acceptable 
according to the client's Accept*-headers, with a Location-header 
saying where to find the best file, and that file may well have a .pdf 
extension. If no appropriate representation is found, the server should 
respond with 406. 



Cheers,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Miles Elam
Antonio A. Gallardo Rivera wrote:


Kjetil Kjernsmo dijo:
 

On Thursday 07 November 2002 23:57, Tony Collen wrote:
   

However, later I realize that using file extensions is "bad".  Read
http://www.alistapart.com/stories/slashforward/ for more info on this
idea.
 


I know about that. The theory is fine, but in the real world... Are you
tried to open a PDF file without the .pdf extension with MS IE 6.0 SP1? It
does not work. MS IE relays mainly on the extension of the file to open a
pdf file. How we can address this? I already know that Carsten and Mathew
in his book dont recommend the use of extension and I agree. But how we
can tell MS Internet Explorer about that?


PDF isn't IE's normal method of receiving information (ease of use with 
Acrobat aside).  If you specifically want the PDF representation, 
specify *.pdf.  If what you want is the resource, then you aren't asking 
specifically for PDF.  If all you have is PDF and PDF is the only 
representation, then having your URL specify that you are serving PDF 
hurts no one and corrupts no URLs.

- Miles



-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>



Re: URL Theory & Best Practices

2002-11-09 Thread Antonio A. Gallardo Rivera
Kjetil Kjernsmo dijo:
> Hi!
>
> Interesting thread! Most things has been said allready, but I'll just
> add a little .02 (whatever currency) :-)
>
> On Thursday 07 November 2002 23:57, Tony Collen wrote:
>
>> However, later I realize that using file extensions is "bad".  Read
>> http://www.alistapart.com/stories/slashforward/ for more info on this
>> idea.

I know about that. The theory is fine, but in the real world... Are you
tried to open a PDF file without the .pdf extension with MS IE 6.0 SP1? It
does not work. MS IE relays mainly on the extension of the file to open a
pdf file. How we can address this? I already know that Carsten and Mathew
in his book dont recommend the use of extension and I agree. But how we
can tell MS Internet Explorer about that?

Antonio Gallardo.
>
> The article is interesting, but a little too narrow. Indeed, using file
> extensions are Bad[tm] for URIs because you tie the address to a
> specific technology, which you may not be using in some years. For that
> reason not changing the default "cocoon" in Cocoon URIs are also a  Bad
> Thing[tm].
>
> The authorative reference on this topic is TimBL's rant "Cool URIs don't
>  change": http://www.w3.org/Provider/Style/URI :-)
>
> So, using directories for everything is one possibility, and if you do,
> make sure to include the trailing /, to avoid useless 301 redirects.
> Another option is to use Content Negotation, which is well defined in
> HTTP 1.1 (and earlier, IIRC), it's weird that it isn't more widely
> used.
>
> But, both content negotation and using directories for everything are
> both solutions that exists mainly because URIs have been so strongly
> tied to the file system of the server, and the mentioned article seems
> to take as granted that this connection is a necessity, but as Cocoon
> proves, this is not so. Just use sensible matches. It also means that
> requiring a trailing slash on every URI is a bit too much, I only do
> that if there is logically a hierarchal substructure.
>
> As for the problem of serving different formats to the client, I really
> have no good solution. What the user agents should do, was to let the
> user easily manipulate the Accept-header, so if the user wants a
> PDF-file, he would send only application/pdf in the Accept header, and
> the server would know that the user wanted a PDF-file, and send that.
> Given that it doesn't exist, appending the type to the URI is probably
> not too bad.
>
> Best,
>
> Kjetil
> --
> Kjetil Kjernsmo
> Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
> [EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
> Homepage: http://www.kjetil.kjernsmo.net/
>
>
> -
> Please check that your question  has not already been answered in the
> FAQ before posting. 
>
> To unsubscribe, e-mail: <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>




-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-09 Thread Miles Elam
Tony Collen wrote:


Comments inline...

Miles Elam wrote:


But can't delivered types differ by the incoming client?


Yes, but a problem then arises when someone is using IE and they want 
a PDF, when your user-agent rules will only serve a PDF for FooCo PDF 
Browser 1.0.  IMO browsers should respect the mime-type header.  I 
believe the mime-type headers is very useful when you want to use 
something like a PHP script to send an image or a .tar.gz file.  In 
fact, it's essential for it to work, otherwise the browser interprets 
the data as garbage. 

No, that's wasn't my intention at all.  If someone is using IE and they 
want a pdf (not a default expectation for that particular browser like 
html or xml), then the URL they would get directed to would be *.pdf. 
This is not the intrinsic resource.  You are explicitly asking for the 
PDF representation of that resource.

If the browser's default expectation is PDF (like in your FooCo PDF 
Browser 1.0 example), the trailing slash resource would give it PDF. 
However, it could still be pointed to *.pdf if you wanted to make it 
explicit.

In those cases where only PDF is available (common when it's not 
dynamically generated), I see no reason why the URI wouldn't be *.pdf. 
In fact, if in the future more presentation types are added, a special 
case for *.pdf to return a static resource and all other variations 
being dynamically generated (or some other mixing and matching) would 
still be valid and a stable URI space.

As far as a php script returning an image, that's fine, but if the URL 
ends with (or even contains) any reference to "php", you are tying your 
URI to a particular technology/delivery method.  With Cocoon, why not 
map /foo/bar/alpha.png to the PHP script that returns a PNG image?  In 
this case, I'm not advocating the trailing slash.  I am advocating that 
you not have PHP even mentioned in the URL.  In this case, the resource 
is a PNG image without regard to client -- have the URL reflect this.

This is where we differ slightly.  In my mind /a/b/ is the intrinsic 
resource.  /a/b/index.html is the explicit call for HTML represention 
of /a/b/.  If you redirect a client to /a/b/index.html and the client 
bookmarks it, they are bookmarking the HTML representation, not the 
intrinsic resource.  I understand the efficiency issues, but a user 
agent match when viewed in the context of sitemap matches, 
server-side logic, servlet request and response object creation and 
other assorted methods calls is just a couple of string comparisons.

This is pretty much the original problem I was trying to solve.  Sure, 
having a clean URL space that always ends in a / is useful, but if you 
look at how that would work on the server, side, it means you create a 
physical directory for each page and then create an index.html.  You 
have tons of files named index.html on your web server, but at least 
it's all organized with the directories. 

Hmmm...  Why is it that your physical directory structure must have 
ANYTHING to do with the URL?  This flies right in the face of the reason 
for Cocoon's sitemap and the resources made available from Apache's 
httpd.conf.  You would indeed have many URLs that point to a resource 
called index.html, but your filesystem need not have any.  Your 
filesystem could be flat without any directories at all.  It could be 
replaced with a database.  ...or LDAP or xmldb or PHP...

If your filesystem is to be 1:1 with your URLs, why use Cocoon and a 
servlet engine at all?  A flat file webserver would serve things much 
faster.  The reason I want to use Cocoon is that it makes things 
*better* and not faster -- although I have methods for getting extra speed.

In my opinion, URLs should not change.


As further explained at http://www.useit.com/alertbox/990321.html 
The rundown:

   - URLs should not change
   - URLs are easy to remember (and therefore are organized logically)
   - URLs are easy to type and are generally all in lowercase

That is one of the main things that drew me to Cocoon: URI 
abstraction.  Once the URL is abstracted enough to act as a true URI, 
it can start acting as a true indentifier instead of an ad hoc, vague 
gobbledygook.  Of course this also assumes that the URL/URI remains 
set in stone and not a moving target.


Yes! This is exactly the conclusion I was coming to on my own. URIs 
are no more than data abstractions.  They usually provide a view to 
some data, and more often than not, a URL on a web server directly 
correlates with a physical file on a disk (e.g. index.html).  Cocoon 
allows one to create a purely virtual URL space in which no real files 
on the server could exist.  It probably doesn't matter how the 
underlying data is abstracted, whether it be a one-to-one correlation 
to a directory tree on a disk somewhere, or an xpath statement into an 
xml file, or arguments to a CGI script that accesses a database 
depending on the order of the items in the request. Imagine a request 
for

Re: URL Theory & Best Practices

2002-11-09 Thread Kjetil Kjernsmo
Hi!

Interesting thread! Most things has been said allready, but I'll just 
add a little .02 (whatever currency) :-)

On Thursday 07 November 2002 23:57, Tony Collen wrote:

> However, later I realize that using file extensions is "bad".  Read
> http://www.alistapart.com/stories/slashforward/ for more info on this
> idea.

The article is interesting, but a little too narrow. Indeed, using file 
extensions are Bad[tm] for URIs because you tie the address to a 
specific technology, which you may not be using in some years. For that 
reason not changing the default "cocoon" in Cocoon URIs are also a 
Bad Thing[tm].

The authorative reference on this topic is TimBL's rant "Cool URIs don't 
change": http://www.w3.org/Provider/Style/URI :-)

So, using directories for everything is one possibility, and if you do, 
make sure to include the trailing /, to avoid useless 301 redirects. 
Another option is to use Content Negotation, which is well defined in 
HTTP 1.1 (and earlier, IIRC), it's weird that it isn't more widely 
used. 

But, both content negotation and using directories for everything are 
both solutions that exists mainly because URIs have been so strongly 
tied to the file system of the server, and the mentioned article seems 
to take as granted that this connection is a necessity, but as Cocoon 
proves, this is not so. Just use sensible matches. It also means that 
requiring a trailing slash on every URI is a bit too much, I only do 
that if there is logically a hierarchal substructure. 

As for the problem of serving different formats to the client, I really 
have no good solution. What the user agents should do, was to let the 
user easily manipulate the Accept-header, so if the user wants a 
PDF-file, he would send only application/pdf in the Accept header, and 
the server would know that the user wanted a PDF-file, and send that. 
Given that it doesn't exist, appending the type to the URI is probably 
not too bad. 

Best,

Kjetil
-- 
Kjetil Kjernsmo
Astrophysicist/IT Consultant/Skeptic/Ski-orienteer/Orienteer/Mountaineer
[EMAIL PROTECTED]  [EMAIL PROTECTED]  [EMAIL PROTECTED]
Homepage: http://www.kjetil.kjernsmo.net/


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>




Re: URL Theory & Best Practices

2002-11-08 Thread Tony Collen
Comments inline...

Miles Elam wrote:


Justin Fagnani-Bell wrote:


  I've wrestled with similar problems for a while with my content 
management system, which uses a database for content and structure. 
I'm in the process of setting the system to use file extensions for 
the client to specify the file type and have Cocoon return that type. 
If they request /a.html, they get html, /a.pdf and they get pdf, and 
so on. This seems elegant, but it has problems when you consider the 
points covered in the slashforward article. Here's the compromise 
I've come up with so far, adapted to a filesystem like you're using. 
I'm still toying with these ideas, so i'd like to hear comments.

1) Instead of having directories with index.xml files, have a 
directory and an xml file with the same name at the same level.
so you have /a/b/ actually returning /a/b.xml. you could map a 
request for /a/b/index.html to /a/b.xml as well. This way you can add 
a leaf, and if you need to later add sub-nodes, and turn the leaf 
into a node, you just add a directory and some files underneath it. 


sounds good to me


2) Redirect all urls to *not* end in a slash. I see the point of the 
article you've linked to, and agree with it, but the file extension 
is the only form of file meta data that's pretty standard. Ending all 
urls in slashes only works, in my opinion, if all the files are the 
same type, if not it's really nice to have a way of identifying the 
type from the url, not just the mime-type response header. So 
considering that any request is going to point to a leaf (or an error 
page), then I would redirect /a/b/ to /a/b.html 


But can't delivered types differ by the incoming client?


Yes, but a problem then arises when someone is using IE and they want a 
PDF, when your user-agent rules will only serve a PDF for FooCo PDF 
Browser 1.0.  IMO browsers should respect the mime-type header.  I 
believe the mime-type headers is very useful when you want to use 
something like a PHP script to send an image or a .tar.gz file.  In 
fact, it's essential for it to work, otherwise the browser interprets 
the data as garbage.

This is where we differ slightly.  In my mind /a/b/ is the intrinsic 
resource.  /a/b/index.html is the explicit call for HTML represention 
of /a/b/.  If you redirect a client to /a/b/index.html and the client 
bookmarks it, they are bookmarking the HTML representation, not the 
intrinsic resource.  I understand the efficiency issues, but a user 
agent match when viewed in the context of sitemap matches, server-side 
logic, servlet request and response object creation and other assorted 
methods calls is just a couple of string comparisons.

This is pretty much the original problem I was trying to solve.  Sure, 
having a clean URL space that always ends in a / is useful, but if you 
look at how that would work on the server, side, it means you create a 
physical directory for each page and then create an index.html.  You 
have tons of files named index.html on your web server, but at least 
it's all organized with the directories.

In particular, as new clients become more and more capable, a give and 
take can take place when the resource identifier is left ambiguous.  
For example giving Opera the XHTML/CSS version and IE6 the XML w/ XSLT 
processing instruction.  I'm sure we're all aware of IE's fixation on 
file extension (or at least anyone who's fought with serving PDFs when 
the URL didn't end in PDF).  If you pass XML w/ processing instruction 
from a URL tagged with .html, I'm not entirely convinced that IE will 
get this straight.  The file extension can become a straightjacket.

As clients become more advanced, some work (ie. XSLT processing, 
XInclude work, etc) can be offloaded from the server.  If someone has 
the .html version bookmarked or copied to email, we have basically 
made a contract with the user that they will always receive HTML for 
this resource no matter the capabilities of the client.



In my opinion, URLs should not change. 

As further explained at http://www.useit.com/alertbox/990321.html  

The rundown:

   - URLs should not change
   - URLs are easy to remember (and therefore are organized logically)
   - URLs are easy to type and are generally all in lowercase

That is one of the main things that drew me to Cocoon: URI 
abstraction.  Once the URL is abstracted enough to act as a true URI, 
it can start acting as a true indentifier instead of an ad hoc, vague 
gobbledygook.  Of course this also assumes that the URL/URI remains 
set in stone and not a moving target.

Yes! This is exactly the conclusion I was coming to on my own. URIs are 
no more than data abstractions.  They usually provide a view to some 
data, and more often than not, a URL on a web server directly correlates 
with a physical file on a disk (e.g. index.html).  Cocoon allows one to 
create a purely virtual URL space in which no real files on the server 
could exist.  It probably doesn't matter how the und

Re: URL Theory & Best Practices

2002-11-07 Thread Miles Elam
Justin Fagnani-Bell wrote:


  I've wrestled with similar problems for a while with my content 
management system, which uses a database for content and structure. 
I'm in the process of setting the system to use file extensions for 
the client to specify the file type and have Cocoon return that type. 
If they request /a.html, they get html, /a.pdf and they get pdf, and 
so on. This seems elegant, but it has problems when you consider the 
points covered in the slashforward article. Here's the compromise I've 
come up with so far, adapted to a filesystem like you're using. I'm 
still toying with these ideas, so i'd like to hear comments.

1) Instead of having directories with index.xml files, have a 
directory and an xml file with the same name at the same level.
so you have /a/b/ actually returning /a/b.xml. you could map a request 
for /a/b/index.html to /a/b.xml as well. This way you can add a leaf, 
and if you need to later add sub-nodes, and turn the leaf into a node, 
you just add a directory and some files underneath it. 

sounds good to me


2) Redirect all urls to *not* end in a slash. I see the point of the 
article you've linked to, and agree with it, but the file extension is 
the only form of file meta data that's pretty standard. Ending all 
urls in slashes only works, in my opinion, if all the files are the 
same type, if not it's really nice to have a way of identifying the 
type from the url, not just the mime-type response header. So 
considering that any request is going to point to a leaf (or an error 
page), then I would redirect /a/b/ to /a/b.html 

But can't delivered types differ by the incoming client?

This is where we differ slightly.  In my mind /a/b/ is the intrinsic 
resource.  /a/b/index.html is the explicit call for HTML represention of 
/a/b/.  If you redirect a client to /a/b/index.html and the client 
bookmarks it, they are bookmarking the HTML representation, not the 
intrinsic resource.  I understand the efficiency issues, but a user 
agent match when viewed in the context of sitemap matches, server-side 
logic, servlet request and response object creation and other assorted 
methods calls is just a couple of string comparisons.

In particular, as new clients become more and more capable, a give and 
take can take place when the resource identifier is left ambiguous.  For 
example giving Opera the XHTML/CSS version and IE6 the XML w/ XSLT 
processing instruction.  I'm sure we're all aware of IE's fixation on 
file extension (or at least anyone who's fought with serving PDFs when 
the URL didn't end in PDF).  If you pass XML w/ processing instruction 
from a URL tagged with .html, I'm not entirely convinced that IE will 
get this straight.  The file extension can become a straightjacket.

As clients become more advanced, some work (ie. XSLT processing, 
XInclude work, etc) can be offloaded from the server.  If someone has 
the .html version bookmarked or copied to email, we have basically made 
a contract with the user that they will always receive HTML for this 
resource no matter the capabilities of the client.

In my opinion, URLs should not change.  That is one of the main things 
that drew me to Cocoon: URI abstraction.  Once the URL is abstracted 
enough to act as a true URI, it can start acting as a true indentifier 
instead of an ad hoc, vague gobbledygook.  Of course this also assumes 
that the URL/URI remains set in stone and not a moving target.

This way the extension isn't revealing the underlying technology of 
the site, but the type of file the client is expecting, and this goes 
for directories too.

Yup, although I think people underestimate the utility of the default 
directory listing when there is no index.html (or default.htm, 
home.html, etc.).  If you think back to the beginnings of the web, what 
was index.html but a dressed up view of all resources in the general area?

The matchers would look something like this: (i might have this wrong)


  



  
  
 

Shouldn't this be ?  But 
yeah, that's assuming that the resource will be HTML.  A valid 
assumption for most sites...for the time being.  A lot has changed in 
the last few years and a lot of new clients have jumped on the scene. 
As I mentioned before, I believe URLs should be as permanent as 
possible.  This has no flexibility for the future.

This is based upon the sitemap we're using as a working model (I might 
also have something wrong):



 
 
 
   
 
 




 
 
 




 
 
   
 
 




 
 
 




 
   
 
 
   
   
 
 
   
   
 
 
   
   
 
 
   
 


This all works on the following assumptions:

 "/a/b/d/" refers to a resource independant of presentation.  From 
here, we do browser type checking for the appropriate output type.

 "/a/b/d/index.xml" refers to a list of resources associated with "/a/b/d/"

 "/a/b/d/page.xml" refers to the resource explicitly as XML.

 "/a/b/d/page.html" refers to the resource explicitly as HTML.

--

This also refle

Re: URL Theory & Best Practices

2002-11-07 Thread Justin Fagnani-Bell
Tony,

  I've wrestled with similar problems for a while with my content 
management system, which uses a database for content and structure. I'm 
in the process of setting the system to use file extensions for the 
client to specify the file type and have Cocoon return that type. If 
they request /a.html, they get html, /a.pdf and they get pdf, and so 
on. This seems elegant, but it has problems when you consider the 
points covered in the slashforward article. Here's the compromise I've 
come up with so far, adapted to a filesystem like you're using. I'm 
still toying with these ideas, so i'd like to hear comments.

1) Instead of having directories with index.xml files, have a directory 
and an xml file with the same name at the same level.
so you have /a/b/ actually returning /a/b.xml. you could map a request 
for /a/b/index.html to /a/b.xml as well. This way you can add a leaf, 
and if you need to later add sub-nodes, and turn the leaf into a node, 
you just add a directory and some files underneath it.

2) Redirect all urls to *not* end in a slash. I see the point of the 
article you've linked to, and agree with it, but the file extension is 
the only form of file meta data that's pretty standard. Ending all urls 
in slashes only works, in my opinion, if all the files are the same 
type, if not it's really nice to have a way of identifying the type 
from the url, not just the mime-type response header. So considering 
that any request is going to point to a leaf (or an error page), then I 
would redirect /a/b/ to /a/b.html

This way the extension isn't revealing the underlying technology of the 
site, but the type of file the client is expecting, and this goes for 
directories too.

The matchers would look something like this: (i might have this wrong)


  



  
  


Add matchers, or use selectors, for more file types.



-Justin


On Thursday, November 7, 2002, at 02:57  PM, Tony Collen wrote:

Apologies for the extra long post, but this has been bugging me for a 
while.

First, some background:

I'm attempting to put together a URL space using cocoon that will 
allow users to drop an XML file into a directory, say 
$TOMCAT_HOME/webapps/cocoon/documents/ and have it published.  This is 
easy enough:


   
   


So then I decide that for organization's sake, I want to allow people 
to create subdirectories under documents/ any number of levels deep, 
and still have cocoon publish them. This is also fairly simple:


   
   
   


However, later I realize that using file extensions is "bad".  Read 
http://www.alistapart.com/stories/slashforward/ for more info on this 
idea.
This creates problems with how I automatically generate content using 
Cocoon.  I want to allow people to create content arbitrarily deep in 
the documents/ directory, but I run into a bunch of questions.

Should trailing slashes always be used? I think so.
Therefore: Consider an HTTP request for "/a/b/c/".

   1. Is it a request for the discreet resource named "c" which is 
contained in "b"?
   2. Is it a request for the listing of all the contents of the "c" 
resource (which is in turn contained within "b")?
   3. Is this equivalent to a request for "/a/b/c"? 3b. Should 
a request for something w/o a trailing slash be redirected to the same 
URL, but with a trailing slash added?

Using the "best practice" of always having trailing slashes creates 
problems when mapping the virtual URL space to a physical directory 
structure.  Considering a request for "/a/b/c/", do I go into 
documents/a/b/c/ and generate from index.xml?  Or do I go to 
documents/a/b/ and generate from c.xml?  Having every "leaf" be a 
directory with an index.xml gets to be unmaintainable, IMO.

Likewise, do I generate from documents/a/b/d.xml or 
documents/a/b/d/index.xml for a request of "/a/b/d"?  Additionally, 
what should happen when there's a request for "/a/b/"?  Obviously, if 
the subdirectory "b" exists, it would not be correct to go to 
documents/a/ and look for b.xml.

Part of my reasoning behind all these questions lies in my quest for 
creating an uber-flexible "drop-in" directory structure where people 
can simply add their .xml files to the "documents" directory and have 
Cocoon automagically publish them, as I stated above.  The other 
reason for this is that I'm trying to devise a system which 
automatically creates navigation, as well. I've looked at the 
Bonebreaker example, and it's good, but has some limitations.  What if 
I don't want to use the naming scheme they have?

Oh well, thanks for listening to my ramblings, and hopefully I can get 
some light shed on this situation, as well as have a nifty autonavbar 
work eventually :)
Regards,
Tony


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:  

URL Theory & Best Practices

2002-11-07 Thread Tony Collen
Apologies for the extra long post, but this has been bugging me for a 
while.

First, some background:

I'm attempting to put together a URL space using cocoon that will allow 
users to drop an XML file into a directory, say 
$TOMCAT_HOME/webapps/cocoon/documents/ and have it published.  This is 
easy enough:


   
   


So then I decide that for organization's sake, I want to allow people to 
create subdirectories under documents/ any number of levels deep, and 
still have cocoon publish them. This is also fairly simple:


   
   
   


However, later I realize that using file extensions is "bad".  Read 
http://www.alistapart.com/stories/slashforward/ for more info on this 
idea.  

This creates problems with how I automatically generate content using 
Cocoon.  I want to allow people to create content arbitrarily deep in 
the documents/ directory, but I run into a bunch of questions.

Should trailing slashes always be used? I think so.  

Therefore: Consider an HTTP request for "/a/b/c/".

   1. Is it a request for the discreet resource named "c" which is 
contained in "b"?
   2. Is it a request for the listing of all the contents of the "c" 
resource (which is in turn contained within "b")?
   3. Is this equivalent to a request for "/a/b/c"?  
   3b. Should a request for something w/o a trailing slash be 
redirected to the same URL, but with a trailing slash added?

Using the "best practice" of always having trailing slashes creates 
problems when mapping the virtual URL space to a physical directory 
structure.  Considering a request for "/a/b/c/", do I go into 
documents/a/b/c/ and generate from index.xml?  Or do I go to 
documents/a/b/ and generate from c.xml?  Having every "leaf" be a 
directory with an index.xml gets to be unmaintainable, IMO.

Likewise, do I generate from documents/a/b/d.xml or 
documents/a/b/d/index.xml for a request of "/a/b/d"?  Additionally, what 
should happen when there's a request for "/a/b/"?  Obviously, if the 
subdirectory "b" exists, it would not be correct to go to documents/a/ 
and look for b.xml.

Part of my reasoning behind all these questions lies in my quest for 
creating an uber-flexible "drop-in" directory structure where people can 
simply add their .xml files to the "documents" directory and have Cocoon 
automagically publish them, as I stated above.  The other reason for 
this is that I'm trying to devise a system which automatically creates 
navigation, as well. I've looked at the Bonebreaker example, and it's 
good, but has some limitations.  What if I don't want to use the naming 
scheme they have?

Oh well, thanks for listening to my ramblings, and hopefully I can get 
some light shed on this situation, as well as have a nifty autonavbar 
work eventually :)  

Regards,
Tony


-
Please check that your question  has not already been answered in the
FAQ before posting. 

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>