Pier,
As a coincidence we recently (last week) had a similar post on xreporter-list (which uses cocoon)
Bad news is that I didn't track it down to the bottom yet, just some findings below:
(in fact the odd-char-in-filename for map:read and map:mount was one of the first things I was going to test, seems I'm already presented with the results)
what I did find already was this:
Cocoon's Request.getSitemapURI() will return an assembly of javax.servlet.http.HttpServletRequest.getServletPath()
+ javax.servlet.http.HttpServletRequest.getPathInfo()
Servlet spec on those states they will be (url-) decoded
Thus 3 char sequences of the kind "%BYTE_HEX" will have been translated into single bytes. The obtained byte-sequence is then decoded using SOME_DECODING (my guess would be using ISO-8859-1, but haven't found yet if this is container specific, modifiable or hard noted in some spec. Only thing I found is this: http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars, but I'm yet unsure on how this influences servlet specs, or actual container and even browser implementations for that matter)
Alternatively there is: Cocoon's Request.getRequestURI() which maps onto the javax.servlet.http.HttpServletRequest.getRequestURI()
This one resembles the URI as transferred over the wire: ie. not (url-)decoded, or in other words still holding the %XX sequences
As an extra clarification on all these the servlet spec explicitely states: (2.3 version, page 34, section SRV4.4 Request Path Elements)
<quote>
It is important to note that, *except for URL encoding differences* between the request URI and the path parts, the following equation is always true:
requestURI = contextPath + servletPath + pathInfo </quote>
I (for now) assume that this is the same encoding we expect cocoon-deploy people to specify in the 'container-encoding' init-parameter in the web.xml (allowing to correctly en-re-decode request-paramater-values in case of mismatching form and container encodings)
Ok, above is dull data, and not much into a direction of any solution yet. My current feeling (long shot, needs time to test and try, and based on above assumption) is that we should
In terms of backwards compatibility I'm unsure if we could just go about changing the semantics (histrocally implied use of iso-8859-1 encoding) of getSitemapURI() or rather should deprecate and/or have a different method next to it?
In any case this new implementation should then probably apply the same kind of dirty en-re-decoding-trick
new return(getSitemapURI().getBytes(container_encoding),form_encoding)
as we do today with the request param values?
(see http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/cocoon/environment/http/HttpRequest.java?annotate=1.11#391
sorry for the old cvs-style link, the svn version of viewcvs doesn't seem to support 'annotate' ?)
For the record: the fast hack/workaround in the xreporter case was exactly to apply this.
Attached to this I'm also seeing the trouble of mount-points in cocoon. I've seen a number of installments needing (well, 'using' at least) some insertion of that part-of-the-URL-that-maps-to-the-mounted-sitemap to be able to have links in source xml.files refer to other resources managed by the same mounted sitemap without the need to explicitely mention that part (but have it dynamically inserted by some xsl in stead).
In those occasions I've seen people mostly subtract siteMapURI from requestURI to obtain that prefix part. Regarding the above observations this algorithm will however fail due to encoding differences.
My proposal would be to not only add a method for decoding the sitemapURI properly, but in the mean time adding the convenience method to return the mounted-sitemap-part as well on the level of cocoon's request.
Above are early observations that need some backing, so comments welcome. (and hoping someone beats me to this since I'm lacking the time to pursue myself)
-marc=
Pier Fumagalli wrote:
On 12 Aug 2004, at 12:45, roy huang wrote:
Hi,all:
Use reader to display jpg or gif is quite simple,like:
<map:match pattern="*.jpg">
<map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
</map:match>
But if the file name is not ASCII but utf-8 or other encoding like è.jpg (simplified Chinese),the resolver didn't resolve the name correctly,error occur:
org.apache.cocoon.ResourceNotFoundException: Error during resolving of the input stream: org.apache.excalibur.source.SourceNotFoundException: file:/C:/My Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/ÃÂÂ.jpg doesn't exist.
How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.
Roy Huang
It appears indeed as a bug...
I have this sitemap snippet:
<map:match pattern="è*"> <map:generate src="è{1}.xml"/> <map:transform src="welcome.xslt"> <map:parameter name="contextPath" value="{request:contextPath}"/> </map:transform> <map:serialize type="xhtml"/> </map:match>
and a file on the disk called "èçå.xml". Somewhere, when I make a request for "http://localhost:8888/èçå", the whole thing goes berserk...
Now, the URL is passed correctly, as I see that in the access log:
INFO (2004-08-16) 10:26.36:538 [access] (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????' Processed by Apache Cocoon 2.1.5 in 27 milliseconds.
The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7 E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it gets lost in the process.
Now, if I modify my itemap to
<map:match pattern="tanisatoko"> <map:generate src="èçå.xml"/> <map:transform src="welcome.xslt"> <map:parameter name="contextPath" value="{request:contextPath}"/> </map:transform> <map:serialize type="xhtml"/> </map:match>
And I make a request to "http://localhost:8888/tanisatoko", the thing works perfectly. We can safely exclude the fact that it's the generation process.
Now, the _odd_ thing I noticed is that in those cases, I get an error of "PipelineNotFound", not a "ResourceNotFound", which means that the matcher seriously doesn't see that request.
Changing over the matcher to a 'regexp' matcher doesn't change, so, I bet it's the data we feed to the matcher.
Now, changing that matcher to "谷理子", the encoding, and running it again, I get my nice page correctly.
I bet that somewhere (I don't know where, but surely somewhere), the UTF-8 encoded URL converted into a string using the current locale (MacRoman on my system), or a default of "ISO-8859-1", before the string is actually given to the sitemap.
Not having the sources at hand at the moment, I can't do a quick build to put out some debugging instruction, but you get the idea.
Pier
-- Marc Portier http://outerthought.org/ Outerthought - Open Source, Java & XML Competence Support Center Read my weblog at http://blogs.cocoondev.org/mpo/ [EMAIL PROTECTED] [EMAIL PROTECTED]