On 12 Aug 2004, at 12:45, roy huang wrote:

Hi,all:
Use reader to display jpg or gif is quite simple,like:
<map:match pattern="*.jpg">
<map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
</map:match>
But if the file name is not ASCII but utf-8 or other encoding like è.jpg (simplified Chinese),the resolver didn't resolve the name correctly,error occur:
org.apache.cocoon.ResourceNotFoundException: Error during resolving of the input stream: org.apache.excalibur.source.SourceNotFoundException: file:/C:/My Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/ÃÂÂ.jpg doesn't exist.

How can I use non-ASCII file name in cocoon?I can't find any description or help in wiki or archived mail list.

Roy Huang

It appears indeed as a bug...

I have this sitemap snippet:

<map:match pattern="è*">
<map:generate src="è{1}.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath" value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>

and a file on the disk called "èçå.xml". Somewhere, when I make a request for "http://localhost:8888/èçå";, the whole thing goes berserk...

Now, the URL is passed correctly, as I see that in the access log:

INFO (2004-08-16) 10:26.36:538 [access] (/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????' Processed by Apache Cocoon 2.1.5 in 27 milliseconds.

The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7 E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it gets lost in the process.

Now, if I modify my itemap to

<map:match pattern="tanisatoko">
<map:generate src="èçå.xml"/>
<map:transform src="welcome.xslt">
<map:parameter name="contextPath" value="{request:contextPath}"/>
</map:transform>
<map:serialize type="xhtml"/>
</map:match>

And I make a request to "http://localhost:8888/tanisatoko";, the thing works perfectly. We can safely exclude the fact that it's the generation process.

Now, the _odd_ thing I noticed is that in those cases, I get an error of "PipelineNotFound", not a "ResourceNotFound", which means that the matcher seriously doesn't see that request.

Changing over the matcher to a 'regexp' matcher doesn't change, so, I bet it's the data we feed to the matcher.

Now, changing that matcher to "&#xe8;&#xb0;&#xb7;&#xe7;&#x90;&#x86;&#xe5;&#xad;&#x90;", the encoding, and running it again, I get my nice page correctly.

I bet that somewhere (I don't know where, but surely somewhere), the UTF-8 encoded URL converted into a string using the current locale (MacRoman on my system), or a default of "ISO-8859-1", before the string is actually given to the sitemap.

Not having the sources at hand at the moment, I can't do a quick build to put out some debugging instruction, but you get the idea.

Pier

Attachment: smime.p7s
Description: S/MIME cryptographic signature



Reply via email to