Content-Disposition and utf8 filenames

2010-12-04 Thread Bill Moseley
I would like to use a user-supplied filename when returning a download (e.g.
pdf).  For example, might be filename=$title.pdf.  But $title can include
any character.

It seems like support for this in browsers is spotty:  See:
http://greenbytes.de/tech/tc2231/

Is anyone aware of a way to set this header to allow utf8 filenames that is
supported across browsers?

Also, my assumption is HTTP::Headers expect encoded values -- that is the
values are octets not characters and so should always encode( 'US-ASCII' )
the value.

I just tried with Google Apps and it seems they turn any non A-Za-z into an
underscore.  Not sure that means they just didn't try hard or if they felt
it was not possible to use non ASCII characters in suggested filenames.

-- 
Bill Moseley
mose...@hank.org


Re: Content-Disposition and utf8 filenames

2010-12-04 Thread Bjoern Hoehrmann
* Bill Moseley wrote:
I would like to use a user-supplied filename when returning a download (e.g.
pdf).  For example, might be filename=$title.pdf.  But $title can include
any character.

It seems like support for this in browsers is spotty:  See:
http://greenbytes.de/tech/tc2231/

Is anyone aware of a way to set this header to allow utf8 filenames that is
supported across browsers?

No, as you can tell from the results there is no one way supported by
all major browsers. The IETF HTTPbis Working Group is currently revising
the specification for it, and the recommended way to do this will be the
through the RFC 5987 style notation, where you can also specify a fall-
back value by using both the filename* and filename parameters. But no
silver bullet there.

Also, my assumption is HTTP::Headers expect encoded values -- that is the
values are octets not characters and so should always encode( 'US-ASCII' )
the value.

(That is generally correct, yes).

I just tried with Google Apps and it seems they turn any non A-Za-z into an
underscore.  Not sure that means they just didn't try hard or if they felt
it was not possible to use non ASCII characters in suggested filenames.

As I understand it, Google is currently investigating switching to RFC
5987 style encoding in some of their applications.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


Re: Content-Disposition and utf8 filenames

2010-12-04 Thread Bill Moseley
On Sat, Dec 4, 2010 at 6:40 AM, Bjoern Hoehrmann derhoe...@gmx.net wrote:


 No, as you can tell from the results there is no one way supported by
 all major browsers. The IETF HTTPbis Working Group is currently revising
 the specification for it, and the recommended way to do this will be the
 through the RFC 5987 style notation, where you can also specify a fall-
 back value by using both the filename* and filename parameters. But no
 silver bullet there.


Thanks very much for the pointers.

Oh, that RFC says 8859-1 is supported, not just ASCII.  So, I suppose I
could encode to 8859-1 and have encode() use an underscore for the
substitution character.  But, then do I have to be concerned with removing
any characters that might not be appropriate for their (the client's) file
system?  That is, remove slashes?

Maybe I should just stick to something basic like:

$filename = substr( $title, 0, 50 );  # But 78 is allowed
$filename =~ s/[^A-Za-z0-9]/_/g;  # Replace
$filename = 'document' unless $filename =~ /[^_]/;
$filename =~ s/^_{2,}/_/;  # trim to make less ugly?
$filename =~ s/_{2,}$/_/;
$filename = encode( 'US-ASCII', $filename );
$filename .= '.pdf';  # for example



 As I understand it, Google is currently investigating switching to RFC
 5987 style encoding in some of their applications.


That would require some kind of browser detection, right?

Hopefully, that might all get abstracted out into a HTTP::Headers
method/subclass in the future.

Thanks,

-- 
Bill Moseley
mose...@hank.org