What one can use for a HTTP field name is dictated by:

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

When you step through the various standards, you end up with:

        The  field-name must be composed of printable ASCII characters
        (i.e., characters that  have  values  between  33.  and  126.,
        decimal, except colon).

That is only for the header name though for HTTP.

Anyway, definitely can't have arbitrary characters such that could
handle byte string version of a Unicode string.

For WSGI, that header name gets converted to a CGI meta variable name
as defined in:

    http://www.ietf.org/rfc/rfc3875

as:

      meta-variable-name = "AUTH_TYPE" | "CONTENT_LENGTH" |
                           "CONTENT_TYPE" | "GATEWAY_INTERFACE" |
                           "PATH_INFO" | "PATH_TRANSLATED" |
                           "QUERY_STRING" | "REMOTE_ADDR" |
                           "REMOTE_HOST" | "REMOTE_IDENT" |
                           "REMOTE_USER" | "REQUEST_METHOD" |
                           "SCRIPT_NAME" | "SERVER_NAME" |
                           "SERVER_PORT" | "SERVER_PROTOCOL" |
                           "SERVER_SOFTWARE" | scheme |
                           protocol-var-name | extension-var-name
      protocol-var-name  = ( protocol | scheme ) "_" var-name
      scheme             = alpha *( alpha | digit | "+" | "-" | "." )
      var-name           = token
      extension-var-name = token

Where working back for token you get:

      alpha         = lowalpha | hialpha
      lowalpha      = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                      "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                      "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                      "y" | "z"
      hialpha       = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
                      "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
                      "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
                      "Y" | "Z"

      digit         = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                      "8" | "9"
      alphanum      = alpha | digit
      OCTET         = <any 8-bit byte>
      CHAR          = alpha | digit | separator | "!" | "#" | "$" |
                      "%" | "&" | "'" | "*" | "+" | "-" | "." | "`" |
                      "^" | "_" | "{" | "|" | "}" | "~" | CTL
      CTL           = <any control character>

      token         = 1*<any CHAR except CTLs or separators>

So, technically the code borrowed from Apache could well be too
restrictive as that would appear on first read to allow '%'.

Would have to do some more investigation as to why Apache does it that
way. Since for CGI it becomes a process environment variable, maybe
there is some restriction because of cross platform compatibility.

As far as what is accepted practice, I have never ever seen anyone
using anything for header names that wasn't alphanumeric and dash.

Graham

2011/8/14 Antony Chazapis <[email protected]>:
> Thanks for the reply Graham.
>
> I can easily change the code generating UTF-8 in header names, but the
> question here is - with what?
>
> I have found this:
> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q6
> Where it is suggested that url-encoding is used to send arbitrary
> characters in headers.
>
> The real problem here is that the isalnum() function in wsgi_http2env
> will not even allow '%'s to pass through - or '/'s, or '?'s (used in
> other UTF-8 encoding schemes). I can patch my mod_wsgi installation to
> overcome this, but I asked here in case someone else had the same
> problem and found a better solution.
>
> So, is anybody here aware of an "official" or "standard" guideline on
> how to send UTF-8 in headers?
>
> Antony
>
> On Aug 11, 12:49 pm, Graham Dumpleton <[email protected]>
> wrote:
>> HTTP header names by the HTTP RFC must be ASCII so the code generating
>> headers with full UTF-8 in header names is violating the
>> specification.
>>
>> FWIW, the wsgi_http2env is more or less an exact copy of similar
>> routine in Apache itself used in its mod_cgi modules when generating
>> similar variable names for CGI, which WSGI basically adheres to for
>> that encoding convention.
>>
>> Graham
>>
>> 2011/8/11 Antony Chazapis <[email protected]>:
>>
>>
>>
>>
>>
>>
>>
>> > Hello.
>>
>> > I'm using apache2/mod_wsgi to drive a django project that aims to
>> > implement/extend the OpenStack Object Storage API. In OpenStack Object
>> > Storage they use arbitrary X-Object-Meta-<key>=<value> headers to
>> > assign metadata to objects.
>>
>> > While the embedded django server allows utf8 characters in the
>> > headers, I have found that when I post utf8 to the apache/mod_wsgi
>> > installation, I receive an underscore ('_') in place of every non-
>> > ascii character. I traced this to the wsgi_http2env() function, which
>> > converts all non letter or number characters to '_'.
>>
>> > For example, when posting 'X-Object-Meta-ασδφ=a', I get
>> > 'HTTP_X_OBJECT_META_________=a'.
>>
>> > Is wsgi_http2env() really the source of this? If yes, why does
>> > mod_wsgi keep only letters and numbers?
>>
>> > This is really a problem, as I can not even use url encoding - '%'s
>> > are converted to '_' as well.
>>
>> > Thanks,
>>
>> > Antony
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "modwsgi" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to 
>> > [email protected].
>> > For more options, visit this group 
>> > athttp://groups.google.com/group/modwsgi?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/modwsgi?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to