Hi Addison, > A couple of notes: > > 1. The 3066 pattern is language-region, not the other way around.
Oops, that's what I meant :P > 2. Don't reference RFC 4646. RFCs get obsoleted over time. Instead, reference > BCP 47 (RFC 4646's designation in the IETF standards hierarchy). Ok, good point. Mark Davis also pointed this out. Done. > 3. Do reference RFC 4647 (as part of BCP 47) and, in particular, the Lookup > matching scheme. I think you'll find that this is simple and consistent with > existing practice. Ok, I'll have a read of 4647 and see how I articulate that in the spec. > 4. You may find that, if you recommend what you intend to, certain > applications are hindered. > In particular, some languages (Chinese!)use varying scripts and need the > script subtag from > RFC 4646. Your recommendation will stand in the way of that. Although a > validating > implementation of 4646 adds a bit of overhead, a "well-formed" implementation > isn't nearly > as difficult (it can be done with an admittedly-very-long regular > expression). A better suggestion > might be to recommend using the 3066 ABNF for "validation" (for its > simplicity). Thanks for pointing that out. We certainly don't want to hinder any applications or exclude any languages. In regards to regex, I found this: http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagRegex.txt If it is known to be suitable, I can make a note in the spec that implementers might like to look at the unicode regex code. At least in takes the pain out of trying to decipher the ABNF into regex (or having to implement an ABNF parser). Kind regards, Marcos -- Marcos Caceres http://datadriven.com.au
