Martin v. Löwis wrote: >> Because you can force the encoder to use a specified encoding. If you do >> this and the unicode string starts with an XML declaration > > So what if the unicode string doesn't start with an XML declaration? > Will it add one?
No. > If so, what version number will it use? If we added this we could add an extra argument version to the encoder constructor defaulting to '1.0'. >>>> OK, so should I put the C code into a _xml module? >>> I don't see the need for C code at all. >> Doing the bit fiddling for >> Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the >> right thing to do. > > Hmm. I don't think a sequence like > > + if (strlen>0) > + { > + if (*str++ != '<') > + return 1; > + if (strlen>1) > + { > + if (*str++ != '?') > + return 1; > + if (strlen>2) > + { > + if (*str++ != 'x') > + return 1; > + if (strlen>3) > + { > + if (*str++ != 'm') > + return 1; > + if (strlen>4) > + { > + if (*str++ != 'l') > + return 1; > + if (strlen>5) > + { > + if (*str != ' ' && *str != '\t' && *str != > '\r' && *str != '\n') > + return 1; > > is well-maintainable C. I feel it is much better writing > > if not s.startswith("<=?xml"): > return 1 The point of this code is not just to return whether the string starts with "<?xml" or not. There are actually three cases: * The string does start with "<?xml" * The string starts with a prefix of "<?xml", i.e. we can only decide if it starts with "<?xml" if we have more input. * The string definitely doesn't start with "<?xml". > What bit fiddling are you referring to specifically that you think > is better done in C than in Python? The code that checks the byte signature, i.e. the first part of detect_xml_encoding_str(). Servus, Walter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com