> Using any guessing based on the locale (which describes the codec used
> byt the user's console, but is completely uncorrelated to any particular
> file on the user's filesystem)
No, it's not just the encoding of the console. It is also the encoding
that text editors will use, in absence of a mo
"Martin v. Löwis" writes:
> My bet is that the majority of Python applications written today do
> "web" stuff. In the web, input encoding and output encoding are
> fairly decorrelated - in particular for databases and files read
> from disk.
Sure. Which means that programmers have to do a lo
Antoine Pitrou writes:
> Stephen J. Turnbull xemacs.org> writes:
> >
> > But it *does* determine the charset of ErrorDocuments displayed by
> > Apache. Users are likely to get somewhat confused if the
> > ErrorDocuments are in a different charset from your dynamic HTML.
>
> Why would the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Stephen J. Turnbull wrote:
> You just can't get away from the need for explicit management of
> codecs if you want a robust internationalized application. I don't
> object to giving users an easy way to get the behavior Michael
> proposes; it just sh
Subject: [ANN] Python 2.5.5 Release Candidate 2.
On behalf of the Python development team and the Python community, I'm
happy to announce the release candidate 2 of Python 2.5.5.
This is a source-only release that only includes security fixes. The
last full bug-fix release of Python 2.5 was Pytho
On Sun, Jan 24, 2010 at 1:54 PM, Oleg Broytman wrote:
..
> Depends on the kind of cat and especially on the ways of using it. If
> you ask cat to number lines (see manual for GNU cat) - what do "lines" mean
> for binary IO?
Maybe this is yet another reason why some kinds of cat are a bad idea:
Oleg Broytman phd.pp.ru> writes:
>
>Depends on the kind of cat and especially on the ways of using it. If
> you ask cat to number lines (see manual for GNU cat) - what do "lines" mean
> for binary IO?
b"\n"-separated chunks of data. See the docs:
http://docs.python.org/3.1/library/io.html#io
> I concede that I have no better statistics on the matter than you do,
> but I think that's wishful thinking. It is quite common for "pure
> output" to be mixed with "echoed input", for example. Even if a file
> is converted to another format (eg, restructured text to LaTeX), it's
> very common
On Sun, Jan 24, 2010 at 07:45:20PM +0100, "Martin v. L?wis" wrote:
> This may be a bit out of context - however, a simple cat program should
> open files in binary, and be done.
>
> (not sure whether the average naive programmer is able to grasp the
> notion of binary IO and to oppose to text IO,
On 24/01/2010 18:41, "Martin v. Löwis" wrote:
However it is likely to be often wrong, and where the user's locale
specifies an encoding like CP1252 then it will result in silent
corruption rather than an immediate exception.
Why do you say that? Why do you think it will likely be often wro
> So what is your naive programmer supposed to expect
> when writing a cat program?
This may be a bit out of context - however, a simple cat program should
open files in binary, and be done.
(not sure whether the average naive programmer is able to grasp the
notion of binary IO and to oppose to t
> However it is likely to be often wrong, and where the user's locale
> specifies an encoding like CP1252 then it will result in silent
> corruption rather than an immediate exception.
Why do you say that? Why do you think it will likely be often wrong?
Most likely, encoding text files with cp1252
2010/1/24 Floris Bruynooghe
> Introducing C++ is a big step, but I disagree that it means C++ should
> be allowed in the other CPython code. C++ can be problematic on more
> obscure platforms (certainly when static initialisers are used) and
> being able to build a python without C++ (no JIT/LLV
Stephen J. Turnbull xemacs.org> writes:
>
> But it *does* determine the charset of ErrorDocuments displayed by
> Apache. Users are likely to get somewhat confused if the
> ErrorDocuments are in a different charset from your dynamic HTML.
Why would they? The browser picks the encoding from eithe
Antoine Pitrou writes:
> Perhaps you are speaking with your emacs hat, where the purpose is
> to output to the same file that serves as input.
No, I'm not wearing my Emacs hat. If I was, there would be no
problem. You just use binary for most such purposes. Historically
that was how even Ema
Stephen J. Turnbull xemacs.org> writes:
>
> That's throwing the baby out with the bathwater. Very few practical
> applications that care about the input encoding are going to be
> willing to accept an output encoding that doesn't correspond to the
> input encoding in an appropriate way.
Perhaps
Michael Foord writes:
> When reading text files the presence of the UTF-8 signature *almost
> invariably* means a UTF-8 encoding. Honouring this will almost always be
> better than using the wrong encoding. Of course there are caveats, but
> it will be a substantial improvement.
Sure, that
On 24/01/2010 14:23, Stephen J. Turnbull wrote:
Michael Foord writes:
> This is why I'm keen that by *default* Python should honour the UTF8
> signature when reading files;
Unfortunately, your caveat about "a lot of the time it will *seem* to
work" applies to this as well. The only way t
Michael Foord writes:
> This is why I'm keen that by *default* Python should honour the UTF8
> signature when reading files;
Unfortunately, your caveat about "a lot of the time it will *seem* to
work" applies to this as well. The only way that "honoring
signatures" really works is if Python
On 23 Jan 2010, at 07:53, "Martin v. Löwis" wrote:
[snip...]
Yes, definitely. It is this very reasoning that caused Python 2.x to
use ASCII as the default encoding (when mixing strings and unicode),
and, for the entire lifetime of 2.x, has caused endless pain for
developers, which simply fai
On Sat, Jan 23, 2010 at 10:09:14PM +0100, Cesare Di Mauro wrote:
> Introducing C++ is a big step, also. Aside the problems it can bring on some
> platforms, it means that C++ can now be used by CPython developers. It
> doesn't make sense to force people use C for everything but the JIT part. In
> t
21 matches
Mail list logo