On 03/01/13 23:52, eryksun wrote:
On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Aprano wrote:
2 Since "wide builds" use so much extra memory for the average ASCII
string, hardly anyone uses them.
On Windows (and I think OS X, too) a narrow build has been practical
since the wchar_t type is 16-
On Tue, Jan 1, 2013 at 1:29 AM, Steven D'Aprano wrote:
>
> 2 Since "wide builds" use so much extra memory for the average ASCII
> string, hardly anyone uses them.
On Windows (and I think OS X, too) a narrow build has been practical
since the wchar_t type is 16-bit. As to Linux I'm most familiar
I'm digging out an old email which I saved as a draft almost a month ago
but never got around to sending, because I think the new Unicode
implementation in Python 3.3 is one of the coolest things ever.
On 03/12/12 16:56, eryksun wrote:
CPython 3.3 has a new implementation that angles for the b
On Sun, Dec 2, 2012 at 8:34 AM, Albert-Jan Roskam wrote:
>
> As I emailed earlier today to Peter Otten, I thought unicode_internal means
> UCS-2 or UCS-4, depending on the size of sys.maxunicode? How is this related
> to UTF-16 and UTF-32?
UCS is the universal character set. Some highlights of th
On 12/02/2012 08:34 AM, Albert-Jan Roskam wrote:
>
>
>
>
>
>
> Hi Eryksun,
>
> Observation #1: Yes, makes perfect sense. I should have thought about that.
> Observation #2:
> As I emailed earlier today to Peter Otten, I thought unicode_internal means
> UCS-2 or UCS-4,
> depending on the size o
> to make is that the transform formats are multibyte encodings (except
> ASCII in UTF-8), which means the expression str(len(hello)) is using
> the wrong length; it needs to use the length of the encoded string.
> Also, UTF-16 and UTF-32 typically have very many null bytes. Together,
> the
>
> * some encodings are more compact than others (e.g. Latin-1 uses
> one byte per character, while UTF-32 uses four bytes per
> character).
I read that performance of UTF32 is better ("UTF-32 advantage: you don't need
to decode
stored data to the 32-bit Unicode
code point for e.g. char
>> How can I pack a unicode string using the struct module? If I simply use
>> packed = struct.pack(fmt, hello) in the code below (and 'hello' is a
>> unicode string), I get this: "error: argument for 's' must be a string". I
>> keep reading that I have to encode it to a utf-8 bytestring, but this
On Sat, Dec 1, 2012 at 2:30 AM, Steven D'Aprano wrote:
>
>> The length and order of the optional byte order mark (BOM)
>> distinguishes UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE.
>
> That's not quite right. The UTF-16BE and UTF-16LE character sets do
> not take BOMs, because the encoding already
On 01/12/12 12:28, eryksun wrote:
UTF-8 was
designed to encode all of Unicode in a way that can seamlessly pass
through libraries that process C strings (i.e. an array of non-null
bytes terminated by a null byte). Byte values less than 128 are ASCII;
beyond ASCII, UTF-8 uses 2-4 bytes, and all b
A clarification: in the default mode ('@'), struct uses native
alignment padding, but not if you override this with <, >, =, or !, as
you did.
>> fmt = endianness + str(len(hello)) + "s"
>
> That's the wrong length. Use the length of the encoded string.
Generally, however, you'd use a fixed size
On Fri, Nov 30, 2012 at 11:43 AM, Albert-Jan Roskam wrote:
>
> How can I pack a unicode string using the struct module?
struct.pack is for packing an arbitrary sequence of data into a C-like
struct. You have to manually add pad bytes. Alternatively you can use
a ctypes.Structure.
The struct modu
On 01/12/12 03:43, Albert-Jan Roskam wrote:
Hi,
How can I pack a unicode string using the struct module? If I
simply use packed = struct.pack(fmt, hello) in the code below
(and 'hello' is a unicode string), I get this:
"error: argument for 's' must be a string".
To be precise, it must be a *by
Albert-Jan Roskam wrote:
> How can I pack a unicode string using the struct module? If I simply use
> packed = struct.pack(fmt, hello) in the code below (and 'hello' is a
> unicode string), I get this: "error: argument for 's' must be a string". I
> keep reading that I have to encode it to a utf-8
Hi,
How can I pack a unicode string using the struct module? If I simply use packed
= struct.pack(fmt, hello) in the code below (and 'hello' is a unicode string),
I get this: "error: argument for 's' must be a string". I keep reading that I
have to encode it to a utf-8 bytestring, but this does
15 matches
Mail list logo