[issue10435] Document unicode C-API in reST

2016-04-30 Thread Berker Peksag

Berker Peksag added the comment:

This is a duplicate of issue 1944.

--
nosy: +berker.peksag
resolution:  -> duplicate
stage: patch review -> resolved
status: open -> closed
superseder:  -> Document PyUnicode_* API

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-05-03 Thread Mark Lawrence

Mark Lawrence added the comment:

Py_UNICODE_TOLOWER, Py_UNICODE_TOUPPER and Py_UNICODE_TOTITLE are all labelled 
deprecated in 3.3 and presumably can be removed completely.  Alternatively 
should these like many others be scheduled for removal in 4.0?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-04-30 Thread Mark Lawrence

Mark Lawrence added the comment:

List of just about everything that's in the header file but not in the rst file 
as I'm not sure which bits you normally wouldn't bother with.

Py_USING_UNICODE
Py_UNICODE_SIZE
Py_UNICODE_WIDE
Py_UNICODE_COPY
Py_UNICODE_FILL
Py_UNICODE_HIGH_SURROGATE
Py_UNICODE_LOW_SURROGATE
Py_UNICODE_MATCH
PyUnicode_WSTR_LENGTH
PyUnicode_AS_DATA
PyUnicode_IS_ASCII
PyUnicode_IS_COMPACT
PyUnicode_IS_COMPACT_ASCII
PyUnicode_IS_READY
Py_UNICODE_REPLACEMENT_CHARACTER 
PyUnicode_FromString
PyUnicode_GetMax
PyUnicode_Resize
PyUnicode_InternImmortal
PyUnicode_CHECK_INTERNED
PyUnicode_FromOrdinal
PyUnicode_GetDefaultEncoding
PyUnicode_AsDecodedObject
PyUnicode_AsDecodedUnicode
PyUnicode_AsEncodedObject
PyUnicode_AsEncodedUnicode
PyUnicode_BuildEncodingMap
PyUnicode_DecodeCodePageStateful
PyUnicode_EncodeDecimal
PyUnicode_Append
PyUnicode_AppendAndDel
PyUnicode_Partition
PyUnicode_RPartition
PyUnicode_RSplit
PyUnicode_IsIdentifier
Py_UNICODE_strlen
Py_UNICODE_strcpy
Py_UNICODE_strcat
Py_UNICODE_strncpy
Py_UNICODE_strcmp
Py_UNICODE_strncmp
Py_UNICODE_strchr
Py_UNICODE_strrchr

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-04-21 Thread Mark Lawrence

Mark Lawrence added the comment:

Okay Alexander I'll give it a go, but not tonight :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-04-21 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

Sorry for the broken link, the correct header file is Include/unicodeobject.h

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-04-21 Thread Alexander Belopolsky

Alexander Belopolsky added the comment:

Mark,

Unicode C-APIs have changed a lot since this issue was opened, but I think many 
of the listed functions are still present but not properly documented.

You can help by checking the Include/unicode.h file and compiling a list of 
functions that are there, don't start with _ and not documented in the 
reference manual.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2015-04-21 Thread Mark Lawrence

Mark Lawrence added the comment:

I've looked at c-api/unicode.rst and I can't see any correlation between it and 
the names listed here in msg121302.  So either this was never completed or it's 
been all change in the mean time, so could somebody take a look please.

--
nosy: +BreamoreBoy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-23 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky  added the comment:
> 
> On Wed, Nov 17, 2010 at 5:20 PM, Marc-Andre Lemburg
>  wrote:
> ..
>> -/* Encodes a Unicode object and returns the result as Python string
>> +/* Encodes a Unicode object and returns the result as Python bytes
>>object. */
>>
>>
>> PyUnicode_AsEncodedObject() encodes the Unicode object to
>> whatever the codec returns, so the "bytes" is wrong in the
>> above line.
>>
> 
> The above line describes PyUnicode_AsEncodedString(), not
> PyUnicode_AsEncodedObject().  The former has PyBytes_Check(v) after
> calling  v = PyCodec_Encode(..).  As far as I can tell this is the
> only difference that makes PyUnicode_AsEncodedObject() not redundant.

In that case, the change is fine.

> ..
>> +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, 
>> const char *encoding, const char *errors)
>>
>> +   Create a Unicode object by decoding the encoded Unicode object
>> +   *unicode*.
>>
>> The function does not guarantee that a Unicode object will be
>> returned. It merely passes a Unicode object to a codec's
>> decode function and returns whatever the codec returns.
>>
> 
> Good point.  I am changing "Unicode object" to "Python object".
> 
> ..
>> +   Note that Python codecs do not accept Unicode objects for decoding,
>> +   so this method is only useful with user or 3rd party codecs.
>>
>> Please strike the last sentence. The codecs that were wrongly removed
>> from Python3 will get added back and provide such functionality.
>>
> 
> Would it be acceptable to keep this note, but add "as of version 3.2"
> or something like that?   I don't think there is a chance that these
> codecs will be added in 3.2 given the current schedule.

Please remove the sentence or change it to:

 Note that most Python codecs only accept Unicode objects for
 decoding.

> ..
>> This should read:
>>
>>   Decodes a Unicode object by passing the given Unicode object
>>   *unicode* to the codec for *encoding*.
>>   *encoding* and *errors* have the same meaning as the
>>   parameters of the same name in the :func:`unicode` built-in
>>   function.  The codec to be used is looked up using the Python codec
>>   registry.  Return *NULL* if an exception was raised by the codec.
>>
> 
> Is the following better?
> 
> """
> Decodes a Unicode object by passing the given Unicode object
> *unicode* to the codec for *encoding*.  *encoding* and *errors*
> have the same meaning as the parameters of the same name in the
> :func:`unicode` built-in  function. The codec to be used is
> looked up using the Python codec registry. Return *NULL* if an
> exception was raised by the codec.
> 
> As of Python 3.2, this method is only useful with user or 3rd
> party codec that encodes string into something other than bytes.

Same as above.

> For encoding to bytes, use c:func:`PyUnicode_AsEncodedString`
> instead.
> """
> ..
>>
>> +.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
> ..
>> +
>> +.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject 
>> *right)
> ..
>>
>> Please don't document these two obscure APIs. Instead we should
>> make them private functions by prepending them with an underscore.
>> If you look at the implementations of those two APIs, they
>> are little more than a macros around PyUnicode_Concat().
>>
> 
> I don't agree that they are obscure.  Python uses them in multiple
> places and developers seem to know about them.  See patches submitted
> to issue4113 and issue7584.

I found these references:

http://osdir.com/ml/python.python-3000.cvs/2007-11/msg00270.html

and

http://riverbankcomputing.co.uk/hg/sip/annotate/91a545605044/siplib/siplib.c

so you're right: they are already in use in the wild. Too bad...

Please add these porting notes to the documentation:

PyUnicode_Append() works like the PyString_Concat(), while
PyUnicode_AppendAndDel() works like PyString_ConcatAndDel().

>> 3rd party extensions should use PyUnicode_Concat() to achieve
>> the same effect.
>>
> 
> Hmm.  I would not be surprised if current 3rd party extensions used
> PyUnicode_AppendAndDel() more often than PyUnicode_Concat().  (I know
> that I learned about PyUnicode_AppendAndDel()  before
> PyUnicode_Concat().)

Certainly not more often. PyUnicode_Concat() has been around much
longer than the other two APIs which are only available in Python3.

> Is there anything that makes PyUnicode_AppendAndDel() undesirable?   I
> don't mind adding a recommendation to use PyUnicode_Concat() if there
> is a practical reason for it or even a warning that
> PyUnicode_AppendAndDel() may be deprecated in the future, but renaming
> it to _PyUnicode_AppendAndDel() seems premature.

Both APIs are just slight variants of the PyUnicode_Concat()
API. They change parameters in-place which is rather uncommon
for the Unicode API and don't return their result - in fact the
error reporting

[issue10435] Document unicode C-API in reST

2010-11-22 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

On Wed, Nov 17, 2010 at 5:20 PM, Marc-Andre Lemburg
 wrote:
..
> -/* Encodes a Unicode object and returns the result as Python string
> +/* Encodes a Unicode object and returns the result as Python bytes
>    object. */
>
>
> PyUnicode_AsEncodedObject() encodes the Unicode object to
> whatever the codec returns, so the "bytes" is wrong in the
> above line.
>

The above line describes PyUnicode_AsEncodedString(), not
PyUnicode_AsEncodedObject().  The former has PyBytes_Check(v) after
calling  v = PyCodec_Encode(..).  As far as I can tell this is the
only difference that makes PyUnicode_AsEncodedObject() not redundant.

..
> +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const 
> char *encoding, const char *errors)
>
> +   Create a Unicode object by decoding the encoded Unicode object
> +   *unicode*.
>
> The function does not guarantee that a Unicode object will be
> returned. It merely passes a Unicode object to a codec's
> decode function and returns whatever the codec returns.
>

Good point.  I am changing "Unicode object" to "Python object".

..
> +   Note that Python codecs do not accept Unicode objects for decoding,
> +   so this method is only useful with user or 3rd party codecs.
>
> Please strike the last sentence. The codecs that were wrongly removed
> from Python3 will get added back and provide such functionality.
>

Would it be acceptable to keep this note, but add "as of version 3.2"
or something like that?   I don't think there is a chance that these
codecs will be added in 3.2 given the current schedule.

..
> This should read:
>
>   Decodes a Unicode object by passing the given Unicode object
>   *unicode* to the codec for *encoding*.
>   *encoding* and *errors* have the same meaning as the
>   parameters of the same name in the :func:`unicode` built-in
>   function.  The codec to be used is looked up using the Python codec
>   registry.  Return *NULL* if an exception was raised by the codec.
>

Is the following better?

"""
Decodes a Unicode object by passing the given Unicode object
*unicode* to the codec for *encoding*.  *encoding* and *errors*
have the same meaning as the parameters of the same name in the
:func:`unicode` built-in  function. The codec to be used is
looked up using the Python codec registry. Return *NULL* if an
exception was raised by the codec.

As of Python 3.2, this method is only useful with user or 3rd
party codec that encodes string into something other than bytes.
For encoding to bytes, use c:func:`PyUnicode_AsEncodedString`
instead.
"""
..
>
> +.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
..
> +
> +.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject 
> *right)
..
>
> Please don't document these two obscure APIs. Instead we should
> make them private functions by prepending them with an underscore.
> If you look at the implementations of those two APIs, they
> are little more than a macros around PyUnicode_Concat().
>

I don't agree that they are obscure.  Python uses them in multiple
places and developers seem to know about them.  See patches submitted
to issue4113 and issue7584.

> 3rd party extensions should use PyUnicode_Concat() to achieve
> the same effect.
>

Hmm.  I would not be surprised if current 3rd party extensions used
PyUnicode_AppendAndDel() more often than PyUnicode_Concat().  (I know
that I learned about PyUnicode_AppendAndDel()  before
PyUnicode_Concat().)

Is there anything that makes PyUnicode_AppendAndDel() undesirable?   I
don't mind adding a recommendation to use PyUnicode_Concat() if there
is a practical reason for it or even a warning that
PyUnicode_AppendAndDel() may be deprecated in the future, but renaming
it to _PyUnicode_AppendAndDel() seems premature.

..
>
> I don't think it's a good idea to make this a public API.
> 3rd party extensions should not need to make use of such
> APIs.
>
> Instead, we should make this a private API.

I agree, but isn't it prudent to document it as deprecated for 3rd
party use first?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-20 Thread Simon Cross

Changes by Simon Cross :


--
nosy: +hodgestar

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Thanks for your work on this.

Please see my comments below:

--- Include/unicodeobject.h (revision 86478)
+++ Include/unicodeobject.h (working copy)
@@ -737,7 +737,7 @@
 const char *errors  /* error handling */
 );
 
-/* Encodes a Unicode object and returns the result as Python string
+/* Encodes a Unicode object and returns the result as Python bytes
object. */
 

PyUnicode_AsEncodedObject() encodes the Unicode object to
whatever the codec returns, so the "bytes" is wrong in the
above line.


--- Doc/c-api/unicode.rst   (revision 86477)
+++ Doc/c-api/unicode.rst   (working copy)
@@ -528,7 +567,22 @@
using the Python codec registry.  Return *NULL* if an exception was raised 
by
the codec.
 
+.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const 
char *encoding, const char *errors)
 
+   Create a Unicode object by decoding the encoded Unicode object
+   *unicode*.

The function does not guarantee that a Unicode object will be
returned. It merely passes a Unicode object to a codec's
decode function and returns whatever the codec returns.

+   *encoding* and *errors* have the same meaning as the
+   parameters of the same name in the :func:`unicode` built-in
+   function.  The codec to be used is looked up using the Python codec
+   registry.  Return *NULL* if an exception was raised by the codec.
+   Note that Python codecs do not accept Unicode objects for decoding,
+   so this method is only useful with user or 3rd party codecs.

Please strike the last sentence. The codecs that were wrongly removed
from Python3 will get added back and provide such functionality.

+.. c:function:: PyObject* PyUnicode_AsEncodedObject(PyObject *unicode, const 
char *encoding, const char *errors)
+
+   Use c:func:`PyUnicode_AsEncodedString` instead.

That's not a useful hint as PyUnicode_AsEncodedString() does something
different than PyUnicode_AsEncodedObject().

+   Same as c:func:`PyUnicode_AsEncodedString`, but without shortcuts
+   for common built-in encodings and without checking the type of the
+   object returned by encoding via the codec registry.  This method is
+   only useful with user or 3rd party codec that encodes string into
+   something other than bytes.

This should read:

   Decodes a Unicode object by passing the given Unicode object
   *unicode* to the codec for *encoding*.
   *encoding* and *errors* have the same meaning as the
   parameters of the same name in the :func:`unicode` built-in
   function.  The codec to be used is looked up using the Python codec
   registry.  Return *NULL* if an exception was raised by the codec.

+.. c:function:: PyObject* PyUnicode_AsEncodedUnicode(PyObject *unicode, const 
char *encoding, const char *errors)
+   
+   Use c:func:`PyUnicode_AsEncodedString` instead.

Please remove this as well.

+   Same as c:func:`PyUnicode_AsEncodedObject`, but raises
+   :exc:`TypeError` is encoding via the codec registry returns an
+   object other than string.  This method is only useful with user or
+   3rd party codec that encodes string into string.

Please remove the last sentence.

+.. c:function: int PyUnicode_EncodeDecimal(Py_UNICODE *s, Py_ssize_t length,
+   char *output,  const char *errors)
+
+   Takes a Unicode string holding a decimal value and writes it into
+   an output buffer using standard ASCII digit codes.
+
+   The output buffer has to provide at least length+1 bytes of storage
+   area. The output string is 0-terminated.
+
+   The encoder converts whitespace to ' ', decimal characters to their
+   corresponding ASCII digit and all other Latin-1 characters except
+   \0 as-is. Characters outside this range (Unicode ordinals 1-256)
+   are treated as errors. This includes embedded NULL bytes.
+
+   Error handling is defined by the errors argument:
+
+  NULL or "strict": raise a ValueError
+  "ignore": ignore the wrong characters (these are not copied to the
+output buffer)
+  "replace": replaces illegal characters with '?'
+
+   Returns 0 on success, -1 on failure.
+   

+.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft. Sets *pleft to
+   NULL on error.
+
+.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
+
+   Concat two strings and put the result in *pleft and drop the right
+   object. Sets *pleft to NULL on error.
+
+

Please don't document these two obscure APIs. Instead we should
make them private functions by prepending them with an underscore.
If you look at the implementations of those two APIs, they
are little more than a macros around PyUnicode_Concat().

3rd party extensions should use PyUnicode_Concat() to achieve
the same effect.


+.. c:function:: void PyUnicode_InternImmortal(PyObject **string)
+ 
+   Use :c:func:`PyUnicode_InternInPlace` instead.
+
+   Same as :c:func

[issue10435] Document unicode C-API in reST

2010-11-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> If you have time, please take a look at
> PyUnicode_As{En,De}codedObject() and
> PyUnicode_As{En,De}DecodedUnicode() documentation in the attached
> patch.

Thanks. I'll try to have a look later tonight.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

It looks like I misunderstood what PyUnicode_As{En,De}codedObject() and
PyUnicode_As{En,De}codedUnicode() functions are designed to do.  Attaching a 
corrected patch, issue10435a.diff.

--
Added file: http://bugs.python.org/file19622/issue10435a.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

> I agree and will handle this in #10435 because codecs.h

s/#10435/#10439/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Ezio Melotti

Changes by Ezio Melotti :


--
nosy: +ezio.melotti

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

On Tue, Nov 16, 2010 at 7:19 PM, Marc-Andre Lemburg
 wrote:
..
>> * Decoding converts a bytes object encoded using a particular
>> character set encoding to a string object.
>> """ 
>> http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode
>
> That's another documentation bug, then. The codec system has always
> supported other type combinations for encoding/decoding as well.
>
> Only certain methods on str and bytes objects in 3.x limit the possible
> types to either str or bytes - which probably results in the
> idea that Python codecs don't support anything else.
>
> The text from the 2.7 documentation is correct, also for 3.x:
>
> http://docs.python.org/library/codecs.html#codec-objects
>

I agree and will handle this in #10435 because codecs.h
(unsurprisingly) supports your POV and we don't want C-API docs to be
in conflict with Py-API docs.

If you have time, please take a look at
PyUnicode_As{En,De}codedObject() and
PyUnicode_As{En,De}DecodedUnicode() documentation in the attached
patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky  added the comment:
> 
> On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg
>  wrote:
>>
>> Marc-Andre Lemburg  added the comment:
>>
>> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the 
>> codec returns for these operations.
>>
>> The codec system is not limited to converting between Unicode and bytes only.
> 
> Not according to the latest reST documentation:
> 
> """
> * Encoding converts a string object to a bytes object using a
> particular character set encoding (e.g., cp1252 or iso-8859-1).
> 
> * Decoding converts a bytes object encoded using a particular
> character set encoding to a string object.
> """ 
> http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode

That's another documentation bug, then. The codec system has always
supported other type combinations for encoding/decoding as well.

Only certain methods on str and bytes objects in 3.x limit the possible
types to either str or bytes - which probably results in the
idea that Python codecs don't support anything else.

The text from the 2.7 documentation is correct, also for 3.x:

http://docs.python.org/library/codecs.html#codec-objects

>> A typical example is a same-type codec such as rot13 that only transforms 
>> Unicode data.
> 
> I thought rot13 would only transform English (or Latin) alphabet.

Right, everything else passes through as-is.

Other examples are codecs that escape certain code points using e.g.
XML entity sequences, backslash notations or other such techniques.

For bytes, you have the zip, base64 and hex codecs which work in
a similar way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

Attached patch documents all previously undocumented unicode C API functions.  
Note that for the PyUnicode_As{En,De}codedObject() and 
PyUnicode_As{En,De}DecodedUnicode() functions I attempted to capture what they 
are supposed to do rather than what the current implementation does.

--
keywords: +patch
stage: needs patch -> patch review
Added file: http://bugs.python.org/file19621/issue10435.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

On Tue, Nov 16, 2010 at 5:54 PM, Marc-Andre Lemburg
 wrote:
>
> Marc-Andre Lemburg  added the comment:
>
> Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the 
> codec returns for these operations.
>
> The codec system is not limited to converting between Unicode and bytes only.

Not according to the latest reST documentation:

"""
* Encoding converts a string object to a bytes object using a
particular character set encoding (e.g., cp1252 or iso-8859-1).

* Decoding converts a bytes object encoded using a particular
character set encoding to a string object.
""" 
http://docs.python.org/dev/library/codecs.html?highlight=codecs#codecs.Codec.encode

> A typical example is a same-type codec such as rot13 that only transforms 
> Unicode data.

I thought rot13 would only transform English (or Latin) alphabet.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

Please note that PyCodec_Encode()/PyCodec_Decode() will return whatever the 
codec returns for these operations.

The codec system is not limited to converting between Unicode and bytes only.

A typical example is a same-type codec such as rot13 that only transforms 
Unicode data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

PyUnicode_AsDecodedObject() and PyUnicode_AsDecodedUnicode() appear to be 
broken as well: both start with a PyUnicode_Check(unicode) and then pass 
unicode to PyCodec_Decode() which expects bytes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Changes by Alexander Belopolsky :


--
nosy: +haypo, lemburg, loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

Alexander Belopolsky  added the comment:

On Tue, Nov 16, 2010 at 10:38 AM, M.-A. Lemburg  wrote:
> Alexander Belopolsky wrote:
..
>> I also have a similar question about C API.  Here, in absence of
>> __all__, the answer should be clear: all symbols in public header
>> files should start with either _Py_ or Py_ and those that start with
>> Py_ are public.   The question is what should be done with names that
>> start with Py_, but are not documented?  Can we add an underscore to
>> those names?  If so, should a (deprecated) alias be made available?
>> Should they be documented as deprecated?
>>
>> I think these questions can only be answered on a case by case bases
>> which choices being:
>>
>> 1. Document.
>> 2. Document as deprecated.
>> 3. Document as deprecated, add underscore prefix and retain a deprecated 
>> alias.
>> 4. Add an underscore prefix.
>>
>> The specific set of names that I would like to consider is the
>> following from unicode.h.  I am marking with (*) the names that I
>> think should be documented and with (D) those that should be
>> deprecated:
>>
>> PyUnicode_GetMax
>> PyUnicode_Resize (*)
>> PyUnicode_InternImmortal
>> PyUnicode_FromOrdinal (*)
>> PyUnicode_GetDefaultEncoding (D)
>> PyUnicode_AsDecodedObject
>> PyUnicode_AsDecodedUnicode
>> PyUnicode_AsEncodedObject
>> PyUnicode_AsEncodedUnicode
>> PyUnicode_BuildEncodingMap
>> PyUnicode_EncodeDecimal (*)
>> PyUnicode_Append (*)
>> PyUnicode_AppendAndDel (*)
>> PyUnicode_Partition (*)
>> PyUnicode_RPartition (*)
>> PyUnicode_RSplit (*)
>> PyUnicode_IsIdentifier (*)
>> Py_UNICODE_strlen
>> Py_UNICODE_strcpy
>> Py_UNICODE_strcat
>> Py_UNICODE_strncpy
>> Py_UNICODE_strcmp
>> Py_UNICODE_strncmp
>> Py_UNICODE_strchr
>> Py_UNICODE_strrchr
>
> For Unicode, unicodeobject.h defines which APIs are private or not.
> APIs which don't appear in the header file are either private or
> need to be added to the header file (but I don't think there are
> any in this category).
>
> All APIs in the header that do not appear in the documentation,
> should be added there as well. unicodeobject.h already provides
> documentation for most of the APIs you've listed above (except some
> new ones that were added later on).
>
> One API I'm not sure about is PyUnicode_AppendAndDel(). It's somewhat
> obscure and given that we already have PyUnicode_Concat(), I think
> it should be made private and eventually dropped.
>

I would also like to nominate PyUnicode_AsEncodedObject and 
PyUnicode_AsEncodedUnicode.  The later is a particularly attractive candidate 
for removal because it appears to be broken:

v = PyCodec_Encode(unicode, encoding, errors);
if (v == NULL)
goto onError;
if (!PyUnicode_Check(v)) {
PyErr_Format(PyExc_TypeError,
 "encoder did not return an str object (type=%.400s)",
 Py_TYPE(v)->tp_name);

Since PyCodec_Encode() returns bytes in 3.x, the code above will always raise 
an error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10435] Document unicode C-API in reST

2010-11-16 Thread Alexander Belopolsky

New submission from Alexander Belopolsky :

The following C-APIs are only documented in comments inside unicode.h:

PyUnicode_GetMax
PyUnicode_Resize
PyUnicode_InternImmortal
PyUnicode_FromOrdinal
PyUnicode_GetDefaultEncoding
PyUnicode_AsDecodedObject
PyUnicode_AsDecodedUnicode
PyUnicode_AsEncodedObject
PyUnicode_AsEncodedUnicode
PyUnicode_BuildEncodingMap
PyUnicode_EncodeDecimal
PyUnicode_Append
PyUnicode_AppendAndDel
PyUnicode_Partition
PyUnicode_RPartition
PyUnicode_RSplit
PyUnicode_IsIdentifier
Py_UNICODE_strlen
Py_UNICODE_strcpy
Py_UNICODE_strcat
Py_UNICODE_strncpy
Py_UNICODE_strcmp
Py_UNICODE_strncmp
Py_UNICODE_strchr
Py_UNICODE_strrchr

--
assignee: belopolsky
components: Documentation
messages: 121302
nosy: belopolsky
priority: normal
severity: normal
stage: needs patch
status: open
title: Document unicode C-API in reST
versions: Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com