[issue10156] Initialization of globals in unicodeobject.c

2013-01-26 Thread Stefan Krah

Changes by Stefan Krah :


--
resolution:  -> fixed
stage: patch review -> committed/rejected

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-26 Thread Stefan Krah

Changes by Stefan Krah :


--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-26 Thread Stefan Krah

Stefan Krah added the comment:

Buildbots etc. look all good. Thanks for fixing this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Committed. Thank you for review, Stefan. Close this issue if the work is 
finished.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 7c8ad0d02664 by Serhiy Storchaka in branch '2.7':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/7c8ad0d02664

New changeset f7eda8165e6f by Serhiy Storchaka in branch '3.2':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/f7eda8165e6f

New changeset 01d4dd412581 by Serhiy Storchaka in branch '3.3':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/01d4dd412581

New changeset cb12d642eed2 by Serhiy Storchaka in branch 'default':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/cb12d642eed2

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-25 Thread Stefan Krah

Stefan Krah added the comment:

Nice. I think the latest patches are commit-ready.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
> (without the _sre changes :) looks good to me.

Patches updated addressing new Stefan's comments. Unicode globals no longer 
reinitialized in _PyUnicode_Init(). Note that I have added a consistency check 
into the macro in 3.3+.

I hope Rietveld will accept this set.

--
Added file: http://bugs.python.org/file28826/unicode_globals-2.7_3.patch
Added file: http://bugs.python.org/file28827/unicode_globals-3.2_3.patch
Added file: http://bugs.python.org/file28828/unicode_globals-3.3_3.patch
Added file: http://bugs.python.org/file28829/unicode_globals-3.4_3.patch

___
Python tracker 

___diff -r 864b9836dae6 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c   Fri Jan 25 17:55:39 2013 +0100
+++ b/Objects/unicodeobject.c   Fri Jan 25 21:34:01 2013 +0200
@@ -82,8 +82,9 @@
 
 /* --- Globals 
 
-   The globals are initialized by the _PyUnicode_Init() API and should
-   not be used before calling that API.
+NOTE: In the interpreter's initialization phase, some globals are currently
+  initialized dynamically as needed. In the process Unicode objects may
+  be created before the Unicode type is ready.
 
 */
 
@@ -93,15 +94,27 @@
 #endif
 
 /* Free list for Unicode objects */
-static PyUnicodeObject *free_list;
-static int numfree;
+static PyUnicodeObject *free_list = NULL;
+static int numfree = 0;
 
 /* The empty Unicode object is shared to improve performance. */
-static PyUnicodeObject *unicode_empty;
+static PyUnicodeObject *unicode_empty = NULL;
+
+#define _Py_RETURN_UNICODE_EMPTY()  \
+do {\
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+else {  \
+unicode_empty = _PyUnicode_New(0);  \
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+}   \
+return (PyObject *)unicode_empty;   \
+} while (0)
 
 /* Single character Unicode strings in the Latin-1 range are being
shared as well. */
-static PyUnicodeObject *unicode_latin1[256];
+static PyUnicodeObject *unicode_latin1[256] = {NULL};
 
 /* Default encoding to use and assume when NULL is passed as encoding
parameter; it is initialized by _PyUnicode_Init().
@@ -110,7 +123,7 @@
PyUnicode_GetDefaultEncoding() APIs to access this global.
 
 */
-static char unicode_default_encoding[100];
+static char unicode_default_encoding[100 + 1] = "ascii";
 
 /* Fast detection of the most frequent whitespace characters */
 const unsigned char _Py_ascii_whitespace[] = {
@@ -204,7 +217,7 @@
 
 #define BLOOM_MASK unsigned long
 
-static BLOOM_MASK bloom_linebreak;
+static BLOOM_MASK bloom_linebreak = ~(BLOOM_MASK)0;
 
 #define BLOOM_ADD(mask, ch) ((mask |= (1UL << ((ch) & (BLOOM_WIDTH - 1)
 #define BLOOM(mask, ch) ((mask &  (1UL << ((ch) & (BLOOM_WIDTH - 1)
@@ -448,10 +461,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single character Unicode objects in the Latin-1 range are
shared when using this constructor */
@@ -497,10 +508,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single characters are shared when using this constructor.
Restrict to ASCII, since the input must be UTF-8. */
@@ -1162,13 +1171,10 @@
 }
 
 /* Convert to Unicode */
-if (len == 0) {
-Py_INCREF(unicode_empty);
-v = (PyObject *)unicode_empty;
-}
-else
-v = PyUnicode_Decode(s, len, encoding, errors);
-
+if (len == 0)
+_Py_RETURN_UNICODE_EMPTY();
+
+v = PyUnicode_Decode(s, len, encoding, errors);
 return v;
 
   onError:
@@ -1381,7 +1387,7 @@
 Py_DECREF(v);
 strncpy(unicode_default_encoding,
 encoding,
-sizeof(unicode_default_encoding));
+sizeof(unicode_default_encoding) - 1);
 return 0;
 
   onError:
@@ -8850,8 +8856,6 @@
 
 void _PyUnicode_Init(void)
 {
-int i;
-
 /* XXX - move this array to unicodectype.c ? */
 Py_UNICODE linebreak[] = {
 0x000A, /* LINE FEED */
@@ -8865,15 +8869,12 @@
 };
 

[issue10156] Initialization of globals in unicodeobject.c

2013-01-25 Thread Stefan Krah

Stefan Krah added the comment:

The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
(without the _sre changes :) looks good to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-25 Thread Stefan Krah

Stefan Krah added the comment:

Since Rietveld didn't mail me this time: I left some comments on the 2.7 patch.

--
versions: +Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-24 Thread Nick Coghlan

Nick Coghlan added the comment:

Serhiy's general approach here looks good to me (although there seem to be some 
unrelated changes to the re module in the current 3.2 patch).

For PEP 432, I want to try to rearrange things so that _PyUnicode_Init is one 
of the *first* calls made in Py_BeginInitialization (even before the general 
call to Py_ReadyTypes), but that still won't invalidate the work done here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There is a set of updated patches.

--
Added file: http://bugs.python.org/file28815/unicode_globals-2.7_2.patch
Added file: http://bugs.python.org/file28816/unicode_globals-3.2_2.patch
Added file: http://bugs.python.org/file28817/unicode_globals-3.3_2.patch
Added file: http://bugs.python.org/file28818/unicode_globals-3.4_2.patch

___
Python tracker 

___diff -r 8f2edea69d5d Objects/unicodeobject.c
--- a/Objects/unicodeobject.c   Thu Jan 24 07:28:33 2013 -0800
+++ b/Objects/unicodeobject.c   Thu Jan 24 22:14:14 2013 +0200
@@ -93,15 +93,27 @@
 #endif
 
 /* Free list for Unicode objects */
-static PyUnicodeObject *free_list;
-static int numfree;
+static PyUnicodeObject *free_list = NULL;
+static int numfree = 0;
 
 /* The empty Unicode object is shared to improve performance. */
-static PyUnicodeObject *unicode_empty;
+static PyUnicodeObject *unicode_empty = NULL;
+
+#define _Py_RETURN_UNICODE_EMPTY()  \
+do {\
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+else {  \
+unicode_empty = _PyUnicode_New(0);  \
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+}   \
+return (PyObject *)unicode_empty;   \
+} while (0)
 
 /* Single character Unicode strings in the Latin-1 range are being
shared as well. */
-static PyUnicodeObject *unicode_latin1[256];
+static PyUnicodeObject *unicode_latin1[256] = {NULL};
 
 /* Default encoding to use and assume when NULL is passed as encoding
parameter; it is initialized by _PyUnicode_Init().
@@ -110,7 +122,7 @@
PyUnicode_GetDefaultEncoding() APIs to access this global.
 
 */
-static char unicode_default_encoding[100];
+static char unicode_default_encoding[100 + 1] = "ascii";
 
 /* Fast detection of the most frequent whitespace characters */
 const unsigned char _Py_ascii_whitespace[] = {
@@ -204,7 +216,7 @@
 
 #define BLOOM_MASK unsigned long
 
-static BLOOM_MASK bloom_linebreak;
+static BLOOM_MASK bloom_linebreak = ~(BLOOM_MASK)0;
 
 #define BLOOM_ADD(mask, ch) ((mask |= (1UL << ((ch) & (BLOOM_WIDTH - 1)
 #define BLOOM(mask, ch) ((mask &  (1UL << ((ch) & (BLOOM_WIDTH - 1)
@@ -448,10 +460,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single character Unicode objects in the Latin-1 range are
shared when using this constructor */
@@ -497,10 +507,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single characters are shared when using this constructor.
Restrict to ASCII, since the input must be UTF-8. */
@@ -1162,13 +1170,10 @@
 }
 
 /* Convert to Unicode */
-if (len == 0) {
-Py_INCREF(unicode_empty);
-v = (PyObject *)unicode_empty;
-}
-else
-v = PyUnicode_Decode(s, len, encoding, errors);
-
+if (len == 0)
+_Py_RETURN_UNICODE_EMPTY();
+
+v = PyUnicode_Decode(s, len, encoding, errors);
 return v;
 
   onError:
@@ -1381,7 +1386,7 @@
 Py_DECREF(v);
 strncpy(unicode_default_encoding,
 encoding,
-sizeof(unicode_default_encoding));
+sizeof(unicode_default_encoding) - 1);
 return 0;
 
   onError:
@@ -8850,8 +8855,6 @@
 
 void _PyUnicode_Init(void)
 {
-int i;
-
 /* XXX - move this array to unicodectype.c ? */
 Py_UNICODE linebreak[] = {
 0x000A, /* LINE FEED */
@@ -8865,15 +8868,10 @@
 };
 
 /* Init the implementation */
-free_list = NULL;
-numfree = 0;
 unicode_empty = _PyUnicode_New(0);
 if (!unicode_empty)
 return;
 
-strcpy(unicode_default_encoding, "ascii");
-for (i = 0; i < 256; i++)
-unicode_latin1[i] = NULL;
 if (PyType_Ready(&PyUnicode_Type) < 0)
 Py_FatalError("Can't initialize 'unicode'");
 
@@ -8918,15 +8916,11 @@
 {
 int i;
 
-Py_XDECREF(unicode_empty);
-unicode_empty = NULL;
-
-for (i = 0; i < 256; i++) {
-if (unicode_latin1[i]) {
-Py_DECREF(unicode_latin1[i]);
-unicode_latin1[i] = NULL;
-}
-}
+Py_CLEAR(unicode_empty);
+
+for (i = 0; i < 256; i++)
+Py_CLEAR(unicode_latin

[issue10156] Initialization of globals in unicodeobject.c

2013-01-09 Thread Stefan Krah

Stefan Krah added the comment:

Nick Coghlan  wrote:
> There should still be a check in tp_new (IIRC) that calls PyType_Ready on
> unready types.

Indeed there is one in type_new(), but that isn't used here AFAICS. If
you apply this patch and start up python, there are many "str: not ready"
instances:

diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -14282,6 +14282,10 @@
 PyUnicode_InternFromString(const char *cp)
 {
 PyObject *s = PyUnicode_FromString(cp);
+
+fprintf(stderr, "%s: %s\n", PyUnicode_Type.tp_name,
+   (PyUnicode_Type.tp_flags & Py_TPFLAGS_READY) ? "ready" : "not ready");
+
 if (s == NULL)
 return NULL;
 PyUnicode_InternInPlace(&s);

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-09 Thread Nick Coghlan

Nick Coghlan added the comment:

There should still be a check in tp_new (IIRC) that calls PyType_Ready on
unready types.

While doing something systematic about this kind of problem is part of the
rationale of PEP 432, that won't help earlier versions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-09 Thread Stefan Krah

Stefan Krah added the comment:

Nick, I'm adding you to the nosy list since this issue seems related to PEP 432.

Quick summary: Globals are used in unicodeobject.c before they are initialized.
Also, Unicode objects are created before PyType_Ready(&PyUnicode_Type) has been
called.


This happens during startup:


_Py_InitializeEx_Private():

  _Py_ReadyTypes():

PyType_Ready(&PyType_Type);

[...]

Many Unicode objects like "" or "__add__" are created. Uninitialized
globals have led to a crash (#16143). This is fixed by Serhiy's patch,
which always dynamically checks all globals for NULL before using them.
However, Unicode objects are still created at this point.

[...]

PyType_Ready(&PyUnicode_Type); /* Called for the first time */

[...]

  _PyUnicode_Init:

for (i = 0; i < 256; i++)   /* Could leak if latin1 strings
unicode_latin1[i] = NULL;  have already been created. */

PyType_Ready(&PyUnicode_Type);  /* Called a second time! */


So, considering PEP 432:  Are these "pre-type-ready" Unicode objects
safe to use, or should something be done about it?

--
nosy: +ncoghlan

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-07 Thread Stefan Krah

Changes by Stefan Krah :


--
priority: high -> critical

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-07 Thread Stefan Krah

Changes by Stefan Krah :


--
nosy: +Gregory.Andersen, georg.brandl, kushou, pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-07 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
stage: commit review -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-07 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here are patches for all four Python versions. They fixes possible usage of the 
followed non-initialized global variables: free_list, numfree, interned, 
unicode_empty, static_strings, unicode_latin1, bloom_linebreak, 
unicode_default_encoding.

--
Added file: http://bugs.python.org/file28607/unicode_globals-2.7.patch
Added file: http://bugs.python.org/file28608/unicode_globals-3.2.patch
Added file: http://bugs.python.org/file28609/unicode_globals-3.3.patch
Added file: http://bugs.python.org/file28610/unicode_globals-3.4.patch

___
Python tracker 

___diff -r 0f24c65fb7e5 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c   Sat Jan 05 07:37:47 2013 +0200
+++ b/Objects/unicodeobject.c   Mon Jan 07 13:26:16 2013 +0200
@@ -93,15 +93,26 @@
 #endif
 
 /* Free list for Unicode objects */
-static PyUnicodeObject *free_list;
-static int numfree;
+static PyUnicodeObject *free_list = NULL;
+static int numfree = 0;
 
 /* The empty Unicode object is shared to improve performance. */
-static PyUnicodeObject *unicode_empty;
+static PyUnicodeObject *unicode_empty = NULL;
+
+#define _Py_RETURN_UNICODE_EMPTY()  do {\
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+else {  \
+unicode_empty = _PyUnicode_New(0);  \
+if (unicode_empty != NULL)  \
+Py_INCREF(unicode_empty);   \
+}   \
+return (PyObject *)unicode_empty;   \
+} while (0)
 
 /* Single character Unicode strings in the Latin-1 range are being
shared as well. */
-static PyUnicodeObject *unicode_latin1[256];
+static PyUnicodeObject *unicode_latin1[256] = {NULL};
 
 /* Default encoding to use and assume when NULL is passed as encoding
parameter; it is initialized by _PyUnicode_Init().
@@ -110,7 +121,7 @@
PyUnicode_GetDefaultEncoding() APIs to access this global.
 
 */
-static char unicode_default_encoding[100];
+static char unicode_default_encoding[100 + 1] = "ascii";
 
 /* Fast detection of the most frequent whitespace characters */
 const unsigned char _Py_ascii_whitespace[] = {
@@ -204,7 +215,7 @@
 
 #define BLOOM_MASK unsigned long
 
-static BLOOM_MASK bloom_linebreak;
+static BLOOM_MASK bloom_linebreak = ~(BLOOM_MASK)0;
 
 #define BLOOM_ADD(mask, ch) ((mask |= (1UL << ((ch) & (BLOOM_WIDTH - 1)
 #define BLOOM(mask, ch) ((mask &  (1UL << ((ch) & (BLOOM_WIDTH - 1)
@@ -448,10 +459,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single character Unicode objects in the Latin-1 range are
shared when using this constructor */
@@ -497,10 +506,8 @@
 if (u != NULL) {
 
 /* Optimization for empty strings */
-if (size == 0 && unicode_empty != NULL) {
-Py_INCREF(unicode_empty);
-return (PyObject *)unicode_empty;
-}
+if (size == 0)
+_Py_RETURN_UNICODE_EMPTY();
 
 /* Single characters are shared when using this constructor.
Restrict to ASCII, since the input must be UTF-8. */
@@ -1162,13 +1169,10 @@
 }
 
 /* Convert to Unicode */
-if (len == 0) {
-Py_INCREF(unicode_empty);
-v = (PyObject *)unicode_empty;
-}
-else
-v = PyUnicode_Decode(s, len, encoding, errors);
-
+if (len == 0)
+_Py_RETURN_UNICODE_EMPTY();
+
+v = PyUnicode_Decode(s, len, encoding, errors);
 return v;
 
   onError:
@@ -1381,7 +1385,7 @@
 Py_DECREF(v);
 strncpy(unicode_default_encoding,
 encoding,
-sizeof(unicode_default_encoding));
+sizeof(unicode_default_encoding) - 1);
 return 0;
 
   onError:
@@ -8838,8 +8842,6 @@
 
 void _PyUnicode_Init(void)
 {
-int i;
-
 /* XXX - move this array to unicodectype.c ? */
 Py_UNICODE linebreak[] = {
 0x000A, /* LINE FEED */
@@ -8853,15 +8855,10 @@
 };
 
 /* Init the implementation */
-free_list = NULL;
-numfree = 0;
 unicode_empty = _PyUnicode_New(0);
 if (!unicode_empty)
 return;
 
-strcpy(unicode_default_encoding, "ascii");
-for (i = 0; i < 256; i++)
-unicode_latin1[i] = NULL;
 if (PyType_Ready(&PyUnicode_Type) < 0)
 Py_FatalError("Can't initialize 'unicode'");
 
@@ -8906,15 +8903,11 @@
 {
 int i;
 
-Py_XDECREF(unicode_empty);
-unicode_empty = NULL;
-
-for (i = 0; i < 256; i++) {
-if (unicode_latin1[i]) {
-Py_DECREF(unicode_latin1[i]);
-  

[issue10156] Initialization of globals in unicodeobject.c

2013-01-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> unicode-leak.patch doesn't fix #16143 though. unicode_empty and
> unicode_latin1 need to be initialized, too.

Indeed. I'll upload patches tomorrow.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-06 Thread Stefan Krah

Stefan Krah added the comment:

unicode-leak.patch doesn't fix #16143 though. unicode_empty and
unicode_latin1 need to be initialized, too.

Actually we could close this in favor of #16143.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Daniel's patch looks good for me. 2.7 looks affected too.

--
nosy: +serhiy.storchaka
stage: patch review -> commit review
versions: +Python 2.7 -Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2013-01-04 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis :


--
nosy: +Arfrever

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2012-10-05 Thread Stefan Krah

Stefan Krah added the comment:

See also #16143.

--
versions: +Python 3.3, Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2012-04-20 Thread Mark Dickinson

Changes by Mark Dickinson :


--
nosy:  -mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2011-10-01 Thread Stefan Krah

Stefan Krah  added the comment:

The PEP-393 changes apparently fix this leak; at least I can't reproduce
it in default any longer (but still in 3.2).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2011-04-11 Thread Stefan Krah

Stefan Krah  added the comment:

Stefan Krah  wrote:
> Is the module initialization procedure documented somewhere? I get
> the impression that unicodeobject.c depends on dict.c and dict.c
> depends on unicodeobject.c.

s/dict.c/dictobject.c/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2011-04-11 Thread Stefan Krah

Stefan Krah  added the comment:

[Merging with issue 11402]

Daniel's patch is much simpler, but I think that unicode_empty and
unicode_latin1 would need to be protected before _PyUnicode_Init
is called.

Is the module initialization procedure documented somewhere? I get
the impression that unicodeobject.c depends on dict.c and dict.c
depends on unicodeobject.c.

--
nosy: +stutzbach
Added file: http://bugs.python.org/file21611/unicode-leak.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2010-10-24 Thread Stefan Krah

Stefan Krah  added the comment:

> why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?


For the embedding scenario (when only Py_Initialize() is called) I wanted
to preserve the old behavior of _PyUnicode_Init().

But this is not really enough. I wrote a new patch that also calls 
_PyUnicode_InitGlobals() at the beginning of Py_Initialize().


I don't like the fact that even more clutter is added to Py_Main(). Perhaps
Py_Initialize() could be moved up or the Unicode functions could be moved
down.

--
Added file: http://bugs.python.org/file19351/unicode_init_globals2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2010-10-22 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc  added the comment:

About the patch: why should _PyUnicode_Init() try to call 
_PyUnicode_InitGlobals() again?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10156] Initialization of globals in unicodeobject.c

2010-10-22 Thread Stefan Krah

Stefan Krah  added the comment:

I've verified the leak manually. The cause is that global variables in
unicodeobject.c, e.g. free_list, are used before _PyUnicode_Init() is
called. Later on _PyUnicode_Init() sets these variables to NULL, losing
the allocated memory.

Here is an example of the earliest use of free_list during
_Py_ReadyTypes (),
well before _PyUnicode_Init():


Breakpoint 1, unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
392 switch (PyUnicode_CHECK_INTERNED(unicode)) {
(gdb) bt
#0  unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
#1  0x0044fc69 in PyUnicode_InternInPlace (p=0x7fff303852b8) at 
Objects/unicodeobject.c:9991
#2  0x0044fed3 in PyUnicode_InternFromString (cp=0x568861 "__len__") at 
Objects/unicodeobject.c:10025
#3  0x004344d0 in init_slotdefs () at Objects/typeobject.c:5751
#4  0x00434840 in add_operators (type=0x7be260) at 
Objects/typeobject.c:5905
#5  0x0042eec8 in PyType_Ready (type=0x7be260) at 
Objects/typeobject.c:3810
#6  0x0042edfc in PyType_Ready (type=0x7bde60) at 
Objects/typeobject.c:3774
#7  0x0041aa5f in _Py_ReadyTypes () at Objects/object.c:1514
#8  0x004992ff in Py_InitializeEx (install_sigs=1) at 
Python/pythonrun.c:232
#9  0x0049957f in Py_Initialize () at Python/pythonrun.c:321
#10 0x004b289f in Py_Main (argc=1, argv=0x1afa010) at Modules/main.c:590
#11 0x00417dcc in main (argc=1, argv=0x7fff30385758) at 
./Modules/python.c:59
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&
(gdb) 
414 if (unicode->length >= KEEPALIVE_SIZE_LIMIT) {
(gdb) 
419 if (unicode->defenc) {
(gdb) 
423 *(PyUnicodeObject **)unicode = free_list;
(gdb) n
424 free_list = unicode;
(gdb) n
425 numfree++;
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&


A possible fix could be to initialize the globals right at the start
in main.c. Note that there are still several Unicode API functions in
main.c before PyType_Ready has been called on the Unicode type.


With the patch, Valgrind does not show the leak any longer.

--
keywords: +patch
priority: normal -> high
stage:  -> patch review
title: Memory leak (r70459) -> Initialization of globals in unicodeobject.c
Added file: http://bugs.python.org/file19336/unicode_init_globals.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com