Control: forwarded -1 https://github.com/python/cpython/issues/92132

Hi,

On Fri, 29 Apr 2022 21:00:52 -0700 Keith Amling <m...@amling2.org> wrote:
> From skimming some of cpython's "marshal" code [1] my best guess is that
> first difference is between it thinking the `_m` string might have
> another reference to it (and thus adding 0x80, or FLAG_REF to it) or
> not.  This seems driven by whether or not python's object for the string
> has other references (it calls Py_REFCNT(v) to decide, see line 302).
> 
> I assume the difference is whether or not python has bothered to collect
> some other reference to the string or not.  Type "Z" is an interned
> string type, TYPE_SHORT_ASCII_INTERNED, which therefore makes sense that
> it might be shared with who knows what else.  I'm assuming this stops
> reproducing when you change it to a unique name since no one else will share
> the reference and you'll just deterministically get no FLAG_REF.

thank you! It was indeed about that line and there exists a pull request
upstream that fixes this issue:

https://github.com/python/cpython/pull/8226

Specifically, the following patch to python3.10 in Debian seems to solve this.
I also attached a full debdiff for your convenience. Thanks!

cheers, josch

>From 6c8ea7c1dacd42f3ba00440231ec0e6b1a38300d Mon Sep 17 00:00:00 2001
From: Inada Naoki <songofaca...@gmail.com>
Date: Sat, 14 Jul 2018 00:46:11 +0900
Subject: [PATCH] Use FLAG_REF always for interned strings

---
 Python/marshal.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/Python/marshal.c
+++ b/Python/marshal.c
@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
     if (p->version < 3 || p->hashtable == NULL)
         return 0; /* not writing object references */

-    /* if it has only one reference, it definitely isn't shared */
-    if (Py_REFCNT(v) == 1)
+    /* If it has only one reference, it definitely isn't shared.
+     * But we use TYPE_REF always for interned string, to PYC file stable
+     * as possible.
+     */
+    if (Py_REFCNT(v) == 1 &&
+            !(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
         return 0;
+    }

     entry = _Py_hashtable_get_entry(p->hashtable, v);
     if (entry != NULL) {
diff -Nru python3.10-3.10.4/debian/changelog python3.10-3.10.4/debian/changelog
--- python3.10-3.10.4/debian/changelog	2022-04-02 11:04:19.000000000 +0200
+++ python3.10-3.10.4/debian/changelog	2022-05-02 13:25:08.000000000 +0200
@@ -1,3 +1,10 @@
+python3.10 (3.10.4-3.1) UNRELEASED; urgency=medium
+
+  * Non-maintainer upload.
+  * Add patch to set FLAG_REF always for interned strings. Closes: #1010368
+
+ -- Johannes Schauer Marin Rodrigues <jo...@debian.org>  Mon, 02 May 2022 13:25:08 +0200
+
 python3.10 (3.10.4-3) unstable; urgency=medium
 
   * Build a python3.10-nopie package, diverting the python3.10
diff -Nru python3.10-3.10.4/debian/patches/series python3.10-3.10.4/debian/patches/series
--- python3.10-3.10.4/debian/patches/series	2022-04-02 11:04:19.000000000 +0200
+++ python3.10-3.10.4/debian/patches/series	2022-05-02 13:24:24.000000000 +0200
@@ -39,3 +39,4 @@
 fix-py_compile.diff
 reproducible-pyc.diff
 fix-ia64.diff
+use-FLAG_REF-always-for-interned-strings.diff
diff -Nru python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff
--- python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff	1970-01-01 01:00:00.000000000 +0100
+++ python3.10-3.10.4/debian/patches/use-FLAG_REF-always-for-interned-strings.diff	2022-05-02 13:25:07.000000000 +0200
@@ -0,0 +1,37 @@
+From 5fe4c1442ded6bdc4c724935d118e996fa022eac Mon Sep 17 00:00:00 2001
+From: Inada Naoki <songofaca...@gmail.com>
+Date: Sat, 14 Jul 2018 00:46:11 +0900
+Subject: [PATCH] Use FLAG_REF always for interned strings
+
+https://bugs.debian.org/1010368
+https://github.com/python/cpython/pull/8226
+https://github.com/python/cpython/issues/92132
+
+---
+ Python/marshal.c | 9 +++++++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/Python/marshal.c b/Python/marshal.c
+index 4125240..341c9aa 100644
+--- a/Python/marshal.c
++++ b/Python/marshal.c
+@@ -298,9 +298,14 @@ w_ref(PyObject *v, char *flag, WFILE *p)
+     if (p->version < 3 || p->hashtable == NULL)
+         return 0; /* not writing object references */
+ 
+-    /* if it has only one reference, it definitely isn't shared */
+-    if (Py_REFCNT(v) == 1)
++    /* If it has only one reference, it definitely isn't shared.
++     * But we use TYPE_REF always for interned string, to PYC file stable
++     * as possible.
++     */
++    if (Py_REFCNT(v) == 1 &&
++            !(PyUnicode_CheckExact(v) && PyUnicode_CHECK_INTERNED(v))) {
+         return 0;
++    }
+ 
+     entry = _Py_hashtable_get_entry(p->hashtable, v);
+     if (entry != NULL) {
+-- 
+2.35.1
+

Attachment: signature.asc
Description: signature

Reply via email to