[ https://issues.apache.org/jira/browse/PROTON-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923771#comment-17923771 ]
ASF GitHub Bot commented on PROTON-2531: ---------------------------------------- pjfawcett commented on PR #438: URL: https://github.com/apache/qpid-proton/pull/438#issuecomment-2634465212 I share your indecision. The current usage of this binding by user code is unknown so we can't be certain what a single "right" solution would be. I put in the 'expected exception' in the unit test to highlight the issue with encoding with Latin-1. I decided to go back and dig into the behaviour of version 0.38.0, which uses SWIG instead of CFFI. After picking through the code, including the SWIG generated C wrapper code, I ascertained that: 1. When setting the tag bytes from a Python string, [PyUnicode_AsUTF8AndSize](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize) is called, which has the same behaviour as `encode("utf-8", errors="strict")` 2. When reading the tag bytes into a Python string, [PyUnicode_DecodeUTF8](https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_DecodeUTF8) is called with `errors="surrogateescape"`, which has the same behaviour as `decode("utf-8", errors="surrogateescape")` This combination allows for round tripping of Unicode strings when going string -> tag -> string and also accepts non-utf8 byte sequences in incoming tags. (Although the resultant strings from such tags would cause exceptions if used as a new tag, since the encoding "fails if the string contains surrogate code points") So, in order to maintain backwards compatibility, and to fix the problem I encountered with some tags causing exceptions when read, I propose to implement the encoding/decoding using the parameters outlined above. Having said that, it is annoying that one can't easily get to the raw byte values for the tags. The current, 0.40.0, implementation allows the setting of the tag form a `bytes` value as so bypasses codecs. I think it would be good to have more direct access to the incoming tag bytes. Could I suggest that a `btag` property is added to the `Delivery` class, alongside the existing `tag` property, to get the tag a `bytes`. I'm happy to do this, and to raise a separate ticket / PR if required. > Delivery tag is str while it should be bytes > -------------------------------------------- > > Key: PROTON-2531 > URL: https://issues.apache.org/jira/browse/PROTON-2531 > Project: Qpid Proton > Issue Type: Bug > Components: python-binding > Reporter: Ievgen Popovych > Assignee: Pete Fawcett > Priority: Major > > According to AMQP standard delivery tag is ??up to 32 octets of binary > data??. Proton C library also has it in binary format. > But in the Python binding {{Delivery.tag}} is a string, which causes issues > when trying to use it (i.e. print/visualize). > As far as I understand this is down to Swig {{python/cproton.i}} > {{wrap_pn_delivery_tag}} (since typemap for {{pn_delivery_tag_t}} seems to be > correct)? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org