Some sources, in case they help:

Message.get() calls policy.header_fetch_parse (
https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/message.py#L471
)
Compat32.header_fetch_parse calls self._sanitize_header (
https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/_policybase.py#L311
)
_sanitize_header calls _has_surrogates (
https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/_policybase.py#L287
)
_has_surrogates check:
https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/utils.py#L51



On Wed, Feb 17, 2021 at 5:42 PM Stestagg <stest...@gmail.com> wrote:

> I don't particularly like to encourage this shotgun help request because,
> as previous commenter suggests, debugging this yourself is best.
>
> Sometimes debugging is super hard, and especially so when uncommon
> situations occur, but it's always far easier to debug things when you have
> visibility into the system under test.
>
> However, in this case, the email code is super complex, and this scenario
> also looks very uncommon, but not unique: (
> https://github.com/Sydius/mbox-to-txt/issues/2), so you successfully
> nerd-sniped me :).
>
> My *guess*, from reading the python standard library source code is that
> you came across an email with some content in the subject line that is
> considered a "surrogate", roughly, some badly encoded unicode or binary
> data in it.
>
> When this happens, the code in some situations (depending on the
> policy...) may return a header.Header() instance, rather than a
> headerregistry.UniqueUnstructuredHeader (which would have had a
> headerregistry.BaseHeader (mro: str) dynamically attached).
>
> header.Header() does not inherit from str, and thus would throw the
> traceback you observed.
>
> Your suggestion of a try: catch: may make sense, alternately, you could
> wrap the result in a call to str():
>
> if sbstrip in str(msghdr["subject"]):
>
> which should attempt to encode the binary into some form of string object
> for comparison (I haven't checked exactly what would happen, beyond: it
> tries).
>
> It should be possible to create a test mbox with some funky bytes in the
> subject, and try to reproduce it that way.
>
> Steve
>
>
> On Wed, Feb 17, 2021 at 5:07 PM Chris Green <c...@isbd.net> wrote:
>
>> Stefan Ram <r...@zedat.fu-berlin.de> wrote:
>> > Chris Green <c...@isbd.net> writes:
>> > >But msghdr["subject"] is surely just a string isn't it?  Why is it
>> > >complaining about something of type 'Header'?
>> >
>> >   What would you do to debug-print the type of an object?
>> >
>> I don't know, what would I do?  :-)
>>
>> Without knowing what provokes the problem I could be waiting for days
>> or weeks even before I see the error again.  As I'd need to print the
>> type for every message I'd get some big logs or I'd need to add a try:
>> except to trap the specific error.
>>
>> --
>> Chris Green
>> ยท
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to