Some sources, in case they help: Message.get() calls policy.header_fetch_parse ( https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/message.py#L471 ) Compat32.header_fetch_parse calls self._sanitize_header ( https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/_policybase.py#L311 ) _sanitize_header calls _has_surrogates ( https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/_policybase.py#L287 ) _has_surrogates check: https://github.com/python/cpython/blob/cd80f430daa7dfe7feeb431ed34f88db5f64aa30/Lib/email/utils.py#L51
On Wed, Feb 17, 2021 at 5:42 PM Stestagg <stest...@gmail.com> wrote: > I don't particularly like to encourage this shotgun help request because, > as previous commenter suggests, debugging this yourself is best. > > Sometimes debugging is super hard, and especially so when uncommon > situations occur, but it's always far easier to debug things when you have > visibility into the system under test. > > However, in this case, the email code is super complex, and this scenario > also looks very uncommon, but not unique: ( > https://github.com/Sydius/mbox-to-txt/issues/2), so you successfully > nerd-sniped me :). > > My *guess*, from reading the python standard library source code is that > you came across an email with some content in the subject line that is > considered a "surrogate", roughly, some badly encoded unicode or binary > data in it. > > When this happens, the code in some situations (depending on the > policy...) may return a header.Header() instance, rather than a > headerregistry.UniqueUnstructuredHeader (which would have had a > headerregistry.BaseHeader (mro: str) dynamically attached). > > header.Header() does not inherit from str, and thus would throw the > traceback you observed. > > Your suggestion of a try: catch: may make sense, alternately, you could > wrap the result in a call to str(): > > if sbstrip in str(msghdr["subject"]): > > which should attempt to encode the binary into some form of string object > for comparison (I haven't checked exactly what would happen, beyond: it > tries). > > It should be possible to create a test mbox with some funky bytes in the > subject, and try to reproduce it that way. > > Steve > > > On Wed, Feb 17, 2021 at 5:07 PM Chris Green <c...@isbd.net> wrote: > >> Stefan Ram <r...@zedat.fu-berlin.de> wrote: >> > Chris Green <c...@isbd.net> writes: >> > >But msghdr["subject"] is surely just a string isn't it? Why is it >> > >complaining about something of type 'Header'? >> > >> > What would you do to debug-print the type of an object? >> > >> I don't know, what would I do? :-) >> >> Without knowing what provokes the problem I could be waiting for days >> or weeks even before I see the error again. As I'd need to print the >> type for every message I'd get some big logs or I'd need to add a try: >> except to trap the specific error. >> >> -- >> Chris Green >> ยท >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > -- https://mail.python.org/mailman/listinfo/python-list