Hi Paul, How are you? Hope you got a chance to at the issue that we reported. I am reiterating the summary of the problem. There are some transport headers starting with “ARC-Seal: ”. These transport headers also contain the To, CC and BCC addresses with both display names and corresponding email IDs. However, the `readpst` is discarding these transport headers while creating the EML file with MIME content and in the final MIME content we are getting only the display names for all the To, CC and BCC addresses. Possible that you might be considering canonical properties to extract the To, CC, and Bcc addresses from the PST file.
After looking at the readpst.c file (see below) we understood that the readpst is discarding any transport header that doesn’t start with the specified text. int valid_headers(char *header) // headers are sometimes really bogus - they seem to be fragments of the // message body, so we only use them if they seem to be real rfc822 headers. // this list is composed of ones that we have seen in real pst files. // there are surely others. the problem is - given an arbitrary character // string, is it a valid (or even reasonable) set of rfc822 headers? if (header) { if (header_match(header, "Content-Type: " )) return 1; if (header_match(header, "Date: " )) return 1; if (header_match(header, "From: " )) return 1; if (header_match(header, "MIME-Version: " )) return 1; if (header_match(header, "Microsoft Mail Internet Headers")) return 1; if (header_match(header, "Received: " )) return 1; if (header_match(header, "Return-Path: " )) return 1; if (header_match(header, "Subject: " )) return 1; if (header_match(header, "To: " )) return 1; if (header_match(header, "X-ASG-Debug-ID: " )) return 1; if (header_match(header, "X-Barracuda-URL: " )) return 1; if (header_match(header, "X-x: " )) return 1; if (strlen(header) > 2) { DEBUG_INFO(("Ignore bogus headers = %s\n", header)); } return 0; } else return 0; } As per our understanding, the ARC headers(which helps preserve email authentication results and verifies the identity of email intermediaries that forward a message on to its final destination) are introduced in 2016 and looks like this is not taken care in readpst. Appreciate if you can clarify : 1. Is our understanding correct? 2. If Yes, can we expect a patch from you ? 3. If our understanding is not correct, can we expect a patch with proper fixes, or can you let us know where to fix the problem? 4. Are there any other headers like ARC, that are not taken care? Looking forward for your reply so that we can commit a date to our customers. Thank you Sai Kalyan From: Surla, Sai Kalyan Sent: 08 March 2021 07:28 PM To: 'Paul Wise' <p...@debian.org>; '984...@bugs.debian.org' <984...@bugs.debian.org> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Outlook blocking the PST, please find the zipped PST file. Thank you Sai Kalyan From: Surla, Sai Kalyan Sent: 08 March 2021 07:27 PM To: 'Paul Wise' <p...@debian.org<mailto:p...@debian.org>>; '984...@bugs.debian.org' <984...@bugs.debian.org<mailto:984...@bugs.debian.org>> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Sorry, it looks like outlooks blocked this pst. From: Surla, Sai Kalyan Sent: 08 March 2021 01:45 PM To: Paul Wise <p...@debian.org<mailto:p...@debian.org>>; 984...@bugs.debian.org<mailto:984...@bugs.debian.org> Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Hi Paul, Please find the PST contains single email with which we also faced problem in extracting email addresses under ‘To:’ header. Thank you Sai Kalyan From: Paul Wise <p...@debian.org<mailto:p...@debian.org>> Sent: 08 March 2021 07:02 AM To: Surla, Sai Kalyan <saikalyan.su...@arcserve.com<mailto:saikalyan.su...@arcserve.com>>; 984...@bugs.debian.org<mailto:984...@bugs.debian.org> Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails having ARC headers from PST file Control: found -1 0.6.75-1 On Sun, 2021-03-07 at 17:42 +0000, Surla, Sai Kalyan wrote: > Already tried with version 0.6.75-1. Thanks, marking the bug as found in that version. > Also compiled the latest code available and tried with it, still the > same results. Thanks for testing this too. > Please find the changes in the attached file. (readpst.c line no. : 1238) It is traditional to provide changes in the patch format by using the `diff -u` command or the corresponding commands from the version control system that the upstream project is using. Below is the output from the Mercurial diff for your change. $ hg diff diff -r 7200790e46ac src/readpst.c --- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700 +++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800 @@ -1235,7 +1235,7 @@ int header_match(char *header, char*field) { int n = strlen(field); - if (strncasecmp(header, field, n) == 0) return 1; // tag:{space} + if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) return 1; // tag:{space} if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) { char *crlftab = "\r\n\t"; DEBUG_INFO(("Possible wrapped header = %s\n", header)); I am fairly certain that this is not the correct fix for this issue. > ARC headers are kind of email authentication headers. Thanks for the info. > For some security reasons we cannot share the original Understood. > if possible we will try to share the inhouse sample pst. That would be necessary to be able to fix the issue. > Meanwhile our observation is if the headers start with the following > headers (...) it is treated as bogus, this email is starting with > some header which is not one of the listed. That does look like what the code does indeed, probably the right fix is to scan through all of the headers instead of just the first one. -- bye, pabs https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>