Hi Paul,

How are you? Hope you got a chance to at the issue that we reported. I am 
reiterating the summary of the problem.
There are some transport headers starting with “ARC-Seal: ”. These transport 
headers also contain the To, CC and BCC addresses with both display names and 
corresponding email IDs. However, the `readpst` is discarding these transport 
headers while creating the EML file with MIME content and in the final MIME 
content we are getting only the display names for all the To, CC and BCC 
addresses. Possible that you might be considering canonical properties to 
extract the To, CC, and Bcc addresses from the PST file.


After looking at the readpst.c file (see below) we understood that the readpst 
is discarding any transport header that doesn’t start with the specified text.



int  valid_headers(char *header)

     // headers are sometimes really bogus - they seem to be fragments of the

     // message body, so we only use them if they seem to be real rfc822 
headers.

     // this list is composed of ones that we have seen in real pst files.

     // there are surely others. the problem is - given an arbitrary character

     // string, is it a valid (or even reasonable) set of rfc822 headers?

     if (header) {

         if (header_match(header, "Content-Type: "                 )) return 1;

         if (header_match(header, "Date: "                         )) return 1;

         if (header_match(header, "From: "                         )) return 1;

         if (header_match(header, "MIME-Version: "                 )) return 1;

         if (header_match(header, "Microsoft Mail Internet Headers")) return 1;

         if (header_match(header, "Received: "                     )) return 1;

         if (header_match(header, "Return-Path: "                  )) return 1;

         if (header_match(header, "Subject: "                      )) return 1;

         if (header_match(header, "To: "                           )) return 1;

         if (header_match(header, "X-ASG-Debug-ID: "               )) return 1;

         if (header_match(header, "X-Barracuda-URL: "              )) return 1;

         if (header_match(header, "X-x: "                          )) return 1;

         if (strlen(header) > 2) {

             DEBUG_INFO(("Ignore bogus headers = %s\n", header));

         }

         return 0;

     }

     else return 0;

}

As per our understanding, the ARC headers(which helps preserve email 
authentication results and verifies the identity of email intermediaries that 
forward a message on to its final destination) are introduced in 2016 and looks 
like this is not taken care in readpst.

Appreciate if you can clarify :

  1.  Is our understanding correct?
  2.  If Yes, can we expect a patch from you ?
  3.  If our understanding is not correct, can we expect a patch with proper 
fixes, or can you let us know where to fix the problem?
  4.  Are there any other headers like ARC, that are not taken care?



Looking forward for your reply so that we can commit a date to our customers.

Thank you
Sai Kalyan


From: Surla, Sai Kalyan
Sent: 08 March 2021 07:28 PM
To: 'Paul Wise' <p...@debian.org>; '984...@bugs.debian.org' 
<984...@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Outlook blocking the PST, please find the zipped PST file.

Thank you
Sai Kalyan

From: Surla, Sai Kalyan
Sent: 08 March 2021 07:27 PM
To: 'Paul Wise' <p...@debian.org<mailto:p...@debian.org>>; 
'984...@bugs.debian.org' <984...@bugs.debian.org<mailto:984...@bugs.debian.org>>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Sorry, it looks like outlooks blocked this pst.

From: Surla, Sai Kalyan
Sent: 08 March 2021 01:45 PM
To: Paul Wise <p...@debian.org<mailto:p...@debian.org>>; 
984...@bugs.debian.org<mailto:984...@bugs.debian.org>
Subject: RE: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Hi Paul,

Please find the PST contains single email with which we also faced problem in 
extracting email addresses under ‘To:’ header.

Thank you
Sai Kalyan

From: Paul Wise <p...@debian.org<mailto:p...@debian.org>>
Sent: 08 March 2021 07:02 AM
To: Surla, Sai Kalyan 
<saikalyan.su...@arcserve.com<mailto:saikalyan.su...@arcserve.com>>; 
984...@bugs.debian.org<mailto:984...@bugs.debian.org>
Subject: Re: Bug#984581: pst-utils: Fails to extract email addresses for emails 
having ARC headers from PST file

Control: found -1 0.6.75-1

On Sun, 2021-03-07 at 17:42 +0000, Surla, Sai Kalyan wrote:

> Already tried with version 0.6.75-1.

Thanks, marking the bug as found in that version.

> Also compiled the latest code available and tried with it, still the
> same results.

Thanks for testing this too.

> Please find the changes in the attached file. (readpst.c line no. : 1238)

It is traditional to provide changes in the patch format by using the
`diff -u` command or the corresponding commands from the version
control system that the upstream project is using.

Below is the output from the Mercurial diff for your change.

$ hg diff
diff -r 7200790e46ac src/readpst.c
--- a/src/readpst.c Tue Jun 16 17:18:28 2020 -0700
+++ b/src/readpst.c Mon Mar 08 09:20:50 2021 +0800
@@ -1235,7 +1235,7 @@

int header_match(char *header, char*field) {
int n = strlen(field);
- if (strncasecmp(header, field, n) == 0) return 1; // tag:{space}
+ if (strstr(header,field) != NULL || strncasecmp(header, field, n) == 0) 
return 1; // tag:{space}
if ((field[n-1] == ' ') && (strncasecmp(header, field, n-1) == 0)) {
char *crlftab = "\r\n\t";
DEBUG_INFO(("Possible wrapped header = %s\n", header));


I am fairly certain that this is not the correct fix for this issue.

> ARC headers are kind of email authentication headers.

Thanks for the info.

> For some security reasons we cannot share the original

Understood.

> if possible we will try to share the inhouse sample pst.

That would be necessary to be able to fix the issue.

> Meanwhile our observation is if the headers start with the following
> headers (...) it is treated as bogus, this email is starting with
> some header which is not one of the listed.

That does look like what the code does indeed, probably the right fix
is to scan through all of the headers instead of just the first one.

--
bye,
pabs

https://wiki.debian.org/PaulWise<https://wiki.debian.org/PaulWise>

Reply via email to