Michael Forney wrote:
> On 12/24/16, Cág <c...@riseup.net> wrote:
> > Markus Wichmann wrote:
> >
> >> Well, that looks like it might be problematic, doesn't it? Especially
> >> when you find out, that the size of h->name there is 100 bytes. path
> >> contains, of course, the entire file path relative to the starting
> >> directory. In short, you will get this error message whenever trying to
> >> package files with a total relative path length of more than 100
> >> characters.
> >
> > Indeed, I've just tried to compress an extracted Linux kernel
> > (that doesn't have .git folder), it went without errors. Thanks for
> > pointing out.
> >
> > But when I tried to extract it, it still said "malformed tar archive".
> > Here's the part with it: http://git.suckless.org/sbase/tree/tar.c#n404
> 
> Fixing up tar bugs in sbase has been on my TODO list for a while. It's
> the one tool I'm not using from sbase.
> 
> One thing that might be related is that in various places, it uses
> eread(..., BLKSIZE), expecting that exactly BLKSIZE bytes are written
> (skipblk, xt, unarchive). But, if you are extracting from a pipe
> hooked up to a decompression program, it may be less than that. When
> this happens, it calls chktar on a random piece of data from the
> archive, which fails the checksum check.
> 
> I think we should add a readall function to libutil, similar to the
> writeall function I sent in a patch set a few weeks ago.
> 
> I think I have a pending patch to make the "malformed tar archive"
> errors more specific, but it's on a different computer and I am
> visiting family for the holidays. If you want to try and debug the
> error, I'd start with trying to figure out why it thinks it is
> malformed. I suspect it is due to a bad checksum.

Bumping this thread because I've been testing it a bit. I attached a
patch to make the error more descriptive, and this patch shows that the
most common error (to me) is a bad magic and that is certainly because the
magic is checked before the checksum. Sometimes the magic check pass,
but it then fails on the checksum.

This is indeed due to a short read from a pipe, because if you try to
print the said "bad magic", you end up with a chunk of the previously
extracted file. I will look into this "readall" function to submit
a patch.

Something worth noticing is that I only encountered this bug with bzip2
when it's compiled against musl. I couldn't reproduce it with glibc.
diff --git a/tar.c b/tar.c
index 71719b0..e274874 100644
--- a/tar.c
+++ b/tar.c
@@ -29,6 +29,12 @@ enum Type {
        RESERVED  = '7'
 };
 
+enum Error {
+       BADCHKSUM,
+       BADMAGIC,
+       BADPATH
+};
+
 struct header {
        char name[100];
        char mode[8];
@@ -71,6 +77,12 @@ static const char *filtertools[] = {
        ['z'] = "gzip",
 };
 
+static const char *errors[] = {
+       [BADCHKSUM] = "bad checksum",
+       [BADMAGIC]  = "bad magic",
+       [BADPATH]   = "empty filename"
+};
+
 static void
 pushdirtime(char *name, time_t mtime)
 {
@@ -351,8 +363,10 @@ skipblk(ssize_t l)
        char b[BLKSIZ];
 
        for (; l > 0; l -= BLKSIZ)
-               if (!eread(tarfd, b, BLKSIZ))
+               if (!eread(tarfd, b, BLKSIZ)) {
+                       fprintf(stderr, "skipblk: failed to read %d bytes\n", 
BLKSIZ);
                        break;
+               }
 }
 
 static int
@@ -407,27 +421,36 @@ chktar(struct header *h)
        char tmp[8], *err;
        char *p = (char *)h;
        long s1, s2, i;
+       int errnum;
 
-       if (h->prefix[0] == '\0' && h->name[0] == '\0')
+       if (h->prefix[0] == '\0' && h->name[0] == '\0') {
+               errnum = BADPATH;
                goto bad;
-       if (h->magic[0] && strncmp("ustar", h->magic, 5))
+       }
+       if (h->magic[0] && strncmp("ustar", h->magic, 5)) {
+               errnum = BADMAGIC;
                goto bad;
+       }
        memcpy(tmp, h->chksum, sizeof(tmp));
        for (i = 0; i < sizeof(tmp); i++)
                if (tmp[i] == ' ')
                        tmp[i] = '\0';
        s1 = strtol(tmp, &err, 8);
-       if (s1 < 0 || *err != '\0')
+       if (s1 < 0 || *err != '\0') {
+               errnum = BADCHKSUM;
                goto bad;
+       }
        memset(h->chksum, ' ', sizeof(h->chksum));
        for (i = 0, s2 = 0; i < sizeof(*h); i++)
                s2 += (unsigned char)p[i];
-       if (s1 != s2)
+       if (s1 != s2) {
+               errnum = BADCHKSUM;
                goto bad;
+       }
        memcpy(h->chksum, tmp, sizeof(h->chksum));
        return;
 bad:
-       eprintf("malformed tar archive\n");
+       eprintf("malformed tar archive: %s\n", errors[errnum]);
 }
 
 static void

Reply via email to