Kent Boortz <kent.boo...@sun.com> ha escrit: >> the resulting archive might not unpack correctly using some other TAR >> implementations,
Sergey Poznyakoff <g...@gnu.org.ua> writes: > Thanks for reporting. Looks like a bug in directory name splitting > algorithm. Please try the attached patch. Examining the USTAR package produces before your patch, it had three TAR headers, the second one with an empty "name", that I guess made the other TAR implementations think we were done and skipt all but the first header. After your patch the GNU TAR and Solaris TAR produces identical three headers, only difference is that the resulting files are padded differently, making the GNU TAR one slightly larger. I have no idea what this padding is for, just zero bytes. But I think I might have found more bugs :( This is how I interpret the USTAR format, but could be wrong of course - We have 155 characters in the "prefix" field - We have 100 characters in the "name" field - The strings in "prefix" and "name" can fill the fields, but if the string is shorter than the limit, it is null terminated - The field "name" can't be empty (is this stated in the standard?) - We split on dir/dir or dir/file boundaries (is this stated in the standard?) - If path ends in a directory, "name" ends with a slash (is this stated in standard?) Some test results below with GNU TAR + the patch. I also added Solaris results as referense. Note that I do expect some of the below tests to fail, just trying out "border cases". In some cases just the error message is a bit misleading. GNU directory 155 + file 100 => pass Sol directory 155 + file 100 => aaaa...aaaaa: filename is greater than 100 GNU directory 155 + file 99 => pass Sol directory 155 + file 99 => aaaa...aaaaa: filename is greater than 100 GNU directory 154 + file 100 => aaaa...aaaa/: file name is too long (cannot be split); not dumped Sol directory 154 + file 100 => aaaa...aaaaa: filename is greater than 100 GNU directory 100 + file 100 => aaaa...aaaa/: file name is too long (cannot be split); not dumped Sol directory 100 + file 100 => aaaa...aaaaa: filename is greater than 100 GNU directory 99 + file 100 => pass Sol directory 99 + file 100 => pass So somehow both Solaris and GNU TAR has a 99 character limitation on the first directory part that it is to be put into "prefix", EXCEPT that GNU TAR seems to accept exactly 155 characts as well. I did expect the first directory part to be accepted if 155 characters and below, no 99 character limit. Now the same test, but directory + subdirectory GNU directory 155 + subdir 100 => aaaa...aaaaa/bbbb...bbbb/: file name is too long (max 256); not dumped Sol directory 155 + subdir 100 => aaaa...aaaaa: filename is greater than 100 GNU directory 155 + subdir 99 => pass Sol directory 155 + subdir 99 => aaaa...aaaaa: filename is greater than 100 GNU directory 154 + subdir 99 => aaaa...aaaa/: file name is too long (cannot be split); not dumped Sol directory 154 + subdir 99 => aaaa...aaaaa: filename is greater than 100 GNU directory 100 + subdir 99 => aaaa...aaaa/: file name is too long (cannot be split); not dumped Sol directory 100 + subdir 99 => aaaa...aaaaa: filename is greater than 100 GNU directory 99 + subdir 100 => aaaa...aaaaa/bbbb...bbbb/: file name is too long (max 256); not dumped Sol directory 99 + subdir 100 => bbbb...bbbbb: filename is greater than 100 GNU directory 99 + subdir 99 => pass Sol directory 99 + subdir 99 => pass Seems that if path end with a directory, the "name" field is to end with a slash. This is why the limit is different from when the ending part is a file. Now, two directory parts and a file part GNU directory 100 + 55 + file 100 => aaaa...aaaaa/: file name is too long (cannot be split); not dumped Sol directory 100 + 55 + file 100 => aaaa...aaaaaa: filename is greater than 100 GNU directory 100 + 55 + file 99 => aaaa...aaaaa/: file name is too long (cannot be split); not dumped Sol directory 100 + 55 + file 99 => aaaa...aaaaaa: filename is greater than 100 GNU directory 99 + 55 + file 100 => pass (1) Sol directory 99 + 55 + file 100 => pass GNU directory 99 + 54 + file 100 => pass Sol directory 99 + 54 + file 100 => pass GNU directory 99 + 50 + file 50 => pass Sol directory 99 + 50 + file 50 => pass The third line (1) triggers the bug again, making second header to have nothing in the "name" field. Worse, in the test (1) the third record is corrupt, "name" is "cccc....bbbb...." with no "/", i.e. file name is wrong. But "prefix" is correct. Finally, three directory parts and a file part GNU directory 99 + 54 + file 50 + 50 => aaa.../bbb.../ddd: file name is too long (cannot be split); not dumped Sol directory 99 + 54 + file 50 + 50 => aaa.../bbb.../ddd: prefix is greater than 155 GNU directory 99 + 54 + file 49 + 50 => pass Sol directory 99 + 54 + file 49 + 50 => pass The "name" field needs a "/" in it, that takes one character. Maybe it would be a good idea to include some sort of test suite for this with the GNU TAR sources? Maybe even a test generator that generates all/most of the permutations that should pass, and border cases for those that should fail? Including tests that verify that the headers are correct? In any case, we have two problems here - GNU TAR should do the splitting "correctly". To me it seems it is not working that well, likely needs a complete rewrite. - GNU TAR should produce USTAR packages that other USTAR implementations can read. Lets say USTAR allow the first directory part to be up to 155 characters, if most other USTAR implementations think the limit is 99 when unpacking, maybe that is the limit GNU TAR should use as well. No fun if what GNU TAR produces using --format=ustar is not readable by other USTAR implementations. Just what I think, you are the experts, what do you think? kent
tarheader.c
Description: Binary data
-- Kent Boortz, Senior Production Engineer Sun Microsystems Inc., the MySQL team Office: +46 863 11 363 Mobile: +46 70 279 11 71