Re: Disk read error interpreted as "File shrank" + not saying it is "Padding with zeros", but padds silently

Ondrej Dubaj Fri, 19 Mar 2021 04:25:03 -0700

Hello,
any update here?

Thanks.
Ondrej


On Mon, Mar 1, 2021 at 11:05 AM Ondrej Dubaj <odu...@redhat.com> wrote:

> Ping, any updates here?
>
> Thanks.
>
> On Mon, Feb 15, 2021 at 5:07 PM Ondrej Dubaj <odu...@redhat.com> wrote:
>
>> Gentle ping
>>
>> On Mon, Jan 18, 2021 at 12:02 PM Ondrej Dubaj <odu...@redhat.com> wrote:
>>
>>> One of the customer faced I/O errors while archiving a huge file 11 TB and 
>>> observed that after Tar had hit read I/O error due to xfs filesystem, it 
>>> still continue writing 0's to the file using strace. However there was no 
>>> indication for tar that it was writing 0's when the error occurred.
>>>
>>> Later it was found that it is expected behavior to write 0's as the file 
>>> header is already written. Hence, it need to be padded with 0's.
>>>
>>> Using the reproducing steps provided by customer, we can see this behavior.
>>>
>>> Padding 0's is expected behavior however it does so silently (for Read 
>>> error at byte...), it should say it is Padding with zeros similar to how it 
>>> reports "File Shrank , padding with zeroes"
>>>
>>> During the reproducer steps provided by customer we see that sometimes tar 
>>> report "Read I/O errors" as "File shrank, padding with 0" , we see in the 
>>> step(2) provided.
>>>
>>> Reproducer available here:
>>>
>>> #!/bin/bash
>>> # Reproducer "tardust"
>>> #
>>> # When "tar create" reads a file there are several shortcomings when it 
>>> hits read error
>>> #
>>> # 1) When read() returns 0 bytes due to read error, then this happens
>>> # read(4, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # read(4, 0x563adef7b000, 3584) = -1 EIO (Input/output error)
>>> # write(2, "tar: ", 5tar: ) = 5
>>> # write(2, "/mntx/testfile: Read error at by"..., 70/mntx/testfile: Read 
>>> error at byte 260653056, while reading 3584 bytes) = 70
>>> # write(2, ": Input/output error", 20: Input/output error) = 20
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # Actual behaviour: it prints a message about "Read error", but it conceals 
>>> the information it will pad the output with zeros
>>> # Expected behaviour: it should also print the information "padding with 
>>> zero"
>>> # 2) There is a 2nd shortcoming about tar not differentiate between "read 
>>> error" and "file shrinkage"
>>> # That means when it sees a short read due to read error, it does not 
>>> report read error.
>>> # It looks like this:
>>> # read(4, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # read(4, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 2560          <<< HERE
>>> # write(2, "tar: ", 5tar: ) = 5
>>> # write(2, "/mntx/testfile: File shrank by 5"..., 65/mntx/testfile: File 
>>> shrank by 53927936 bytes; padding with zeros) = 65
>>> # write(2, "\n", 1
>>> # ) = 1
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # Summary: A read error is not reported here. At least it now says "padding 
>>> with zeros"
>>> # Expected behaviour: it should report a read error, so the user knows what 
>>> it going on.
>>> #
>>> # 3) Side-Note:
>>> # The blocking factor is applied to the output. When reading a file, all 
>>> reads are misaligned by 512 bytes.
>>> # This is because it writes a 512 header for every archived file.
>>> # That means the first read from file is 512bytes too short:
>>> # Running with tar-blocking-factor=7
>>> # fstat(1, {st_mode=S_IFREG|0644, st_size=17827, ...}) = 0
>>> # write(1, "/mntx/testfile\n", 15/mntx/testfile
>>> # ) = 15
>>> # read(4, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3072) = 3072 #1st read 512bytes too short
>>> # write(3, "mntx/testfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) 
>>> = 3584
>>> # read(4, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> # write(3, 
>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>> 3584) = 3584
>>> #
>>> # 4) Reproducer overview:
>>> # - Create a 500MB testimage, then create a testfile1 in the image
>>> # - Use losetup/dmsetup with the "dust" target type
>>> # - you can inject IO errors at specified block number in "dust"
>>> # - You must hit a 4K boundary to see EIO, so use tar-blocking-factor=7 and
>>> # - vary the bad blocknumber to find the case (1)
>>> echo Step 1 Create disk image
>>> dd if=/dev/zero of=/tmp/testimage bs=1M count=500 || exit
>>> echo Step 2 Create XFS in image
>>> mkfs.xfs /tmp/testimage || exit
>>> echo Step 3 Use losetup so the file can be used a block device
>>> losetup /dev/loop1 /tmp/testimage || exit
>>> losetup
>>> echo Step 6 Now create the testfile, this will have read error injected 
>>> later
>>> mkdir /mntx
>>> mount /dev/loop1 /mntx || exit
>>> dd if=/dev/zero of=/mntx/testfile bs=1M count=300 || exit
>>> umount /mntx
>>> echo Step7 Now iterating through bad blocks
>>> echo As result, there are strace output file a1000 ... a1040
>>> for i in `seq 1000 1 1040`
>>> do
>>> echo
>>> echo Badblock $i
>>> let ERR=i
>>> let ERR1=i+1
>>> let NUMSECTOR2=1024000-ERR1
>>> #echo ERR1 is $ERR1
>>> #echo NUMSECTOR2 is $NUMSECTOR2
>>> dmsetup create tardust <<EOF
>>> 0 $ERR linear /dev/loop1 0
>>> $ERR 1 error
>>> $ERR1 $NUMSECTOR2 linear /dev/loop1 $ERR1
>>> EOF
>>> #dmsetup ls
>>> #dmsetup status
>>> #dmsetup table
>>> mount /dev/mapper/tardust /mntx || exit
>>> strace tar cvbf 7 /tmp/tardust.tar /mntx/testfile >&/tmp/a$i
>>> umount /mntx
>>> dmsetup remove tardust
>>> grep -e error -e shrank /tmp/a$i
>>> done
>>> echo "Done: inspect the strace output file for error behaviour (grep error 
>>> ; Look at last read()-call )"
>>> losetup -d /dev/loop1
>>>
>>> =================
>>>
>>> Actual results:
>>> - When tar hits a disk read error when reading file from disk and creating 
>>> an archive, then it prints "file shrank"
>>> - then it writes zeros (aka padding) according to initial file size (but 
>>> does not print that message)
>>> - This happens in most cases (due to tar-block-size / disk-block-size / 
>>> read-shift-by-512-bytes interaction)
>>> - I provided a reproducer which shows under which circumstances it 
>>> correctly prints "Read error at byte…"
>>>
>>> Expected results:
>>> - When there is a read error, THEN tar shall report a read error
>>> - When there is a read error, THEN tar shall NOT report a "file shrank"
>>> - In addition it SHALL print "Padding with zeros". This is missing 
>>> currently.
>>>
>>>
>>> Regards,
>>>
>>> Ondrej Dubaj
>>>
>>>

Re: Disk read error interpreted as "File shrank" + not saying it is "Padding with zeros", but padds silently

Reply via email to