Re: Wrong file position after writing 65537 bytes to block device

2017-12-20 Thread Kaz Kylheku

On 18.12.2017 16:27, Steven Penny wrote:

On Mon, 18 Dec 2017 14:10:35, Corinna Vinschen wrote:

In general, the writes on disk devices is sector-oriented.  Howewver,
in this case ftell should have returned 65536.  The problem here is
that the newlib implmentation of ftell/ftello performs an fflush
when called on a write stream since about 2008 to adjust for appending
streams.  Given your example (thanks for the testcase!) this seems
pretty wrong.  Looking further it turns out that neither glibc nor BSD
actually calls fflush in this case.  There's only a special case for
appending streams, but this calls lseek, not fflush.

Looks like a patch is required.  Stay tuned.


is it though? he says "write 65536 + 1 bytes", but as far as i can 
tell, you

cant do that. quoting myself:

Seeking, reading and writing must all be done in multiples of sector 
size, in

my case 512 bytes


http://web.archive.org/web/stackoverflow.com/questions/37228874/how-to-fwrite-to-removable-volume

so it would make sense that the position becomes "65536 + 512"


You can do that on a "block" device. It's "raw" devices that have
transfer unit restrictions.

A block device creates an abstraction over a disk, dividing it into
blocks. Those blocks are not related to the underlying sector size;
they could be larger (e.g. 4096 byte block size, versus 512 byte
sectors) or even smaller (e.g. 4096 byte block size, versus 65536
byte flash erase block size).

Unix block devices let you read, write and seek using byte offsets
and sizes.  The bytes which are affected by a write operation map
to one or more ranges in one or more blocks. All of the blocks have to
be read into memory (if they aren't already). The bytes are updated,
and then the blocks are marked dirty and written out (eventually).
More changes can take place before that happens.

So for instance if we have a block device (4096 bytes) over a flash
device with 64 kB erase blocks, we can write just one byte somewhere
in a block. When this change is flushed, the entire erase block has to
be erased and rewritten.

Because of the abstract nature of block devices, it's largely
pointless to use the "dd" utility; you can use "cp" to copy them.
"dd" is required when you need to control the exact size of the
read and write calls. Thus "cat /dev/zero > /dev/blockdevice"
has the same effect as "dd if=/dev/zero of=/dev/blockdevice".


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Ivan Kozik
On Tue, Dec 19, 2017 at 6:19 PM, Corinna Vinschen
 wrote:
> On Dec 19 16:35, Ivan Kozik wrote:
>> From what I observe on Linux, it supports writing at any offset to the
>> block device because it does a read-modify-write behind the scenes,
>> with accompanying nasty overhead (e.g. writes going at 64MB/s instead
>> of an "expected" 180MB/s).
>
> That's what Cygwin was trying to emulate as well.  Debugging pointed out
> that it only works for reading, not for writing, because the latter
> neglected to fix up buffer pointers.  Those are used in lseek to report
> the Linux-like byte-exact file position.
>
> I pushed a patch and uploaded new developer snapshots to
> https://cygwin.com/snapshts/
>
> Please give them a test.

Hi Corinna,

It is writing correctly now, thank you for the fix!

Ivan

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Corinna Vinschen
On Dec 19 16:35, Ivan Kozik wrote:
> On Tue, Dec 19, 2017 at 4:13 PM, Eric Blake  wrote:
> > Can block devices report an unaligned offset to lseek()?  If not, then when
> > writing an unaligned value to a block device, don't we have to do a
> > read-modify-write of the larger aligned cluster, and then put lseek() back
> > to the unaligned boundary, and have extra magic in ftell() to track that we
> > are at an unaligned position within the block device?  But that sounds like
> > a lot of nasty overhead; and that it would be better to make sure that block
> > devices can report unaligned lseek() locations (caveat: I haven't tested
> > what Linux does in that regards).
> 
> >From what I observe on Linux, it supports writing at any offset to the
> block device because it does a read-modify-write behind the scenes,
> with accompanying nasty overhead (e.g. writes going at 64MB/s instead
> of an "expected" 180MB/s).

That's what Cygwin was trying to emulate as well.  Debugging pointed out
that it only works for reading, not for writing, because the latter
neglected to fix up buffer pointers.  Those are used in lseek to report
the Linux-like byte-exact file position.

I pushed a patch and uploaded new developer snapshots to
https://cygwin.com/snapshts/

Please give them a test.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat


signature.asc
Description: PGP signature


Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Ivan Kozik
On Tue, Dec 19, 2017 at 4:13 PM, Eric Blake  wrote:
> Can block devices report an unaligned offset to lseek()?  If not, then when
> writing an unaligned value to a block device, don't we have to do a
> read-modify-write of the larger aligned cluster, and then put lseek() back
> to the unaligned boundary, and have extra magic in ftell() to track that we
> are at an unaligned position within the block device?  But that sounds like
> a lot of nasty overhead; and that it would be better to make sure that block
> devices can report unaligned lseek() locations (caveat: I haven't tested
> what Linux does in that regards).

>From what I observe on Linux, it supports writing at any offset to the
block device because it does a read-modify-write behind the scenes,
with accompanying nasty overhead (e.g. writes going at 64MB/s instead
of an "expected" 180MB/s).

I think you can observe this behavior on Linux by piping this
program's stdout to a block device (note: must be python3, not
python2):

#!/usr/bin/python3

import sys

block = b" " * 4096
while True:
sys.stdout.buffer.write(block)
sys.stdout.buffer.write(b" ")

and watching the block device activity with `dstat -d -D sdN` - you
should see a lot of reads.

Ivan

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Eric Blake

On 12/19/2017 09:46 AM, Ivan Kozik wrote:

Thanks, I can confirm that the 2017-12-18 snapshot fixed the test
program I posted.

What about the harder case where the program calls fflush, though?

#include 

int main(int argc, char *argv[]) {
 FILE *f = fopen(argv[1], "w");
 char x[65536 + 1];
 fwrite(x, 1, 65536 + 1, f);
 fflush(f);
 printf("%ld", ftell(f));


Can block devices report an unaligned offset to lseek()?  If not, then 
when writing an unaligned value to a block device, don't we have to do a 
read-modify-write of the larger aligned cluster, and then put lseek() 
back to the unaligned boundary, and have extra magic in ftell() to track 
that we are at an unaligned position within the block device?  But that 
sounds like a lot of nasty overhead; and that it would be better to make 
sure that block devices can report unaligned lseek() locations (caveat: 
I haven't tested what Linux does in that regards).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Ivan Kozik
On Tue, Dec 19, 2017 at 9:14 AM, Corinna Vinschen
 wrote:
> Neither glibc nor FreeBSD show this behaviour.  Keep in mind that stdio
> is designed for buffered I/O.  What should happen, basically, is that a
> multiple of the stdio buffersize is written and the remainder is kept in
> the stdio buffer:
>
>   fwrite(65537)
>   -> write(65536)
>   -> store 1 byte in FILE._buf
>
> ftell calls lseek which returns 65536.  It adds the number of bytes
> still in the buffer, so it should return 65537.  Further fwrite's
> seemlessly append to the bytes already written, as expected.  ftell
> calling fflush and thus setting the current file position to the next
> sector boundary breaks this expectation.
>
> I pushed a patch yesterday and uploaded new developer snapshots to
> https://cygwin.com/snapshots/
>
> Please test.

Thanks, I can confirm that the 2017-12-18 snapshot fixed the test
program I posted.

What about the harder case where the program calls fflush, though?

#include 

int main(int argc, char *argv[]) {
FILE *f = fopen(argv[1], "w");
char x[65536 + 1];
fwrite(x, 1, 65536 + 1, f);
fflush(f);
printf("%ld", ftell(f));
return 0;
}

cygwin reports 66048, while Linux reports 65537.  In cygwin, if such a
write is done in a loop, for example, you can get garbled output on
disk.

fflush can be visibly unnecessary when done from C, but python3 (where
I originally observed the problem) appears to do implicit flushing.

If this is annoying to fix and I am the only one who notices, please
don't worry about it, I can just write in proper block sizes to block
devices.

Best regards,

Ivan

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-19 Thread Corinna Vinschen
On Dec 18 16:27, Steven Penny wrote:
> On Mon, 18 Dec 2017 14:10:35, Corinna Vinschen wrote:
> > In general, the writes on disk devices is sector-oriented.  Howewver,
> > in this case ftell should have returned 65536.  The problem here is
> > that the newlib implmentation of ftell/ftello performs an fflush
> > when called on a write stream since about 2008 to adjust for appending
> > streams.  Given your example (thanks for the testcase!) this seems
> > pretty wrong.  Looking further it turns out that neither glibc nor BSD
> > actually calls fflush in this case.  There's only a special case for
> > appending streams, but this calls lseek, not fflush.
> > 
> > Looks like a patch is required.  Stay tuned.
> 
> is it though? he says "write 65536 + 1 bytes", but as far as i can tell, you
> cant do that. quoting myself:
> 
> > Seeking, reading and writing must all be done in multiples of sector size, 
> > in
> > my case 512 bytes
> 
> http://web.archive.org/web/stackoverflow.com/questions/37228874/how-to-fwrite-to-removable-volume
> 
> so it would make sense that the position becomes "65536 + 512"

Neither glibc nor FreeBSD show this behaviour.  Keep in mind that stdio
is designed for buffered I/O.  What should happen, basically, is that a
multiple of the stdio buffersize is written and the remainder is kept in
the stdio buffer:

  fwrite(65537)
  -> write(65536)
  -> store 1 byte in FILE._buf

ftell calls lseek which returns 65536.  It adds the number of bytes
still in the buffer, so it should return 65537.  Further fwrite's
seemlessly append to the bytes already written, as expected.  ftell
calling fflush and thus setting the current file position to the next
sector boundary breaks this expectation.

I pushed a patch yesterday and uploaded new developer snapshots to
https://cygwin.com/snapshots/

Please test.


Thanks,
Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat


signature.asc
Description: PGP signature


Re: Wrong file position after writing 65537 bytes to block device

2017-12-18 Thread Steven Penny

On Mon, 18 Dec 2017 14:10:35, Corinna Vinschen wrote:

In general, the writes on disk devices is sector-oriented.  Howewver,
in this case ftell should have returned 65536.  The problem here is
that the newlib implmentation of ftell/ftello performs an fflush
when called on a write stream since about 2008 to adjust for appending
streams.  Given your example (thanks for the testcase!) this seems
pretty wrong.  Looking further it turns out that neither glibc nor BSD
actually calls fflush in this case.  There's only a special case for
appending streams, but this calls lseek, not fflush.

Looks like a patch is required.  Stay tuned.


is it though? he says "write 65536 + 1 bytes", but as far as i can tell, you
cant do that. quoting myself:


Seeking, reading and writing must all be done in multiples of sector size, in
my case 512 bytes


http://web.archive.org/web/stackoverflow.com/questions/37228874/how-to-fwrite-to-removable-volume

so it would make sense that the position becomes "65536 + 512"


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Wrong file position after writing 65537 bytes to block device

2017-12-18 Thread Corinna Vinschen
On Dec 16 02:07, Ivan Kozik wrote:
> Hello,
> 
> I have discovered that if you write 65536 + 1 bytes to a block device
> in cygwin, the file position can become 65536 + 512.
> 
> With /dev/sdc as a throwaway USB block device:
> 
> (cygwin_write.c is pasted below)
> # gcc -O2 -Wall -o cygwin_write cygwin_write.c
> # ./cygwin_write /dev/sdc
> 66048
> 
> I am running 64-bit cygwin 2.9.0 on an updated Windows 8.1.  I saw the
> same results with an 8TB drive and a 512MB USB stick.

In general, the writes on disk devices is sector-oriented.  Howewver,
in this case ftell should have returned 65536.  The problem here is
that the newlib implmentation of ftell/ftello performs an fflush
when called on a write stream since about 2008 to adjust for appending
streams.  Given your example (thanks for the testcase!) this seems
pretty wrong.  Looking further it turns out that neither glibc nor BSD
actually calls fflush in this case.  There's only a special case for
appending streams, but this calls lseek, not fflush.

Looks like a patch is required.  Stay tuned.


Thanks,
Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat


signature.asc
Description: PGP signature