Juergen Keil wrote:
> 2009/3/11 Dina <dina.nimeh at sun.com>
>> Juergen Keil wrote:
>>> Hmm, but why does the following work using the lofi block device?
>>> Seems to be another bug... I'd expect that this get s a EROFS error, too.
>>>
>>> # dd if=/dev/zero of=/dev/lofi/5 count=2
>>> 2+0 records in
>>> 2+0 records out
>> I don't know if this is related to an existing bug:
>> 6717722 lofi must not write to R/O file systems
>
> I think that's different.
>
> lofi_strategy_task does fail the writes on the block device with EROFS
> errors, but the write is async, so the user program doesn't notice the
> errors?
>
> /*
> * Compressed files can only be read from and
> * not written to
> */
> if (!(bp->b_flags & B_READ)) {
> bp->b_resid = bp->b_bcount;
> error = EROFS;
> goto done;
> }
>
>
I think what happens is a number of disk blocks are cached by UFS
as soon as the lofi file is mapped. (I was using UFS when I was
seeing this behavior, don't know about ZFS).
112 disk blocks were immediately read into memory (7 * 8K chunks)
very shortly after the lofi file is mapped. The lofi block i/o
seems to be directed to the blocks mapped into memory. When blocks
are written, it seemed that the mirrored block is memory is updated
and also written to disk -- however I can't recall right now if it
was immediate or not.
For the most part, as the user app starts requesting more blocks
from the lofi file, it appears that UFS tries to anticipate by
serving up another 112 blocks, and so on. It's not the lofi driver
doing this.
> The same problem exists with sd(7D) when trying to write to a
> DVD-ROM media:
>
> # dd if=/dev/zero of=/dev/rdsk/c0t1d0p0 count=64 bs=2k
> write: I/O error
> 1+0 records in
> 1+0 records out
> # dd if=/dev/zero of=/dev/dsk/c0t1d0p0 count=64 bs=2k
> 64+0 records in
> 64+0 records out
>
>
> Using the ps/2 floppy driver and a read/only floppy media:
> In this case the char and block device open() fails with EROFS.
>
> # dd if=/dev/zero of=/dev/rdiskette count=64
> dd: /dev/rdiskette: open: Read-only file system
> # dd if=/dev/zero of=/dev/diskette count=64
> dd: /dev/diskette: open: Read-only file system
>
> I think lofi could do the same, and fail open for write
> on a non master control device, when a compressed
> file is mapped.
>
From the behavior I described above, maybe lofi is showing
a bug from elsewhere? Whatever is serving up the file blocks
needs to be examined further.
>
>>>> It releases the memory at unmap time, is unmapping the file a guaranteed
>>>> event at this point? If not, then does this lead to a stale segment in
>>>> the cache?
>>> No, patching / changing the lofi_max_comp_cache variable doesn't
>>> result in unmapping the file (that is calling lofi_free_handle(()).
>>> You would have to use lofiadm -d and close all references to the
>>> /dev/lofi/N file to get the cache memory released.
>>>
>> Did not mean that changing lofi_max_comp_cache unmaps the file, I know
>> that. Rephrasing: "Lofi releases the memory at unmap time, ok, however
>> what guarantees that an unmap happens between the time the caching is
>> disabled and the time the caching is re-enabled?"
>
> That would be a problem if it would be possible to map a new file for an
> active
> lofi device, but that isn't possible.
>
> # mount -F hsfs /files2/media/os200906.iso /mnt
> { change lofi_max_comp_cache from 1 => 0 here }
> # lofiadm
> Block Device File Options
> /dev/lofi/1 /files2/media/os200906.iso -
> # lofiadm -f -d /dev/lofi/1
> # lofiadm -a /files2/media/os200811.iso /dev/lofi/1
> lofiadm: could not map file /files2/media/os200811.iso to /dev/lofi/1:
> File exists
> { assuming the above lofiadm -a would be possible (but it isn't);
> change lofi_max_comp_cache from 0 => 1 here }
>
>
> lofi_map_file() checks that there is no existing state before reusing a
> lofi device, lines 1798,1799:
>
>
> 1787 if (pickminor) {
> ...
> 1796 } else {
> 1797 newminor = klip->li_minor;
> 1798 if (ddi_get_soft_state(lofi_statep, newminor) !=
> NULL) {
> 1799 error = EEXIST;
> 1800 goto out;
> 1801 }
> 1802 }
>
> The soft state is freed in lofi_free_handle() - at the same time
> the cache with decompressed data is freed.
>
>> It seems the answer is related to the compressed lofi being read-only.
>> Inconsistency is not an issue, is that correct?
>>
>>> Hmm, you could construct a case with attaching a compressed lofi file,
>>> and write to the underlying vnode.
>>>
>>>>> Do we really need code to shrink the lofi decompress cache at runtime?
>>>> I think I understand that 1 cached segment gives a nice boost. And that
>>>> the improvement is probably not linear. At some point it doesn't matter
>>>> if you have 5 or 10 for your max cache size, the incremental performance
>>>> improvement may no longer be noticeable.
>>>>
>>>> That's why I asked what would happen if I set it 10240. Would it really
>>>> grow to 10240, or would it cut me off somewhere and not grow anymore, to
>>>> prevent running out of heap?
>>> It would grow to 10240 cached segments.
>> Is there any value to put a hard limit on how big lofi_max_comp_cache
>> can ever be?
>
> Hmm, we currently have no upper limit for the
> compression segment size, either. The following
> command could have bad effects on a system:
>
> mkfile -n 3G /var/tmp/foobar
> lofiadm -C 1g /var/tmp/foobar
> lofiadm -a /var/tmp/foobar /dev/lofi/99
> od -xc /dev/lofi/99
>
> And with s/3G/4G/, the compressed lofi file after lofiadm -a is empty?
> Another >= 4 giga byte bug?
>
Yes, its the CR you mentioned in your subsequent email.
>> D.
>>