[caiman-discuss] hsfs readahead changes, review comments/feedback

[email protected] Mon, 03 Sep 2007 17:03:07 +0100 (BST)

Hi Moinak,


final webrev ready ?

See responses inline,

Thanks !
FrankH.


On Wed, 29 Aug 2007, Moinak Ghosh wrote:

> Hi Frank,
>
>  Thanks a lot for the comments. Please see inline ...
>
[ ... ]
>  The global variable hsfs_readahead is just incremented and not used in any
>  computation. It is just an indicator that readahead is occurring at all, 
> more
>  from debugging perspective.

SDT probe ?
(no need to change the code just to get that in, just a thought to keep in 
mind if making such changes)

>
>  Readahead stats will not be 100% accurate in the face of multiple readers 
> on
>  a single file.
>  Readahead is actually called from hsfs_getapage only when a page is
>  satisfied from the cache. This is to prevent unnecessary readahead calls if
>  it is not resulting in actual cache hits. In addition readahead is 
> essentially
>  additional getpage calls being issued ahead of time and so should have the
>  same behavior as when multiple threads are requesting for the same page[s]
>  and they are not in cache. In addition multiple readers on the same file 
> will
>  actually cause the readahead counters to fall back to 0 quite fast.

Ok, was assuming that. If we want reliable statistics, we should hook that 
stuff into a kstat; or just use SDT probes in the relevant points.

[ ... ]
>> - same code, should also zero out the trailing parts of the non-coalesced
>>    buffers. Or else, kmem_zalloc() 'iodata' to prevent incidental copyout
>>    of garbage when I/O was only partially successful. Again, in that sense
>>    _only_ the 'error' case block is correct, since you shouldn't make
>>    assumptions on data within a buf past b_resid on read (or rather, never
>>    copy such data out to the user). I.e. 'nbuf' contents past b_resid
>>    should be considered invalid.
>> 
>
>  Actually data is not copied for the errored blocks. It just signals a 
> bioerror. Should
>  the target bufs be zeroed out as well ?

To be sure there's no "leaking" of data into userspace, it might be a 
reasonable thing to do. I think setting b_resid = b_count would achieve 
the same (nothing transferred). Wanted to stress that in error case, we 
shouldn't "leak" stuff that's none of the business of the reading app.

[ ... ]
>> - do we really need eight threads per filesystem ? Why wouldn't eight per
>>    machine, NCPUS per machine, 'tunable' per machine or something like that
>>    be sufficient ?
>>    Along the same lines, can we not "bother" system_taskq with the HSFS
>>    readahead work ? A high number of proprietary/private taskqs that are
>>    infrequently used - seems not like a good idea to me.
>>    For a box with one CD drive, all is ok, but I'm rather wondering about
>>    a situation like "sunray server with 100 drives [ on the sunrays ]".
>> 
>
>  Readahead should complete fast enough to be useful, ideally before or a 
> little
>  after the caller comes back to read those pages. So using a slower general
>  purpose queue may not give sufficient throughput, esp. with faster DVD 
> drives.
>  I have seen that in my tests and changed to a dynamic taskq. Since this is 
> dynamic
>  taskq, 8 threads will be created only if required.
>  That said, I see your point, it should be enough to just use one dynamic 
> taskq
>  for all hsfs mounts on the machine with a max of say 16 threads. Since this 
> is
>  dynamic using a larger max should be okay and desired to support a large 
> no.
>  of sequential threads.

The advantage of the system taskq would be that it's generically "fair" 
and doesn't promote HSFS readahead over other system activities. Yes, it's 
important to have it done.

Done a bit of browsing other code ...
You're right, we can go by precedence. The LOFI driver does a 4-thread 
taskq per lofi device, and noone has ever complained about that. So go 
with the eight threads per fs instance; if we ever receive a report about 
that causing too high a system load, changing it to an N-thread-for-all 
would be easy enough.

>
>  BTW are there Sunray servers with 100 CDROM drives ?

Every SunRay appliance could (theoretically) have one attached, and these 
show up on the SunRay server as USB devices. I'm not sure what customers 
use in actual deployment, though.

>
>> - Also, hsched_fini() doesn't make attempts to suspend/wait on the taskq;
>>    I wonder if the situation could ever arise that we manage to unmount the
>>    fs while there are still pending/unserviced readahead tasks. If not, I'd
>>    also like to know what makes sure that situation couldn'r arise.
>> 
>
>  Won't taskq_destroy wait for the threads to complete ? But then if we use 
> one
>  per-machine taskq then an explicit check needs to be added.

Checked the semantics, you're right taskq_destroy will wait. Thanks for 
the clarification.

>
>> 
>> Generically - I like the code. As said above, I'd like to see some 
>> explanation why the code does [ not ] race in various places. And the 
>> 'partial buf copyout' to be modified not to use b_count but always b_resid 
>> when copying. All else are nits.
>> 
>> Ah, one final thing 'nits': There's quite a lot of non-cstyle-clean things 
>> in the new code. Pls. cleanup.
>> 
>
>  Yes, cstyle and lint cleanup needs to be done.


:)

Best regards,
FrankH.

>
> Regards,
> Moinak.
>
>> Thanks,
>> FrankH.
>>
>> 
>>> Regards,
>>> Moinak.
>>> 
>> _______________________________________________
>> caiman-discuss mailing list
>> caiman-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
>> 
>
>

------------------------------------------------------------------------------
No good can come from selling your freedom, not for all the gold in the world,
for the value of this heavenly gift far exceeds that of any fortune on earth.
------------------------------------------------------------------------------

[caiman-discuss] hsfs readahead changes, review comments/feedback

Reply via email to