I hadn't really looked at the code. You raise a good point.

Changing it isn't super simple.  The info.blocks variable is passed through 
cli_scandesc_callback() and scan_common() where it's placed into the scan 
context.  When data is scanned, the amount scanned is divided by 
CL_COUNT_PRECISION (also found in clamav.h), which is what you multiply the 
number by to get the value in bytes. Provided that all downstream applications 
use CL_COUNT_PRECISION as clamscan does, we could shrink the count precision 
from 4k to something lower, but that would also decrease the max amount of data 
which could be scanned.  

If the variable were a uint64_t, that'd probably be fine... but it's an 
unsigned long int... aka maybe 4 bytes or maybe 8 bytes (don't you love C?).  
On systems where an unsigned long is 4 bytes, then that'd cap the scan limit at 
4GB.  Changing the variable to be an uint64_t would be "best", but it would be 
a non-backwards compatible change to the API which is very much not worth it. 

Sigh :-/

> -----Original Message-----
> From: clamav-users <clamav-users-boun...@lists.clamav.net> On Behalf Of
> Paul Kosinski via clamav-users
> Sent: Monday, November 2, 2020 5:23 PM
> To: clamav-users@lists.clamav.net
> Cc: Paul Kosinski <clamav-us...@iment.com>
> Subject: Re: [clamav-users] ClamAV Scan - Data Read vs Data Scanned
> 
> Can this really be done? I was looking at the code referred to by G.W.
> Haywood, and I see that it uses "info.blocks" and "info.rblocks".
> Looking at the definitions in "clamav-0.103.0/clamscan/", I see the
> following:
> 
> struct s_info {
>     unsigned int sigs;         /* number of signatures */
>     unsigned int dirs;         /* number of scanned directories */
>     unsigned int files;        /* number of scanned files */
>     unsigned int ifiles;       /* number of infected files */
>     unsigned int errors;       /* number of errors */
>     unsigned long int blocks;  /* number of *scanned* 16kb blocks */
>     unsigned long int rblocks; /* number of *read* 16kb blocks */ };
> 
> This suggests that the counts for "scanned" and "read" are not really byte
> counts, and EICAR's 68 bytes would always be recorded as 0 (if normal
> rounding rules are applied).
> 
> 
> 
> On Mon, 2 Nov 2020 23:59:20 +0000
> "Micah Snyder \(micasnyd\) via clamav-users" <clamav-users@lists.clamav.net>
> wrote:
> 
> > I agree.  We already have some logic in freshclam to convert bytes to human
> readable B / KiB / MiB / GiB format.  It should be pretty much a copypaste
> effort to improve the data scanned/read output.
> >
> > -Micah
> >
> > On 11/2/20, 9:47 AM, "clamav-users on behalf of G.W. Haywood via clamav-
> users" <clamav-users-boun...@lists.clamav.net on behalf of clamav-
> us...@lists.clamav.net> wrote:
> >
> >     Hi there,
> >
> >     On Mon, 2 Nov 2020, Paul Kosinski via clamav-users wrote:
> >
> >     > ... I still think it is a bad message that should be fixed.
> >
> >     +1
> >
> >     If you want to try a very quick and dirty tweak to get more precise
> >     numbers, change the value of
> >
> >     1) CL_COUNT_PRECISION in .../libclamav/clamav.h from 4096 to 1
> >
> >     2) replace '1024' with '1' in four places in clamscan/clamscan.c
> >
> >     3) change 'MB' to 'Bytes' in two places in clamscan/clamscan.c and
> >
> >     4) rebuild.
> >
> >     8<----------------------------------------------------------------------
> >     ~/clamav-0.103.0-rc2: $ grep -C3 -r CL_COUNT_PRECISION clamscan
> libclamav | ...
> >     ...
> >     ...
> >     clamscan/clamscan.c:        mb = info.blocks * (CL_COUNT_PRECISION /
> 1024) / 1024.0;
> >     clamscan/clamscan.c:        logg("Data scanned: %2.2lf MB\n", mb);
> >     clamscan/clamscan.c:        rmb = info.rblocks * (CL_COUNT_PRECISION /
> 1024) / 1024.0;
> >     clamscan/clamscan.c:        logg("Data read: %2.2lf MB (ratio 
> > %.2f:1)\n",
> rmb, info.rblocks ? (double)info.blocks / (double)info.rblocks : 0);
> >     ...
> >     ...
> >     libclamav/clamav.h:#define CL_COUNT_PRECISION 4096
> >     ...
> >     ...
> >
> > 8<--------------------------------------------------------------------
> > --
> >
> >     This is untested, YMMV.  Obviously, if you're skilled in the art, this
> >     can be done better.  Note that 'MB' should in any case be 'MiB' as the
> >     values printed are the counts divided by 2^20 and not by 10^6.
> >
> >     --
> >
> >     73,
> >     Ged.
> 
> _______________________________________________
> 
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
> 
> 
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
> 
> http://www.clamav.net/contact.html#ml

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to