Re: [Clamav-users] XML and large file scan performance

2005-12-19 Thread GiM
Chuck Swiger in message 'Re: [Clamav-users] XML and large file scan 
performance' wrote:
> 
> 
> The 133MB/s # you mentioned is the bus speed, or what you can do ideally if 
> you
> only make small transactions which stay in cache.  In practice, you top out at
> 90% of the IDE bus speed.  And a real world virus scanner is going to always 
> be
> dealing with new incoming data, most probably with multiple scanner processes 
> or
> threads going on (so I/O is heavily multithreaded), all of which means numbers
> around 10-15 MB/s for IDE drives are realistic.  :-)
> 

Yeah, I know, I was thinking of abstract not real maximum ;)

 main(int a[puts("Michał 'GiM' Spadliński")]){}
-- 
int main(void)<%long c<::>=<%0x49008639,0x50001000,0xEFC7D296,0x8B037A
%>;char *a=".:@aegiklmnprsyz",*b=(char *)c;char i=sizeof c-1;while(i--
)<%putchar(*((*b&0xf)+a));putchar((*b>>4&0xf)<:a:>);b++;%>return(0);%>
..little-endian-code..
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] XML and large file scan performance

2005-12-19 Thread GiM
des in message 'Re: [Clamav-users] XML and large file scan performance' wrote:
> On 12/17/05, GiM <[EMAIL PROTECTED]> wrote:
> 
> The 7MB file took 1.5s with the other scanner. Looking at disk
> transfer speeds doesn't come close to explaining the PowerPoint scan
> time (66s).
> 

> 750KB Excel file: 0.36s
> 2MB Word doc: 5.0s
> 3MB binary file: 0.75s
> 3MB XML file: 1.28s
> 6MB SO library: 1.51s
> 7MB XML file: 3.60s
> 7.5MB PowerPoint file: 66s
>

Your problem is quite interesting so I've made few tests...
Ten tests on each file:
17M EXE  : 15.8411 s (  1M/s)
4.2M PPT : 6.55250 s (640k/s)
3.4M XLS : 9.56440 s (360k/s)
2.4M DOC : 2.99640 s (808k/s)

tests were made on 333Mhz Celeron with 128M 
nearly not loaded at all:
load average: 0.00, 0.12, 0.18

> So you are seeing much lower scan times with similar sized files of
> these types? You don't see the big difference between file types? If

I don't know how clamav treats different file types, so I can't tell you
where such differences come from, maybe any of developers will answer
you.

> so, that's great, I can investigate the OS/hardware setup. I did some
> brief tests on 3.2 GHz Xeon (same OS) and saw the same the same
> pattern, but didn't look in detail.
> 

 main(int a[puts("Michał 'GiM' Spadliński")]){}
-- 
"Zwyciężyłeś i przeto ulegam. Atoli odtąd umarłeś dla Nieba i dla Nadziei!
 We mnie istniałeś i - spojrzyj w moją śmierć, spojrzyj wskroś tej,
 która jest twoją, postaci - jak doszczętnie zamordowałeś siebie samego!"
- E. A. Poe - William Wilson

___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] XML and large file scan performance

2005-12-19 Thread des
On 12/17/05, GiM <[EMAIL PROTECTED]> wrote:
> des in message '[Clamav-users] XML and large file scan performance' wrote:
> > In investigating heavy load during clamd scanning on an email server,
> > I noticed that scanning XML files appears to take longer than similar
> > sized binary files. I'm also seeing a big drop off in performance when
> > scanning certain large files, e.g. PowerPoint, Word. Tests with
> > clamdscan on a dual PIII 1.2GHz:
> >
>
> 1. MS Office files are treated differently then normal files.

Order of magnitude slower though?

> 2. Are you scanning under windows or *nix system?

Linux.

> 3. How much tests have you made on each file?

The results were an average of 5 runs per file.

> 4. Your Scanning rates seems to be below 2MB/s,
>this mean, you probably haven't compiled DNA into the kernel

This was with SCSI disk and 2GB RAM, running clamdscan against files
on the filesystem (eliminating any email part of the equation). The
other scanner was running from command line against the same files on
disk and wasn't daemonised. The second scanner isn't so much the point
though, I'm interested in improving clamd performance. :)

The 7MB file took 1.5s with the other scanner. Looking at disk
transfer speeds doesn't come close to explaining the PowerPoint scan
time (66s).

So you are seeing much lower scan times with similar sized files of
these types? You don't see the big difference between file types? If
so, that's great, I can investigate the OS/hardware setup. I did some
brief tests on 3.2 GHz Xeon (same OS) and saw the same the same
pattern, but didn't look in detail.

Thanks,
--
des -- http://frommars.org/
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] XML and large file scan performance

2005-12-17 Thread Chuck Swiger
GiM wrote:
> To ClamAV users ML in message 'Re: [Clamav-users] XML and large file scan 
> performance' wrote:
>> 7MB XML file in 0.3s ?
>>
>> let's do some math:
>> 7*1024^2 * 10 /3 / 1024^2
>> 23.3 MB/s
>>
>> Do you have SCSI drive ?
>> Cause, correct me if I'm wrong the fastest transfer rate on ATA is about
>> 17MB/s and 40MB/s on SCSI device.
> 
> I've checked, that UDMA6 should be able to do 133M/s and
> SATA even 150M/s, but this depends on both hardware and system.

It has more to do with the type of test (sequential or random), then with the
environment.  Take a look at the difference, using a Western Digital WD1200JB
(7200 RPM, 8MB cache, UDMA5) with Sandra:

Benchmark Breakdown
Buffered Read : 80 MB/s
Sequential Read : 45 MB/s
Random Read : 7 MB/s
Buffered Write : 89 MB/s
Sequential Write : 42 MB/s
Random Write : 13 MB/s
Average Access Time : 8 ms (estimated)

The 133MB/s # you mentioned is the bus speed, or what you can do ideally if you
only make small transactions which stay in cache.  In practice, you top out at
90% of the IDE bus speed.  And a real world virus scanner is going to always be
dealing with new incoming data, most probably with multiple scanner processes or
threads going on (so I/O is heavily multithreaded), all of which means numbers
around 10-15 MB/s for IDE drives are realistic.  :-)

> I'm quite curious what was the enviroment you were making your tests on.

Something like this on Unix:

  dd if=/dev/disk of=/dev/null bs=8k

...is a reasonable benchmark (at least for ballpark # purposes), so long as you
read enough data to exceed any caching in RAM.

[ Now, if you've got enough RAM, perhaps you can do your virus scanning entirely
via a pipeline or socket, without ever hitting disk, which could explain a
faster time you saw, but the Subject of the thread suggests we're dealing with a
file on disk.  Besides, the MTA shouldn't confirm receipt of a message until
it's written to permanent storage. ]

-- 
-Chuck
___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] XML and large file scan performance

2005-12-17 Thread GiM
To ClamAV users ML in message 'Re: [Clamav-users] XML and large file scan 
performance' wrote:
> 
> 7MB XML file in 0.3s ?
> 
> let's do some math:
> 7*1024^2 * 10 /3 / 1024^2
> 23.3 MB/s
> 
> Do you have SCSI drive ?
> Cause, correct me if I'm wrong the fastest transfer rate on ATA is about
> 17MB/s and 40MB/s on SCSI device.
>

I've checked, that UDMA6 should be able to do 133M/s and
SATA even 150M/s, but this depends on both hardware and system.

I'm quite curious what was the enviroment you were making your tests on.

 main(int a[puts("Michał 'GiM' Spadliński")]){}
-- 
Linux: gawk, date, finger, wait, unzip, touch, nice,
suck, strip, mount, fsck, umount, make clean, sleep.
(Who needs porn when you have /usr/bin?) - stolen from usenet

___
http://lurker.clamav.net/list/clamav-users.html


Re: [Clamav-users] XML and large file scan performance

2005-12-17 Thread GiM
des in message '[Clamav-users] XML and large file scan performance' wrote:
> In investigating heavy load during clamd scanning on an email server,
> I noticed that scanning XML files appears to take longer than similar
> sized binary files. I'm also seeing a big drop off in performance when
> scanning certain large files, e.g. PowerPoint, Word. Tests with
> clamdscan on a dual PIII 1.2GHz:
> 

1. MS Office files are treated differently then normal files.
2. Are you scanning under windows or *nix system?
3. How much tests have you made on each file?
4. Your Scanning rates seems to be below 2MB/s,
   this mean, you probably haven't compiled DNA into the kernel

> 750KB Excel file: 0.36s
> 2MB Word doc: 5.0s
> 3MB binary file: 0.75s
> 3MB XML file: 1.28s
> 6MB SO library: 1.51s
> 7MB XML file: 3.60s
> 7.5MB PowerPoint file: 66s
> 
> A different scanner tested on the same hardware stayed below 1.5s for
> all the tests (below 0.3s for most of them). Are there any options for
> tuning clamd, short of buying oodles more and faster cpus?
> 

7MB XML file in 0.3s ?

let's do some math:
7*1024^2 * 10 /3 / 1024^2
23.3 MB/s

Do you have SCSI drive ?
Cause, correct me if I'm wrong the fastest transfer rate on ATA is about
17MB/s and 40MB/s on SCSI device.
(( I'm ommiting the fact, that Clamav needs to compare it with over ))
(( 4 virus signatures... ))

I really don't belive your 'different scanner' scans 7MB file
in < 0.3secs, and if it does, it's cheating im(ns)ho...

 main(int a[puts("Michał 'GiM' Spadliński")]){}
-- 
A jak myślisz skąd się biorą żołędzie?"
- T.P. - Kolor Magii

___
http://lurker.clamav.net/list/clamav-users.html


[Clamav-users] XML and large file scan performance

2005-12-17 Thread des
In investigating heavy load during clamd scanning on an email server,
I noticed that scanning XML files appears to take longer than similar
sized binary files. I'm also seeing a big drop off in performance when
scanning certain large files, e.g. PowerPoint, Word. Tests with
clamdscan on a dual PIII 1.2GHz:

750KB Excel file: 0.36s
2MB Word doc: 5.0s
3MB binary file: 0.75s
3MB XML file: 1.28s
6MB SO library: 1.51s
7MB XML file: 3.60s
7.5MB PowerPoint file: 66s

A different scanner tested on the same hardware stayed below 1.5s for
all the tests (below 0.3s for most of them). Are there any options for
tuning clamd, short of buying oodles more and faster cpus?

Thanks,
--
des -- http://frommars.org/
___
http://lurker.clamav.net/list/clamav-users.html