Dear Ged/All,
After a beer things started to look more clear :)
You were right about something: indeed clamav is looking for something
before starting to look after URL but it's actually looking for what
should be the start of email headers. In short words is looking for:
"From someone".
Basically the test can be:
echo -e "From test\n\n http://www.google.com/" | clamscan -d bla.gdb -
or
echo -e "From test\n\n<a href=http://www.google.com/>test</a>" |
clamscan -d bla.gdb -
with the fallowing result:
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.102.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.051 sec (0 m 0 s)
I totally agree with you that "Know viruses" should be 1 but this is
another story for another time.
Now comes the funny part which explains why i didn't found the sha256
hash in my mysql and also why the above test will fail if you don't
create the hash correctly.
If you read https://developers.google.com/safe-browsing/v4/urls-hashing
(very carefully, not like I've did in the beginning) you will see that
you can create multiple hashes for the same url but you first need to
strip http[s]://
The same is also seen in the clamav debugging.
If we take for example url
"http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt" the debug will be.
LibClamAV debug: getHrefs: html_normalise_mem returned
LibClamAV debug: Phishcheck:Checking url
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>->
LibClamAV debug: Looking up hash
DDEF6ACD0DF553A77CBC6B3537BDAA766E0CD819733D0B712AFD9A41B5888AB5 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash
B8047D0B3763184FF29E17D4F649BA05E469538C40018FBB901437822F0066C6 for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(31)
LibClamAV debug: Looking up hash
6D92531661EBF105F3C03BE8EA6C7E585F2A1603B5FF4D501BC0846755355018 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash
DA983C0FAA7401A96BBBF6068F29762557B63F0811A0418BC046D95795999AFB for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(16)
LibClamAV debug: Looking up hash
88981E6263BE34A6C0B53ADA73D168B68828DD643723D34A812E9F8A6ABB5EE9 for
google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: Looking up hash
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 for
www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(0)
LibClamAV debug: This hash matched:
BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5
LibClamAV debug: Hash matched for:
http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>
LibClamAV debug: Phishcheck: Phishing scan result: Blacklisted
LibClamAV debug: blobDestroy
Long story short, safebrowsing is working ok but there are no hits which
is quite surprising i can say seeing the magnitude of the database
entries and the scam/phishing flowing trough emails now-days.
---
Best regards,
Iulian Stan
On 2020-10-19 20:01, G.W. Haywood via clamav-users wrote:
Hi there,
Just some thoughts, as you asked. Sorry is isn't more helpful.
On Mon, 19 Oct 2020, iulian stan via clamav-users wrote:
#cat bla.gdb
S1:F:dd014af5ed6b38d9130e3f466f850e46d21b951199d53a18ef29ee9341614eaf
S1:P:dd014af5 Creating file to be tested: #cat /tmp/clam.txt
http://www.google.com/
www.google.com
http://www.google.com/asdasdasd
I repeated your tests with 0.103-rc2 and got the same results. I
looked for obvious things like line terminators being included by
accident, but I didn't find anything.
Running scanner: clamscan --debug -d bla.gdb /tmp/clam.txt
LibClamAV debug: Module <....> On
I wondered if there's a module that should be being loaded and isn't.
LibClamAV debug: Recognized ASCII text
I wondered does it need to recognize the file as HTML, and also if
there's some length limit below which the scanner won't bother doing
the scan (I've seen mention of something like that when I've been
reading the code looking for something else) but I tried wrapping your
text in some html tags, and added some padding, and it made no
difference. This is incidentally one of those cases where the values
printed in the output for "Data scanned" and "Data read" could be more
useful...
8<----------------------------------------------------------------------
...
LibClamAV debug: Recognized ASCII text
LibClamAV debug: Matched signature for file type HTML data at 0
...
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.103.0-rc2
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.20 MB
Data read: 0.10 MB (ratio 2.00:1)
8<----------------------------------------------------------------------
Lastly
----------- SCAN SUMMARY -----------
Known viruses: 2
This doesn't seem right to me. There's really only one signature.
Basically I haven't seen anything here which might make me think the
problem is you, but I don't use the safebrowsing stuff so I don't have
the experience (and I don't have the time right now) to investigate it
further. It seems to me that even if there isn't something wrong with
clamd (which I guess means that it's faulty documentation) it really
shouldn't be this difficult - that alone would make it worth a report
to the ClamAV Bugzilla.
--
73,
Ged.
_______________________________________________
clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml
_______________________________________________
clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml