Dear Ged/All,

After a beer things started to look more clear :)

You were right about something: indeed clamav is looking for something before starting to look after URL but it's actually looking for what should be the start of email headers. In short words is looking for: "From someone".
Basically the test can be:
echo -e "From test\n\n http://www.google.com/"; | clamscan  -d bla.gdb  -
or
echo -e "From test\n\n<a href=http://www.google.com/>test</a>" | clamscan -d bla.gdb -

with the fallowing result:
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.102.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.051 sec (0 m 0 s)

I totally agree with you that "Know viruses" should be 1 but this is another story for another time.


Now comes the funny part which explains why i didn't found the sha256 hash in my mysql and also why the above test will fail if you don't create the hash correctly.

If you read https://developers.google.com/safe-browsing/v4/urls-hashing (very carefully, not like I've did in the beginning) you will see that you can create multiple hashes for the same url but you first need to strip http[s]://

The same is also seen in the clamav debugging.
If we take for example url "http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt"; the debug will be.

LibClamAV debug: getHrefs: html_normalise_mem returned
LibClamAV debug: Phishcheck:Checking url http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>-> LibClamAV debug: Looking up hash DDEF6ACD0DF553A77CBC6B3537BDAA766E0CD819733D0B712AFD9A41B5888AB5 for google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(31) LibClamAV debug: Looking up hash B8047D0B3763184FF29E17D4F649BA05E469538C40018FBB901437822F0066C6 for www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(31) LibClamAV debug: Looking up hash 6D92531661EBF105F3C03BE8EA6C7E585F2A1603B5FF4D501BC0846755355018 for google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(16) LibClamAV debug: Looking up hash DA983C0FAA7401A96BBBF6068F29762557B63F0811A0418BC046D95795999AFB for www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(16) LibClamAV debug: Looking up hash 88981E6263BE34A6C0B53ADA73D168B68828DD643723D34A812E9F8A6ABB5EE9 for google.com/(11)jhgfedwsqasdfgh/234tewdas.txt</p>(0) LibClamAV debug: Looking up hash BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 for www.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt</p>(0) LibClamAV debug: This hash matched: BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 LibClamAV debug: Hash matched for: http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt</p>
LibClamAV debug: Phishcheck: Phishing scan result: Blacklisted
LibClamAV debug: blobDestroy



Long story short, safebrowsing is working ok but there are no hits which is quite surprising i can say seeing the magnitude of the database entries and the scam/phishing flowing trough emails now-days.

---
Best regards,
Iulian Stan


On 2020-10-19 20:01, G.W. Haywood via clamav-users wrote:
Hi there,

Just some thoughts, as you asked.  Sorry is isn't more helpful.

On Mon, 19 Oct 2020, iulian stan via clamav-users wrote:

#cat bla.gdb
S1:F:dd014af5ed6b38d9130e3f466f850e46d21b951199d53a18ef29ee9341614eaf
S1:P:dd014af5 Creating file to be tested: #cat /tmp/clam.txt
http://www.google.com/
www.google.com
http://www.google.com/asdasdasd

I repeated your tests with 0.103-rc2 and got the same results.  I
looked for obvious things like line terminators being included by
accident, but I didn't find anything.

Running scanner: clamscan --debug -d bla.gdb /tmp/clam.txt
LibClamAV debug: Module <....> On

I wondered if there's a module that should be being loaded and isn't.

LibClamAV debug: Recognized ASCII text

I wondered does it need to recognize the file as HTML, and also if
there's some length limit below which the scanner won't bother doing
the scan (I've seen mention of something like that when I've been
reading the code looking for something else) but I tried wrapping your
text in some html tags, and added some padding, and it made no
difference.  This is incidentally one of those cases where the values
printed in the output for "Data scanned" and "Data read" could be more
useful...

8<----------------------------------------------------------------------
...
LibClamAV debug: Recognized ASCII text
LibClamAV debug: Matched signature for file type HTML data at 0
...
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.103.0-rc2
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.20 MB
Data read: 0.10 MB (ratio 2.00:1)
8<----------------------------------------------------------------------

Lastly

----------- SCAN SUMMARY -----------
Known viruses: 2

This doesn't seem right to me.  There's really only one signature.

Basically I haven't seen anything here which might make me think the
problem is you, but I don't use the safebrowsing stuff so I don't have
the experience (and I don't have the time right now) to investigate it
further.  It seems to me that even if there isn't something wrong with
clamd (which I guess means that it's faulty documentation) it really
shouldn't be this difficult - that alone would make it worth a report
to the ClamAV Bugzilla.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Reply via email to