Re: [clamav-users] Google safebrowsing types and usage questions

iulian stan via clamav-users Mon, 19 Oct 2020 14:56:13 -0700

Dear Ged/All,

After a beer things started to look more clear :)

You were right about something: indeed clamav is looking for somethingbefore starting to look after URL but it's actually looking for whatshould be the start of email headers. In short words is looking for:"From someone".

Basically the test can be:
echo -e "From test\n\n http://www.google.com/"; | clamscan  -d bla.gdb  -
or

echo -e "From test\n\n<a href=http://www.google.com/>test</a>" |clamscan -d bla.gdb -


with the fallowing result:
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.102.4
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.051 sec (0 m 0 s)

I totally agree with you that "Know viruses" should be 1 but this isanother story for another time.

Now comes the funny part which explains why i didn't found the sha256hash in my mysql and also why the above test will fail if you don'tcreate the hash correctly.

If you read https://developers.google.com/safe-browsing/v4/urls-hashing(very carefully, not like I've did in the beginning) you will see thatyou can create multiple hashes for the same url but you first need tostrip http[s]://


The same is also seen in the clamav debugging.

If we take for example url"http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt"; the debug will be.


LibClamAV debug: getHrefs: html_normalise_mem returned

LibClamAV debug: Phishcheck:Checking urlhttp://www.google.com/jhgfedwsqasdfgh/234tewdas.txt->LibClamAV debug: Looking up hashDDEF6ACD0DF553A77CBC6B3537BDAA766E0CD819733D0B712AFD9A41B5888AB5 forgoogle.com/(11)jhgfedwsqasdfgh/234tewdas.txt(31)LibClamAV debug: Looking up hashB8047D0B3763184FF29E17D4F649BA05E469538C40018FBB901437822F0066C6 forwww.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt(31)LibClamAV debug: Looking up hash6D92531661EBF105F3C03BE8EA6C7E585F2A1603B5FF4D501BC0846755355018 forgoogle.com/(11)jhgfedwsqasdfgh/234tewdas.txt(16)LibClamAV debug: Looking up hashDA983C0FAA7401A96BBBF6068F29762557B63F0811A0418BC046D95795999AFB forwww.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt(16)LibClamAV debug: Looking up hash88981E6263BE34A6C0B53ADA73D168B68828DD643723D34A812E9F8A6ABB5EE9 forgoogle.com/(11)jhgfedwsqasdfgh/234tewdas.txt(0)LibClamAV debug: Looking up hashBC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5 forwww.google.com/(15)jhgfedwsqasdfgh/234tewdas.txt(0)LibClamAV debug: This hash matched:BC9A8F2B6FFFD58571E188BB110545F8FB3AF51CDF1A63696D505A9870A85BE5LibClamAV debug: Hash matched for:http://www.google.com/jhgfedwsqasdfgh/234tewdas.txt

LibClamAV debug: Phishcheck: Phishing scan result: Blacklisted
LibClamAV debug: blobDestroy

Long story short, safebrowsing is working ok but there are no hits whichis quite surprising i can say seeing the magnitude of the databaseentries and the scam/phishing flowing trough emails now-days.


---
Best regards,
Iulian Stan


On 2020-10-19 20:01, G.W. Haywood via clamav-users wrote:

Hi there,

Just some thoughts, as you asked.  Sorry is isn't more helpful.

On Mon, 19 Oct 2020, iulian stan via clamav-users wrote:

#cat bla.gdb
S1:F:dd014af5ed6b38d9130e3f466f850e46d21b951199d53a18ef29ee9341614eaf
S1:P:dd014af5 Creating file to be tested: #cat /tmp/clam.txt
http://www.google.com/
www.google.com
http://www.google.com/asdasdasd


I repeated your tests with 0.103-rc2 and got the same results.  I
looked for obvious things like line terminators being included by
accident, but I didn't find anything.

Running scanner: clamscan --debug -d bla.gdb /tmp/clam.txt
LibClamAV debug: Module <....> On


I wondered if there's a module that should be being loaded and isn't.

LibClamAV debug: Recognized ASCII text


I wondered does it need to recognize the file as HTML, and also if
there's some length limit below which the scanner won't bother doing
the scan (I've seen mention of something like that when I've been
reading the code looking for something else) but I tried wrapping your
text in some html tags, and added some padding, and it made no
difference.  This is incidentally one of those cases where the values
printed in the output for "Data scanned" and "Data read" could be more
useful...

8<----------------------------------------------------------------------
...
LibClamAV debug: Recognized ASCII text
LibClamAV debug: Matched signature for file type HTML data at 0
...
----------- SCAN SUMMARY -----------
Known viruses: 2
Engine version: 0.103.0-rc2
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.20 MB
Data read: 0.10 MB (ratio 2.00:1)
8<----------------------------------------------------------------------

Lastly

----------- SCAN SUMMARY -----------
Known viruses: 2


This doesn't seem right to me.  There's really only one signature.

Basically I haven't seen anything here which might make me think the
problem is you, but I don't use the safebrowsing stuff so I don't have
the experience (and I don't have the time right now) to investigate it
further.  It seems to me that even if there isn't something wrong with
clamd (which I guess means that it's faulty documentation) it really
shouldn't be this difficult - that alone would make it worth a report
to the ClamAV Bugzilla.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Re: [clamav-users] Google safebrowsing types and usage questions

Reply via email to