Ciao my friends,
I am glad to inform you that the first version (beta-beta-beta-...) of
htcheck is pretty ready. This program uses a big slice of htdig library and
technology and stores info on a MySQL database. It goes without saying that
this program needs improvements (it is my biggest work so far) and if
someone of you, with MySql experience, offers to test it, I can send him
sources. I need new ideas and advice (who doesn't?).
Its purpose is to help a webmaster to maintain the sites he manages: of
course, checking websites on a local net is faster ... And what makes
htcheck slow is checking "external" Urls ... It can tell everything you can
ask to a website (a set of HTML documents or those that can be parsed -
html for now): failed urls, which urls are linked to another link and which
tags and attributes are involved, and if u know SQL this list can grow ...
Well, htcheck capabilities are to store every tag present on a HTML (so
far only this, I'll see in the future) document on the web, every attribute
of overy Html element parsed and every link created by them.
At today, I have developed the "crawling" and storing system and I need to
build the interface for querying the database created ( I can choose the
name of the DB for every scan operation): this is the easiest part, cos I
only have to do SQL statements. I got 2 ideas: a standalone program and a
PHP interface. I am using this right now (very very simple) to check the
results htcheck produces.
The crawling system is very similar to the htdig's one, and it uses
HTTP/1.1 (partially developed by me on htdig) and configuration
inclusion/exclusion too. For those Urls included, htcheck tries to retrieve
them and depending on the content type returned, parses them. The
"external" Urls can be checked if they exist or not (configuration attr.).
Last indexing process, on a Linux RedHat 6.0 - Intel Pentium II 333 Mhz,
had this result:
- about 18.000 Urls seen
- 12 minutes and 46 seconds.
- about 30.000 HTTP 1/1 requests with only 366 connections on 15 servers
- 100.000 Links stored
- 102.000 html statements and attributes stored
I only need to get it ready for the OPENSOURCE community, but I think
Geoff, GIlles, Loic and you guys can help me.
Ciao and thanks to all of you
-Gabriele
P.S.: Have a good Easter if we won't have the chance to hear again ...
-------------------------------------------------
Gabriele Bartolini
Computer Programmer (are U sure?)
U.O. Rete Civica - Comune di Prato
Prato - Italia - Europa
e-mail: [EMAIL PROTECTED]
http://www.po-net.prato.it
-------------------------------------------------
Zinedine "Zizou" Zidane. Just for soccer lovers.
-------------------------------------------------
-------------------------------------------------
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.