Hi,

I am the maintainer of the GIFT (GNU Image Finding Tool). The GIFT offers 
visual similarity search for images. Try the demo on http://viper.unige.ch if 
you are interested what this means exactly.

Currently I am looking into improving (or rather making) our web-site 
indexing capabilities. As of the current version, the GIFT does the 
equivalent of a "find -exec" on a directory tree to index it. 

For indexing websites: why not just wget all the stuff and then index it 
locally using a program that visits all the files? This would be simple, but 
just imagine, you are indexing a large site with large images, but what you 
need for the GUI and the index are just the image thumbnails and image 
features. However, using the get-first-index-later approach would mean that 
we would have to have at one given time the whole collection locally.

It would be much more practical, if we have some program which gets each 
document, indexes it, and deletes the local copy of it, then gets the next 
image etc.

Doing something like that must be easy to add to wget or htdig. I am thinking 
of adding a plugin mechanism, that gives hooks that are called

1) before crawling is started
2) before a given document is loaded (e.g. to decide if it will be loaded)
3) after a given document has been loaded
4) after crawling has finished.

I was thinking of shared libs that can be loaded on startup. The shared lib 
to be used could be an option of wget. The GIFT would wrap this up in a small 
shell script, making ugly things invisible to the user.

IMHO things like that could be interesting both for htdig and wget, and I 
would like to determine

1) if someone of you htdig/wget guys is doing that already
2) if you are interested in me adding something like that to wget/htdig
or alternatively, if somebody volunteers...
3) if someone would be willing or able to point me to the right places,
4) how to do things in order to maximize the use for everybody.

Cheers,
Wolfgang

-- 
Dr. Wolfgang Müller, assistant == teaching assistant
Personal page: http://cui.unige.ch/~vision/members/WolfgangMueller.html 
Maintainer, GNU Image Finding Tool (http://www.gnu.org/software/gift)


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to