Hi, I am the maintainer of the GIFT (GNU Image Finding Tool). The GIFT offers visual similarity search for images. Try the demo on http://viper.unige.ch if you are interested what this means exactly.
Currently I am looking into improving (or rather making) our web-site indexing capabilities. As of the current version, the GIFT does the equivalent of a "find -exec" on a directory tree to index it. For indexing websites: why not just wget all the stuff and then index it locally using a program that visits all the files? This would be simple, but just imagine, you are indexing a large site with large images, but what you need for the GUI and the index are just the image thumbnails and image features. However, using the get-first-index-later approach would mean that we would have to have at one given time the whole collection locally. It would be much more practical, if we have some program which gets each document, indexes it, and deletes the local copy of it, then gets the next image etc. Doing something like that must be easy to add to wget or htdig. I am thinking of adding a plugin mechanism, that gives hooks that are called 1) before crawling is started 2) before a given document is loaded (e.g. to decide if it will be loaded) 3) after a given document has been loaded 4) after crawling has finished. I was thinking of shared libs that can be loaded on startup. The shared lib to be used could be an option of wget. The GIFT would wrap this up in a small shell script, making ugly things invisible to the user. IMHO things like that could be interesting both for htdig and wget, and I would like to determine 1) if someone of you htdig/wget guys is doing that already 2) if you are interested in me adding something like that to wget/htdig or alternatively, if somebody volunteers... 3) if someone would be willing or able to point me to the right places, 4) how to do things in order to maximize the use for everybody. Cheers, Wolfgang -- Dr. Wolfgang Müller, assistant == teaching assistant Personal page: http://cui.unige.ch/~vision/members/WolfgangMueller.html Maintainer, GNU Image Finding Tool (http://www.gnu.org/software/gift) _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
