Welcome here Kev! :-) At 10.20 29/01/2003 +1100, you wrote:
I have been running HtDig for years, and a number of times have thought about harnessing what people are searching for as symptoms of desirable knowledge in an organisation. Looking through the logs, there seemed to be valuable information buried away.
Yes ... it's our 'mine'. :-)
I recently modified my HtDig configuration so that I could use PHP4 search and results forms. I began to think that now I could change the results URLs into PHP links to capture the information into a log of what links were followed. The log would simply contain the URL, the keyword(s) used, and todays date.I have done something similar, but I also add both a session (at least!) and a permanent cookie, in order to allow more precise queries. I have done some research too in Web usage mining for my thesis and in this simple database there's lots of potential knowledge.
I know there will be users who click on all links until they find what they wanted, but I suspect that on average, the weightings would lead to higher initial win rates.I had thought this a little bit different, but unfortunately (again!) I had no time to make those ideas real. I was thinking about considering effective found URLs those ones which have a following click distant more than, let's say, 10 seconds in the same session (that's where the browsing session comes for!). Of course there must be some euristic assumptions to be taken for the last entry.
By doing this you could get rid of some of the false results you were talking about.
While I am competent at PHP, I don't think I could tackle a patch for HtDig so I'll throw this idea open for discussion.Thanks for your suggestion. However, I'll try and 'come back with feet on earth again' (as we say in Italy - I hope I don't get misunderstood).
Using usage information internally is what ht://Dig lacks, as we already use both content and structure someway (through the 'backlink_factor' attribute for instance). IMHO this could be very hard to implement, at least without re-thinking the actual design of the whole system.
I guess people like Geoff and Neal could be more precise, but an internal usage module should be something more than a urls/frequency archive, as long as, for instance, not all the clicked URLs could be what the user effectively wanted; the learning mechanism should be designed from scratch for ht://Dig and consistency between different crawls and incremental updating should be preserved. Also, I am afraid that such a module would affect also the actual design.
However, this is something I'd really love to see implemented in ht://Dig but, not in this phase, as I think we have to be realist: our next aim is to release the first 3.2 branch's stable release. That must be our first step.
Ciao ciao and thanks!
-Gabriele
--
Gabriele Bartolini - Web Programmer - ht://Dig & IWA Member - ht://Check maintainer
Current Location: Prato, Tuscany, Italia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html