On Thu, 15 Nov 2001, Gregory Kozlovsky wrote:

> One document that exhibited this problem was
> http://www.icrc.org/icrcfre.nsf/c125629700325427c12561740044a4f7/323ed98e7ee
> 34ae6412565560025f227?OpenDocument
> 
> Look at one of the pictures, for example
> http://www.icrc.org/icrcfre.nsf/0/323ed98e7ee34ae6412565560025f227/Content/2
> .14A2?OpenElement&FieldElemFormat=gif
> In the source the reference is
> <IMG
> SRC="/icrcfre.nsf/0/323ed98e7ee34ae6412565560025f227/Content/2.14A2?OpenElem
> ent&FieldElemFormat=gif" WIDTH=198 HEIGHT=181>
> 
> When indexing, ASPSeek outputs message:
> Adding URL:
> http://www.icrc.org/icrcfre.nsf/0/323ed98e7ee34ae6412565560025f227/Content/2
> .14A2?OpenElement&FieldElemFormat=gif
> 
> What happens I think, is that you weed out image files by Disallowing .gif
> and other graphical extensions. In this case, however, the webmaster (not
> me) uses some fancy content management with dynamic access to images. It
> seems that it would be more logical and reliable just not to follow <IMG
> SRC> links, unless we want to index images.

Ok, I stand corrected! >:^p

Had a prod at the parser code and yes, img src tags are "followed" provided
they are not disallowed.  The actual number of images that may be indexed is
relatively low "over all" since they must meet the dynamic criteria to qualify
(in general).  Granted, if you are indexing a single site that uses this
technique your figures might be quite high.  But in general, when indexing
40,000 and upwards unique sites the figures ebb somewhat.

There is actually some longer term benefit to this capability.  A feature such
as href text would potentially allow ASPSeek to provide an image search
feature.


Matt. 

--
>     Gregory Kozlovsky
> 
> -----Original Message-----
> From: Kir Kolyshkin [mailto:[EMAIL PROTECTED]]
> Sent: Donnerstag, 15. November 2001 17:37
> To: [EMAIL PROTECTED]
> Subject: Re: [aseek-users] Some small problems
> 
> 
> Gregory Kozlovsky wrote:
> > 
> > Here are some small problems I found that would be nice to have fixed.
> > 
> > 1. ASPSeek tries to follow <IMG SRC ...> links to images. This wastes time
> > and space.
> 
> I believe you are wrong here. Please re-check this and prove ;)
> 

Reply via email to