Steve Severance wrote:
> I am not looking to really make an image retrieval engine. During indexing 
> referencing docs will be analyzed and text content will be associated with 
> the image. Currently I want to keep this in a separate index. So despite the 
> fact that images will be returned the search will be against text data.

So do you just want to be able to reference the cached images?  In that 
case, I think the images should stay in the content directory and be 
accessed like cached pages.  The parse should just contain enough 
metadata to index so that the images can be located in the cache.  I 
don't see a reason to keep this in a separate index, but perhaps a 
separate field instead?  Then when displaying hits you can look up 
associated images and display them too.  Does that work?

Steve Severance wrote:
> I like Mathijs's suggestion about using a DB for holding thumbnails. I just 
> want access to be in constant time since I am going to probably need to grab 
> at least 10 and maybe 50 for each query. That can be kept in the plugin as an 
> option or something like that. Does that have any ramifications for being run 
> on Hadoop?

I'm not sure how a database solves scalability issues.  It seems to me 
that thumbnails should be handled similarly to summaries.  They should 
be retrieved in parallel from segment data in a separate pass once the 
final set of hits to be displayed has been determined.  Thumbnails could 
be placed in a directory per segment as a separate mapreduce pass.  I 
don't see this as a parser issue, although perhaps it could be 
piggybacked on that mapreduce pass, which also processes content.

Doug

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to