[ 
https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083289#comment-13083289
 ] 

Lewis John McGibbney commented on NUTCH-296:
--------------------------------------------

I haven't looked too deeply into this, however I think we have just missed the 
window where this could have been easily integrated into Nutch 1.2. As it 
concerns searching and viewing of images as described here [1], I really think 
that it would be mostly useful for folks at Solr to look at, however there 
would obviously be some sort of image processing required by Nutch in the form 
of a plugin. My main concern is dealing with the API changes...

Any comments from anyone familiar with the original project or anyone who has 
time to have a look through the README link below.

[1] 
http://archive-access.svn.sourceforge.net/svnroot/archive-access/trunk/archive-access/projects/nutchwax/imagesearch/README.txt

> Image Search
> ------------
>
>                 Key: NUTCH-296
>                 URL: https://issues.apache.org/jira/browse/NUTCH-296
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Thomas Delnoij
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>
> Per the discussion in the Nutch-User mailing list, there is a wish for an 
> "Image Search" add-on component that will index images.
> Must have:
> - retrieve outlinks to image files from fetched pages
> - generate thumbnails from images
> - thumbnails are stored in the segments as ImageWritable that contains the 
> compressed binary data and some meta data 
> Should have:
> - implemented as hadoop map reduce job
> - should be seperate from main Nutch codeline as it breaks general Nutch 
> logic of one url == one index document.
> Could  have:
> - store the original image in the segments
> Would like to have:
> - search interface for image index
> - parameterizable thumbnail generation (width, height, quality)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to