[ 
https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478920
 ] 

Steve Severance commented on NUTCH-296:
---------------------------------------

I know the commiters are hard at work on the 0.9.0 release but I have begun to 
work on the first piece of this, the parser. I am looking for guidance as to 
how the images and thumbnails should be stored. One file per image is probably 
too inefficient. Are there existing file formats that the community would like 
to use?

I am building a parser that can handle most image types. Should I break them 
out into individual plugins so there is one per file type? e.g. jpg will have 
an extension, gif will have a separate extension etc... This may be more 
flexible in the long run. This is the first project that I am undertaking on 
the nutch codebase so any guidance would be great.

Steve

> Image Search
> ------------
>
>                 Key: NUTCH-296
>                 URL: https://issues.apache.org/jira/browse/NUTCH-296
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Thomas Delnoij
>            Priority: Minor
>
> Per the discussion in the Nutch-User mailing list, there is a wish for an 
> "Image Search" add-on component that will index images.
> Must have:
> - retrieve outlinks to image files from fetched pages
> - generate thumbnails from images
> - thumbnails are stored in the segments as ImageWritable that contains the 
> compressed binary data and some meta data 
> Should have:
> - implemented as hadoop map reduce job
> - should be seperate from main Nutch codeline as it breaks general Nutch 
> logic of one url == one index document.
> Could  have:
> - store the original image in the segments
> Would like to have:
> - search interface for image index
> - parameterizable thumbnail generation (width, height, quality)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to