[ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478920 ]
Steve Severance commented on NUTCH-296: --------------------------------------- I know the commiters are hard at work on the 0.9.0 release but I have begun to work on the first piece of this, the parser. I am looking for guidance as to how the images and thumbnails should be stored. One file per image is probably too inefficient. Are there existing file formats that the community would like to use? I am building a parser that can handle most image types. Should I break them out into individual plugins so there is one per file type? e.g. jpg will have an extension, gif will have a separate extension etc... This may be more flexible in the long run. This is the first project that I am undertaking on the nutch codebase so any guidance would be great. Steve > Image Search > ------------ > > Key: NUTCH-296 > URL: https://issues.apache.org/jira/browse/NUTCH-296 > Project: Nutch > Issue Type: New Feature > Reporter: Thomas Delnoij > Priority: Minor > > Per the discussion in the Nutch-User mailing list, there is a wish for an > "Image Search" add-on component that will index images. > Must have: > - retrieve outlinks to image files from fetched pages > - generate thumbnails from images > - thumbnails are stored in the segments as ImageWritable that contains the > compressed binary data and some meta data > Should have: > - implemented as hadoop map reduce job > - should be seperate from main Nutch codeline as it breaks general Nutch > logic of one url == one index document. > Could have: > - store the original image in the segments > Would like to have: > - search interface for image index > - parameterizable thumbnail generation (width, height, quality) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.