[ https://issues.apache.org/jira/browse/TIKA-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986574#comment-16986574 ]
Hudson commented on TIKA-2630: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1757 (See [https://builds.apache.org/job/Tika-trunk/1757/]) TIKA-2630 -- add defensive null check and fix "if (...width)" to "if (tallison: [https://github.com/apache/tika/commit/8c0d7feb55988c2d6b34a96d8b2ab73884989bd4]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java > Wrong height and width metadata for JPEG images > ----------------------------------------------- > > Key: TIKA-2630 > URL: https://issues.apache.org/jira/browse/TIKA-2630 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Reporter: Ancuta Morarasu > Assignee: Dave Meikle > Priority: Major > Attachments: Tika-metadata.txt, metadata-exctractor-metadata.txt, > sizesampleissue.jpg > > > According to [Exif > specs|http://www.exif.org/Exif2-2.PDF#page=73&zoom=auto,-176,103], for > compressed images the values for width and height should come from the tags: > * *PixelXDimension* mapped in metadata-extractor to > {{com.drew.metadata.Directory.ExifDirectoryBase.TAG_EXIF_IMAGE_WIDTH}} and > * *PixelYDimension* mapped to {{ExifDirectoryBase.TAG_EXIF_IMAGE_HEIGHT}}. > {{ImageMetadataExtractor$ExifHandler.[handlePhotoTags(...)|https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java#L487]}} > should extract and set these in the metadata: > {code:java} > if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) { > metadata.set(Metadata.IMAGE_WIDTH, > > trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH))); > } > if (directory.containsTag(ExifSubIFDDirectory.TAG_EXIF_IMAGE_WIDTH)) { > metadata.set(Metadata.IMAGE_LENGTH, > > trimPixels(directory.getDescription(ExifSubIFDDirectory.TAG_EXIF_IMAGE_HEIGHT))); > } > {code} > Also the {{CopyUnknownFieldsHandler}} overrides the values for "Image Width" > ({{JpegDirectory.TAG_IMAGE_WIDTH}}) and "Image Height" > ({{JpegDirectory.TAG_IMAGE_HEIGHT}}) with the values from > {{ExifIFD0Descriptor.TAG_IMAGE_WIDTH}} and > {{ExifIFD0Descriptor.TAG_IMAGE_HEIGHT}} because they have the same tag name. > I attached a sample image, these are the metadata values: > * extracted by metadata-extractor: > [JPEG] Image Height = 367 pixels > [JPEG] Image Width = 1535 pixels > [Exif IFD0] Image Width = 2173 pixels > [Exif IFD0] Image Height = 520 pixels > [Exif SubIFD] Exif Image Width = 1535 pixels > [Exif SubIFD] Exif Image Height = 367 pixels > * Tika metadata: > Image Height: 520 pixels > Image Width: 2173 pixels > tiff:ImageLength: 520 > tiff:ImageWidth: 2173 > Exif Image Height: 367 pixels > Exif Image Width: 1535 pixels -- This message was sent by Atlassian Jira (v8.3.4#803005)