[ https://issues.apache.org/jira/browse/TIKA-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4209: ------------------------------ Description: [~johanvanderknijff] recently published a great post on multi-image TIFFs: [https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories] I hadn't worked on TIFF in a while. I tried out a few sample multi-image tiffs and found that we are not processing anything beyond the first page/image in a TIFF. Even worse, we're not populating our "{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs. It looks like Drew Noakes' metadata-extractor is not yet handling these well: [https://github.com/drewnoakes/metadata-extractor/issues/648] There's an example file on that issue: [https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip] And [~johanvanderknijff] also pointed out to TIFFs available here: [https://www.leadtools.com/support/forum/posts/t10960-] was: [~johanvanderknijff] recently published a great post on multipage TIFFs: [https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories] I hadn't worked on TIFF in a while. I tried out a few sample multipage tiffs and found that we are not processing anything beyond the first page/image in a TIFF. Even worse, we're not populating our "{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs. It looks like Drew Noakes' metadata-extractor is not yet handling these well: [https://github.com/drewnoakes/metadata-extractor/issues/648] There's an example file on that issue: [https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip] And [~johanvanderknijff] also pointed out to TIFFs available here: [https://www.leadtools.com/support/forum/posts/t10960-] > Improve handling of multi-image tiffs > ------------------------------------- > > Key: TIKA-4209 > URL: https://issues.apache.org/jira/browse/TIKA-4209 > Project: Tika > Issue Type: New Feature > Reporter: Tim Allison > Priority: Major > > [~johanvanderknijff] recently published a great post on multi-image TIFFs: > [https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories] > I hadn't worked on TIFF in a while. I tried out a few sample multi-image > tiffs and found that we are not processing anything beyond the first > page/image in a TIFF. Even worse, we're not populating our > "{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs. > It looks like Drew Noakes' metadata-extractor is not yet handling these well: > [https://github.com/drewnoakes/metadata-extractor/issues/648] > > There's an example file on that issue: > [https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip] > And [~johanvanderknijff] also pointed out to TIFFs available here: > [https://www.leadtools.com/support/forum/posts/t10960-] -- This message was sent by Atlassian Jira (v8.20.10#820010)