[ https://issues.apache.org/jira/browse/TIKA-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Burch resolved TIKA-1212. ------------------------------ Resolution: Invalid The problem is that you're not tracking how far down the rabbit hole you've gone when you recurse. When your recursing parser is processing a resource, if it wants to recurse for another time, it needs to track here it is and tell the next one down where it came from. I've added a simple example of this to the wiki - https://wiki.apache.org/tika/RecursiveMetadata#Tracking_how_far_down_the_Rabbit_Hole_you_have_gone Various other approaches will work too, the trick is that when you recurse once more you need to track where you came from if you want relative paths > Recursive Extraction of Archive File > ------------------------------------ > > Key: TIKA-1212 > URL: https://issues.apache.org/jira/browse/TIKA-1212 > Project: Tika > Issue Type: Bug > Reporter: Vikram > Priority: Critical > Attachments: RecursiveMetadataParserZukka.java, TIKA-Output.xlsx, > abc.zip, abc.zip > > > Please refer the code: > http://wiki.apache.org/tika/RecursiveMetadata#Main_from_Jukka.27s_Example > Requirement: > ----------------- > abc.zip > ---> a.doc > ---> b.xls > ---> pqr.zip > -------------> m.ppt > There are two issues with TIKA: > 1. How to block extraction embedded doc separately optionally? > 2. When I extract recussively, file name / or resourceKeyName is not coming > properly. For example > --> a.doc should have value abc.zip/a.doc. Similarily for b.xls. This is > fine BUT m.ppt is having resource file name as pqr/m.ppt which is WRONG. This > should have value abc.zip/pqr.zip/m.ppt. > --> Even for the Embedded doc, only random name is coming.. not even with > proper file path. -- This message was sent by Atlassian JIRA (v6.2#6252)