Hi, I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like to get access to the metadata for the individual files inside of the package.
It looks like there has been some discussion about how to provide the metadata, and from looking at the code I don't think any of the proposed solutions have been implemented yet: http://mail-archives.apache.org/mod_mbox/lucene-tika-dev/200906.mbox/%[email protected]%3e http://mail-archives.apache.org/mod_mbox/lucene-tika-dev/200907.mbox/%[email protected]%3e It looks like the last suggestion was to add attributes to the <div> element for each file for each metadata entry. Unfortunately I don't think the code does this today. I'm left with the following questions: - Has a consensus been reached for how to provide access to the metadata? - If consensus has been reached, will this be implemented soon, or can I help by implementing the preferred solution? - If consensus has not been reached, how can a consensus be reached so someone can implement this functionality? Please let me know how I can help move this forward. Paul
