Hi,

I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like
to get access to the metadata for the individual files inside of the
package.

It looks like there has been some discussion about how to provide the
metadata, and from looking at the code I don't think any of the proposed
solutions have been implemented yet:
http://mail-archives.apache.org/mod_mbox/lucene-tika-dev/200906.mbox/%3c3949e4f8-0acf-4ba4-8ffc-57af8a783...@soe.ucsc.edu%3e
http://mail-archives.apache.org/mod_mbox/lucene-tika-dev/200907.mbox/%3c510143ac0907300409u699a3953t9b2dfbd6bb633...@mail.gmail.com%3e

It looks like the last suggestion was to add attributes to the <div> element
for each file for each metadata entry. Unfortunately I don't think the code
does this today.

I'm left with the following questions:
- Has a consensus been reached for how to provide access to the metadata?
- If consensus has been reached, will this be implemented soon, or can I
help by implementing the preferred solution?
- If consensus has not been reached, how can a consensus be reached so
someone can implement this functionality?

Please let me know how I can help move this forward.

Paul

Reply via email to