Title: Message
Hi,
 
I have a requirement to convert hundreds of unstructured documents in WORD/PDF/TXT/EMAIL formats
into a structured repository of XML Metadata of the document and the documents itself.
 
I need to parse each of these documents and extract the relevant information to build a XML metadata
document for each document.
 
The XML structured metadata of the underlying document will contain fields like Keywords, Category, Doc Name,
Author etc.
 
Is it possible to use Cocoon and or POI to do this.  And if yes how to use Cocoon to do the extraction.
 
I am new to Cocoon, and trying to understand the world of transformers/generators etc.
 
Also could I use Lucene to index the XML documents and build a search engine around it.
 
I would like to know about the possible ways to do this.
 
regards
 
rajesh.
 

Reply via email to