Hi Ankit, you've already received good and well informed advices. Just a few of links to other Apache projects you might find useful:
- http://commons.apache.org/fileupload/ - http://tika.apache.org/1.0/formats.html#Portable_Document_Format - http://pdfbox.apache.org/ - http://lucene.apache.org/ (as others pointed out) And maybe (but this is probably too much in your case): - http://chemistry.apache.org/ - http://jackrabbit.apache.org/ I would keep things very simple: - store PDFs files in the file system - extract metadata (when/it available) out of PDFs using Tika and store it as RDF in Jena TDB - extract text out of PDFs using PDFBox and index it using Lucene|Solr - provide free text search capabilities using Lucene|Solr It is often the case that people need to deal with metadata and content/blobs. Storing content/blobs in the file system or a remote content store (such, for example, Amazon S3) is quite common. Jena helps you only for the metadata bit (and only if you model the metadata in RDF). My 2 cents, Paolo Ankit Verma wrote: > Hi all, > > Can we persist any other document like .pdf rather than .rdf file > using jena . > Thanks in advance for the reply. > > > Thanks > Ankit
