Thanks Ernesto. The issue I'm working with now (this is more lack of experience than anything) is getting an input I can index. All my indexing classes (doc, pdf, xml, ppt) take a File object as a parameter and return a Lucene Document containing all the fields I need.
I'm struggling with how I can work with an array of bytes instead of a Java File. It would be easier to unzip the zip to a temp directory, parse the files and than delete the directory. But this would greatly slow indexing and use up disk space. Luke ----- Original Message ----- From: "Ernesto De Santis" <[EMAIL PROTECTED]> To: "Lucene Users List" <lucene-user@jakarta.apache.org> Sent: Tuesday, March 01, 2005 10:48 AM Subject: Re: Zip Files > Hello > > first, you need a parser for each file type: pdf, txt, word, etc. > and use a java api to iterate zip content, see: > > http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html > > use getNextEntry() method > > little example: > > ZipInputStream zis = new ZipInputStream(fileInputStream); > ZipEntry zipEntry; > while(zipEntry = zis.getNextEntry() != null){ > //use zipEntry to get name, etc. > //get properly parser for current entry > //use parser with zis (ZipInputStream) > } > > good luck > Ernesto > > Luke Shannon escribió: > > >Hello; > > > >Anyone have an ideas on how to index the contents within zip files? > > > >Thanks, > > > >Luke > > > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > -- > Ernesto De Santis - Colaborativa.net > Córdoba 1147 Piso 6 Oficinas 3 y 4 > (S2000AWO) Rosario, SF, Argentina. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]