Not sure what you are using as your indexing classes but if you changed them to use InputStream I think it would go a long way towards making them more flexible and solving your problem.
> -----Original Message----- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 01, 2005 12:39 PM > To: Lucene Users List > Subject: Re: Zip Files > > Thanks Ernesto. > > The issue I'm working with now (this is more lack of experience than > anything) is getting an input I can index. All my indexing classes (doc, > pdf, xml, ppt) take a File object as a parameter and return a Lucene > Document containing all the fields I need. > > I'm struggling with how I can work with an array of bytes instead of a > Java File. > > It would be easier to unzip the zip to a temp directory, parse the files > and > than delete the directory. But this would greatly slow indexing and use up > disk space. > > Luke > > ----- Original Message ----- > From: "Ernesto De Santis" <[EMAIL PROTECTED]> > To: "Lucene Users List" <lucene-user@jakarta.apache.org> > Sent: Tuesday, March 01, 2005 10:48 AM > Subject: Re: Zip Files > > > > Hello > > > > first, you need a parser for each file type: pdf, txt, word, etc. > > and use a java api to iterate zip content, see: > > > > > http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html > > > > use getNextEntry() method > > > > little example: > > > > ZipInputStream zis = new ZipInputStream(fileInputStream); > > ZipEntry zipEntry; > > while(zipEntry = zis.getNextEntry() != null){ > > //use zipEntry to get name, etc. > > //get properly parser for current entry > > //use parser with zis (ZipInputStream) > > } > > > > good luck > > Ernesto > > > > Luke Shannon escribió: > > > > >Hello; > > > > > >Anyone have an ideas on how to index the contents within zip files? > > > > > >Thanks, > > > > > >Luke > > > > > > > > >--------------------------------------------------------------------- > > >To unsubscribe, e-mail: [EMAIL PROTECTED] > > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > -- > > Ernesto De Santis - Colaborativa.net > > Córdoba 1147 Piso 6 Oficinas 3 y 4 > > (S2000AWO) Rosario, SF, Argentina. > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]