Zip Files
Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Zip Files
Hello first, you need a parser for each file type: pdf, txt, word, etc. and use a java api to iterate zip content, see: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html use getNextEntry() method little example: ZipInputStream zis = new ZipInputStream(fileInputStream); ZipEntry zipEntry; while(zipEntry = zis.getNextEntry() != null){ //use zipEntry to get name, etc. //get properly parser for current entry //use parser with zis (ZipInputStream) } good luck Ernesto Luke Shannon escribió: Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Ernesto De Santis - Colaborativa.net Córdoba 1147 Piso 6 Oficinas 3 y 4 (S2000AWO) Rosario, SF, Argentina. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Zip Files
Thanks Ernesto. The issue I'm working with now (this is more lack of experience than anything) is getting an input I can index. All my indexing classes (doc, pdf, xml, ppt) take a File object as a parameter and return a Lucene Document containing all the fields I need. I'm struggling with how I can work with an array of bytes instead of a Java File. It would be easier to unzip the zip to a temp directory, parse the files and than delete the directory. But this would greatly slow indexing and use up disk space. Luke - Original Message - From: Ernesto De Santis [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Tuesday, March 01, 2005 10:48 AM Subject: Re: Zip Files Hello first, you need a parser for each file type: pdf, txt, word, etc. and use a java api to iterate zip content, see: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html use getNextEntry() method little example: ZipInputStream zis = new ZipInputStream(fileInputStream); ZipEntry zipEntry; while(zipEntry = zis.getNextEntry() != null){ //use zipEntry to get name, etc. //get properly parser for current entry //use parser with zis (ZipInputStream) } good luck Ernesto Luke Shannon escribió: Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Ernesto De Santis - Colaborativa.net Córdoba 1147 Piso 6 Oficinas 3 y 4 (S2000AWO) Rosario, SF, Argentina. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Zip Files
Luke, Look at the javadocs for java.io.ByteArrayInputStream - it wraps a byte array and makes it accessible as an InputStream. Also see java.util.zip.ZipFile. You should be able to read and parse all contents of the zip file in memory. http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayInputStream.html On Tue, 1 Mar 2005 12:39:17 -0500, Luke Shannon [EMAIL PROTECTED] wrote: Thanks Ernesto. I'm struggling with how I can work with an array of bytes instead of a Java File. It would be easier to unzip the zip to a temp directory, parse the files and than delete the directory. But this would greatly slow indexing and use up disk space. Luke - Original Message - From: Ernesto De Santis [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Tuesday, March 01, 2005 10:48 AM Subject: Re: Zip Files Hello first, you need a parser for each file type: pdf, txt, word, etc. and use a java api to iterate zip content, see: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html use getNextEntry() method little example: ZipInputStream zis = new ZipInputStream(fileInputStream); ZipEntry zipEntry; while(zipEntry = zis.getNextEntry() != null){ //use zipEntry to get name, etc. //get properly parser for current entry //use parser with zis (ZipInputStream) } good luck Ernesto Luke Shannon escribió: Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Ernesto De Santis - Colaborativa.net Córdoba 1147 Piso 6 Oficinas 3 y 4 (S2000AWO) Rosario, SF, Argentina. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]