Thanks Ernesto.

The issue I'm working with now (this is more lack of experience than
anything) is getting an input I can index. All my indexing classes (doc,
pdf, xml, ppt) take a File object as a parameter and return a Lucene
Document containing all the fields I need.

I'm struggling with how I can work with an  array of bytes  instead of a
Java File.

It would be easier to unzip the zip to a temp directory, parse the files and
than delete the directory. But this would greatly slow indexing and use up
disk space.

Luke

----- Original Message ----- 
From: "Ernesto De Santis" <[EMAIL PROTECTED]>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, March 01, 2005 10:48 AM
Subject: Re: Zip Files


> Hello
>
> first, you need a parser for each file type: pdf, txt, word, etc.
> and use a java api to iterate zip content, see:
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
>
> use getNextEntry() method
>
> little example:
>
> ZipInputStream zis = new ZipInputStream(fileInputStream);
> ZipEntry zipEntry;
> while(zipEntry = zis.getNextEntry() != null){
>     //use zipEntry to get name, etc.
>     //get properly parser for current entry
>     //use parser with zis (ZipInputStream)
> }
>
> good luck
> Ernesto
>
> Luke Shannon escribió:
>
> >Hello;
> >
> >Anyone have an ideas on how to index the contents within zip files?
> >
> >Thanks,
> >
> >Luke
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
> >
>
> -- 
> Ernesto De Santis - Colaborativa.net
> Córdoba 1147 Piso 6 Oficinas 3 y 4
> (S2000AWO) Rosario, SF, Argentina.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to