Zip Files

2005-03-01 Thread Luke Shannon
Hello;

Anyone have an ideas on how to index the contents within zip files?

Thanks,

Luke


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Zip Files

2005-03-01 Thread Ernesto De Santis
Hello
first, you need a parser for each file type: pdf, txt, word, etc.
and use a java api to iterate zip content, see:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
use getNextEntry() method
little example:
ZipInputStream zis = new ZipInputStream(fileInputStream);
ZipEntry zipEntry;
while(zipEntry = zis.getNextEntry() != null){
   //use zipEntry to get name, etc.
   //get properly parser for current entry
   //use parser with zis (ZipInputStream)
}
good luck
Ernesto
Luke Shannon escribió:
Hello;
Anyone have an ideas on how to index the contents within zip files?
Thanks,
Luke
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 

--
Ernesto De Santis - Colaborativa.net
Córdoba 1147 Piso 6 Oficinas 3 y 4
(S2000AWO) Rosario, SF, Argentina.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Zip Files

2005-03-01 Thread Luke Shannon
Thanks Ernesto.

The issue I'm working with now (this is more lack of experience than
anything) is getting an input I can index. All my indexing classes (doc,
pdf, xml, ppt) take a File object as a parameter and return a Lucene
Document containing all the fields I need.

I'm struggling with how I can work with an  array of bytes  instead of a
Java File.

It would be easier to unzip the zip to a temp directory, parse the files and
than delete the directory. But this would greatly slow indexing and use up
disk space.

Luke

- Original Message - 
From: Ernesto De Santis [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Tuesday, March 01, 2005 10:48 AM
Subject: Re: Zip Files


 Hello

 first, you need a parser for each file type: pdf, txt, word, etc.
 and use a java api to iterate zip content, see:

 http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html

 use getNextEntry() method

 little example:

 ZipInputStream zis = new ZipInputStream(fileInputStream);
 ZipEntry zipEntry;
 while(zipEntry = zis.getNextEntry() != null){
 //use zipEntry to get name, etc.
 //get properly parser for current entry
 //use parser with zis (ZipInputStream)
 }

 good luck
 Ernesto

 Luke Shannon escribió:

 Hello;
 
 Anyone have an ideas on how to index the contents within zip files?
 
 Thanks,
 
 Luke
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 

 -- 
 Ernesto De Santis - Colaborativa.net
 Córdoba 1147 Piso 6 Oficinas 3 y 4
 (S2000AWO) Rosario, SF, Argentina.



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Zip Files

2005-03-01 Thread Chris Lamprecht
Luke,

Look at the javadocs for java.io.ByteArrayInputStream - it wraps a
byte array and makes it accessible as an InputStream.  Also see
java.util.zip.ZipFile.  You should be able to read and parse all
contents of the zip file in memory.

http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayInputStream.html


On Tue, 1 Mar 2005 12:39:17 -0500, Luke Shannon
[EMAIL PROTECTED] wrote:
 Thanks Ernesto.
 
 I'm struggling with how I can work with an  array of bytes  instead of a
 Java File.
 
 It would be easier to unzip the zip to a temp directory, parse the files and
 than delete the directory. But this would greatly slow indexing and use up
 disk space.
 
 Luke
 
 - Original Message -
 From: Ernesto De Santis [EMAIL PROTECTED]
 To: Lucene Users List lucene-user@jakarta.apache.org
 Sent: Tuesday, March 01, 2005 10:48 AM
 Subject: Re: Zip Files
 
  Hello
 
  first, you need a parser for each file type: pdf, txt, word, etc.
  and use a java api to iterate zip content, see:
 
  http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html
 
  use getNextEntry() method
 
  little example:
 
  ZipInputStream zis = new ZipInputStream(fileInputStream);
  ZipEntry zipEntry;
  while(zipEntry = zis.getNextEntry() != null){
  //use zipEntry to get name, etc.
  //get properly parser for current entry
  //use parser with zis (ZipInputStream)
  }
 
  good luck
  Ernesto
 
  Luke Shannon escribió:
 
  Hello;
  
  Anyone have an ideas on how to index the contents within zip files?
  
  Thanks,
  
  Luke
  
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
  
 
  --
  Ernesto De Santis - Colaborativa.net
  Córdoba 1147 Piso 6 Oficinas 3 y 4
  (S2000AWO) Rosario, SF, Argentina.
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]