RE: HDFS loosing blocks or connection error

2009-01-23 Thread Zak, Richard [USA]
? Thx, J-D On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA] zak_rich...@bah.comwrote: 4 slaves, 1 master, all are the m1.xlarge instance type. Richard J. Zak -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Friday

Hadoop with many input/output files?

2009-01-22 Thread Zak, Richard [USA]
I am seeing the MultiFileInputFormat and the MultipleOutputFormat Input/Output formats for the Job configuration. How can I properly use them? I had previously used the default Input and Output Format types, which for my PDF concatenation project, merely reduced Hadoop to a scheduler. The idea

RE: Suitable for Hadoop?

2009-01-21 Thread Zak, Richard [USA]
You can do that. I did a Map/Reduce job for about 6 GB of PDFs to concatenate them, and the New York times used Hadoop to process a few TB of PDFs. What I would do is this: - Use the iText library, a Java library for PDF manipulation (don't know what you would use for reading Word docs) - Don't

RE: Suitable for Hadoop?

2009-01-21 Thread Zak, Richard [USA]
can sit on the same block. Rgds, Ricky -Original Message- From: Zak, Richard [USA] [mailto:zak_rich...@bah.com] Sent: Wednesday, January 21, 2009 6:42 AM To: core-user@hadoop.apache.org Subject: RE: Suitable for Hadoop? You can do that. I did a Map/Reduce job for about 6 GB

RE: Concatenating PDF files

2009-01-06 Thread Zak, Richard [USA]
/instance-types/). And you should configure them to take advantage of multiple disks - https://issues.apache.org/jira/browse/HADOOP-4745. Tom On Fri, Jan 2, 2009 at 8:50 PM, Zak, Richard [USA] zak_rich...@bah.com wrote: All, I have a project that I am working on involving PDF files in HDFS. There are X

Concatenating PDF files

2009-01-02 Thread Zak, Richard [USA]
All, I have a project that I am working on involving PDF files in HDFS. There are X number of directories and each directory contains Y number of PDFs, and per directory all the PDFs are to be concatenated. At the moment I am running a test with 5 directories and 15 PDFs in each directory. I am