?
Thx,
J-D
On Fri, Jan 23, 2009 at 12:55 PM, Zak, Richard [USA]
zak_rich...@bah.comwrote:
4 slaves, 1 master, all are the m1.xlarge instance type.
Richard J. Zak
-Original Message-
From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
Jean-Daniel Cryans
Sent: Friday
I am seeing the MultiFileInputFormat and the MultipleOutputFormat
Input/Output formats for the Job configuration. How can I properly use
them? I had previously used the default Input and Output Format types,
which for my PDF concatenation project, merely reduced Hadoop to a
scheduler.
The idea
You can do that. I did a Map/Reduce job for about 6 GB of PDFs to
concatenate them, and the New York times used Hadoop to process a few TB of
PDFs.
What I would do is this:
- Use the iText library, a Java library for PDF manipulation (don't know
what you would use for reading Word docs)
- Don't
can sit on the same block.
Rgds,
Ricky
-Original Message-
From: Zak, Richard [USA] [mailto:zak_rich...@bah.com]
Sent: Wednesday, January 21, 2009 6:42 AM
To: core-user@hadoop.apache.org
Subject: RE: Suitable for Hadoop?
You can do that. I did a Map/Reduce job for about 6 GB
/instance-types/). And you should configure
them to take advantage of multiple disks -
https://issues.apache.org/jira/browse/HADOOP-4745.
Tom
On Fri, Jan 2, 2009 at 8:50 PM, Zak, Richard [USA] zak_rich...@bah.com
wrote:
All, I have a project that I am working on involving PDF files in HDFS.
There are X
All, I have a project that I am working on involving PDF files in HDFS.
There are X number of directories and each directory contains Y number
of PDFs, and per directory all the PDFs are to be concatenated. At the
moment I am running a test with 5 directories and 15 PDFs in each
directory. I am