subject:"Suitable for Hadoop\?"

Suitable for Hadoop?

2009-01-21 Thread Darren Govoni

Hi, I have a task to process large quantities of files by converting them into other formats. Each file is processed as a whole and converted to a target format. Since there are 100's of GB of data I thought it suitable for Hadoop, but the problem is, I don't think the files can be broken apart

RE: Suitable for Hadoop?

2009-01-21 Thread Zak, Richard [USA]

: Wednesday, January 21, 2009 08:08 To: core-user@hadoop.apache.org Subject: Suitable for Hadoop? Hi, I have a task to process large quantities of files by converting them into other formats. Each file is processed as a whole and converted to a target format. Since there are 100's of GB of data I thought

RE: Suitable for Hadoop?

2009-01-21 Thread Ricky Ho

- From: Zak, Richard [USA] [mailto:zak_rich...@bah.com] Sent: Wednesday, January 21, 2009 6:42 AM To: core-user@hadoop.apache.org Subject: RE: Suitable for Hadoop? You can do that. I did a Map/Reduce job for about 6 GB of PDFs to concatenate them, and the New York times used Hadoop to process

RE: Suitable for Hadoop?

2009-01-21 Thread Darren Govoni

in a beneficial manner, and the distributed part is very helpful! Richard J. Zak -Original Message- From: Darren Govoni [mailto:dar...@ontrenet.com] Sent: Wednesday, January 21, 2009 08:08 To: core-user@hadoop.apache.org Subject: Suitable for Hadoop? Hi, I have a task to process

Re: Suitable for Hadoop?

2009-01-21 Thread Jim Twensky

@hadoop.apache.org Subject: RE: Suitable for Hadoop? You can do that. I did a Map/Reduce job for about 6 GB of PDFs to concatenate them, and the New York times used Hadoop to process a few TB of PDFs. What I would do is this: - Use the iText library, a Java library for PDF manipulation (don't

RE: Suitable for Hadoop?

2009-01-21 Thread Ricky Ho

Twensky [mailto:jim.twen...@gmail.com] Sent: Wednesday, January 21, 2009 11:47 AM To: core-user@hadoop.apache.org Subject: Re: Suitable for Hadoop? Ricky, Hadoop was formerly optimized for large files, usually files of size larger than one input split. However, there is an input format called

RE: Suitable for Hadoop?

2009-01-21 Thread Zak, Richard [USA]

@hadoop.apache.org Subject: RE: Suitable for Hadoop? Jim, thanks for your explanation. But isn't isSplittable an option in writing output rather than reading input ? There are two phases. 1) Upload the data from local file to HDFS. Is there an option in the hadoop fs copy to pack multiple small files

Suitable for Hadoop?

RE: Suitable for Hadoop?

RE: Suitable for Hadoop?

RE: Suitable for Hadoop?

Re: Suitable for Hadoop?

RE: Suitable for Hadoop?

RE: Suitable for Hadoop?

7 matches

Site Navigation

Mail list logo

Footer information