Re: wholeTextFiles like for binary files ?

2014-06-26 Thread Akhil Das
You cannot read image files with wholeTextFiles because it uses
CombineFileInputFormat which cannot read gripped files because they are not
splittable http://www.bigdataspeak.com/2013_01_01_archive.html (source
proving it):

  override def createRecordReader(
  split: InputSplit,
  context: TaskAttemptContext): RecordReader[String, String] = {

new CombineFileRecordReader[String, String](
  split.asInstanceOf[CombineFileSplit],
  context,
  classOf[WholeTextFileRecordReader])
  }

You may be able to use newAPIHadoopFile with wholefileinputformat
https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java
(not
built into hadoop but all over the internet) to get this to work correctly.
I don't think WholeFileInputFormat will work since it just gets the bytes
of the file, meaning you may have to write your own class possibly
extending WholeFileInputFormat.

Thanks
Best Regards


On Thu, Jun 26, 2014 at 3:31 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:

 Is there an equivalent of wholeTextFiles for binary files for example a
 set of images ?

 Cheers,

 Jaonary



wholeTextFiles like for binary files ?

2014-06-25 Thread Jaonary Rabarisoa
Is there an equivalent of wholeTextFiles for binary files for example a set
of images ?

Cheers,

Jaonary