Re: Reading sequencefile

2014-03-11 Thread Jaonary Rabarisoa
Thank you. I fogort the classOf[*] arguments.


On Tue, Mar 11, 2014 at 10:46 AM, Shixiong Zhu  wrote:

> Hi Jaonary,
>
> You can use "sc.sequenceFile" to load your file. E.g.,
>
> scala> import org.apache.hadoop.io._
> import org.apache.hadoop.io._
>
> scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text],
> classOf[BytesWritable])
> rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text,
> org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at
> :15
>
>
> Best Regards,
> Shixiong Zhu
>
>
> 2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa :
>
> Hi all,
>>
>> I'm trying to read a sequenceFile that represent a set of jpeg image
>> generated using this tool :
>> http://stuartsierra.com/2008/04/24/a-million-little-files . According to
>> the documentation : "Each key is the name of a file (a Hadoop “Text”),
>> the value is the binary contents of the file (a BytesWritable)"
>>
>> How do I load the generated file inside spark ?
>>
>> Cheers,
>>
>> Jaonary
>>
>
>


Re: Reading sequencefile

2014-03-11 Thread Shixiong Zhu
Hi Jaonary,

You can use "sc.sequenceFile" to load your file. E.g.,

scala> import org.apache.hadoop.io._
import org.apache.hadoop.io._

scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text],
classOf[BytesWritable])
rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text,
org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at
:15


Best Regards,
Shixiong Zhu


2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa :

> Hi all,
>
> I'm trying to read a sequenceFile that represent a set of jpeg image
> generated using this tool :
> http://stuartsierra.com/2008/04/24/a-million-little-files . According to
> the documentation : "Each key is the name of a file (a Hadoop "Text"),
> the value is the binary contents of the file (a BytesWritable)"
>
> How do I load the generated file inside spark ?
>
> Cheers,
>
> Jaonary
>


Reading sequencefile

2014-03-11 Thread Jaonary Rabarisoa
Hi all,

I'm trying to read a sequenceFile that represent a set of jpeg image
generated using this tool :
http://stuartsierra.com/2008/04/24/a-million-little-files . According to
the documentation : "Each key is the name of a file (a Hadoop “Text”), the
value is the binary contents of the file (a BytesWritable)"

How do I load the generated file inside spark ?

Cheers,

Jaonary