Re: Reading sequencefile
Thank you. I fogort the classOf[*] arguments. On Tue, Mar 11, 2014 at 10:46 AM, Shixiong Zhu wrote: > Hi Jaonary, > > You can use "sc.sequenceFile" to load your file. E.g., > > scala> import org.apache.hadoop.io._ > import org.apache.hadoop.io._ > > scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text], > classOf[BytesWritable]) > rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text, > org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at > :15 > > > Best Regards, > Shixiong Zhu > > > 2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa : > > Hi all, >> >> I'm trying to read a sequenceFile that represent a set of jpeg image >> generated using this tool : >> http://stuartsierra.com/2008/04/24/a-million-little-files . According to >> the documentation : "Each key is the name of a file (a Hadoop “Text”), >> the value is the binary contents of the file (a BytesWritable)" >> >> How do I load the generated file inside spark ? >> >> Cheers, >> >> Jaonary >> > >
Re: Reading sequencefile
Hi Jaonary, You can use "sc.sequenceFile" to load your file. E.g., scala> import org.apache.hadoop.io._ import org.apache.hadoop.io._ scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text], classOf[BytesWritable]) rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text, org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at :15 Best Regards, Shixiong Zhu 2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa : > Hi all, > > I'm trying to read a sequenceFile that represent a set of jpeg image > generated using this tool : > http://stuartsierra.com/2008/04/24/a-million-little-files . According to > the documentation : "Each key is the name of a file (a Hadoop "Text"), > the value is the binary contents of the file (a BytesWritable)" > > How do I load the generated file inside spark ? > > Cheers, > > Jaonary >
Reading sequencefile
Hi all, I'm trying to read a sequenceFile that represent a set of jpeg image generated using this tool : http://stuartsierra.com/2008/04/24/a-million-little-files . According to the documentation : "Each key is the name of a file (a Hadoop “Text”), the value is the binary contents of the file (a BytesWritable)" How do I load the generated file inside spark ? Cheers, Jaonary