Hi
        Recently i want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748  
        I runs the code well and generate a index and data file. I can use 
command
"hadoop fs -text /spark/out2/mapFile/data" to open the file .But when I run
command :hadoop fs -text /spark/out2/mapFile/index ,I can't see the index
content .there are only some informations in console :
        14/11/10 16:11:04 INFO zlib.ZlibFactory: Successfully loaded & 
initialized
native-zlib library
        14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
        14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
        14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
        14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]

and commond :hadoop fs -ls /spark/out2/mapFile/  shows follows
-rw-r--r--   3 spark hdfs      24002 2014-11-10 15:19
/spark/out2/mapFile/data
-rw-r--r--   3 spark hdfs        136 2014-11-10 15:19
/spark/out2/mapFile/index

        I think "INFO compress.CodecPool: Got brand-new decompressor [.deflate]"
should't prohibit show the data in index. It'e really confused me. My code
was as follows:
        def try_Map_File(writePath:String) = { 
            val uri = writePath+"/mapFile"
            val data=Array(
              "One, two, buckle my shoe","Three, four, shut the door","Five, 
six,
pick up sticks",    
              "Seven, eight, lay them straight","Nine, ten, a big fat hen")
                    
            val con = new SparkConf()
           
con.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec")
            val sc= new SparkContext(con)

            val conf = sc.hadoopConfiguration
            val fs = FileSystem.get(URI.create(uri),conf)
            val key = new IntWritable()
            val value = new Text()
            var writer:MapFile.Writer = null
            try{
              val writer = new Writer(conf,fs,uri,key.getClass,value.getClass)
              writer.setIndexInterval(64)
              for(i<- Range(0,512)){
                key.set(i+1)
                value.set(data(i%data.length))
                writer.append(key,value)
              }
            }finally {
              IOUtils.closeStream(writer)
            }
        }
        can anyone give me some idea or other method to instead mapFile?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/index-File-create-by-mapFile-can-t-read-tp18471.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to