yunzhi.lyz created SPARK-6316:
---------------------------------

             Summary: add a parameter for  SparkContext(conf).textFile() method 
, support for multi-language  hdfs file ,   e.g. "gbk"
                 Key: SPARK-6316
                 URL: https://issues.apache.org/jira/browse/SPARK-6316
             Project: Spark
          Issue Type: New Feature
         Environment: linux   
LANG=en_US.UTF-8
            Reporter: yunzhi.lyz


        add a parameter for  SparkContext(conf).textFile() method , support for 
multi-language  hdfs file .
  
       e.g.     val file = new SparkContext(conf).textFile(args(0), 10,"gbk")

modify the codeļ¼š
       
      org.apache.spark.SparkContext

     +  def defaultEncoding: String = "utf-8"
    
     -   def textFile(path: String, minPartitions: Int = defaultMinPartitions): 
RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => pair._2.toString).setName(path)
  }


   +    def textFile(path: String, minPartitions: Int = 
defaultMinPartitions,encoding: String = defaultEncoding): RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => new String(pair._2.getBytes(), 0 , 
pair._2.getLength(), encoding)).setName(path)
  }


   

       

        



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to