yunzhi.lyz created SPARK-6316: --------------------------------- Summary: add a parameter for SparkContext(conf).textFile() method , support for multi-language hdfs file , e.g. "gbk" Key: SPARK-6316 URL: https://issues.apache.org/jira/browse/SPARK-6316 Project: Spark Issue Type: New Feature Environment: linux LANG=en_US.UTF-8 Reporter: yunzhi.lyz
add a parameter for SparkContext(conf).textFile() method , support for multi-language hdfs file . e.g. val file = new SparkContext(conf).textFile(args(0), 10,"gbk") modify the codeļ¼ org.apache.spark.SparkContext + def defaultEncoding: String = "utf-8" - def textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String] = { hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], minPartitions).map(pair => pair._2.toString).setName(path) } + def textFile(path: String, minPartitions: Int = defaultMinPartitions,encoding: String = defaultEncoding): RDD[String] = { hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], minPartitions).map(pair => new String(pair._2.getBytes(), 0 , pair._2.getLength(), encoding)).setName(path) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org