[ 
https://issues.apache.org/jira/browse/SPARK-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunzhi.lyz updated SPARK-6316:
------------------------------
    Description: 
        add a parameter for  SparkContext(conf).textFile() method , support for 
multi-language  hdfs file .
  
       e.g.     val file = new SparkContext(conf).textFile(args(0), 10,"gbk")

modify the code:
       
      org.apache.spark.SparkContext

     +  def defaultEncoding: String = "utf-8"
    
     --   def textFile(path: String, minPartitions: Int = 
defaultMinPartitions): RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => pair._2.toString).setName(path)
  }


   ++    def textFile(path: String, minPartitions: Int = 
defaultMinPartitions,encoding: String = defaultEncoding): RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => new String(pair._2.getBytes(), 0 , 
pair._2.getLength(), encoding)).setName(path)
  }


   

       

        

  was:
        add a parameter for  SparkContext(conf).textFile() method , support for 
multi-language  hdfs file .
  
       e.g.     val file = new SparkContext(conf).textFile(args(0), 10,"gbk")

modify the code:
       
      org.apache.spark.SparkContext

     +  def defaultEncoding: String = "utf-8"
    
     -   def textFile(path: String, minPartitions: Int = defaultMinPartitions): 
RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => pair._2.toString).setName(path)
  }


   +    def textFile(path: String, minPartitions: Int = 
defaultMinPartitions,encoding: String = defaultEncoding): RDD[String] = {
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
classOf[Text],
      minPartitions).map(pair => new String(pair._2.getBytes(), 0 , 
pair._2.getLength(), encoding)).setName(path)
  }


   

       

        


> add a parameter for  SparkContext(conf).textFile() method , support for 
> multi-language  hdfs file ,   e.g. "gbk"
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6316
>                 URL: https://issues.apache.org/jira/browse/SPARK-6316
>             Project: Spark
>          Issue Type: New Feature
>         Environment: linux   
> LANG=en_US.UTF-8
>            Reporter: yunzhi.lyz
>
>         add a parameter for  SparkContext(conf).textFile() method , support 
> for multi-language  hdfs file .
>   
>        e.g.     val file = new SparkContext(conf).textFile(args(0), 10,"gbk")
> modify the code:
>        
>       org.apache.spark.SparkContext
>      +  def defaultEncoding: String = "utf-8"
>     
>      --   def textFile(path: String, minPartitions: Int = 
> defaultMinPartitions): RDD[String] = {
>     hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
> classOf[Text],
>       minPartitions).map(pair => pair._2.toString).setName(path)
>   }
>    ++    def textFile(path: String, minPartitions: Int = 
> defaultMinPartitions,encoding: String = defaultEncoding): RDD[String] = {
>     hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], 
> classOf[Text],
>       minPartitions).map(pair => new String(pair._2.getBytes(), 0 , 
> pair._2.getLength(), encoding)).setName(path)
>   }
>    
>        
>         



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to