Hi Fokko, Maxim, Long: Thanks!
This reading has been occurred in a custom datasource as below: override def createRelation(…) { … blocks.map(block => (block.bytes)).saveAsTextFile(parameters("path”)) ... } I am a new Sparker, will try the those methods you guys provides. Best! Bing. On Jan 17, 2020, at 4:28 AM, Maxim Gekk <maxim.g...@databricks.com<mailto:maxim.g...@databricks.com>> wrote: Hi Bing, You can try Text datasource. It shouldn't modify strings: scala> Seq(""""201900002_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt") $ cat tmp/text.txt/part-00000-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt "201900002_1",1,24,0,2,”S66.000x001” Maxim Gekk Software Engineer Databricks B. V. [http://go.databricks.com/hubfs/emails/Databricks-logo-bug.png] <http://databricks.com/> On Thu, Jan 16, 2020 at 10:02 PM Long, Andrew <loand...@amazon.com.invalid<mailto:loand...@amazon.com.invalid>> wrote: Hey Bing, There’s a couple different approaches you could take. The quickest and easiest would be to use the existing APIs val bytes = spark.range(1000 bytes.foreachPartition(bytes =>{ //W ARNING anything used in here will need to be serializable. // There's some magic to serializing the hadoop conf. see the hadoop wrapper class in the source val writer = FileSystem.get(null).create(new Path("s3://...")) bytes.foreach(b => writer.write(b)) writer.close() }) The more complicated but pretty approach would be to either implement a custom datasource. From: "Duan,Bing" <duanb...@baidu.com<mailto:duanb...@baidu.com>> Date: Thursday, January 16, 2020 at 12:35 AM To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" <dev@spark.apache.org<mailto:dev@spark.apache.org>> Subject: How to implement a "saveAsBinaryFile" function? Hi all: I read binary data(protobuf format) from filesystem by binaryFiles function to a RDD[Array[Byte]] it works fine. But when I save the it to filesystem by saveAsTextFile, the quotation mark was be escaped like this: "\"201900002_1\"",1,24,0,2,"\"S66.000x001\””, which should be "201900002_1",1,24,0,2,”S66.000x001”. Anyone could give me some tip to implement a function like saveAsBinaryFile to persist the RDD[Array[Byte]]? Bests! Bing