Hi Bing, You can try Text datasource. It shouldn't modify strings: scala> Seq(""""201900002_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt") $ cat tmp/text.txt/part-00000-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt "201900002_1",1,24,0,2,”S66.000x001”
Maxim Gekk Software Engineer Databricks B. V. <http://databricks.com/> On Thu, Jan 16, 2020 at 10:02 PM Long, Andrew <loand...@amazon.com.invalid> wrote: > Hey Bing, > > > > There’s a couple different approaches you could take. The quickest and > easiest would be to use the existing APIs > > > > val bytes = *spark*.range(1000 > > bytes.foreachPartition(bytes =>{ > //W ARNING anything used in here will need to be serializable. > // There's some magic to serializing the hadoop conf. see the hadoop > wrapper class in the source > val writer = FileSystem.*get*(null).create(new Path("s3://...")) > bytes.foreach(b => writer.write(b)) > writer.close() > }) > > > > The more complicated but pretty approach would be to either implement a > custom datasource. > > > > *From: *"Duan,Bing" <duanb...@baidu.com> > *Date: *Thursday, January 16, 2020 at 12:35 AM > *To: *"dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *How to implement a "saveAsBinaryFile" function? > > > > Hi all: > > > > I read binary data(protobuf format) from filesystem by binaryFiles > function to a RDD[Array[Byte]] it works fine. But when I save the it to > filesystem by saveAsTextFile, the quotation mark was be escaped like this: > > "\"201900002_1\"",1,24,0,2,"\"S66.000x001\””, which should > be "201900002_1",1,24,0,2,”S66.000x001”. > > > > Anyone could give me some tip to implement a function > like saveAsBinaryFile to persist the RDD[Array[Byte]]? > > > > Bests! > > > > Bing >