Re: How to implement a "saveAsBinaryFile" function?

Maxim Gekk Thu, 16 Jan 2020 12:29:31 -0800

Hi Bing,

You can try Text datasource. It shouldn't modify strings:
scala>
Seq(""""201900002_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt")
$ cat tmp/text.txt/part-00000-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt
"201900002_1",1,24,0,2,”S66.000x001”


Maxim Gekk

Software Engineer

Databricks B. V.  <http://databricks.com/>


On Thu, Jan 16, 2020 at 10:02 PM Long, Andrew <loand...@amazon.com.invalid>
wrote:

> Hey Bing,
>
>
>
> There’s a couple different approaches you could take.  The quickest and
> easiest would be to use the existing APIs
>
>
>
> val bytes = *spark*.range(1000
>
> bytes.foreachPartition(bytes =>{
>   //W ARNING anything used in here will need to be serializable.
>   // There's some magic to serializing the hadoop conf. see the hadoop
> wrapper class in the source
>   val writer = FileSystem.*get*(null).create(new Path("s3://..."))
>   bytes.foreach(b => writer.write(b))
>   writer.close()
> })
>
>
>
> The more complicated but pretty approach would be to either implement a
> custom datasource.
>
>
>
> *From: *"Duan,Bing" <duanb...@baidu.com>
> *Date: *Thursday, January 16, 2020 at 12:35 AM
> *To: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *How to implement a "saveAsBinaryFile" function?
>
>
>
> Hi all:
>
>
>
> I read binary data(protobuf format) from filesystem by binaryFiles
> function to a RDD[Array[Byte]]   it works fine. But when I save the it to
> filesystem by saveAsTextFile, the quotation mark was be escaped like this:
>
> "\"201900002_1\"",1,24,0,2,"\"S66.000x001\””,    which  should
> be "201900002_1",1,24,0,2,”S66.000x001”.
>
>
>
> Anyone could give me some tip to implement a function
> like saveAsBinaryFile to persist the RDD[Array[Byte]]?
>
>
>
> Bests!
>
>
>
> Bing
>

Re: How to implement a "saveAsBinaryFile" function?

Reply via email to