Hi Fokko, Maxim, Long:

Thanks!

This reading has been occurred in a custom datasource as below:

override def createRelation(…) {
…
blocks.map(block => (block.bytes)).saveAsTextFile(parameters("path”))
...
}

I am a new Sparker,  will try the those methods you guys provides.

Best!

Bing.

On Jan 17, 2020, at 4:28 AM, Maxim Gekk 
<maxim.g...@databricks.com<mailto:maxim.g...@databricks.com>> wrote:

Hi Bing,

You can try Text datasource. It shouldn't modify strings:
scala> 
Seq(""""201900002_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt")
$ cat tmp/text.txt/part-00000-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt
"201900002_1",1,24,0,2,”S66.000x001”

Maxim Gekk
Software Engineer
Databricks B. V. 
[http://go.databricks.com/hubfs/emails/Databricks-logo-bug.png] 
<http://databricks.com/>


On Thu, Jan 16, 2020 at 10:02 PM Long, Andrew 
<loand...@amazon.com.invalid<mailto:loand...@amazon.com.invalid>> wrote:
Hey Bing,

There’s a couple different approaches you could take.  The quickest and easiest 
would be to use the existing APIs

val bytes = spark.range(1000

bytes.foreachPartition(bytes =>{
  //W ARNING anything used in here will need to be serializable.
  // There's some magic to serializing the hadoop conf. see the hadoop wrapper 
class in the source
  val writer = FileSystem.get(null).create(new Path("s3://..."))
  bytes.foreach(b => writer.write(b))
  writer.close()
})

The more complicated but pretty approach would be to either implement a custom 
datasource.

From: "Duan,Bing" <duanb...@baidu.com<mailto:duanb...@baidu.com>>
Date: Thursday, January 16, 2020 at 12:35 AM
To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: How to implement a "saveAsBinaryFile" function?

Hi all:

I read binary data(protobuf format) from filesystem by binaryFiles function to 
a RDD[Array[Byte]]   it works fine. But when I save the it to filesystem by 
saveAsTextFile, the quotation mark was be escaped like this:
"\"201900002_1\"",1,24,0,2,"\"S66.000x001\””,    which  should be 
"201900002_1",1,24,0,2,”S66.000x001”.

Anyone could give me some tip to implement a function like saveAsBinaryFile to 
persist the RDD[Array[Byte]]?

Bests!

Bing

Reply via email to