Re: How to implement a "saveAsBinaryFile" function?

2020-01-18 Thread jelmer
I think you could also try saveAsHadoopFile with a custom output format like https://github.com/amutu/tdw/blob/master/qe/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/protobuf/mapred/ProtobufOutputFormat.java On Thu, 16 Jan 2020 at 09:34, Duan,Bing wrote: > Hi all: > > I read

Re: How to implement a "saveAsBinaryFile" function?

2020-01-17 Thread Duan,Bing
ch would be to either implement a custom datasource. From: "Duan,Bing" mailto:duanb...@baidu.com>> Date: Thursday, January 16, 2020 at 12:35 AM To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" mailto:dev@spark.apache.org>> Subject: How to impl

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Maxim Gekk
the source > val writer = FileSystem.*get*(null).create(new Path("s3://...")) > bytes.foreach(b => writer.write(b)) > writer.close() > }) > > > > The more complicated but pretty approach would be to either implement a > custom datasource. > > > > *Fro

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Long, Andrew
quot;Duan,Bing" Date: Thursday, January 16, 2020 at 12:35 AM To: "dev@spark.apache.org" Subject: How to implement a "saveAsBinaryFile" function? Hi all: I read binary data(protobuf format) from filesystem by binaryFiles function to a RDD[Array[Byte]] it wo

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Driesprong, Fokko
Hi Bing, Good question and the answer is; it depends on what your use-case is. If you really just want to write raw bytes, then you could create a .foreach where you open an OutputStream and write it to some file. But this is probably not what you want, and in practice not very handy since you

How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Duan,Bing
Hi all: I read binary data(protobuf format) from filesystem by binaryFiles function to a RDD[Array[Byte]] it works fine. But when I save the it to filesystem by saveAsTextFile, the quotation mark was be escaped like this: "\"20192_1\"",1,24,0,2,"\"S66.000x001\””,which should be