Re: How to implement a "saveAsBinaryFile" function?

2020-01-18 Thread jelmer
I think you could also try saveAsHadoopFile with a custom output format like https://github.com/amutu/tdw/blob/master/qe/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/protobuf/mapred/ProtobufOutputFormat.java On Thu, 16 Jan 2020 at 09:34, Duan,Bing wrote: > Hi all: > > I read

Re: How to implement a "saveAsBinaryFile" function?

2020-01-17 Thread Duan,Bing
Hi Fokko, Maxim, Long: Thanks! This reading has been occurred in a custom datasource as below: override def createRelation(…) { … blocks.map(block => (block.bytes)).saveAsTextFile(parameters("path”)) ... } I am a new Sparker, will try the those methods you guys provides. Best! Bing. On Jan

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Maxim Gekk
Hi Bing, You can try Text datasource. It shouldn't modify strings: scala> Seq(20192_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt") $ cat tmp/text.txt/part-0-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt "20192_1",1,24,0,2,”S66.000x001” Maxim Gekk Software Engineer

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Long, Andrew
Hey Bing, There’s a couple different approaches you could take. The quickest and easiest would be to use the existing APIs val bytes = spark.range(1000 bytes.foreachPartition(bytes =>{ //W ARNING anything used in here will need to be serializable. // There's some magic to serializing the

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Driesprong, Fokko
Hi Bing, Good question and the answer is; it depends on what your use-case is. If you really just want to write raw bytes, then you could create a .foreach where you open an OutputStream and write it to some file. But this is probably not what you want, and in practice not very handy since you