I think you could also try saveAsHadoopFile with a custom output format
like
https://github.com/amutu/tdw/blob/master/qe/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/protobuf/mapred/ProtobufOutputFormat.java
On Thu, 16 Jan 2020 at 09:34, Duan,Bing wrote:
> Hi all:
>
> I read
Hi Fokko, Maxim, Long:
Thanks!
This reading has been occurred in a custom datasource as below:
override def createRelation(…) {
…
blocks.map(block => (block.bytes)).saveAsTextFile(parameters("path”))
...
}
I am a new Sparker, will try the those methods you guys provides.
Best!
Bing.
On Jan
Hi Bing,
You can try Text datasource. It shouldn't modify strings:
scala>
Seq(20192_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt")
$ cat tmp/text.txt/part-0-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt
"20192_1",1,24,0,2,”S66.000x001”
Maxim Gekk
Software Engineer
Hey Bing,
There’s a couple different approaches you could take. The quickest and easiest
would be to use the existing APIs
val bytes = spark.range(1000
bytes.foreachPartition(bytes =>{
//W ARNING anything used in here will need to be serializable.
// There's some magic to serializing the
Hi Bing,
Good question and the answer is; it depends on what your use-case is.
If you really just want to write raw bytes, then you could create a
.foreach where you open an OutputStream and write it to some file. But this
is probably not what you want, and in practice not very handy since you