Re: OutOfDirectMemoryError for Spark 2.2

2018-03-12 Thread Dave Cameron
t;>> >>>> 917: 8 448 io.netty.buffer.UnpooledHeapByteBuf >>>> >>>> 1018: 20 320 io.netty.buffer.PoolThreadCache$1 >>>> >>>> 1305: 4 128 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry >>>> >>>> 1404

[Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-02-09 Thread Dave Cameron
Hi I have a Spark structured streaming job that reads from Kafka and writes parquet files to Hive/HDFS. The files are not very large, but the Kafka source is noisy so each spark job takes a long time to complete. There is a significant window during which the parquet files are incomplete and