Re: [Structured Streaming] Checkpoint file compact file grows big

2020-04-19 Thread Jungtaek Lim
Deleting the latest .compact file would lose the ability for exactly-once and lead Spark fail to read from the output directory. If you're reading the output directory from non-Spark then metadata on output directory doesn't matter, but there's no exactly-once (exactly-once is achieved leveraging

Re:[Structured Streaming] Checkpoint file compact file grows big

2020-04-15 Thread Kelvin Qin
SEE:http://spark.apache.org/docs/2.3.1/streaming-programming-guide.html#checkpointing "Note that checkpointing of RDDs incurs the cost of saving to reliable storage. This may cause an increase in the processing time of those batches where RDDs get checkpointed." As far as I know, the