Re: Sorting on a streaming dataframe

2018-04-26 Thread Michael Armbrust
The basic tenet of structured streaming is that a query should return the same answer in streaming or batch mode. We support sorting in complete mode because we have all the data and can sort it correctly and return the full answer. In update or append mode, sorting would only return a correct

Re: saveAsNewAPIHadoopDataset must not enable speculation for parquet file?

2018-04-26 Thread Steve Loughran
sorry, not noticed this followup. Been busy with other issues On 3 Apr 2018, at 11:19, cane > wrote: Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause data loss. I check the comment of thi api: We should