Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-06 Thread Jungtaek Lim
I'd recommend getting in touch with Delta Lake community (Google Groups) https://groups.google.com/forum/#!forum/delta-users to get more feedback from experts about Delta Lake specific issues. On Mon, Sep 6, 2021 at 1:56 AM wrote: > Hi Jungtaek, > thanks for your reply. I was afraid that the

Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-05 Thread eugen . wintersberger
Hi Jungtaek,   thanks for your reply. I was afraid that the problem is not only on my side but rather of conceptual nature. I guess I have to rethink my approach. However, because you mentioned DeltaLake. I have the same problem, but the other way around, with DeltaLake. I cannot write with a

Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread Jungtaek Lim
Hi, The file stream sink maintains the metadata in the output directory. The metadata retains the list of files written by the streaming query, and Spark reads the metadata on listing the files to read. This is to guarantee end-to-end exactly once on writing files in the streaming query. There

Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread eugen . wintersberger
Hi all,   I recently stumbled about a rather strange  problem with streaming sources in one of my tests. I am writing a Parquet file from a streaming source and subsequently try to append the same data but this time from a static dataframe. Surprisingly, the number of rows in the Parquet file