Re: dropDuplicates and watermark in structured streaming

2020-02-27 Thread Tathagata Das
1. Yes. All times in event time, not processing time. So you may get 10AM event time data at 11AM processing time, but it will still be compared again all data within 9-10AM event times. 2. Show us your code. On Thu, Feb 27, 2020 at 2:30 AM lec ssmi wrote: > Hi: > I'm new to structured

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Matei Zaharia
+1 on this new rubric. It definitely captures the issues I’ve seen in Spark and in other projects. If we write down this rubric (or something like it), it will also be easier to refer to it during code reviews or in proposals of new APIs (we could ask “do you expect to have to change this API

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Michael Armbrust
Thanks for the discussion! A few responses: The decision needs to happen at api/config change time, otherwise the > deprecated warning has no purpose if we are never going to remove them. > Even if we never remove an API, I think deprecation warnings (when done right) can still serve a purpose.

Re: Clarification on the commit protocol

2020-02-27 Thread Michael Armbrust
No, it is not. Although the commit protocol has mostly been superseded by Delta Lake , which is available as a separate open source project that works natively with Apache Spark. In contrast to the commit protocol, Delta can guarantee full ACID (rather than just partition level

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Tom Graves
In general +1 I think these are good guidelines and making it easier to upgrade is beneficial to everyone.  The decision needs to happen at api/config change time, otherwise the deprecated warning has no purpose if we are never going to remove them.That said we still need to be able to remove

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Sean Owen
Those are all quite reasonable guidelines and I'd put them into the contributing or developer guide, sure. Although not argued here, I think we should go further than codifying and enforcing common-sense guidelines like these. I think bias should shift in favor of retaining APIs going forward, and