Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Hyukjin Kwon
Yeah, I have been thinking about this too, and Holden did some work here that this SPIP will reuse. I support this. On Wed, 14 Jun 2023 at 08:10, Amanda Liu wrote: > Hi all, > > I'd like to start a discussion about implementing an official PySpark test > framework. Currently, there's no

[no subject]

2023-06-13 Thread Amanda Liu

[DISCUSS] SPIP: Add PySpark Test Framework

2023-06-13 Thread Amanda Liu
Hi all, I'd like to start a discussion about implementing an official PySpark test framework. Currently, there's no official test framework, but only various open-source repos and blog posts. Many of these open-source resources are very popular, which demonstrates user-demand for PySpark testing

Re: Apache Spark 4.0 Timeframe?

2023-06-13 Thread Dongjoon Hyun
It's great to hear from you that you are interested in the discussion of Apache Spark 4.0 scope and timeframe. :) This is the initial thread which includes upgrading the default Scala version to 2.13 or 3.3 is the one of the reasons. As we know, Apache Spark 4.0 is not limited to this. For

Re: JDK version support policy?

2023-06-13 Thread David Li
Thanks all for the discussion here. Based on this I think we'll stick with Java 8 for now and then upgrade to Java 11 around or after Spark 4. On Thu, Jun 8, 2023, at 07:17, Sean Owen wrote: > Noted, but for that you'd simply run your app on Java 17. If Spark works, and > your app's

Re: Data Contracts

2023-06-13 Thread Mich Talebzadeh
>From my limited understanding of data contracts, there are two factors that deem necessary. 1. procedure matter 2. technical matter I mean this is nothing new. Some tools like Cloud data fusion can assist when the procedures are validated. Simply "The process of integrating multiple data

Re: Data Contracts

2023-06-13 Thread Phillip Henry
Hi, Fokko and Deepak. The problem with DBT and Great Expectations (and Soda too, I believe) is that by the time they find the problem, the error is already in production - and fixing production can be a nightmare. What's more, we've found that nobody ever looks at the data quality reports we

Re: Data Contracts

2023-06-13 Thread Fokko Driesprong
Hey Phillip, Thanks for raising this. I like the idea. The question is, should this be implemented in Spark or some other framework? I know that dbt has a fairly extensive way of testing your data , and making sure that you can enforce assumptions on