Re: Issue with Materialized Views in Spark SQL

2024-05-02 Thread Walaa Eldin Moustafa
I do not think the issue is with DROP MATERIALIZED VIEW only, but also with CREATE MATERIALIZED VIEW, because neither is supported in Spark. I guess you must have created the view from Hive and are trying to drop it from Spark and that is why you are running to the issue with DROP first. There is

Re: Profiling data quality with Spark

2022-12-27 Thread Walaa Eldin Moustafa
Rajat, You might want to read about Data Sentinel, a data validation tool on Spark that is developed at LinkedIn. https://engineering.linkedin.com/blog/2020/data-sentinel-automating-data-validation The project is not open source, but the blog post might give you insights about how such a system

Re: [EXTERNAL] Parse Execution Plan from PySpark

2022-05-03 Thread Walaa Eldin Moustafa
Hi Pablo, Do you mean an in-memory plan? You can access one by implementing a Spark Listener. Here is an example from the Datahub project [1]. If you end up parsing the SQL plan string, you may consider using/extending Coral [2, 3]. There is already a POC for that. See some test cases [4].