Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
Sadly Apache Spark sounds like it has nothing to do within materialised
views. I was hoping it could read it!

>>> *spark.sql("SELECT * FROM test.mv ").show()*
Traceback (most recent call last):
  File "", line 1, in 
  File "/opt/spark/python/pyspark/sql/session.py", line 1440, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
  File
"/usr/src/Python-3.9.16/venv/venv3.9/lib/python3.9/site-packages/py4j/java_gateway.py",
line 1321, in __call__
return_value = get_return_value(
  File "/opt/spark/python/pyspark/errors/exceptions/captured.py", line 175,
in deco
raise converted from None
*Pyspark.errors.exceptions.captured.AnalysisException: Hive materialized
view is not supported.*


HTH

Mch Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 3 May 2024 at 11:03, Mich Talebzadeh 
wrote:

> Thanks for the comments I received.
>
> So in summary, Apache Spark itself doesn't directly manage materialized
> views,(MV)  but it can work with them through integration with the
> underlying data storage systems like Hive or through iceberg. I believe
> databricks through unity catalog support MVs as well.
>
> Moreover, there is a case for supporting MVs. However, Spark can utilize
> materialized views even though it doesn't directly manage them.. This came
> about because someone in the Spark user forum enquired about "Spark
> streaming issue to Elastic data*". One option I thought of was that uUsing
> materialized views with Spark Structured Streaming and Change Data Capture
> (CDC) is a potential solution for efficiently streaming view data updates
> in this scenario. .
>
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Fri, 3 May 2024 at 00:54, Mich Talebzadeh 
> wrote:
>
>> An issue I encountered while working with Materialized Views in Spark
>> SQL. It appears that there is an inconsistency between the behavior of
>> Materialized Views in Spark SQL and Hive.
>>
>> When attempting to execute a statement like DROP MATERIALIZED VIEW IF
>> EXISTS test.mv in Spark SQL, I encountered a syntax error indicating
>> that the keyword MATERIALIZED is not recognized. However, the same
>> statement executes successfully in Hive without any errors.
>>
>> pyspark.errors.exceptions.captured.ParseException:
>> [PARSE_SYNTAX_ERROR] Syntax error at or near 'MATERIALIZED'.(line 1, pos
>> 5)
>>
>> == SQL ==
>> DROP MATERIALIZED VIEW IF EXISTS test.mv
>> -^^^
>>
>> Here are the versions I am using:
>>
>>
>>
>> *Hive: 3.1.1Spark: 3.4*
>> my Spark session:
>>
>> spark = SparkSession.builder \
>>   .appName("test") \
>>   .enableHiveSupport() \
>>   .getOrCreate()
>>
>> Has anyone seen this behaviour or encountered a similar issue or if there
>> are any insights into why this discrepancy exists between Spark SQL and
>> Hive.
>>
>> Thanks
>>
>> Mich Talebzadeh,
>>
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> Disclaimer: The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner Von Braun)".
>>
>


Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
Thanks for the comments I received.

So in summary, Apache Spark itself doesn't directly manage materialized
views,(MV)  but it can work with them through integration with the
underlying data storage systems like Hive or through iceberg. I believe
databricks through unity catalog support MVs as well.

Moreover, there is a case for supporting MVs. However, Spark can utilize
materialized views even though it doesn't directly manage them.. This came
about because someone in the Spark user forum enquired about "Spark
streaming issue to Elastic data*". One option I thought of was that uUsing
materialized views with Spark Structured Streaming and Change Data Capture
(CDC) is a potential solution for efficiently streaming view data updates
in this scenario. .


Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Fri, 3 May 2024 at 00:54, Mich Talebzadeh 
wrote:

> An issue I encountered while working with Materialized Views in Spark SQL.
> It appears that there is an inconsistency between the behavior of
> Materialized Views in Spark SQL and Hive.
>
> When attempting to execute a statement like DROP MATERIALIZED VIEW IF
> EXISTS test.mv in Spark SQL, I encountered a syntax error indicating that
> the keyword MATERIALIZED is not recognized. However, the same statement
> executes successfully in Hive without any errors.
>
> pyspark.errors.exceptions.captured.ParseException:
> [PARSE_SYNTAX_ERROR] Syntax error at or near 'MATERIALIZED'.(line 1, pos 5)
>
> == SQL ==
> DROP MATERIALIZED VIEW IF EXISTS test.mv
> -^^^
>
> Here are the versions I am using:
>
>
>
> *Hive: 3.1.1Spark: 3.4*
> my Spark session:
>
> spark = SparkSession.builder \
>   .appName("test") \
>   .enableHiveSupport() \
>   .getOrCreate()
>
> Has anyone seen this behaviour or encountered a similar issue or if there
> are any insights into why this discrepancy exists between Spark SQL and
> Hive.
>
> Thanks
>
> Mich Talebzadeh,
>
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>
> London
> United Kingdom
>
>
>view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner Von Braun)".
>


Re: Issue with Materialized Views in Spark SQL

2024-05-02 Thread Jungtaek Lim
(removing dev@ as I don't think this is dev@ related thread but more about
"question")

My understanding is that Apache Spark does not support Materialized View.
That's all. IMHO it's not a proper expectation that all operations in
Apache Hive will be supported in Apache Spark. They are different projects
and Apache Spark does not aim to be 100% compatible with Apache Hive. There
was a time the community tried to provide some sort of compatibility, but
both projects are 10+ years old, and mature enough to have their own
roadmap to drive.

That said, that's not a bug or an issue. You can initiate a feature request
and wish the community to include that into the roadmap.

On Fri, May 3, 2024 at 12:01 PM Mich Talebzadeh 
wrote:

> An issue I encountered while working with Materialized Views in Spark SQL.
> It appears that there is an inconsistency between the behavior of
> Materialized Views in Spark SQL and Hive.
>
> When attempting to execute a statement like DROP MATERIALIZED VIEW IF
> EXISTS test.mv in Spark SQL, I encountered a syntax error indicating that
> the keyword MATERIALIZED is not recognized. However, the same statement
> executes successfully in Hive without any errors.
>
> pyspark.errors.exceptions.captured.ParseException:
> [PARSE_SYNTAX_ERROR] Syntax error at or near 'MATERIALIZED'.(line 1, pos 5)
>
> == SQL ==
> DROP MATERIALIZED VIEW IF EXISTS test.mv
> -^^^
>
> Here are the versions I am using:
>
>
>
> *Hive: 3.1.1Spark: 3.4*
> my Spark session:
>
> spark = SparkSession.builder \
>   .appName("test") \
>   .enableHiveSupport() \
>   .getOrCreate()
>
> Has anyone seen this behaviour or encountered a similar issue or if there
> are any insights into why this discrepancy exists between Spark SQL and
> Hive.
>
> Thanks
>
> Mich Talebzadeh,
>
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>
> London
> United Kingdom
>
>
>view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner Von Braun)".
>


Re: Issue with Materialized Views in Spark SQL

2024-05-02 Thread Walaa Eldin Moustafa
I do not think the issue is with DROP MATERIALIZED VIEW only, but also with
CREATE MATERIALIZED VIEW, because neither is supported in Spark. I guess
you must have created the view from Hive and are trying to drop it from
Spark and that is why you are running to the issue with DROP first.

There is some work in the Iceberg community to add the support to Spark
through SQL extensions, and Iceberg support for views and
materialization tables. Some recent discussions can be found here [1] along
with a WIP Iceberg-Spark PR.

[1] https://lists.apache.org/thread/rotmqzmwk5jrcsyxhzjhrvcjs5v3yjcc

Thanks,
Walaa.

On Thu, May 2, 2024 at 4:55 PM Mich Talebzadeh 
wrote:

> An issue I encountered while working with Materialized Views in Spark SQL.
> It appears that there is an inconsistency between the behavior of
> Materialized Views in Spark SQL and Hive.
>
> When attempting to execute a statement like DROP MATERIALIZED VIEW IF
> EXISTS test.mv in Spark SQL, I encountered a syntax error indicating that
> the keyword MATERIALIZED is not recognized. However, the same statement
> executes successfully in Hive without any errors.
>
> pyspark.errors.exceptions.captured.ParseException:
> [PARSE_SYNTAX_ERROR] Syntax error at or near 'MATERIALIZED'.(line 1, pos 5)
>
> == SQL ==
> DROP MATERIALIZED VIEW IF EXISTS test.mv
> -^^^
>
> Here are the versions I am using:
>
>
>
> *Hive: 3.1.1Spark: 3.4*
> my Spark session:
>
> spark = SparkSession.builder \
>   .appName("test") \
>   .enableHiveSupport() \
>   .getOrCreate()
>
> Has anyone seen this behaviour or encountered a similar issue or if there
> are any insights into why this discrepancy exists between Spark SQL and
> Hive.
>
> Thanks
>
> Mich Talebzadeh,
>
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>
> London
> United Kingdom
>
>
>view my Linkedin profile
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> Disclaimer: The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner Von Braun)".
>


Issue with Materialized Views in Spark SQL

2024-05-02 Thread Mich Talebzadeh
An issue I encountered while working with Materialized Views in Spark SQL.
It appears that there is an inconsistency between the behavior of
Materialized Views in Spark SQL and Hive.

When attempting to execute a statement like DROP MATERIALIZED VIEW IF
EXISTS test.mv in Spark SQL, I encountered a syntax error indicating that
the keyword MATERIALIZED is not recognized. However, the same statement
executes successfully in Hive without any errors.

pyspark.errors.exceptions.captured.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'MATERIALIZED'.(line 1, pos 5)

== SQL ==
DROP MATERIALIZED VIEW IF EXISTS test.mv
-^^^

Here are the versions I am using:



*Hive: 3.1.1Spark: 3.4*
my Spark session:

spark = SparkSession.builder \
  .appName("test") \
  .enableHiveSupport() \
  .getOrCreate()

Has anyone seen this behaviour or encountered a similar issue or if there
are any insights into why this discrepancy exists between Spark SQL and
Hive.

Thanks

Mich Talebzadeh,

Technologist | Architect | Data Engineer  | Generative AI | FinCrime

London
United Kingdom


   view my Linkedin profile


 https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge
but of course cannot be guaranteed . It is essential to note that, as with
any advice, quote "one test result is worth one-thousand expert opinions
(Werner Von Braun)".