Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Mich Talebzadeh
Thanks

In the context of stored procedures API for Catalogs, this approach
deviates from the traditional definition of stored procedures in RDBMS for
two key reasons:

   - Compilation vs. Interpretation: Traditional stored procedures are
   typically pre-compiled into machine code for faster execution. This
   approach, however, focuses on loading and interpreting the code on demand,
   similar to how scripts are run in some programming languages like Python.
   - Schema Changes and Invalidation: In RDBMS, changes to the underlying
   tables can invalidate compiled procedures as they might reference
   non-existent columns or have incompatible data types. This approach aims to
   avoid invalidation by potentially adapting to minor schema changes.

So, while it leverages the concept of pre-defined procedures stored within
the database and accessible through the Catalog API, it is evident that
this approach functions more like dynamic scripts than traditional compiled
stored procedures.

HTH

Mich Talebzadeh,Technologist | Architect | Data Engineer  | Generative AI |
FinCrime

London
United Kingdom

   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh




Disclaimer: The information provided is correct to the best of my knowledge
but of course cannot be guaranteed . It is essential to note that, as with
any advice, quote "one test result is worth one-thousand expert opinions
(Werner Von Braun)".


Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 11 May 2024 at 19:25, Anton Okolnychyi 
wrote:

> Mich, I don't think the invalidation will be necessary in our case as
> there is no plan to preprocess or compile the procedures into executable
> objects. They will be loaded and executed on demand via the Catalog API.
>
> пт, 10 трав. 2024 р. о 10:37 Mich Talebzadeh 
> пише:
>
>> Hi,
>>
>> If the underlying table changes (DDL), if I recall from RDBMSs like
>> Oracle, the stored procedure will be invalidated as it is a compiled
>> object. How is this going to be handled? Does it follow the same mechanism?
>>
>> Thanks
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Sat, 20 Apr 2024 at 02:34, Anton Okolnychyi 
>> wrote:
>>
>>> Hi folks,
>>>
>>> I'd like to start a discussion on SPARK-44167 that aims to enable
>>> catalogs to expose custom routines as stored procedures. I believe this
>>> functionality will enhance Spark’s ability to interact with external
>>> connectors and allow users to perform more operations in plain SQL.
>>>
>>> SPIP [1] contains proposed API changes and parser extensions. Any
>>> feedback is more than welcome!
>>>
>>> Unlike the initial proposal for stored procedures with Python [2], this
>>> one focuses on exposing pre-defined stored procedures via the catalog API.
>>> This approach is inspired by a similar functionality in Trino and avoids
>>> the challenges of supporting user-defined routines discussed earlier [3].
>>>
>>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>>
>>> - Anton
>>>
>>> [1] -
>>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>>> [2] -
>>> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
>>> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>>>
>>>
>>>
>>>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Anton Okolnychyi
Mich, I don't think the invalidation will be necessary in our case as there
is no plan to preprocess or compile the procedures into executable objects.
They will be loaded and executed on demand via the Catalog API.

пт, 10 трав. 2024 р. о 10:37 Mich Talebzadeh 
пише:

> Hi,
>
> If the underlying table changes (DDL), if I recall from RDBMSs like
> Oracle, the stored procedure will be invalidated as it is a compiled
> object. How is this going to be handled? Does it follow the same mechanism?
>
> Thanks
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sat, 20 Apr 2024 at 02:34, Anton Okolnychyi 
> wrote:
>
>> Hi folks,
>>
>> I'd like to start a discussion on SPARK-44167 that aims to enable
>> catalogs to expose custom routines as stored procedures. I believe this
>> functionality will enhance Spark’s ability to interact with external
>> connectors and allow users to perform more operations in plain SQL.
>>
>> SPIP [1] contains proposed API changes and parser extensions. Any
>> feedback is more than welcome!
>>
>> Unlike the initial proposal for stored procedures with Python [2], this
>> one focuses on exposing pre-defined stored procedures via the catalog API.
>> This approach is inspired by a similar functionality in Trino and avoids
>> the challenges of supporting user-defined routines discussed earlier [3].
>>
>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>
>> - Anton
>>
>> [1] -
>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>> [2] -
>> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
>> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>>
>>
>>
>>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-10 Thread Mich Talebzadeh
Hi,

If the underlying table changes (DDL), if I recall from RDBMSs like Oracle,
the stored procedure will be invalidated as it is a compiled object. How is
this going to be handled? Does it follow the same mechanism?

Thanks

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 20 Apr 2024 at 02:34, Anton Okolnychyi 
wrote:

> Hi folks,
>
> I'd like to start a discussion on SPARK-44167 that aims to enable catalogs
> to expose custom routines as stored procedures. I believe this
> functionality will enhance Spark’s ability to interact with external
> connectors and allow users to perform more operations in plain SQL.
>
> SPIP [1] contains proposed API changes and parser extensions. Any feedback
> is more than welcome!
>
> Unlike the initial proposal for stored procedures with Python [2], this
> one focuses on exposing pre-defined stored procedures via the catalog API.
> This approach is inspired by a similar functionality in Trino and avoids
> the challenges of supporting user-defined routines discussed earlier [3].
>
> Liang-Chi was kind enough to shepherd this effort. Thanks!
>
> - Anton
>
> [1] -
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> [2] -
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>
>
>
>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread huaxin gao
Thanks Anton for the updated proposal -- it looks great! I appreciate the
hard work put into refining it. I am looking forward to the upcoming vote
and moving forward with this initiative.

Thanks,
Huaxin

On Thu, May 9, 2024 at 7:30 PM L. C. Hsieh  wrote:

> Thanks Anton. Thank you, Wenchen, Dongjoon, Ryan, Serge, Allison and
> others if I miss those who are participating in the discussion.
>
> I suppose we have reached a consensus or close to being in the design.
>
> If you have some more comments, please let us know.
>
> If not, I will go to start a vote soon after a few days.
>
> Thank you.
>
> On Thu, May 9, 2024 at 6:12 PM Anton Okolnychyi 
> wrote:
> >
> > Thanks to everyone who commented on the design doc. I updated the
> proposal and it is ready for another look. I hope we can converge and move
> forward with this effort!
> >
> > - Anton
> >
> > пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi 
> пише:
> >>
> >> Hi folks,
> >>
> >> I'd like to start a discussion on SPARK-44167 that aims to enable
> catalogs to expose custom routines as stored procedures. I believe this
> functionality will enhance Spark’s ability to interact with external
> connectors and allow users to perform more operations in plain SQL.
> >>
> >> SPIP [1] contains proposed API changes and parser extensions. Any
> feedback is more than welcome!
> >>
> >> Unlike the initial proposal for stored procedures with Python [2], this
> one focuses on exposing pre-defined stored procedures via the catalog API.
> This approach is inspired by a similar functionality in Trino and avoids
> the challenges of supporting user-defined routines discussed earlier [3].
> >>
> >> Liang-Chi was kind enough to shepherd this effort. Thanks!
> >>
> >> - Anton
> >>
> >> [1] -
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> >> [2] -
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
> >> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
> >>
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread Wenchen Fan
Thanks for leading this project! Let's move forward.

On Fri, May 10, 2024 at 10:31 AM L. C. Hsieh  wrote:

> Thanks Anton. Thank you, Wenchen, Dongjoon, Ryan, Serge, Allison and
> others if I miss those who are participating in the discussion.
>
> I suppose we have reached a consensus or close to being in the design.
>
> If you have some more comments, please let us know.
>
> If not, I will go to start a vote soon after a few days.
>
> Thank you.
>
> On Thu, May 9, 2024 at 6:12 PM Anton Okolnychyi 
> wrote:
> >
> > Thanks to everyone who commented on the design doc. I updated the
> proposal and it is ready for another look. I hope we can converge and move
> forward with this effort!
> >
> > - Anton
> >
> > пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi 
> пише:
> >>
> >> Hi folks,
> >>
> >> I'd like to start a discussion on SPARK-44167 that aims to enable
> catalogs to expose custom routines as stored procedures. I believe this
> functionality will enhance Spark’s ability to interact with external
> connectors and allow users to perform more operations in plain SQL.
> >>
> >> SPIP [1] contains proposed API changes and parser extensions. Any
> feedback is more than welcome!
> >>
> >> Unlike the initial proposal for stored procedures with Python [2], this
> one focuses on exposing pre-defined stored procedures via the catalog API.
> This approach is inspired by a similar functionality in Trino and avoids
> the challenges of supporting user-defined routines discussed earlier [3].
> >>
> >> Liang-Chi was kind enough to shepherd this effort. Thanks!
> >>
> >> - Anton
> >>
> >> [1] -
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> >> [2] -
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
> >> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
> >>
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread L. C. Hsieh
Thanks Anton. Thank you, Wenchen, Dongjoon, Ryan, Serge, Allison and
others if I miss those who are participating in the discussion.

I suppose we have reached a consensus or close to being in the design.

If you have some more comments, please let us know.

If not, I will go to start a vote soon after a few days.

Thank you.

On Thu, May 9, 2024 at 6:12 PM Anton Okolnychyi  wrote:
>
> Thanks to everyone who commented on the design doc. I updated the proposal 
> and it is ready for another look. I hope we can converge and move forward 
> with this effort!
>
> - Anton
>
> пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi  пише:
>>
>> Hi folks,
>>
>> I'd like to start a discussion on SPARK-44167 that aims to enable catalogs 
>> to expose custom routines as stored procedures. I believe this functionality 
>> will enhance Spark’s ability to interact with external connectors and allow 
>> users to perform more operations in plain SQL.
>>
>> SPIP [1] contains proposed API changes and parser extensions. Any feedback 
>> is more than welcome!
>>
>> Unlike the initial proposal for stored procedures with Python [2], this one 
>> focuses on exposing pre-defined stored procedures via the catalog API. This 
>> approach is inspired by a similar functionality in Trino and avoids the 
>> challenges of supporting user-defined routines discussed earlier [3].
>>
>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>
>> - Anton
>>
>> [1] - 
>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>> [2] - 
>> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
>> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>>
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread Anton Okolnychyi
Thanks to everyone who commented on the design doc. I updated the proposal
and it is ready for another look. I hope we can converge and move forward
with this effort!

- Anton

пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi  пише:

> Hi folks,
>
> I'd like to start a discussion on SPARK-44167 that aims to enable catalogs
> to expose custom routines as stored procedures. I believe this
> functionality will enhance Spark’s ability to interact with external
> connectors and allow users to perform more operations in plain SQL.
>
> SPIP [1] contains proposed API changes and parser extensions. Any feedback
> is more than welcome!
>
> Unlike the initial proposal for stored procedures with Python [2], this
> one focuses on exposing pre-defined stored procedures via the catalog API.
> This approach is inspired by a similar functionality in Trino and avoids
> the challenges of supporting user-defined routines discussed earlier [3].
>
> Liang-Chi was kind enough to shepherd this effort. Thanks!
>
> - Anton
>
> [1] -
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> [2] -
> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>
>
>
>


[DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-04-19 Thread Anton Okolnychyi
Hi folks,

I'd like to start a discussion on SPARK-44167 that aims to enable catalogs
to expose custom routines as stored procedures. I believe this
functionality will enhance Spark’s ability to interact with external
connectors and allow users to perform more operations in plain SQL.

SPIP [1] contains proposed API changes and parser extensions. Any feedback
is more than welcome!

Unlike the initial proposal for stored procedures with Python [2], this one
focuses on exposing pre-defined stored procedures via the catalog API. This
approach is inspired by a similar functionality in Trino and avoids the
challenges of supporting user-defined routines discussed earlier [3].

Liang-Chi was kind enough to shepherd this effort. Thanks!

- Anton

[1] -
https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
[2] -
https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
[3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l