[DISCUSS] Updating documentation hosted for EOL and maintenance releases

2023-08-30 Thread Hyukjin Kwon
Hi all, I would like to raise a discussion about updating documentation hosted for EOL and maintenance versions. To provide some context, we currently host the documentation for EOL versions of Apache Spark, which can be found at links like

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
> Which Python version will run that stored procedure? > > All Python versions supported in PySpark > Where in stored procedure defines the exact python version which will run the code? That was the question. > How to manage external dependencies? > > Existing way we have >

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Hyukjin Kwon
Which Python version will run that stored procedure? All Python versions supported in PySpark How to manage external dependencies? Existing way we have https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html . In fact, this will use the external dependencies within your

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Alexander Shorin
-1 Great idea to ignore the experience of others and copy bad practices back for nothing. If you are familiar with Python ecosystem then you should answer the questions: 1. Which Python version will run that stored procedure? 2. How to manage external dependencies? 3. How to test it via a common

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Yuming Wang
It seems can not check signature: yumwang@G9L07H60PK Downloads % gpg --keyserver hkps://keys.openpgp.org --recv-key FC3AE3A7EAA1BAC98770840E7E1ABCC53AAA2216 gpg: key 7E1ABCC53AAA2216: no user ID gpg: Total number processed: 1 yumwang@G9L07H60PK Downloads % gpg --batch --verify

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Sean Owen
It worked fine after I ran it again I included "package test" instead of "test" (I had previously run "install") +1 On Wed, Aug 30, 2023 at 6:06 AM yangjie01 wrote: > Hi, Sean > > > > I have performed testing with Java 17 and Scala 2.13 using maven (`mvn > clean install` and `mvn package

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Hyukjin Kwon
+1 we should have this .. a lot of other projects and DBMSes have this too, and we currently don't have a way to handle them within Apache Spark. Disclaimer: I am the shepherd of this SPIP. On Thu, 31 Aug 2023 at 09:31, Allison Wang wrote: > Hi Mich, > > I've updated the permissions on the

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Mridul Muralidharan
+1 Signatures, digests, etc check out fine. Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes Regards, Mridul On Wed, Aug 30, 2023 at 6:10 AM yangjie01 wrote: > Hi, Sean > > > > I have performed testing with Java 17 and Scala 2.13 using maven (`mvn > clean install` and

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Allison Wang
Hi Mich, I've updated the permissions on the document. Please feel free to leave comments. Thanks, Allison On Wed, Aug 30, 2023 at 3:44 PM Mich Talebzadeh wrote: > Hi, > > Great. Please allow edit access on SPIP or ability to comment. > > Thanks > > Mich Talebzadeh, > Distinguished

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Allison Wang
Hi Mich, I've updated the permissions on the document. Please feel free to leave comments. Thanks, Allison On Wed, Aug 30, 2023 at 3:44 PM Mich Talebzadeh wrote: > Hi, > > Great. Please allow edit access on SPIP or ability to comment. > > Thanks > > Mich Talebzadeh, > Distinguished

Re: [DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Mich Talebzadeh
Hi, Great. Please allow edit access on SPIP or ability to comment. Thanks Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin profile

[DISCUSS] SPIP: Python Stored Procedures

2023-08-30 Thread Allison Wang
Hi all, I would like to start a discussion on “Python Stored Procedures". This proposal aims to extend Spark SQL by introducing support for stored procedures, starting with Python as the procedural language. This will enable users to run complex logic using Python within their SQL workflows and

Re: [DISCUSS] Incremental statistics collection

2023-08-30 Thread Mich Talebzadeh
Sorry I missed this one In the context what has been changed we ought to have an additional column timestamp In short we can have datachange(object_name, partition_name, colname, timestamp) timestamp is the point in time you want to compare against for changes. Example SELECT * FROM WHERE

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread yangjie01
Hi, Sean I have performed testing with Java 17 and Scala 2.13 using maven (`mvn clean install` and `mvn package test`), and have not encountered the issue you mentioned. The test for the connect module depends on the `spark-protobuf` module to complete the `package,` was it successful? Or

Re: [DISCUSS] Incremental statistics collection

2023-08-30 Thread Mich Talebzadeh
Another idea that came to my mind from the old days, is the concept of having a function called *datachange* This datachange function should measure the amount of change in the data distribution since ANALYZE STATISTICS last ran. Specifically, it should measure the number of inserts, updates and

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-30 Thread Dipayan Dev
Can we fix this bug in Spark 3.5.0? https://issues.apache.org/jira/browse/SPARK-44884 On Wed, Aug 30, 2023 at 11:51 AM Sean Owen wrote: > It looks good except that I'm getting errors running the Spark Connect > tests at the end (Java 17, Scala 2.13) It looks like I missed something > necessary