Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bhuwan Sahni
+1. This is a good addition. *Bhuwan Sahni* Staff Software Engineer bhuwan.sa...@databricks.com 500 108th Ave. NE Bellevue, WA 98004 USA On Wed, Jan 10, 2024 at 9:00 AM Burak Yavuz wrote: > +1. Excited to see more stateful workloads with Structured Streaming! > >

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Burak Yavuz
+1. Excited to see more stateful workloads with Structured Streaming! Best, Burak On Wed, Jan 10, 2024 at 8:21 AM Praveen Gattu wrote: > +1. This brings Structured Streaming a good solution for customers wanting > to build stateful stream processing applications. > > On Wed, Jan 10, 2024 at

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Praveen Gattu
+1. This brings Structured Streaming a good solution for customers wanting to build stateful stream processing applications. On Wed, Jan 10, 2024 at 7:30 AM Bartosz Konieczny wrote: > +1 :) > > On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > >> +1 (binding) >> >> Best Regards, >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bartosz Konieczny
+1 :) On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > +1 (binding) > > Best Regards, > Shixiong Zhu > > > On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > >> This is a good addition! +1 >> >> Raghu Angadi 于2024年1月9日周二 13:17写道: >> >>> +1. This is a major improvement to the state API. >>> >>>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Shixiong Zhu
+1 (binding) Best Regards, Shixiong Zhu On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > This is a good addition! +1 > > Raghu Angadi 于2024年1月9日周二 13:17写道: > >> +1. This is a major improvement to the state API. >> >> Raghu. >> >> On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh >> wrote: >> >>> +1

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Jungtaek Lim
Friendly reminder, VOTE thread is now live! https://lists.apache.org/thread/16ryx828bwoth31hobknxnjfxjxj07mf The vote made here is not counted toward, so please ensure you vote in the VOTE thread. Thanks! On Tue, Jan 9, 2024 at 9:33 AM Jungtaek Lim wrote: > Thanks everyone for the feedback! > >

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh
Hi Ashok, Thanks for pointing out the databricks article Scalable Spark Structured Streaming for REST API Destinations | Databricks Blog I browsed it and it is basically similar to many of us involved

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
This is a good addition! +1 Raghu Angadi 于2024年1月9日周二 13:17写道: > +1. This is a major improvement to the state API. > > Raghu. > > On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh > wrote: > >> +1 for me as well >> >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Raghu Angadi
+1. This is a major improvement to the state API. Raghu. On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh wrote: > +1 for me as well > > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > >view my Linkedin profile >

RE: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
+1 This is a good addition! On 2024/01/09 03:23:35 Anish Shrigondekar wrote: > Thanks Jungtaek for creating the Vote thread. > > +1 (non-binding) from my side too. > > Thanks, > Anish > > On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim > wrote: > > > Starting with my +1 (non-binding). Thanks! > > >

Re: AutoReply: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
Hi, Please stop this acknowledgement email. It is spamming the forum unnecessarily! Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
+1 for me as well Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Anish Shrigondekar
Thanks Jungtaek for creating the Vote thread. +1 (non-binding) from my side too. Thanks, Anish On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim wrote: > Starting with my +1 (non-binding). Thanks! > > On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Streaming - Arbitrary > State API v2. > > References: > >- JIRA ticket >- SPIP

[VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Hi all, I'd like to start the vote for SPIP: Structured Streaming - Arbitrary State API v2. References: - JIRA ticket - SPIP doc -

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Thanks everyone for the feedback! Given that we get positive feedback without major concerns, I will initiate the vote thread soon. Please make a vote in that thread as well. Thanks again! On Tue, Jan 9, 2024 at 7:44 AM Bhuwan Sahni wrote: > +1 on the newer APIs. I believe these APIs provide

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Please also note that Flask, by default, is a single-threaded web framework. While it is suitable for development and small-scale applications, it may not handle concurrent requests efficiently in a production environment. In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Bhuwan Sahni
+1 on the newer APIs. I believe these APIs provide a much powerful mechanism for the user to perform arbitrary state management in Structured Streaming queries. Thanks Bhuwan Sahni On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh wrote: > +1 > > I left some comments in the SPIP doc and got replies

Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Thought it might be useful to share my idea with fellow forum members. During the breaks, I worked on the *seamless integration of Spark Structured Streaming with Flask REST API for real-time data ingestion and analytics*. The use case revolves around a scenario where data is generated through

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread L. C. Hsieh
+1 I left some comments in the SPIP doc and got replies quickly. The new API looks good and more comprehensive. I think it will help Spark Structured Streaming to be more useful in more complicated streaming use cases. On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz wrote: > > I'm also a +1 on the

Re: Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-08 Thread Sean Owen
Agreed, that looks wrong. From the code, it seems that "timezone" is only used for testing, though apparently no test caught this. I'll submit a PR to patch it in any event: https://github.com/apache/spark/pull/44619 On Mon, Jan 8, 2024 at 1:33 AM Janda Martin wrote: > I think that >

Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-07 Thread Janda Martin
I think that [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter introduced regression in UIUtils::formatBatchTime when timezone is defined. DateTimeFormatter is thread-safe and immutable according to JavaDoc so method DateTimeFormatter::withZone

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Burak Yavuz
I'm also a +1 on the newer APIs. We had a lot of learnings from using flatMapGroupsWithState and I believe that we can make the APIs a lot easier to use. On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Mich Talebzadeh
Hi Pavan, Thanks for your answers. Given these responses , it seems like you have already taken a comprehensive approach to address the challenges associated with dynamic scaling in Spark Structured Streaming. IMO, It would also be beneficial to engage with other members as well, or gather

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Shixiong Zhu
+1. Looking forward to seeing how the new API brings in new streaming use cases! Best Regards, Shixiong Zhu On Wed, Nov 29, 2023 at 6:42 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the thread > once again to see if other folks have

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Pavan Kotikalapudi
Hi Mich, As always thanks for looking keenly on the design, really appreciate your inputs on this Ticket. Would love to improve this further and cover more edge-cases if any. I can answer the concerns you have below. I believe I have covered some of them in the proposal, If at all I missed out

回复:unsubscribe

2024-01-04 Thread yxj1141

unsubscribe

2024-01-03 Thread Chenyang Tang
unsubscribe

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-02 Thread Mich Talebzadeh
Hi Pavan, Thanks for putting this request forward. I am generally supportive of it. In a nutshell, I believe this proposal can potentially hold a significant promise for optimizing resource utilization and enhancing performance in Spark Structured Streaming. Having said that there are potential

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-01 Thread Pavan Kotikalapudi
Hi PMC members, Bumping this idea for one last time to see if there are any approvals to take it forward. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc:

Re: When and how does Spark use metastore statistics?

2023-12-26 Thread Bjørn Jørgensen
Tell me more about spark.sql.cbo.strategy tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas < nicholas.cham...@gmail.com>: > Where exactly are you getting this information from? > > As far as I can tell, spark.sql.cbo.enabled has defaulted to false since > it was introduced 7 years ago >

Re: Contribute to Spark Open source

2023-12-25 Thread Colin Williams
Hello, Did you see https://spark.apache.org/contributing.html ? On Mon, Dec 25, 2023 at 5:13 AM Sudharshan V wrote: > > Hi All, > > I am new to Open source and have been using spark scala in my organisation > for the past couple of years. > I would like to contribute to spark open source. > I

Contribute to Spark Open source

2023-12-25 Thread Sudharshan V
Hi All, I am new to Open source and have been using spark scala in my organisation for the past couple of years. I would like to contribute to spark open source. I am not exactly sure of how and where to start. Any help would be greatly appreciated. Is there any documentation per se on how to

the life cycle shuffle Dependency

2023-12-24 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and

Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
Yes, you can validate the syntax of your PySpark SQL queries without connecting to an actual dataset or running the queries on a cluster. PySpark provides a method for syntax validation without executing the query. Something like below __ / __/__ ___ _/ /__ _\ \/ _

Validate spark sql

2023-12-23 Thread ram manickam
Hello, Is there a way to validate pyspark sql to validate only syntax errors?. I cannot connect do actual data set to perform this validation. Any help would be appreciated. Thanks Ram

Meet our keynote speakers and register to Community Over Code EU!

2023-12-22 Thread Ryan Skraba
[Note: You're receiving this email because you are subscribed to one or more project dev@ mailing lists at the Apache Software Foundation.] * Merge with the ASF EUniverse!The registration for

Unsubscribe

2023-12-21 Thread yxj1141
Unsubscribe

Re: ShuffleManager and Speculative Execution

2023-12-21 Thread Mich Talebzadeh
Interesting point. As I understand, the key point is the ShuffleManager ensures that only one map output file is processed by the reduce task, even when multiple attempts succeed. So it is not a random selection process. At the reduce stage, only one copy of the map output needs to be read by the

ShuffleManager and Speculative Execution

2023-12-21 Thread Enrico Minack
Hi Spark devs, I have a question around ShuffleManager: With speculative execution, one map output file is being created multiple times (by multiple task attempts). If both attempts succeed, which is to be read by the reduce task in the next stage? Is any map output as good as any other?

the life cycle shuffle Dependency

2023-12-17 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Guidance for filling out "Affects Version" on Jira

2023-12-17 Thread Nicholas Chammas
The Contributing guide only mentions what to fill in for “Affects Version” for bugs. How about for improvements? This question once caused some problems when I set “Affects Version” to the last released version, and that was interpreted as a request

[ANNOUNCE] Apache Spark 3.3.4 released

2023-12-16 Thread Dongjoon Hyun
We are happy to announce the availability of Apache Spark 3.3.4! Spark 3.3.4 is the last maintenance release based on the branch-3.3 maintenance branch of Spark. It contains many fixes including security and correctness domains. We strongly recommend all 3.3 users to upgrade to this or higher

[VOTE][RESULT] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
The vote passes with 6 +1s (3 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Yuming Wang * - Kent Yao - Liang-Chi Hsieh * - Yang Jie - Malcolm Decuire +0: None -1: None

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
Thank you all. This vote passed. Let me conclude. Dongjoon On 2023/12/11 23:58:28 Malcolm Decuire wrote: > +1 > > On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > > > +1 > > > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > > +1 > > > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote:

Re: Spark 3.5.0 and issue SPARK-45593 (SPARK-45201)

2023-12-14 Thread Steven B Jones
A follow-up to my note yesterday. Issue SPARK-45201 has similar externals to SPARK-45593 and is written to cover target release 3.5.0. Remarkably, the issue only affects self-created distributions, and not the one(s) provided by Spark development itself. I'll let you read

Spark 3.5.0 and issue SPARK-45593

2023-12-13 Thread Steven B Jones
Hello, I maintain a version of Apache Spark that runs on z/OS. I'm porting Spark 3.5.0 to our platform, and having the problem described by https://issues.apache.org/jira/projects/SPARK/issues/SPARK-45593 in

Re: Apache Spark 3.3.4 EOL Release?

2023-12-11 Thread Jungtaek Lim
Sorry for the late reply, I've been busy these days and haven't had time to respond. I didn't realize you were doing release preparation and discussion in parallel. I totally agree you should go if you take a step already. Also, thanks for the suggestion! Unfortunately I got to be busy after

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Malcolm Decuire
+1 On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > +1 > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > +1 > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > > > +1(non-binding > > > > > > Kent Yao > > > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > > > +1 > > > > > >

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
Where exactly are you getting this information from? As far as I can tell, spark.sql.cbo.enabled has defaulted to false since it was introduced 7 years ago

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Yang Jie
+1 On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > +1 > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > +1(non-binding > > > > Kent Yao > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > +1 > > > > > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > > >> > > >> +1 > > >> >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Dongjoon Hyun
Hi, Mridul. > I am currently on Python 3.11.6, java 8. For the above, I added `Python 3.11 support` at Apache Spark 3.4.0. That's exactly one of my reasons why I wanted to do the EOL release of Apache Spark 3.3.4. https://issues.apache.org/jira/browse/SPARK-41454 (Support Python 3.11) Thanks,

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
You are right. By default CBO is not enabled. Whilst the CBO was the default optimizer in earlier versions of Spark, it has been replaced by the AQE in recent releases. spark.sql.cbo.strategy As I understand, The spark.sql.cbo.strategy configuration property specifies the optimizer strategy used

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Mridul Muralidharan
I am seeing a bunch of python related (43) failures in the sql module (for example [1]) ... I am currently on Python 3.11.6, java 8. Not sure if ubuntu modified anything from under me, thoughts ? I am currently testing this against an older branch to make sure it is not an issue with my desktop.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > spark.sql.cbo.strategy: Set to AUTO to use the CBO as the default optimizer, > or NONE to disable it completely. > Hmm, I’ve also never heard of this setting before and can’t seem to find it in the Spark docs or source code.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > > By default, the CBO is enabled in Spark. Note that this is not correct. AQE is enabled

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
Some of these have been around outside of spark for years. like CBO and RBO etc but I concur that they have a place in spark's doc. Simply put, statistics provide insights into the characteristics of data, such as distribution, skewness, and cardinalities, which help the optimizer make informed

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread L. C. Hsieh
+1 On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > +1(non-binding > > Kent Yao > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > +1 > > > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > >> > >> +1 > >> > >> Dongjoon > >> > >> On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > >> > Please

Re: Algolia search on website is broken

2023-12-10 Thread Gengliang Wang
Hi Nick, Thank you for reporting the issue with our web crawler. I've found that the issue was due to a change(specifically, pull request #40269 ) in the website's HTML structure, where the JavaScript selector ".container-wrapper" is now ".container".

Re: When and how does Spark use metastore statistics?

2023-12-10 Thread Nicholas Chammas
I’ve done some reading and have a slightly better understanding of statistics now. Every implementation of LeafNode.computeStats

Disabling distributing local conf file during spark-submit

2023-12-10 Thread Eugene Miretsky
Hello, It looks like local conf archives always get copied to the target (HDFS) every time a job is submitted 1. Other

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Kent Yao
+1(non-binding Kent Yao Yuming Wang 于2023年12月11日周一 09:33写道: > > +1 > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: >> >> +1 >> >> Dongjoon >> >> On 2023/12/08 21:41:00 Dongjoon Hyun wrote: >> > Please vote on releasing the following candidate as Apache Spark version >> > 3.3.4. >> >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Yuming Wang
+1 On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > > 3.3.4. > > > > The vote is open until December 15th 1AM (PST) and passes if a majority > +1

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Dongjoon Hyun
+1 Dongjoon On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.4. > > The vote is open until December 15th 1AM (PST) and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this

Re: Spark on Yarn with Java 17

2023-12-10 Thread Jason Xu
Doogjoon and Luca, it's great to learn that there is a way to run different JVM versions for Spark and Hadoop binaries. I had concerns about Java compatibility issues without this solution. Thank you! Luca, thank you for providing a how-to guide for this. It's really helpful! On Sat, Dec 9, 2023

Re: Algolia search on website is broken

2023-12-10 Thread Nicholas Chammas
Pinging Gengliang and Xiao about this, per these docs . It looks like to fix this problem you need access to the Algolia Crawler Admin Console.

unsubscribe

2023-12-10 Thread bruce COTTMAN
- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

unsubscribe

2023-12-10 Thread Stevens, Clay
Clay

unsubscribe

2023-12-10 Thread Rajanikant V

unsubscribe

2023-12-09 Thread Ravi Chinoy
-- Regards Ravi Chinoy Phone: (415) 230 9971

RE: Spark on Yarn with Java 17

2023-12-09 Thread Luca Canali
Jason, In case you need a pointer on how to run Spark with a version of Java different than the version used by the Hadoop processes, as indicated by Dongjoon, this is an example of what we do on our Hadoop clusters:

Re: Spark on Yarn with Java 17

2023-12-09 Thread Dongjoon Hyun
Please try Apache Spark 3.3+ (SPARK-33772) with Java 17 on your cluster simply, Jason. I believe you can set up for your Spark 3.3+ jobs to run with Java 17 while your cluster(DataNode/NameNode/ResourceManager/NodeManager) is still sitting on Java 8. Dongjoon. On Fri, Dec 8, 2023 at 11:12 PM

Re: Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
Dongjoon, thank you for the fast response! Apache Spark 4.0.0 depends on only Apache Hadoop client library. To better understand your answer, does that mean a Spark application built with Java 17 can successfully run on a Hadoop cluster on version 3.3 and Java 8 runtime? On Fri, Dec 8, 2023 at

Re: Spark on Yarn with Java 17

2023-12-08 Thread Dongjoon Hyun
Hi, Jason. Apache Spark 4.0.0 depends on only Apache Hadoop client library. You can track all `Apache Spark 4` activities including Hadoop dependency here. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) According to the release history, the original suggested

Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
Hi Spark devs, According to the Spark 3.5 release notes, Spark 4 will no longer support Java 8 and 11 (link ). My company is using Spark on Yarn with Java 8 now. When considering a future upgrade to Spark 4, one issue

[VOTE] Release Spark 3.3.4 (RC1)

2023-12-08 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.3.4. The vote is open until December 15th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.3.4 [ ] -1 Do not release this package

Re: Apache Spark 3.3.4 EOL Release?

2023-12-08 Thread Dongjoon Hyun
Thank you, Mridul, and Kent, too. Additionally, thank you for volunteering as a release manager, Jungtaek, For the 3.3.4 EOL release, I've already been testing and preparing for one week since my first email. So, why don't you proceed with the Apache Spark 3.5.1 release? It has 142 patches

Re: Apache Spark 3.3.4 EOL Release?

2023-12-07 Thread Jungtaek Lim
+1 to release 3.3.4 and consider 3.3 as EOL. Btw, it'd be probably ideal if we could encourage taking an opportunity of experiencing the release process to people who hadn't had a time to go through (when there are people who are happy to take it). If you don't mind and we are not very strict on

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
This is not a question for the dev list. Moving dev to bcc. One thing I would try is to connect to this database using JDBC + SSH tunnel, but without Spark. That way you can focus on getting the JDBC connection to work without Spark complicating the picture for you. > On Dec 5, 2023, at 8:12 

SSH Tunneling issue with Apache Spark

2023-12-05 Thread Venkatesan Muniappan
Hi Team, I am facing an issue with SSH Tunneling in Apache Spark. The behavior is same as the one in this Stackoverflow question but there are no answers there. This is what I am trying:

When and how does Spark use metastore statistics?

2023-12-05 Thread Nicholas Chammas
I’m interested in improving some of the documentation relating to the table and column statistics that get stored in the metastore, and how Spark uses them. But I’m not clear on a few things, so I’m writing to you with some questions. 1. The documentation for 

Algolia search on website is broken

2023-12-05 Thread Nicholas Chammas
Should I report this instead on Jira? Apologies if the dev list is not the right place. Search on the website appears to be broken. For example, here is a search for “analyze”:  And here is the same search using DDG

unsubscribe

2023-12-05 Thread Kalpana Jalawadi

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Kent Yao
+1 Thank you for driving this EOL release, Dongjoon! Kent Yao On 2023/12/04 19:40:10 Mridul Muralidharan wrote: > +1 > > Regards, > Mridul > > On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh wrote: > > > +1 > > > > Thanks Dongjoon! > > > > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: > > > >

Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-04 Thread Sean Owen
It already does. I think that's not the same idea? On Mon, Dec 4, 2023, 8:12 PM Almog Tavor wrote: > I think Spark should start shading it’s problematic deps similar to how > it’s done in Flink > > On Mon, 4 Dec 2023 at 2:57 Sean Owen wrote: > >> I am not sure we can control that - the Scala

Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-04 Thread Almog Tavor
I think Spark should start shading it’s problematic deps similar to how it’s done in Flink On Mon, 4 Dec 2023 at 2:57 Sean Owen wrote: > I am not sure we can control that - the Scala _x.y suffix has particular > meaning in the Scala ecosystem for artifacts and thus the naming of .jar > files.

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Mridul Muralidharan
+1 Regards, Mridul On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh wrote: > +1 > > Thanks Dongjoon! > > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: > > > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > > > > Jie Yang > > > > On 2023/12/04 15:08:25 Tom Graves wrote: > > > +1 for a 3.3.4 EOL

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Dongjoon Hyun
Thank you all. Dongjoon. On Mon, Dec 4, 2023 at 9:40 AM L. C. Hsieh wrote: > +1 > > Thanks Dongjoon! > > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: > > > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > > > > Jie Yang > > > > On 2023/12/04 15:08:25 Tom Graves wrote: > > > +1 for a 3.3.4

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread L. C. Hsieh
+1 Thanks Dongjoon! On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > > Jie Yang > > On 2023/12/04 15:08:25 Tom Graves wrote: > > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > > Tom > > On Friday, December 1, 2023 at 02:48:22 PM CST,

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Yang Jie
+1 for a 3.3.4 EOL Release. Thanks Dongjoon. Jie Yang On 2023/12/04 15:08:25 Tom Graves wrote: > +1 for a 3.3.4 EOL Release. Thanks Dongjoon. > Tom > On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun > wrote: > > Hi, All. > > Since the Apache Spark 3.3.0 RC6 vote passed

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Tom Graves
+1 for a 3.3.4 EOL Release. Thanks Dongjoon. Tom On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun wrote: Hi, All. Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has been maintained and served well until now. -

Re: [DISCUSS] SPIP: ShuffleManager short name registration via SparkPlugin

2023-12-04 Thread Alessandro Bellina
Hello devs, We are going to be tabling the SPIP proposal given that we don't see responses in the discussion thread. We still believe that making custom ShuffleManagers easier to configure is worthwhile, given interactions with our users, but we can revisit this later. If anyone in the list has

unsubscribe

2023-12-04 Thread Duy Pham

`orc-format` 1.0 (ORC-1531) for Apache ORC 2.0

2023-12-03 Thread Dongjoon Hyun
Hi, All. As one of the key parts of Apache ORC 2.0, we've been discussing a new repository and module, `orc-format`, in the following. https://github.com/apache/orc/issues/1543 Now, we are ready to create a new repo. Please take a look at the POC repo and code and let us know your thoughts.

Re: Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-03 Thread Sean Owen
I am not sure we can control that - the Scala _x.y suffix has particular meaning in the Scala ecosystem for artifacts and thus the naming of .jar files. And we need to work with the Scala ecosystem. What can't handle these files, Spring Boot? does it somehow assume the .jar file name relates to

Should Spark 4.x use Java modules (those you define with module-info.java sources)?

2023-12-03 Thread Marc Le Bihan
Hello,     Last month, I've attempted the experience of upgrading my Spring-Boot 2 Java project, that relies heavily on Spark 3.4.2, to Spring-Boot 3. It didn't succeed yet, but was informative.     Spring-Boot 2 → 3 means especially javax.* becoming jakarka.* : javax.activation,

unsubscribe

2023-12-03 Thread Kalpana Jalawadi

Unsubscribe

2023-12-03 Thread Kalpana Jalawadi

Re: [FYI] SPARK-45981: Improve Python language test coverage

2023-12-02 Thread Hyukjin Kwon
Awesome! On Sat, Dec 2, 2023 at 2:33 PM Dongjoon Hyun wrote: > Hi, All. > > As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community > starts to have test coverage for all supported Python versions from Today. > > - https://github.com/apache/spark/actions/runs/7061665420 > >

<    2   3   4   5   6   7   8   9   10   11   >