[jira] [Created] (SPARK-48362) Add CollectSetWIthLimit

2024-05-20 Thread Holden Karau (Jira)
Holden Karau created SPARK-48362: Summary: Add CollectSetWIthLimit Key: SPARK-48362 URL: https://issues.apache.org/jira/browse/SPARK-48362 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-44953: Assignee: binjie yang > Log a warning (or automatically disable) when shuffle track

[jira] [Resolved] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2024-05-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-44953. -- Resolution: Fixed > Log a warning (or automatically disable) when shuffle tracking is enab

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
Is there some point of contact that can provide me needed context and >> permissions? >> I'd also love to see why the costs are high and see how we can reduce >> them... >> >> Thanks, >> Nimrod >> >> On Wed, May 8, 2024 at 8:26 AM Holden Karau >

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
> will be automated and the only thing which will be manual is to sign the > release for security reasons that would be reasonable. > > Thanks, > Nimrod > > > בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏< > holden.ka...@gmail.com>: > >> Indeed. We could concei

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
ore, my pgp >> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for >> your patience! >> >> Wenchen >> >> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >> >>> +1 >>> >>> >>> >>> *发件人**: *Jun

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
gt;>>> >>>> In addition, Apache Spark PMC received an official notice from ASF >>>> Infra team. >>>> >>>> https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg >>>> > [NOTICE] Apache Spark's GitHub Actions us

Re: ASF board report draft for May

2024-05-06 Thread Holden Karau
possible, we >> opened a blocker-level JIRA issue and have been working on it. >> - https://infra.apache.org/github-actions-policy.html >> >> Please include a sentence that Apache Spark PMC is working on under the >> following umbrella JIRA issue. >> >

Re: ASF board report draft for May

2024-05-05 Thread Holden Karau
Do we want to include that we’re planning on having a preview release of Spark 4 so folks can see the APIs “soon”? Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams:

[jira] [Updated] (SPARK-48101) When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved

2024-05-02 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-48101: - Priority: Minor (was: Major) > When using INSERT OVERWRITE with Spark CTEs they

[jira] [Created] (SPARK-48101) When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved

2024-05-02 Thread Holden Karau (Jira)
Holden Karau created SPARK-48101: Summary: When using INSERT OVERWRITE with Spark CTEs they may not be fully resolved Key: SPARK-48101 URL: https://issues.apache.org/jira/browse/SPARK-48101 Project

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > +1 > > On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > >> +1 for next Monday. >> >> We can do more previews when the other features are ready for preview. >> >> Tathagata Das 于2024年5月1日周三 08:46写道: >> >>> Next week sounds

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh wrote: > +1 > > On Fri, Apr 26, 2024

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Thu, Apr 25, 2024 at 11:18 AM Maciej wrote: > +1 > > Best regards, > Maciej

[Bug 2018504] Re: cups-browsed is using an excessive amount of CPU

2024-04-19 Thread Holden Karau
+1 also running into this If I restart cups the issue goes away for "awhile" though (interestingly printing does not seem to impact cups meaning it's probably behavior that is unrelated to the printing). -- You received this bug notification because you are a member of Ubuntu Bugs, which is

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-13 Thread Holden Karau
+1 -- even if it's not perfect now is the time to change default values On Sat, Apr 13, 2024 at 4:11 PM Hyukjin Kwon wrote: > +1 > > On Sun, Apr 14, 2024 at 7:46 AM Chao Sun wrote: > >> +1. >> >> This feature is very helpful for guarding against correctness issues, >> such as null results due

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > already used it. > > > On

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Apache Spark 3.4.3 (?)

2024-04-06 Thread Holden Karau
Sounds good to me :) Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Sat, Apr 6, 2024 at 2:51 PM Dongjoon Hyun wrote: > Hi, All.

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Apr 1, 2024 at 5:44 PM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin

[jira] [Created] (SPARK-47672) Avoid double evaluation of non-trivial projected elements from filter pushdown

2024-04-01 Thread Holden Karau (Jira)
Holden Karau created SPARK-47672: Summary: Avoid double evaluation of non-trivial projected elements from filter pushdown Key: SPARK-47672 URL: https://issues.apache.org/jira/browse/SPARK-47672

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin wrote: > +1 > > > On Mon, Mar 11 2024

[jira] [Created] (SPARK-47220) log4j race condition during shutdown

2024-02-28 Thread Holden Karau (Jira)
Holden Karau created SPARK-47220: Summary: log4j race condition during shutdown Key: SPARK-47220 URL: https://issues.apache.org/jira/browse/SPARK-47220 Project: Spark Issue Type: Improvement

Re: Generating config docs automatically

2024-02-21 Thread Holden Karau
I think this is a good idea. I like having everything in one source of truth rather than two (so option 1 sounds like a good idea); but that’s just my opinion. I'd be happy to help with reviews though. On Wed, Feb 21, 2024 at 6:37 AM Nicholas Chammas wrote: > I know config documentation is not

Re: Spark 4.0 Query Analyzer Bug Report

2024-02-20 Thread Holden Karau
Do you mean Spark 3.4? 4.0 is very much not released yet. Also it would help if you could share your query & more of the logs leading up to the error. On Tue, Feb 20, 2024 at 3:07 PM Sharma, Anup wrote: > Hi Spark team, > > > > We ran into a dataframe issue after upgrading from spark 3.1 to 4.

[jira] [Resolved] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-47077. -- Resolution: Cannot Reproduce After blowing away my maven + ivy cache it works fine – should

[jira] [Created] (SPARK-47077) sbt build is broken due to selenium change

2024-02-16 Thread Holden Karau (Jira)
Holden Karau created SPARK-47077: Summary: sbt build is broken due to selenium change Key: SPARK-47077 URL: https://issues.apache.org/jira/browse/SPARK-47077 Project: Spark Issue Type

[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types

2024-02-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-47001: - Description: When pushing a filter down in a union the data type may not match exactly

Re: Dynamically Support Spark Native Engine in Iceberg

2024-02-13 Thread Holden Karau
This is great work! Very excited to see this. Cell : 425-233-8271 On Tue, Feb 13, 2024 at 4:38 PM huaxin gao wrote: > Hello Iceberg community, > > As you may already know, Project Comet > , a plugin to > accelerate Spark query execution via

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveraging DataFusion

[jira] [Created] (SPARK-47031) Union of with non-determinstic expression should be non-deterministic

2024-02-12 Thread Holden Karau (Jira)
Holden Karau created SPARK-47031: Summary: Union of with non-determinstic expression should be non-deterministic Key: SPARK-47031 URL: https://issues.apache.org/jira/browse/SPARK-47031 Project: Spark

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue,

Re: Spark-Connect: Param `--packages` does not take effect for executors.

2023-12-04 Thread Holden Karau
So I think this sounds like a bug to me, in the help options for both regular spark-submit and ./sbin/start-connect-server.sh we say: " --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths.

Re: Classpath isolation per SparkSession without Spark Connect

2023-11-27 Thread Holden Karau
So I don’t think we make any particular guarantees around class path isolation there, so even if it does work it’s something you’d need to pay attention to on upgrades. Class path isolation is tricky to get right. On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde wrote: > Hello, > > We are using spark

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread Holden Karau
+1 On Tue, Nov 14, 2023 at 10:21 AM DB Tsai wrote: > +1 > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov < > vakaris.bashki...@gmail.com> wrote: > > +1 (non-binding) > > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun wrote: > >> +1

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some follow-up questions: Have we taken the time to learn from the other operators? Do we have a compatible CRD/API or not (and if so why?) The API seems to assume that everything is packaged in the container in advance, but I

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Holden Karau
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > >

[jira] [Created] (SPARK-45712) Provide a command line flag to override the log4j properties file

2023-10-27 Thread Holden Karau (Jira)
Holden Karau created SPARK-45712: Summary: Provide a command line flag to override the log4j properties file Key: SPARK-45712 URL: https://issues.apache.org/jira/browse/SPARK-45712 Project: Spark

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Affects Version/s: 4.0.0 (was: 3.3.0

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Issue Type: Improvement (was: Bug) > Spark history files backend currently depend on poll

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Description: The spark history server FS  currently depends on polling for loading history

[jira] [Updated] (SPARK-45563) Spark history files backend currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-45563: - Summary: Spark history files backend currently depend on polling for loading into the history

[jira] [Created] (SPARK-45563) Spark rolling history files currently depend on polling for loading into the history server

2023-10-16 Thread Holden Karau (Jira)
Holden Karau created SPARK-45563: Summary: Spark rolling history files currently depend on polling for loading into the history server Key: SPARK-45563 URL: https://issues.apache.org/jira/browse/SPARK-45563

[jira] [Resolved] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-10-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-44735. -- Fix Version/s: 4.0.0 Resolution: Fixed > Log a warning when inserting colu

[jira] [Assigned] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-10-13 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-44735: Assignee: Jia Fan > Log a warning when inserting columns with the same name by

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear

Re: Write Spark Connection client application in Go

2023-09-12 Thread Holden Karau
That’s so cool! Great work y’all :) On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote: > Hi Spark Friends, > > Anyone interested in using Golang to write Spark application? We created a > Spark > Connect Go Client library . > Would love to hear

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-07 Thread Holden Karau
+1 pip installing seems to function :) On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang wrote: > +1. > > On Thu, Sep 7, 2023 at 10:33 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Gengliang Wang >> *日期**: *2023年9月7日 星期四 12:53 >> *收件人**: *Yuanjian Li >> *抄送**: *Xiao Li ,

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-09-02 Thread Holden Karau
Can we delay the next RC cut until after Labor Day? On Sat, Sep 2, 2023 at 9:59 PM Yuanjian Li wrote: > Thank you for all the reports! > The vote has failed. I plan to cut RC4 in two days. > > @Dipayan Dev I quickly skimmed through the > corresponding ticket, and it doesn't seem to be a

[jira] [Created] (SPARK-44992) Add support for rack information from an environment variable

2023-08-28 Thread Holden Karau (Jira)
Holden Karau created SPARK-44992: Summary: Add support for rack information from an environment variable Key: SPARK-44992 URL: https://issues.apache.org/jira/browse/SPARK-44992 Project: Spark

Re: Elasticsearch support for Spark 3.x

2023-08-27 Thread Holden Karau
What’s the version of the ES connector you are using? On Sat, Aug 26, 2023 at 10:17 AM Dipayan Dev wrote: > Hi All, > > We're using Spark 2.4.x to write dataframe into the Elasticsearch index. > As we're upgrading to Spark 3.3.0, it throwing out error > Caused by:

[jira] [Created] (SPARK-44970) Spark History File Uploads Can Fail on S3

2023-08-25 Thread Holden Karau (Jira)
Holden Karau created SPARK-44970: Summary: Spark History File Uploads Can Fail on S3 Key: SPARK-44970 URL: https://issues.apache.org/jira/browse/SPARK-44970 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-44955) Add the option for dynamically marking containers for preemption based data

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44955: Summary: Add the option for dynamically marking containers for preemption based data Key: SPARK-44955 URL: https://issues.apache.org/jira/browse/SPARK-44955 Project

[jira] [Created] (SPARK-44954) Make DEA algorithms pluggable

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44954: Summary: Make DEA algorithms pluggable Key: SPARK-44954 URL: https://issues.apache.org/jira/browse/SPARK-44954 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2023-08-24 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44953: - Parent: SPARK-44951 Issue Type: Sub-task (was: Improvement) > Log a warn

[jira] [Created] (SPARK-44953) Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44953: Summary: Log a warning (or automatically disable) when shuffle tracking is enabled along side another DA supported mechanism Key: SPARK-44953 URL: https://issues.apache.org/jira

[jira] [Created] (SPARK-44951) Improve Spark Dynamic Allocation

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44951: Summary: Improve Spark Dynamic Allocation Key: SPARK-44951 URL: https://issues.apache.org/jira/browse/SPARK-44951 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-44950) Improve Spark Driver Launch Time

2023-08-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44950: Summary: Improve Spark Driver Launch Time Key: SPARK-44950 URL: https://issues.apache.org/jira/browse/SPARK-44950 Project: Spark Issue Type: Improvement

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Holden Karau
d >>> London >>> United Kingdom >>> >>> >>>view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>>

[jira] [Created] (SPARK-44769) Add SQL statement to create an empty array with a type

2023-08-10 Thread Holden Karau (Jira)
Holden Karau created SPARK-44769: Summary: Add SQL statement to create an empty array with a type Key: SPARK-44769 URL: https://issues.apache.org/jira/browse/SPARK-44769 Project: Spark Issue

[jira] [Updated] (SPARK-42035) Add a config flag to force exit on JDK major version mismatch

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42035: - Target Version/s: 4.0.0 > Add a config flag to force exit on JDK major version misma

[jira] [Updated] (SPARK-42261) K8s will not allocate more execs if there are any pending execs until next snapshot

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42261: - Target Version/s: 4.0.0 > K8s will not allocate more execs if there are any pending execs un

[jira] [Updated] (SPARK-44511) Allow insertInto to succeed with partion columns specified when they match those on the target table

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44511: - Target Version/s: 4.0.0 > Allow insertInto to succeed with partion columns specified w

[jira] [Updated] (SPARK-42361) Add an option to use external storage to distribute JAR set in cluster mode on Kube

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42361: - Target Version/s: 4.0.0 > Add an option to use external storage to distribute JAR

[jira] [Updated] (SPARK-42260) Log when the K8s Exec Pods Allocator Stalls

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42260: - Target Version/s: 4.0.0 > Log when the K8s Exec Pods Allocator Sta

[jira] [Commented] (SPARK-44727) Improve the error message for dynamic allocation conditions

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752496#comment-17752496 ] Holden Karau commented on SPARK-44727: -- Do you have more context [~chengpan] ? > Improve the er

[jira] [Updated] (SPARK-42035) Add a config flag to force exit on JDK major version mismatch

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-42035: - Description: JRE version mismatches can cause errors which are difficult to debug (potentially

[jira] [Updated] (SPARK-34337) Reject disk blocks when out of disk space

2023-08-09 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-34337: - Target Version/s: 4.0.0 > Reject disk blocks when out of disk sp

[jira] [Created] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-08-08 Thread Holden Karau (Jira)
Holden Karau created SPARK-44735: Summary: Log a warning when inserting columns with the same name by row that don't match up Key: SPARK-44735 URL: https://issues.apache.org/jira/browse/SPARK-44735

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
Mich Talebzadeh >>>> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> From what I have seen spark on a serverless cluster has hard up getting >>>> the driver going in a time

Re: ASF board report draft for August 2023

2023-08-08 Thread Holden Karau
Maybe add a link to the 4.0 JIRA where we are tracking the current plans for 4.0? On Tue, Aug 8, 2023 at 9:33 AM Dongjoon Hyun wrote: > Thank you, Matei. > > It looks good to me. > > Dongjoon > > On Mon, Aug 7, 2023 at 22:54 Matei Zaharia > wrote: > >> It’s time to send our quarterly report to

Re: Dynamic allocation does not deallocate executors

2023-08-08 Thread Holden Karau
ny > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-08-07 Thread Holden Karau
Oooh fascinating. I’m going on call this week so it will take me awhile but I do want to review this :) On Mon, Aug 7, 2023 at 5:30 PM Pavan Kotikalapudi wrote: > Hi Spark Dev, > > I have extended traditional DRA to work for structured streaming > use-case. > > Here is an initial Implementation

Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
Oh great point On Mon, Aug 7, 2023 at 2:23 PM bo yang wrote: > Thanks Holden for bringing this up! > > Maybe another thing to think about is how to make dynamic allocation more > friendly with Kubernetes and disaggregated shuffle storage? > > > > On Mon, Aug 7, 2023

[jira] [Commented] (SPARK-44050) Unable to Mount ConfigMap in Driver Pod - ConfigMap Creation Issue

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751796#comment-17751796 ] Holden Karau commented on SPARK-44050: -- Ah interesting, it sounds like the fix would be to retry

[jira] [Commented] (SPARK-44508) Add user guide for Python UDTFs

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751795#comment-17751795 ] Holden Karau commented on SPARK-44508: -- I'm not sure this should be a blocker. > Add user gu

Improving Dynamic Allocation Logic for Spark 4+

2023-08-07 Thread Holden Karau
So I wondering if there is interesting in revisiting some of how Spark is doing it's dynamica allocation for Spark 4+? Some things that I've been thinking about: - Advisory user input (e.g. a way to say after X is done I know I need Y where Y might be a bunch of GPU machines) - Configurable

Re: Dynamic allocation does not deallocate executors

2023-08-07 Thread Holden Karau
I think you need to set "spark.dynamicAllocation.shuffleTracking.enabled=true" to false. On Mon, Aug 7, 2023 at 2:50 AM Mich Talebzadeh wrote: > Yes I have seen cases where the driver gone but a couple of executors > hanging on. Sounds like a code issue. > > HTH > > Mich Talebzadeh, > Solutions

[jira] [Commented] (SPARK-24282) Add support for PMML export for the Standard Scaler Stage

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751758#comment-17751758 ] Holden Karau commented on SPARK-24282: -- I don't think were going to do this anymore. > Add supp

[jira] [Resolved] (SPARK-28740) Add support for building with bloop

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-28740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-28740. -- Resolution: Won't Fix > Add support for building with bl

[jira] [Commented] (SPARK-32111) Cleanup locks and docs in CoarseGrainedSchedulerBackend

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751754#comment-17751754 ] Holden Karau commented on SPARK-32111: -- I think this could be a good target for Spark 4.0

[jira] [Updated] (SPARK-32111) Cleanup locks and docs in CoarseGrainedSchedulerBackend

2023-08-07 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-32111: - Target Version/s: 4.0.0 Affects Version/s: 4.0.0 > Cleanup locks and d

[jira] [Updated] (SPARK-44578) Support pushing down BoundFunction in DSv2

2023-07-31 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau updated SPARK-44578: - Description: See [https://github.com/apache/iceberg/pull/7886#discussion_r1257537662

[jira] [Created] (SPARK-44592) Add history link to previous run on SS HA

2023-07-28 Thread Holden Karau (Jira)
Holden Karau created SPARK-44592: Summary: Add history link to previous run on SS HA Key: SPARK-44592 URL: https://issues.apache.org/jira/browse/SPARK-44592 Project: Spark Issue Type

[jira] [Created] (SPARK-44578) Support pushing down UDFs in DSv2

2023-07-27 Thread Holden Karau (Jira)
Holden Karau created SPARK-44578: Summary: Support pushing down UDFs in DSv2 Key: SPARK-44578 URL: https://issues.apache.org/jira/browse/SPARK-44578 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-37562) Add Spark History Server Links for Kubernetes & other CMs

2023-07-26 Thread Holden Karau (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747676#comment-17747676 ] Holden Karau commented on SPARK-37562: -- So (in theory) the cluster administrator has some base

[jira] [Created] (SPARK-44529) Add a flag to resolve docker tags to hashes at launch time

2023-07-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-44529: Summary: Add a flag to resolve docker tags to hashes at launch time Key: SPARK-44529 URL: https://issues.apache.org/jira/browse/SPARK-44529 Project: Spark

[jira] [Created] (SPARK-44511) Allow insertInto to succeed with partion columns specified when they match those on the target table

2023-07-21 Thread Holden Karau (Jira)
Holden Karau created SPARK-44511: Summary: Allow insertInto to succeed with partion columns specified when they match those on the target table Key: SPARK-44511 URL: https://issues.apache.org/jira/browse/SPARK

Re: [VOTE][SPIP] Python Data Source API

2023-07-07 Thread Holden Karau
+1 On Fri, Jul 7, 2023 at 9:55 AM huaxin gao wrote: > +1 > > On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh > wrote: > >> +1 for me >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >> >>view my Linkedin profile

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
? On Wed, Jun 21, 2023 at 8:30 AM Reynold Xin wrote: > +1 > > This is a great idea. > > > On Wed, Jun 21, 2023 at 8:29 AM, Holden Karau > wrote: > >> I’d like to start with a +1, better Python testing tools integrated into >> the project make sense. >> >

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Holden Karau
I’d like to start with a +1, better Python testing tools integrated into the project make sense. On Wed, Jun 21, 2023 at 8:11 AM Amanda Liu wrote: > Hi all, > > I'd like to start the vote for SPIP: PySpark Test Framework. > > The high-level summary for the SPIP is that it proposes an official

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Holden Karau
ut if it only entails changing to >>> Scala 2.13 and dropping support for JDK 8, then we could also just release >>> a month after 3.5. >>> >>> How about we do this? We get 3.5 released, and afterwards we do a couple >>> of meetings where we build this road

Re: Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
Yup I think buidling consensus on what goes in 4.X is something we’ll need to do. On Mon, Jun 12, 2023 at 11:56 AM Dongjoon Hyun wrote: > Thank you for sharing those. I'm also interested in taking advantage of > it. Also, I hope `spark-upgrade` can help us in line with Spark 4.0. > > However,

Gauging interest in: ScalaFix + Scala Steward for Spark 4.0

2023-06-12 Thread Holden Karau
My self and a few folks have been working on a spark-upgrade project (focused on getting folks onto current versions of Spark). Since it looks like were starting the discussion around Spark 4 I was thinking now could be a good time for us to consider if we want to try and integrate auto-upgrade

Re: [VOTE] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-12 Thread Holden Karau
-0 I'd like to see more of a doc around what we're planning on for a 4.0 before we pick a target release date etc. (feels like cart before the horse). But it's a weak preference. On Mon, Jun 12, 2023 at 11:24 AM Xiao Li wrote: > Thanks for starting the vote. > > I do have a concern about the

Re: JDK version support policy?

2023-06-07 Thread Holden Karau
So JDK 11 is still supported in open JDK until 2026, I'm not sure if we're going to see enough folks moving to JRE17 by the Spark 4 release unless we have a strong benefit from dropping 11 support I'd be inclined to keep it. On Tue, Jun 6, 2023 at 9:08 PM Dongjoon Hyun wrote: > I'm also +1 on

Re: ASF policy violation and Scala version issues

2023-06-06 Thread Holden Karau
So I think if the Spark PMC wants to ask Databricks something that could be reasonable (although I'm a little fuzzy as to the ask), but that conversation might belong on private@ (I could be wrong of course). On Tue, Jun 6, 2023 at 3:29 AM Mich Talebzadeh wrote: > I concur with you Sean. > > If

Re: Is there any way to set the parallelism of operators like group by, join?

2023-04-16 Thread Holden Karau
To a (small) degree Sparks “new” AQE might be able to help depending on what kind of operations Beam is compiling it down to. Have you tried setting spark.sql.adaptive.enabled & spark.sql.adaptive.coalescePartitions.enabled On Mon, Apr 17, 2023 at 10:34 AM Reuven Lax via user wrote: > I see.

Re: Slack for Spark Community: Merging various threads

2023-04-07 Thread Holden Karau
I think there was some concern around how to make any sync channel show up in logs / index / search results? On Fri, Apr 7, 2023 at 9:41 AM Dongjoon Hyun wrote: > Thank you, All. > > I'm very satisfied with the focused and right questions for the real > issues by removing irrelevant claims. :)

  1   2   3   4   5   6   7   8   9   10   >