Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-02 Thread Tom Graves
+1 Tom On Sunday, March 31, 2024 at 10:09:28 PM CDT, Ruifeng Zheng wrote: +1 On Mon, Apr 1, 2024 at 10:06 AM Haejoon Lee wrote: +1 On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon wrote: Hi all, I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark Connect) 

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Tom Graves
Similar as others,  will be interested in working out api's and details but overall in favor of it. +1 Tom Graves On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan wrote:   I am supportive of the proposal - this is a step in the right direction !Additional metadata

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Tom Graves
It is interesting. I think there are definitely some discussion points around this.  reliability vs performance is always a trade off and its great it doesn't fail but if it doesn't meet someone's SLA now that could be as bad if its hard to figure out why.   I think if something like this

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Tom Graves
+1 for a 3.3.4 EOL Release. Thanks Dongjoon. Tom On Friday, December 1, 2023 at 02:48:22 PM CST, Dongjoon Hyun wrote: Hi, All. Since the Apache Spark 3.3.0 RC6 vote passed on Jun 14, 2022, branch-3.3 has been maintained and served well until now. -

Re: Apache Spark 3.2.4 EOL Release?

2023-04-05 Thread Tom Graves
+1 Tom On Tuesday, April 4, 2023 at 12:25:13 PM CDT, Dongjoon Hyun wrote: Hi, All. Since Apache Spark 3.2.0 passed RC7 vote on October 12, 2021, branch-3.2 has been maintained and served well until now. - https://github.com/apache/spark/releases/tag/v3.2.0 (tagged on Oct 6, 2021) -

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Tom Graves
It looks like there are still blockers open, we need to make sure they are addressed before doing a release: https://issues.apache.org/jira/browse/SPARK-41793 https://issues.apache.org/jira/browse/SPARK-42444 TomOn Tuesday, February 21, 2023 at 10:35:45 PM CST, Xinrong Meng wrote:

Re: Depolying stage-level scheduling for Spark SQL

2022-10-03 Thread Tom Graves
experienced with the AQE part, would you list more potential challenges it may lead to?  Thanks in advance and I would really appreciate it if you could give us more feedback! Cheers, ChenghaoOn Sep 30, 2022, 4:22 PM +0200, Tom Graves , wrote: see the original SPIP for as to why we only

Re: Depolying stage-level scheduling for Spark SQL

2022-09-30 Thread Tom Graves
see the original SPIP for as to why we only support RDD:  https://issues.apache.org/jira/browse/SPARK-27495 The main problem is exactly what you are referring to. The RDD level is not exposed to the user when using SQL or Dataframe API. This is on purpose and user shouldn't have to know

Re: How to set platform-level defaults for array-like configs?

2022-08-11 Thread Tom Graves
A few years ago when I was doing more deployment management I kicked around the idea of having different types of configs or different ways to specify the configs.  Though one of the problems at the time was actually with users specifying a properties file and not picking up the

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Tom Graves
+1 Tom On Thursday, June 9, 2022, 11:27:50 PM CDT, Maxim Gekk wrote: Please vote on releasing the following candidate as Apache Spark version 3.3.0. The vote is open until 11:59pm Pacific time June 14th and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-27 Thread Tom Graves
+1. Ran through internal tests. Tom Graves On Tuesday, May 24, 2022, 12:13:56 PM CDT, Maxim Gekk wrote: Please vote on releasing the following candidate as Apache Spark version 3.3.0. The vote is open until 11:59pm Pacific time May 27th and passes if a majority +1 PMC votes are cast

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Tom Graves
Is there going to be an rc2? I thought there were a couple of issue mentioned in the thread. Tom On Tuesday, May 10, 2022, 11:53:36 AM CDT, Maxim Gekk wrote: Hi All, Today is the last day for voting. Please, test the RC1 and vote. Maxim Gekk Software Engineer Databricks, Inc. On

Re: Apache Spark 3.3 Release

2022-03-21 Thread Tom Graves
s,Max Gekk On Thu, Mar 17, 2022 at 4:59 PM Tom Graves wrote: Is the feature freeze target date March 22nd then?  I saw a few dates thrown around want to confirm what we landed on  I am trying to get the following improvements finished review and in, if concerns with either, let me know:- [SPARK-

Re: Apache Spark 3.3 Release

2022-03-17 Thread Tom Graves
thub.com/apache/spark/pull/35262) >> >> It's already reviewed and approved. >> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves >> wrote: >> > >> > It looks like the version hasn't been updated on master and still shows >> > 3.3.0-SNAPSHOT,

Re: Apache Spark 3.3 Release

2022-03-16 Thread Tom Graves
It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.  Tom On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk wrote: Hi All, I have created the branch for Spark 3.3:

Re: [DISCUSSION] SPIP: Support Volcano/Alternative Schedulers Proposal

2021-11-30 Thread Tom Graves
Great to have other integrations and improved K8s support.  Left some comments/questions in the design doc.  TomOn Tuesday, November 30, 2021, 02:46:42 AM CST, Yikun Jiang wrote: Hey everyone, I'd like to start a discussion on "Support Volcano/Alternative Schedulers Proposal". This

Re: Update Spark 3.3 release window?

2021-10-28 Thread Tom Graves
+1 for updating, mid march sounds good.  I'm also fine with EOL 2.x. Tom  On Thursday, October 28, 2021, 09:37:00 AM CDT, Mridul Muralidharan wrote: +1 to EOL 2.xMid march sounds like a good placeholder for 3.3. Regards,Mridul  On Wed, Oct 27, 2021 at 10:38 PM Sean Owen wrote: Seems

Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-17 Thread Tom Graves
Thanks, I didn't see that one. Tom On Friday, September 17, 2021, 10:45:36 AM CDT, Gengliang Wang wrote: Hi Tom, I will cut RC3 right after SPARK-36772 is resolved. Thanks,Gengliang On Fri, Sep 17, 2021 at 10:03 PM Tom Graves wrote: Hey folks,   just curious what the status

Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-17 Thread Tom Graves
Hey folks,   just curious what the status was on doing an rc3?  I didn't see any blockers left since it looks like parquet change got merged. Thanks,Tom On Thursday, September 9, 2021, 12:27:58 PM CDT, Mridul Muralidharan wrote: I have filed a blocker, SPARK-36705 which will need to

Re: -1s on committed but not released code?

2021-08-20 Thread Tom Graves
So personally I think its fine to comment post merge but I think an issue should also be filed (that might just be me though).  This change was reviewed and committed so if someone found a problem with it, then it should be officially tracked as a bug.  I would think a -1 on a already

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-18 Thread Tom Graves
+1  Ran through some internal tests. Thanks,Tom On Thursday, June 17, 2021, 05:11:21 AM CDT, Yi Wu wrote: Please vote on releasing the following candidate as Apache Spark version 3.0.3. The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC votes are cast, with a

Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-24 Thread Tom Graves
so repartition() would look at some other config (spark.sql.adaptive.advisoryPartitionSizeInBytes) to decide the size to use to partition it on then?  Does it require AQE?  If so what does a repartition() call do if AQE is not enabled? this is essentially a new api so would repartitionBySize

Re: Please take a look at the draft of the Spark 3.1.1 release notes

2021-03-01 Thread Tom Graves
Thanks Hyukjin, overall they look good to me. TomOn Saturday, February 27, 2021, 05:00:42 PM CST, Jungtaek Lim wrote: Thanks Hyukjin! I've only looked into the SS part, and added a comment. Otherwise it looks great!  On Sat, Feb 27, 2021 at 7:12 PM Dongjoon Hyun wrote: Thank you for

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Tom Graves
2월 3일 (수) 오전 2:36, Tom Graves 님이 작성: Just curious if we have an update on next rc? is there a jira for the tpcds issue? Thanks,Tom On Wednesday, January 27, 2021, 05:46:27 PM CST, Hyukjin Kwon wrote: Just to share the current status, most of the known issues were resolved. Let me

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Tom Graves
fixed at https://github.com/apache/spark/pull/31287 On Fri, Jan 22, 2021 at 4:34 AM Tom Graves wrote: +1 built from tarball, verified sha and regular CI and tests all pass. Tom On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon wrote: Please vote on releasing the following

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Tom Graves
+1 built from tarball, verified sha and regular CI and tests all pass. Tom On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon wrote: Please vote on releasing the following candidate as Apache Spark version 3.1.1. The vote is open until January 22nd 4PM PST and passes if a

Re: Removing references to Master

2021-01-19 Thread Tom Graves
thanks for the interest, I haven't had time to work on replacing Master, hopefully for the next release but time dependent, if you follow the lira -  https://issues.apache.org/jira/browse/SPARK-32333 - I will post there when I start or if someone else picks it up should see activity there. Tom

Re: [VOTE] Release Spark 3.1.0 (RC1)

2021-01-06 Thread Tom Graves
I think it makes sense to wait and see what they say on INFRA-21266.   In the mean time hopefully people can start testing it and if no other problems found and vote passes can stay published.  It seems like the 2 issues above wouldn't be blockers in my opinion and could be handled in a 3.1.1

Re: [build system] WE'RE LIVE!

2020-12-04 Thread Tom Graves
thanks Shane and folks for great work. Not sure if this is at all related but I noticed the spark master deploy job hasn't been running and the last one Dec 2nd failed:https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/3186/ Not sure if this is

Re: Spark branch-3.1

2020-12-04 Thread Tom Graves
Can we update the version number on the master branch ? its still 3.1.0-SNAPSHOT Thanks,Tom  On Friday, December 4, 2020, 04:54:12 AM CST, Hyukjin Kwon wrote: Hi all, It’s 4th PDT and branch-3.1 is cut out now as planned. Mid Dec 2020 QA period. Focus on bug fixes, tests,

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-10 Thread Tom Graves
+1 since its a correctness issue, I think its ok to change the behavior to make sure the user is aware of it and let them decide. Tom On Saturday, November 7, 2020, 01:00:11 AM CST, Liang-Chi Hsieh wrote: Hi devs, In Spark structured streaming, chained stateful operators possibly

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Tom Graves
+1 Tom On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan wrote: Hi, I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based shuffle to improve shuffle efficiency.Please take a look at: - SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602

Re: [VOTE] Release Spark 3.0.1 (RC3)

2020-08-31 Thread Tom Graves
+1 Tom On Friday, August 28, 2020, 09:02:31 AM CDT, 郑瑞峰 wrote: Please vote on releasing the following candidate as Apache Spark version 3.0.1. The vote is open until Sep 2nd at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this

Re: Renaming blacklisting feature input

2020-08-25 Thread Tom Graves
Any other feedback here?  The couple I've heard preferred in various conversations are excludeList and blockList.  If not I'll just make proposal on jira and continue discussion there and anyone interested can watch this jira. Thanks,Tom On Tuesday, August 4, 2020, 09:19:01 AM CDT, Tom

Re: Removing references to Master

2020-08-25 Thread Tom Graves
backwards compatible initially is important since we missed the boat on Spark 3. I like the Controller/Leader one since I think that does a good job of reflecting the codes role. On Tue, Aug 4, 2020 at 7:01 AM Tom Graves wrote: Hey everyone, I filed jira https://issues.apache.org/jira/browse/S

Re: 回复: [DISCUSS] Apache Spark 3.0.1 Release

2020-08-25 Thread Tom Graves
Hey, I'm just curious what the status of the 3.0.1 release is?  Do we have some blockers we are waiting on? Thanks,Tom On Sunday, August 16, 2020, 09:07:44 PM CDT, ruifengz wrote: Thanks for letting us know this issue. On 8/16/20 11:31 PM, Takeshi Yamamuro wrote: I've

Re: [VOTE] Release Spark 2.4.7 (RC1)

2020-08-21 Thread Tom Graves
There is a correctness issue with caching that should go into this if possible: https://github.com/apache/spark/pull/29506 Tom On Wednesday, August 19, 2020, 11:18:37 AM CDT, Wenchen Fan wrote: I think so. I don't see other bug reports for 2.4. On Thu, Aug 20, 2020 at 12:11 AM

Renaming blacklisting feature input

2020-08-04 Thread Tom Graves
Hey Folks, We have jira https://issues.apache.org/jira/browse/SPARK-32037 to rename the blacklisting feature.  It would be nice to come to a consensus on what we want to call that.It doesn't looks like we have any references to whitelist other then from other components.  There is some

Removing references to Master

2020-08-04 Thread Tom Graves
Hey everyone, I filed jira https://issues.apache.org/jira/browse/SPARK-32333 to remove references to Master.  I realize this is a bigger change then the slave jira but I wanted to get folks input on if they are ok with making the change and if so we would need to pick a name to use instead.  I

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-24 Thread Tom Graves
+1 Tom On Tuesday, July 21, 2020, 03:35:18 PM CDT, Holden Karau wrote: Hi Spark Developers, There has been a rather active discussion regarding the specific vetoes that occured during Spark 3. From that I believe we are now mostly in agreement that it would be best to clarify our

Re: [VOTE] Decommissioning SPIP

2020-07-06 Thread Tom Graves
+1 Tom On Wednesday, July 1, 2020, 08:05:47 PM CDT, Holden Karau wrote: Hi Spark Devs, I think discussion has settled on the SPIP doc at  https://docs.google.com/document/d/1EOei24ZpVvR7_w0BwBjOnrWRy4k-qTdIlx60FsHZSHA/edit?usp=sharing  , design doc at 

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-30 Thread Tom Graves
Stage Level Scheduling -  https://issues.apache.org/jira/browse/SPARK-27495 TomOn Monday, June 29, 2020, 11:07:18 AM CDT, Dongjoon Hyun wrote: Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations.

Re: [vote] Apache Spark 3.0 RC3

2020-06-17 Thread Tom Graves
(binding) Mridul Muralidharan (binding) Takeshi Yamamuro Maxim Gekk Matei Zaharia (binding) Jungtaek Lim Denny Lee Russell Spitzer Dongjoon Hyun (binding) DB Tsai (binding) Michael Armbrust (binding) Tom Graves (binding) Bryan Cutler Huaxin Gao Jiaxin Shan Xingbo Jiang Xiao Li (binding) Hyukjin Kwon

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Tom Graves
+1 Tom On Saturday, June 6, 2020, 03:09:09 PM CDT, Reynold Xin wrote: Please vote on releasing the following candidate as Apache Spark version 3.0.0. The vote is open until [DUE DAY] and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Tom Graves
 +1 Tom On Sunday, May 31, 2020, 06:47:09 PM CDT, Holden Karau wrote: Please vote on releasing the following candidate as Apache Spark version 2.4.6. The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-18 Thread Tom Graves
+1. Tom On Monday, May 18, 2020, 08:05:24 AM CDT, Wenchen Fan wrote: +1, no known blockers. On Mon, May 18, 2020 at 12:49 AM DB Tsai wrote: +1 as well. Thanks. On Sun, May 17, 2020 at 7:39 AM Sean Owen wrote: +1 , same response as to the last RC. This looks like it includes the

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Tom Graves
a friendly because > > using Java instance is already documented in the official Scala > > documentation. > > Users still need to search if we have Java specific methods for *some* > > APIs. > > > > > > > > On Thu, 30 Apr 2020, 00:06 Tom Graves, wrote:

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Tom Graves
t of this thread is to make a call rather then defer to the > > > >>> future. > > > >>> > > > >>> On Mon, 27 Apr 2020, 23:15 Wenchen Fan, wrote: > > > >>> > > > >>>> IIUC We are moving aw

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Tom Graves
I agree a general guidance is good so we keep consistent in the apis. I don't necessarily agree that 4 is the best solution though.  I agree its nice to have one api, but it is less friendly for the scala side.  Searching for the equivalent Java api shouldn't be hard as it should be very close

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-10 Thread Tom Graves
Overall makes sense to me, but have same questions as others on the thread. Is this only applying to stable apis? How are we going to apply to 3.0? the way I read this proposal isn't really saying we can't break api's on major releases, its just saying spend more time making sure its worth

Re: GitHub action permissions

2020-02-28 Thread Tom Graves
No, I couldn't see that button, looks like the process of syncing in gitbox didn't finish with my accounts.  I finished that and its working now. Thanks,Tom On Friday, February 28, 2020, 09:39:12 AM CST, Dongjoon Hyun wrote: Hi, Thomas. If you log-in with a GitHub account registered

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-27 Thread Tom Graves
In general +1 I think these are good guidelines and making it easier to upgrade is beneficial to everyone.  The decision needs to happen at api/config change time, otherwise the deprecated warning has no purpose if we are never going to remove them.That said we still need to be able to remove

Re: Apache Spark Docker image repository

2020-02-06 Thread Tom Graves
When discussions of docker have occurred in the past - mostly related to k8s - there is a lot of discussion about what is the right image to publish, as well as making sure Apache is ok with it. Apache official release is the source code so we may need to make sure to have disclaimer and we

Re: `Target Version` management on correctness/data-loss Issues

2020-01-28 Thread Tom Graves
nly way to detect the community decision change. Bests,Dongjoon. On Mon, Jan 27, 2020 at 11:12 AM Tom Graves wrote: thanks for bringing this up. A) I'm not clear on this one as to why affected and target would be different initially, other then the reasons target versions != fixed versions.  Is

Re: `Target Version` management on correctness/data-loss Issues

2020-01-27 Thread Tom Graves
thanks for bringing this up. A) I'm not clear on this one as to why affected and target would be different initially, other then the reasons target versions != fixed versions.  Is the intention here just to say, if its already been discussed and came to consensus not needed in certain release?

Re: Correctness and data loss issues

2020-01-22 Thread Tom Graves
on big endian Without the official Apache Spark 2.4.5 binaries,there is no official way to deliver the 9 correctness fixes in (2) to the users. In addition, usually, the correctness fixes are independent to each other. Bests, Dongjoon. On Wed, Jan 22, 2020 at 7:02 AM Tom Graves wrote: I agree, I

Re: Adding Maven Central mirror from Google to the build?

2020-01-22 Thread Tom Graves
+1 for proposal. Tom On Tuesday, January 21, 2020, 04:37:04 PM CST, Sean Owen wrote: See https://github.com/apache/spark/pull/27307 for some context. We've had to add, in at least one place, some settings to resolve artifacts from a mirror besides Maven Central to work around some

Re: Correctness and data loss issues

2020-01-22 Thread Tom Graves
I agree, I think we just need to go through all of them and individual assess each one. If it's really a correctness issue we should hold 3.0 for it. On the 2.4 release I didn't see an explanation on   https://issues.apache.org/jira/browse/SPARK-26154 why it can't be back ported, I think in the

Re: PR lint-scala jobs failing with http error

2020-01-16 Thread Tom Graves
jobs.  thanks! shane (at a conference) On Thu, Jan 16, 2020 at 11:16 AM Tom Graves wrote: > > I'm seeing the scala-lint jobs fail on the pull request builds with: > > [error] [FATAL] Non-resolvable parent POM: Could not transfer artifact > org.apache:apache:pom:18 from/to ce

PR lint-scala jobs failing with http error

2020-01-16 Thread Tom Graves
I'm seeing the scala-lint jobs fail on the pull request builds with: [error] [FATAL] Non-resolvable parent POM: Could not transfer artifact org.apache:apache:pom:18 from/to central ( http://repo.maven.apache.org/maven2): Error transferring file: Server returned HTTP response code: 501 for URL:

Reviewers for Stage level Scheduling prs

2020-01-08 Thread Tom Graves
ResourceProfile they are created with by tgravescs · Pull Request #26682 · apache/spark Regards,Tom Graves

Re: Spark 3.0 preview release 2?

2019-12-10 Thread Tom Graves
+1 for another preview Tom On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li wrote: I got many great feedbacks from the community about the recent 3.0 preview  release. Since the last 3.0 preview release, we already have 353 commits

Re: Build customized resource manager

2019-11-08 Thread Tom Graves
I don't know if it all works but some work was done to make cluster manager pluggable, see SPARK-13904. Tom On Wednesday, November 6, 2019, 07:22:59 PM CST, Klaus Ma wrote: Any suggestions? - Klaus On Mon, Nov 4, 2019 at 5:04 PM Klaus Ma wrote: Hi team, AFAIK, we built

maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Tom Graves
It looks like maven 3.6.1 was removed from the repo - see SPARK-28960.  It looks like they pushed 3.6.2,  but I don't see any release notes on the maven page for it 3.6.2 Seems like we had this happen before, can't remember if it was maven or something else, anyone remember or know if they are

Re: DISCUSS [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-08-26 Thread Tom Graves
Bumping this up. I'm guessing people haven't had time to review, it would be great to get feedback on this. Thanks,Tom On Tuesday, August 6, 2019, 2:27:49 PM CDT, Tom Graves wrote: Hey everyone, I have been working on coming up with a proposal for supporting stage level resource

DISCUSS [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-08-06 Thread Tom Graves
Hey everyone, I have been working on coming up with a proposal for supporting stage level resource configuration and scheduling.  The basic idea is to allow the user to specify executor and task resource requirements for each stage to allow the user to control the resources required at a finer

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-21 Thread Tom Graves
+1 (binding) I haven't looked at the low level api, but like the idea and approach to get it started. Tom On Tuesday, June 18, 2019, 10:40:34 PM CDT, Guo, Chenzhao wrote: #yiv1391836063 #yiv1391836063 -- _filtered #yiv1391836063 {font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;}

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-29 Thread Tom Graves
of discussions we have come down to just the public API. If the community thinks a new set of public API is maintainable, I don’t see any problem with that. From: Tom Graves Sent: Sunday, May 26, 2019 8:22:59 AM To: hol...@pigscanfly.ca; Reynold Xin Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
uld really give concrete ETL cases to prove that it is importantfor us to do so. On Mon, Apr 22, 2019 at 8:27 AM Tom Graves wrote: Based on there is still discussion and Spark Summit is this week, I'mgoing to extend the vote til Friday the 26th. Tom On Monday, April 22, 2019, 8:44:00 AM CDT, B

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
> > Processing Support > > > >  > > > > + (non-binding) > > > > Sent from my iPhone > > > > Pardon the dumb thumb typos :) > > > > > > On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote: > > > > +1 (non-b

[VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Tom Graves
... Thanks!Tom Graves

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Tom Graves
?  then I will vote +0. On Tue, Mar 5, 2019 at 8:25 AM Tom Graves wrote: So to me most of the questions here are implementation/design questions, I've had this issue in the past with SPIP's where I expected to have more high level design details but was basically told that belongs in the design jira

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Tom Graves
to extend the existing resource allocation mechanisms to handle domain-specific resources, but it does feel to me like we should at least be considering doing that deeper redesign.   On Thu, Mar 21, 2019 at 7:33 AM Tom Graves wrote: Tthe proposal here is that all your resources are static

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Tom Graves
orm, in some release? and (2) is it *possible* to do this in a safe way?  then I will vote +0. On Tue, Mar 5, 2019 at 8:25 AM Tom Graves wrote: So to me most of the questions here are implementation/design questions, I've had this issue in the past with SPIP's where I expected to have more high l

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Tom Graves
So to me most of the questions here are implementation/design questions, I've had this issue in the past with SPIP's where I expected to have more high level design details but was basically told that belongs in the design jira follow on. This makes me think we need to revisit what a SPIP

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Tom Graves
+1 for the SPIP. Tom On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang wrote: Hi all, I want to call for a vote of SPARK-24615. It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The 

Re: Jenkins commands?

2019-02-07 Thread Tom Graves
uot; \    -Pyarn \    -Phive \    -Phive-thriftserver \    -Pkinesis-asl \    -Pmesos \    --fail-at-end \    test  there some some specific rise/amp-lab variables involved (grep -r AMPLAB spark/*) for the build system, but this should cover it. On Wed, Feb 6, 2019 at 3:55 PM Tom Graves wrote: I'm c

Jenkins commands?

2019-02-06 Thread Tom Graves
I'm curious if we have it documented anywhere or if there is a good place to look, what exact commands Spark runs in the pull request builds and the QA builds?   Thanks,Tom

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Tom Graves
This makes sense to me and was going to propose something similar in order to be able to use the kafka acls more effectively as well, can you file a jira for it? Tom On Friday, November 9, 2018, 2:26:12 AM CST, Anastasios Zouzias wrote: Hi all, I run in the following situation with

Re: Test and support only LTS JDK release?

2018-11-07 Thread Tom Graves
+1 seems reasonable at this point. Tom On Tuesday, November 6, 2018, 1:24:16 PM CST, DB Tsai wrote: Given Oracle's new 6-month release model, I feel the only realistic option is to only test and support JDK such as JDK 11 LTS and future LTS release. I would like to have a discussion

Re: What's a blocker?

2018-10-25 Thread Tom Graves
aybe it's reasonable to draw the "must" vs "should" line between them. On Thu, Oct 25, 2018 at 8:51 AM Tom Graves wrote: So just to clarify a few things in case people didn't read the entire thread in the PR, the discussion is what is the criteria for a blocker and reall

Re: What's a blocker?

2018-10-25 Thread Tom Graves
So just to clarify a few things in case people didn't read the entire thread in the PR, the discussion is what is the criteria for a blocker and really my concerns are what people are using as criteria for not marking a jira as a blocker. The only thing we have documented to mark a jira as a

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-17 Thread Tom Graves
as blocker by default.  There is also a label to mark the jira as having something needing to go into the release-notes. Tom On Tuesday, August 14, 2018, 3:32:27 PM CDT, Imran Rashid wrote: +1 on what we should do. On Mon, Aug 13, 2018 at 3:06 PM, Tom Graves wrote: > I mean, w

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
t do it, if it's to an active release branch (see below). Anything that important has to outweigh most any other concern, like behavior changes. On Mon, Aug 13, 2018 at 11:08 AM Tom Graves wrote: I'm not really sure what you mean by this, this proposal is to introduce a process for this type

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
n important question as an aside, one we haven't answered: when does a branch go inactive? I am sure 2.0.x is inactive, de facto, along with all 1.x. I think 2.1.x is inactive too. Should we put any rough guidance in place? a branch is maintained for 12-18 months? On Mon, Aug 13, 2018 at 8:45

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Tom Graves
I agree with Imran, we need to fix SPARK-23243 and any correctness issues for that matter. Tom On Wednesday, August 8, 2018, 9:06:43 AM CDT, Imran Rashid wrote: On Tue, Aug 7, 2018 at 8:39 AM, Wenchen Fan wrote: SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect

[DISCUSS] Handling correctness/data loss jiras

2018-08-13 Thread Tom Graves
se Thanks,Tom Graves

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-07 Thread Tom Graves
I would like to get clarification on our avro compatibility story before the release.  anyone interested please look at -  https://issues.apache.org/jira/browse/SPARK-24924 . I probably should have filed a separate jira and can if we don't resolve via discussion there. Tom  On Tuesday,

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-30 Thread Tom Graves
Shouldn't this be a discuss thread?   I'm also happy to see more release managers and agree the time is getting close, but we should see what features are in progress and see how close things are and propose a date based on that.  Cutting a branch to soon just creates more work for committers

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-20 Thread Tom Graves
fyi, I merged in a couple jira that were critical (and I thought would be good to include in the next release) that if we spin another RC will get included, we should update the jira SPARK-24755 and SPARK-24677, if anyone disagrees we could back those out but I think they would be good to

[ANNOUNCE] Apache Spark 2.2.2

2018-07-10 Thread Tom Graves
We are happy to announce the availability of Spark 2.2.2! Apache Spark 2.2.2 is a maintenance release, based on the branch-2.2 maintenance branch of Spark. We strongly recommend all 2.2.x users to upgrade to this stable release. The release notes are available at 

[RESULT] [VOTE] Spark 2.2.2 (RC2)

2018-07-02 Thread Tom Graves
The vote passes. Thanks to all who helped with the release! I'll start publishing everything tomorrow, and an announcement will be sent when artifacts have propagated to the mirrors (probably early next week). +1 (* = binding): - Marcelo Vanzin * - Sean Owen * - Tom Graves * - Holder Kaurau

Re: [VOTE] Spark 2.2.2 (RC2)

2018-07-02 Thread Tom Graves
2018년 6월 28일 (목) 오전 8:42, Sean Owen 님이 작성: +1 from me too. On Wed, Jun 27, 2018 at 3:31 PM Tom Graves wrote: Please vote on releasing the following candidate as Apache Spark version 2.2.2. The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a majority +1 PMC vote

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves
fix (in time) for 2.1.2? http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-1-2-RC2-tt22540.html#a22555 Since it isn’t a regression I’d say +1 from me. From: Tom Graves Sent: Thursday, June 28, 2018 6:56:16 AM To: Marcelo Vanzin; Felix Cheung Cc: dev Subject: Re: [VOTE] Spark

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves
: Yes, this is broken with newer version of R. We check explicitly for warning for the R check which should fail the test run. From: Marcelo Vanzin Sent: Wednesday, June 27, 2018 6:55 PM To: Felix Cheung Cc: Marcelo Vanzin; Tom Graves; dev Subject: Re: [VOTE] Spark 2.1.3 (RC2) Not sure I

[VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Tom Graves
That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. -- Tom Graves

Re: Time for 2.1.3

2018-06-15 Thread Tom Graves
+1 for doing a 2.1.3 release.   Tom On Wednesday, June 13, 2018, 7:28:26 AM CDT, Marco Gaido wrote: Yes, you're right Herman. Sorry, my bad. Thanks.Marco 2018-06-13 14:01 GMT+02:00 Herman van Hövell tot Westerflier : Isn't this only a problem with Spark 2.3.x? On Wed, Jun 13, 2018 at

Time for 2.2.2 release

2018-06-06 Thread Tom Graves
(by replying here or updating the bug in Jira), otherwise I'm volunteering to prepare the first RC soon-ish (by early next week since Spark Summit is this week). Thanks!Tom Graves

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-01 Thread Tom Graves
Testing with spark 2.3 and I see a difference in the sql coalesce talking to hive vs spark 2.2. It seems spark 2.3 ignores the coalesce. Query:spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= '20170301' AND dt <= '20170331' AND something IS NOT

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-06 Thread Tom Graves
+1 for the idea and feature, but I think the design is definitely lacking detail on the internal changes needed and how the execution pieces work and the communication.  Are you planning on posting more of those details or were you just planning on discussing in PR? Tom On Wednesday,

  1   2   >