Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
- builds FYI, cc'ing Spark dev was dropped during the discussion. If you haven't subscribed to builds@a.g, you have seen the partial discussions only. Please subscribe bui...@apache.org mailing list to participate in the discussion further. 2021년 4월 8일 (목) 오후 1:50, Wenchen Fan 님이 작성: > > for

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Wenchen Fan
> for example, having sub-groups where each group shares the resources - currently one GitHub organisation shares all resources across the projects. That's a good idea. We do need to thank Github to give free resources to ASF projects, but it's better if we can make it a business: we allow

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Liang-Chi Hsieh
I'm working on the fix for master. I think the fix is the same for 2.4. Okay. So I think we are in favor of RC2 and RC1 is dropped. Then I will make the fix merged first and then prepare RC2. Thank you. Liang-Chi Mridul Muralidharan wrote > Do we have a fix for this in 3.x/master which can

Re: [DISCUSS] Build error message guideline

2021-04-07 Thread Hyukjin Kwon
LGTM (I took a look, and had some offline discussions w/ some corrections before it came out) 2021년 4월 8일 (목) 오전 5:28, Karen 님이 작성: > Hi all, > > As discussed in SPIP: Standardize Exception Messages in Spark ( >

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Mridul Muralidharan
Do we have a fix for this in 3.x/master which can be backported without too much surrounding change ? Given we are expecting 2.4.7 to probably be the last release for 2.4, if we can fix it, that would be great. Regards, Mridul On Wed, Apr 7, 2021 at 9:31 PM Liang-Chi Hsieh wrote: > Thanks for

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Liang-Chi Hsieh
Thanks for voting. After I started running the release script to cut RC1 for a while, I found a nested column pruning bug SPARK-34963, and unfortunately it exists in 2.4.7 too. As RC1 is cut, so I continue this voting. The bug looks corner case to me and it is not reported yet since we support

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Takeshi Yamamuro
Thanks for driving this, Liang-Chi~ IIUC there is no critical issue in the SQL part, so it looks fine. +1 (non-binding) On Thu, Apr 8, 2021 at 11:20 AM Wenchen Fan wrote: > +1 > > On Thu, Apr 8, 2021 at 9:24 AM Sean Owen wrote: > >> Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with

Re: Big Broadcast Hash Join with Dynamic Partition Pruning gives wrong results

2021-04-07 Thread Wenchen Fan
Hi Tomas, thanks for reporting this bug! Is it possible to share your dataset so that other people can reproduce and debug it? On Thu, Apr 8, 2021 at 7:52 AM Tomas Bartalos wrote: > when I try to do a Broadcast Hash Join on a bigger table (6Mil rows) I get > an incorrect result of 0 rows. > >

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Wenchen Fan
+1 On Thu, Apr 8, 2021 at 9:24 AM Sean Owen wrote: > Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with about all > profiles enabled. > I still get an odd failure in the Hive versions suite, but I keep seeing > that in my env and think it's something odd about my setup. > +1 >

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Sean Owen
Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with about all profiles enabled. I still get an odd failure in the Hive versions suite, but I keep seeing that in my env and think it's something odd about my setup. +1

Big Broadcast Hash Join with Dynamic Partition Pruning gives wrong results

2021-04-07 Thread Tomas Bartalos
when I try to do a Broadcast Hash Join on a bigger table (6Mil rows) I get an incorrect result of 0 rows. val rightDF = spark.read.format("parquet").load("table-a") val leftDF = spark.read.format("parquet").load("table-b") //needed to activate dynamic pruning subquery .where('part_ts ===

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-07 Thread huaxin gao
Hi Chang, Thanks for working on this. Could you please explain how your proposal can be extended to the file-based data sources? Since at least half of the Spark community are using file-based data sources, I think any designs should consider the file-based data sources as well. I work on both

[DISCUSS] Build error message guideline

2021-04-07 Thread Karen
Hi all, As discussed in SPIP: Standardize Exception Messages in Spark ( https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing), improving error message quality in Apache Spark involves establishing an error message guideline for developers. Error message

please read: current state and the future of the apache spark build system

2021-04-07 Thread shane knapp ☠
this will be a relatively big update, as there are many many moving pieces with short, medium and long term goals. TLDR1: we're shutting jenkins down at the end of 2021. TLDR2: i know we're way behind on pretty much everything. most of the hardware is at or beyond EOL, and random systemic

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
Thanks Martin for your feedback. > What was your reason to migrate from Apache Jenkins to Github Actions ? I am sure there were more reasons for migrating from Amplap Jenkins to GitHub Actions but as far as I can remember: - To reduce the maintenance

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Martin Grigorov
On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon wrote: > Hi Greg, > > I raised this thread to figure out a way that we can work together to > resolve this issue, gather feedback, and to understand how other projects > work around. > Several projects I observed, as far as I can tell, have made enough

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Hyukjin Kwon
Hi Greg, I raised this thread to figure out a way that we can work together to resolve this issue, gather feedback, and to understand how other projects work around. Several projects I observed, as far as I can tell, have made enough efforts to save the resources in GitHub Actions but still

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Greg Stein
On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon wrote: > Hi all, > > I am an Apache Spark PMC, You are a member of the Apache Spark PMC. You are *not* a PMC. Please stop with that terminology. The Foundation has about 200 PMCs, and you are a member of one of them. You are NOT a "PMC" .. you're a

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-07 Thread Jarek Potiuk
Just a comment here - as I commented also in the ticket The document https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status gives complete overview of where the Github Actions are for the ASF project. And we have some nice experiences in Apache Airflow that we will be able to

Re: [Discuss][SPIP] DataSource V2 SQL push down

2021-04-07 Thread Chang Chen
hi huaxin please review https://github.com/apache/spark/pull/32061 as for add a *trait PrunedFilteredAggregateScan* for V1 JDBC, I delete trait, since V1 DataSource needn't support aggregation push down Chang Chen 于2021年4月5日周一 下午10:02写道: > Hi huaxin > > What I am concerned about is

[VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Liang-Chi Hsieh
Please vote on releasing the following candidate as Apache Spark version 2.4.8. The vote is open until Apr 10th at 9AM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.8 [ ] -1 Do not release this package because