Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-11 Thread mhawes
Hi angers.zhu, Reviving this thread to say that while it's not ideal (as it recomputes the last stage) I think the `SizeBasedCoaleaser` solution seems like a good option. If you don't mind re-raising that PR that would be great. Alternatively I'm happy to make the PR based on your previous PR?

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Tianchen Zhang
Thanks everyone for the input. Yes it makes sense that metadata backup/restore should be done outside Spark. We will update the customers with documentations about how that can be done and leave the implementations to them. Thanks, Tianchen On Tue, May 11, 2021 at 1:14 AM Mich Talebzadeh wrote:

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Liang-Chi Hsieh
The staging repository for this release can be accessed now too: https://repository.apache.org/content/repositories/orgapachespark-1383/ Thanks for the guidance. Liang-Chi Hsieh wrote > Seems it is closed now after clicking close button in the UI. -- Sent from:

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Liang-Chi Hsieh
Seems it is closed now after clicking close button in the UI. Sean Owen-2 wrote > Is there a separate process that pushes to maven central? That's what we > have to have in the end. > > On Tue, May 11, 2021, 12:31 PM Liang-Chi Hsieh > viirya@ > wrote: > >> I don't know what will happens

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Liang-Chi Hsieh
Oh, I see. We cannot do release on it as it is still open status. Okay, let me try to close it manually via UI. Sean Owen-2 wrote > Is there a separate process that pushes to maven central? That's what we > have to have in the end. > > On Tue, May 11, 2021, 12:31 PM Liang-Chi Hsieh > viirya@

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
Is there a separate process that pushes to maven central? That's what we have to have in the end. On Tue, May 11, 2021, 12:31 PM Liang-Chi Hsieh wrote: > I don't know what will happens if I manually close it now. > > Not sure if the current status cause a problem? If not, maybe leave as it >

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Liang-Chi Hsieh
I don't know what will happens if I manually close it now. Not sure if the current status cause a problem? If not, maybe leave as it is? Sean Owen-2 wrote > Hm, yes I see it at > http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee=on=index > but not on keyserver.ubuntu.com for

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Mridul Muralidharan
+1 Signatures, digests, etc check out fine. Checked out tag and build/tested. Regards, Mridul On Sun, May 9, 2021 at 4:22 PM Liang-Chi Hsieh wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until May 14th at 9AM PST and passes if

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
Hm, yes I see it at http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee=on=index but not on keyserver.ubuntu.com for some reason. What happens if you try to close it again, perhaps even manually in the UI there? I don't want to click it unless it messes up the workflow On Tue, May

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Liang-Chi Hsieh
I did upload my public key in https://dist.apache.org/repos/dist/dev/spark/KEYS. I also uploaded it to public keyserver before cutting RC1. I just also try to search the public key and can find it. cloud0fan wrote > [image: image.png] > > I checked the log in

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Mich Talebzadeh
>From my experience of dealing with metadata for other applications like Hive if needed an external database for Spark metadata would be useful. However, the maintenance and upgrade of that database should be external to Spark (left to the user) and as usual some form of reliable API or JDBC

Re: [Spark Catalog API] Support for metadata Backup/Restore

2021-05-11 Thread Wenchen Fan
That's my expectation as well. Spark needs a reliable catalog. backup/restore is just implementation details about how you make your catalog reliable, which should be transparent to Spark. On Sat, May 8, 2021 at 6:54 AM ayan guha wrote: > Just a consideration: > > Is there a value in

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Wenchen Fan
[image: image.png] I checked the log in https://repository.apache.org/#stagingRepositories, seems the gpg key is not uploaded to the public keyserver. Liang-Chi can you take a look? On Tue, May 11, 2021 at 3:47 PM Wenchen Fan wrote: > +1 > > On Tue, May 11, 2021 at 2:59 AM Holden Karau wrote:

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Wenchen Fan
+1 On Tue, May 11, 2021 at 2:59 AM Holden Karau wrote: > +1 - pip install with Py 2.7 works (with the understandable warnings > regarding Python 2.7 no longer being maintained). > > On Mon, May 10, 2021 at 11:18 AM sarutak wrote: > > > > +1 (non-binding) > > > > - Kousuke > > > > > It looks

Re: Bintray replacement for spark-packages.org

2021-05-11 Thread Dongjoon Hyun
Thank you, Yi and all. Then, after 2.4.8 release, shall we start to roll 3.1.2 and 3.0.3. Bests, Dongjoon. On Mon, May 10, 2021 at 10:50 PM Yi Wu wrote: > Hi wenchen, > > I'd like to volunteer for Apache Spark 3.0.3 release. > > Thanks, > Yi > > On Fri, Apr 30, 2021 at 12:37 AM Dongjoon Hyun