DataSourceV2 sync notes - 12 June 2019

2019-06-14 Thread Ryan Blue
Here are the latest DSv2 sync notes. Please reply with updates or corrections. *Attendees*: Ryan Blue Michael Armbrust Gengliang Wang Matt Cheah John Zhuge *Topics*: Wenchen’s reorganization proposal Problems with TableProvider - property map isn’t sufficient New PRs: - ReplaceTable:

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
Yeah, PyArrow is the only other PySpark dependency we check for a minimum version. We updated that not too long ago to be 0.12.1, which I think we are still good on for now. On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung wrote: > How about pyArrow? > > -- > *From:*

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Imran Rashid
+1 (binding) I think this is a really important feature for spark. First, there is already a lot of interest in alternative shuffle storage in the community. There is already a lot of interest in alternative shuffle storage, from dynamic allocation in kubernetes, to even just improving

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung
How about pyArrow? From: Holden Karau Sent: Friday, June 14, 2019 11:06:15 AM To: Felix Cheung Cc: Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp Subject: Re: [DISCUSS] Increasing minimum supported version of Pandas Are there other Python

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Holden Karau
Are there other Python dependencies we should consider upgrading at the same time? On Fri, Jun 14, 2019 at 7:45 PM Felix Cheung wrote: > So to be clear, min version check is 0.23 > Jenkins test is 0.24 > > I’m ok with this. I hope someone will test 0.23 on releases though before > we sign off?

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Felix Cheung
So to be clear, min version check is 0.23 Jenkins test is 0.24 I’m ok with this. I hope someone will test 0.23 on releases though before we sign off? From: shane knapp Sent: Friday, June 14, 2019 10:23:56 AM To: Bryan Cutler Cc: Dongjoon Hyun; Holden Karau;

Re: Exposing JIRA issue types at GitHub PRs

2019-06-14 Thread Dongjoon Hyun
Now, you can see the exposed component labels (ordered by the number of PRs) here and click the component to search. https://github.com/apache/spark/labels?sort=count-desc Dongjoon. On Fri, Jun 14, 2019 at 1:15 AM Dongjoon Hyun wrote: > Hi, All. > > JIRA and PR is ready for reviews. > >

jQuery 3.4.1 update

2019-06-14 Thread Sean Owen
Just surfacing this change as it's probably pretty good to go, but, a) I'm not a jQuery / JS expert and b) we don't have comprehensive UI tests. https://github.com/apache/spark/pull/24843 I'd like to get us up to a modern jQuery for 3.0, to keep up with security fixes (which was the minor

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread shane knapp
excellent. i shall not touch anything. :) On Fri, Jun 14, 2019 at 10:22 AM Bryan Cutler wrote: > Shane, I think 0.24.2 is probably more common right now, so if we were to > pick one to test against, I still think it should be that one. Our Pandas > usage in PySpark is pretty conservative, so

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
Shane, I think 0.24.2 is probably more common right now, so if we were to pick one to test against, I still think it should be that one. Our Pandas usage in PySpark is pretty conservative, so it's pretty unlikely that we will add something that would break 0.23.X. On Fri, Jun 14, 2019 at 10:10 AM

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread Ilan Filonenko
+1 (non-binding). This API is versatile and flexible enough to handle Bloomberg's internal use-cases. The ability for us to vary implementation strategies is quite appealing. It is also worth to note the minimal changes to Spark core in order to make it work. This is a very much needed addition

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread shane knapp
ah, ok... should we downgrade the testing env on jenkins then? any specific version? shane, who is loathe (and i mean LOATHE) to touch python envs ;) On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler wrote: > I should have stated this earlier, but when the user does something that > requires

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
I should have stated this earlier, but when the user does something that requires Pandas, the minimum version is checked against what was imported and will raise an exception if it is a lower version. So I'm concerned that using 0.24.2 might be a little too new for users running older clusters. To

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-14 Thread bo yang
+1 This is great work, allowing plugin of different sort shuffle write/read implementation! Also great to see it retain the current Spark configuration (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl). On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah wrote: > Hi everyone, > >

Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-14 Thread Matt Cheah
We opened a thread for voting yesterday, so please participate! -Matt Cheah From: Yue Li Date: Thursday, June 13, 2019 at 7:22 PM To: Saisai Shao , Imran Rashid Cc: Matt Cheah , "Yifei Huang (PD)" , Mridul Muralidharan , Bo Yang , Ilan Filonenko , Imran Rashid , Justin Uang , Liang

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread shane knapp
just to everyone knows, our python 3.6 testing infra is currently on 0.24.2... On Fri, Jun 14, 2019 at 9:16 AM Dongjoon Hyun wrote: > +1 > > Thank you for this effort, Bryan! > > Bests, > Dongjoon. > > On Fri, Jun 14, 2019 at 4:24 AM Holden Karau wrote: > >> I’m +1 for upgrading, although

Re: [build system] upcoming jenkins downtime: august 3rd 2019

2019-06-14 Thread Dongjoon Hyun
Thank you for the early notice, Shane! :) Dongjoon On Fri, Jun 14, 2019 at 9:13 AM shane knapp wrote: > the campus colo will be performing some electrical maintenance, which > means that they'll be powering off the entire building. > > since the jenkins cluster is located in that colo, we are

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Dongjoon Hyun
+1 Thank you for this effort, Bryan! Bests, Dongjoon. On Fri, Jun 14, 2019 at 4:24 AM Holden Karau wrote: > I’m +1 for upgrading, although since this is probably the last easy chance > we’ll have to bump version numbers easily I’d suggest 0.24.2 > > > On Fri, Jun 14, 2019 at 4:38 AM Hyukjin

[build system] upcoming jenkins downtime: august 3rd 2019

2019-06-14 Thread shane knapp
the campus colo will be performing some electrical maintenance, which means that they'll be powering off the entire building. since the jenkins cluster is located in that colo, we are most definitely affected. :) i'll be out of town that weekend, but will have one of my sysadmins bring

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Holden Karau
I’m +1 for upgrading, although since this is probably the last easy chance we’ll have to bump version numbers easily I’d suggest 0.24.2 On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon wrote: > I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and > pandas combinations. Spark 3

Re: Exposing JIRA issue types at GitHub PRs

2019-06-14 Thread Dongjoon Hyun
Hi, All. JIRA and PR is ready for reviews. https://issues.apache.org/jira/browse/SPARK-28051 (Exposing JIRA issue component types at GitHub PRs) https://github.com/apache/spark/pull/24871 Bests, Dongjoon. On Thu, Jun 13, 2019 at 10:48 AM Dongjoon Hyun wrote: > Thank you for the feedbacks