Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Dongjoon Hyun
+1 for Matei's suggestion! Bests, Dongjoon. On Sat, Sep 21, 2019 at 5:44 PM Matei Zaharia wrote: > If the goal is to get people to try the DSv2 API and build DSv2 data > sources, can we recommend the 3.0-preview release for this? That would get > people shifting to 3.0 faster, which is

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Matei Zaharia
If the goal is to get people to try the DSv2 API and build DSv2 data sources, can we recommend the 3.0-preview release for this? That would get people shifting to 3.0 faster, which is probably better overall compared to maintaining two major versions. There’s not that much else changing in 3.0

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Ryan Blue
> If you insist we shouldn't change the unstable temporary API in 3.x . . . Not what I'm saying at all. I said we should carefully consider whether a breaking change is the right decision in the 3.x line. All I'm suggesting is that we can make a 2.5 release with the feature and an API that is

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Reynold Xin
Because for example we'd need to move the location of InternalRow, breaking the package name. If you insist we shouldn't change the unstable temporary API in 3.x to maintain compatibility with 3.0, which is totally different from my understanding of the situation when you exposed it, then I'd

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Ryan Blue
Why would that require an incompatible change? We *could* make an incompatible change and remove support for InternalRow, but I think we would want to carefully consider whether that is the right decision. And in any case, we would be able to keep 2.5 and 3.0 compatible, which is the main goal.

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Reynold Xin
How would you not make incompatible changes in 3.x? As discussed the InternalRow API is not stable and needs to change. On Sat, Sep 21, 2019 at 2:27 PM Ryan Blue wrote: > > Making downstream to diverge their implementation heavily between minor > versions (say, 2.4 vs 2.5) wouldn't be a good

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Ryan Blue
> Making downstream to diverge their implementation heavily between minor versions (say, 2.4 vs 2.5) wouldn't be a good experience You're right that the API has been evolving in the 2.x line. But, it is now reasonably stable with respect to the current feature set and we should not need to break

Re: [DISCUSS] Spark 2.5 release

2019-09-21 Thread Ryan Blue
Thanks for pointing this out, Dongjoon. To clarify, I’m not suggesting that we can break compatibility. I’m suggesting that we make a 2.5 release that uses the same DSv2 API as 3.0. These APIs are marked unstable, so we could make changes to them if we needed — as we have done in the 2.x line —