Do you mean you want to have a breaking API change between 3.0 and 3.1? I believe we follow Semantic Versioning ( https://spark.apache.org/versioning-policy.html ).
> We just won’t add any breaking changes before 3.1. Bests, Dongjoon. On Fri, Sep 20, 2019 at 11:48 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > I don’t think we need to gate a 3.0 release on making a more stable > version of InternalRow > > Sounds like we agree, then. We will use it for 3.0, but there are known > problems with it. > > Thinking we’d have dsv2 working in both 3.x (which will change and > progress towards more stable, but will have to break certain APIs) and 2.x > seems like a false premise. > > Why do you think we will need to break certain APIs before 3.0? > > I’m only suggesting that we release the same support in a 2.5 release that > we do in 3.0. Since we are nearly finished with the 3.0 goals, it seems > like we can certainly do that. We just won’t add any breaking changes > before 3.1. > > On Fri, Sep 20, 2019 at 11:39 AM Reynold Xin <r...@databricks.com> wrote: > >> I don't think we need to gate a 3.0 release on making a more stable >> version of InternalRow, but thinking we'd have dsv2 working in both 3.x >> (which will change and progress towards more stable, but will have to break >> certain APIs) and 2.x seems like a false premise. >> >> To point out some problems with InternalRow that you think are already >> pragmatic and stable: >> >> The class is in catalyst, which states: >> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala >> >> /** >> * Catalyst is a library for manipulating relational query plans. All >> classes in catalyst are >> * considered an internal API to Spark SQL and are subject to change >> between minor releases. >> */ >> >> There is no even any annotation on the interface. >> >> The entire dependency chain were created to be private, and tightly >> coupled with internal implementations. For example, >> >> >> https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java >> >> /** >> * A UTF-8 String for internal Spark use. >> * <p> >> * A String encoded in UTF-8 as an Array[Byte], which can be used for >> comparison, >> * search, see http://en.wikipedia.org/wiki/UTF-8 for details. >> * <p> >> * Note: This is not designed for general use cases, should not be used >> outside SQL. >> */ >> >> >> >> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala >> >> (which again is in catalyst package) >> >> >> If you want to argue this way, you might as well argue we should make the >> entire catalyst package public to be pragmatic and not allow any changes. >> >> >> >> >> On Fri, Sep 20, 2019 at 11:32 AM, Ryan Blue <rb...@netflix.com> wrote: >> >>> When you created the PR to make InternalRow public >>> >>> This isn’t quite accurate. The change I made was to use InternalRow >>> instead of UnsafeRow, which is a specific implementation of InternalRow. >>> Exposing this API has always been a part of DSv2 and while both you and I >>> did some work to avoid this, we are still in the phase of starting with >>> that API. >>> >>> Note that any change to InternalRow would be very costly to implement >>> because this interface is widely used. That is why I think we can certainly >>> consider it stable enough to use here, and that’s probably why UnsafeRow >>> was part of the original proposal. >>> >>> In any case, the goal for 3.0 was not to replace the use of InternalRow, >>> it was to get the majority of SQL working on top of the interface added >>> after 2.4. That’s done and stable, so I think a 2.5 release with it is also >>> reasonable. >>> >>> On Fri, Sep 20, 2019 at 11:23 AM Reynold Xin <r...@databricks.com> >>> wrote: >>> >>> To push back, while I agree we should not drastically change >>> "InternalRow", there are a lot of changes that need to happen to make it >>> stable. For example, none of the publicly exposed interfaces should be in >>> the Catalyst package or the unsafe package. External implementations should >>> be decoupled from the internal implementations, with cheap ways to convert >>> back and forth. >>> >>> When you created the PR to make InternalRow public, the understanding >>> was to work towards making it stable in the future, assuming we will start >>> with an unstable API temporarily. You can't just make a bunch internal APIs >>> tightly coupled with other internal pieces public and stable and call it a >>> day, just because it happen to satisfy some use cases temporarily assuming >>> the rest of Spark doesn't change. >>> >>> >>> >>> On Fri, Sep 20, 2019 at 11:19 AM, Ryan Blue <rb...@netflix.com> wrote: >>> >>> > DSv2 is far from stable right? >>> >>> No, I think it is reasonably stable and very close to being ready for a >>> release. >>> >>> > All the actual data types are unstable and you guys have completely >>> ignored that. >>> >>> I think what you're referring to is the use of `InternalRow`. That's a >>> stable API and there has been no work to avoid using it. In any case, I >>> don't think that anyone is suggesting that we delay 3.0 until a replacement >>> for `InternalRow` is added, right? >>> >>> While I understand the motivation for a better solution here, I think >>> the pragmatic solution is to continue using `InternalRow`. >>> >>> > If the goal is to make DSv2 work across 3.x and 2.x, that seems too >>> invasive of a change to backport once you consider the parts needed to make >>> dsv2 stable. >>> >>> I believe that those of us working on DSv2 are confident about the >>> current stability. We set goals for what to get into the 3.0 release months >>> ago and have very nearly reached the point where we are ready for that >>> release. >>> >>> I don't think instability would be a problem in maintaining >>> compatibility between the 2.5 version and the 3.0 version. If we find that >>> we need to make API changes (other than additions) then we can make those >>> in the 3.1 release. Because the goals we set for the 3.0 release have been >>> reached with the current API and if we are ready to release 3.0, we can >>> release a 2.5 with the same API. >>> >>> On Fri, Sep 20, 2019 at 11:05 AM Reynold Xin <r...@databricks.com> >>> wrote: >>> >>> DSv2 is far from stable right? All the actual data types are unstable >>> and you guys have completely ignored that. We'd need to work on that and >>> that will be a breaking change. If the goal is to make DSv2 work across 3.x >>> and 2.x, that seems too invasive of a change to backport once you consider >>> the parts needed to make dsv2 stable. >>> >>> >>> >>> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue <rb...@netflix.com.invalid> >>> wrote: >>> >>> Hi everyone, >>> >>> In the DSv2 sync this week, we talked about a possible Spark 2.5 release >>> based on the latest Spark 2.4, but with DSv2 and Java 11 support added. >>> >>> A Spark 2.5 release with these two additions will help people migrate to >>> Spark 3.0 when it is released because they will be able to use a single >>> implementation for DSv2 sources that works in both 2.5 and 3.0. Similarly, >>> upgrading to 3.0 won't also require also updating to Java 11 because users >>> could update to Java 11 with the 2.5 release and have fewer major changes. >>> >>> Another reason to consider a 2.5 release is that many people are >>> interested in a release with the latest DSv2 API and support for DSv2 SQL. >>> I'm already going to be backporting DSv2 support to the Spark 2.4 line, so >>> it makes sense to share this work with the community. >>> >>> This release line would just consist of backports like DSv2 and Java 11 >>> that assist compatibility, to keep the scope of the release small. The >>> purpose is to assist people moving to 3.0 and not distract from the 3.0 >>> release. >>> >>> Would a Spark 2.5 release help anyone else? Are there any concerns about >>> this plan? >>> >>> >>> rb >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> > > -- > Ryan Blue > Software Engineer > Netflix >