I'm +1 to separate modules for spark-2 and spark-3, after the 0.8 release. I think it would be a big change in organizations to adopt Spark-3 since that brings in Scala-2.12 which is binary incompatible to previous Scala versions. Hence this adoption could take a lot of time. I know in our company we have no near term plans to move to Spark 3.
-Best, R. On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao <sai.sai.s...@gmail.com> wrote: > I was thinking that if it is possible to limit version lock plugin to only > iceberg core related subprojects., seems like current consistent-versions > plugin doesn't allow to do so. So not sure if there're some other plugins > which could provide similar functionality with more flexibility? > > Any suggestions on this? > > Best regards, > Saisai > > Saisai Shao <sai.sai.s...@gmail.com> 于2020年3月5日周四 下午3:12写道: > >> I think the requirement of supporting different version should be quite >> common. As Iceberg is a table format which should be adapted to different >> engines like Hive, Flink, Spark. To support different versions is a real >> problem, Spark is just one case, Hive, Flink could also be the case if the >> interface is changed across major versions. Also version lock may have >> problems when several engines coexisted in the same build, as they will >> transiently introduce lots of dependencies which may be conflicted, it may >> be hard to figure out one version which could satisfy all, and usually they >> only confined to a single module. >> >> So I think we should figure out a way to support such scenario, not just >> maintaining branches one by one. >> >> Ryan Blue <rb...@netflix.com> 于2020年3月5日周四 上午2:53写道: >> >>> I think the key is that this wouldn't be using the same published >>> artifacts. This work would create a spark-2.4 artifact and a spark-3.0 >>> artifact. (And possibly a spark-common artifact.) >>> >>> It seems reasonable to me to have those in the same build instead of in >>> separate branches, as long as the Spark dependencies are not leaked outside >>> of the modules. That said, I'd rather have the additional checks that >>> baseline provides in general since this is a short-term problem. It would >>> just be nice if we could have versions that are confined to a single >>> module. The Nebula plugin that baseline uses claims to support that, but I >>> couldn't get it to work. >>> >>> On Wed, Mar 4, 2020 at 6:38 AM Saisai Shao <sai.sai.s...@gmail.com> >>> wrote: >>> >>>> Just think a bit on this. I agree that generally introducing different >>>> versions of same dependencies could be error prone. But I think the case >>>> here should not lead to issue: >>>> >>>> 1. These two sub-modules spark-2 and spark-3 are isolated, they're not >>>> dependent on either. >>>> 2. They can be differentiated by names when generating jars, also they >>>> will not be relied by other modules in Iceberg. >>>> >>>> So this dependency issue should not be the case here. And in Maven it >>>> could be achieved easily. Please correct me if wrong. >>>> >>>> Best regards, >>>> Saisai >>>> >>>> Saisai Shao <sai.sai.s...@gmail.com> 于2020年3月4日周三 上午10:01写道: >>>> >>>>> Thanks Matt, >>>>> >>>>> If branching is the only choice, then we would potentially have two >>>>> *master* branches until spark-3 is vastly adopted. That will somehow >>>>> increase the maintenance burden and lead to inconsistency. IMO I'm OK with >>>>> the branching way, just think that we should have a clear way to keep >>>>> tracking of two branches. >>>>> >>>>> Best, >>>>> Saisai >>>>> >>>>> Matt Cheah <mch...@palantir.com.invalid> 于2020年3月4日周三 上午9:50写道: >>>>> >>>>>> I think it’s generally dangerous and error-prone to try to support >>>>>> two versions of the same library in the same build, in the same published >>>>>> artifacts. This is the stance that Baseline >>>>>> <https://github.com/palantir/gradle-baseline> + Gradle Consistent >>>>>> Versions <https://github.com/palantir/gradle-consistent-versions> >>>>>> takes. Gradle Consistent Versions is specifically opinionated towards >>>>>> building against one version of a library across all modules in the >>>>>> build. >>>>>> >>>>>> >>>>>> >>>>>> I would think that branching would be the best way to build and >>>>>> publish against multiple versions of a dependency. >>>>>> >>>>>> >>>>>> >>>>>> -Matt Cheah >>>>>> >>>>>> >>>>>> >>>>>> *From: *Saisai Shao <sai.sai.s...@gmail.com> >>>>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>>>> *Date: *Tuesday, March 3, 2020 at 5:45 PM >>>>>> *To: *Iceberg Dev List <dev@iceberg.apache.org> >>>>>> *Cc: *Ryan Blue <rb...@netflix.com> >>>>>> *Subject: *Re: [Discuss] Merge spark-3 branch into master >>>>>> >>>>>> >>>>>> >>>>>> I didn't realized that Gradle cannot support two different versions >>>>>> in one build. I think I did such things for Livy to build scala 2.10 and >>>>>> 2.11 jars simultaneously with Maven. I'm not so familiar with Gradle >>>>>> thing, >>>>>> I can take a shot to see if there's some hacky ways to make it work. >>>>>> >>>>>> >>>>>> >>>>>> Besides, are we saying that we will move to spark-3 support after 0.8 >>>>>> release in the master branch to replace Spark-2, or we maintain two >>>>>> branches for both spark-2 and spark-3 and make two releases? From >>>>>> my understanding, the adoption of spark-3 may not be so fast, and there >>>>>> still has lots users who stick on spark-2. Ideally, it might be better to >>>>>> support two versions in a near future. >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Saisai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Mass Dosage <massdos...@gmail.com> 于2020年3月4日周三 上午1:33写道: >>>>>> >>>>>> +1 for a 0.8.0 release with Spark 2.4 and then move on for Spark 3.0 >>>>>> when it's ready. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 3 Mar 2020 at 16:32, Ryan Blue <rb...@netflix.com.invalid> >>>>>> wrote: >>>>>> >>>>>> Thanks for bringing this up, Saisai. I tried to do this a couple of >>>>>> months ago, but ran into a problem with dependency locks. I couldn't get >>>>>> two different versions of Spark packages in the build with baseline, but >>>>>> maybe I was missing something. If you can get it working, I think it's a >>>>>> great idea to get this into master. >>>>>> >>>>>> >>>>>> >>>>>> Otherwise, I was thinking about proposing an 0.8.0 release in the >>>>>> next month or so based on Spark 2.4. Then we could merge the branch into >>>>>> master and do another release for Spark 3.0 when it's ready. >>>>>> >>>>>> >>>>>> >>>>>> rb >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 3, 2020 at 6:07 AM Saisai Shao <sai.sai.s...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Hi team, >>>>>> >>>>>> >>>>>> >>>>>> I was thinking of merging spark-3 branch into master, also per the >>>>>> discussion before we could make spark-2 and spark-3 coexisted into 2 >>>>>> different sub-modules. With this, one build could generate both spark-2 >>>>>> and >>>>>> spark-3 runtime jars, user could pick either at preference. >>>>>> >>>>>> >>>>>> >>>>>> One concern is that they share lots of common code in read/write >>>>>> path, this will increase the maintenance overhead to keep consistency of >>>>>> two copies. >>>>>> >>>>>> >>>>>> >>>>>> So I'd like to hear your thoughts, any suggestions on it? >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Saisai >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Ryan Blue >>>>>> >>>>>> Software Engineer >>>>>> >>>>>> Netflix >>>>>> >>>>>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>