Hi Ryan, As mentioned in the meeting, would you please point me out the way to make some submodules excluded from consistent-versions plugin.
Thanks Saisai Anton Okolnychyi <aokolnyc...@apple.com.invalid> 于2020年3月18日周三 上午4:14写道: > I am +1 on having spark-2 and spark-3 modules as well. > > On 7 Mar 2020, at 15:03, RD <rdsr...@gmail.com> wrote: > > I'm +1 to separate modules for spark-2 and spark-3, after the 0.8 release. > I think it would be a big change in organizations to adopt Spark-3 since > that brings in Scala-2.12 which is binary incompatible to previous Scala > versions. Hence this adoption could take a lot of time. I know in our > company we have no near term plans to move to Spark 3. > > -Best, > R. > > On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao <sai.sai.s...@gmail.com> wrote: > >> I was thinking that if it is possible to limit version lock plugin to >> only iceberg core related subprojects., seems like current >> consistent-versions plugin doesn't allow to do so. So not sure if there're >> some other plugins which could provide similar functionality with more >> flexibility? >> >> Any suggestions on this? >> >> Best regards, >> Saisai >> >> Saisai Shao <sai.sai.s...@gmail.com> 于2020年3月5日周四 下午3:12写道: >> >>> I think the requirement of supporting different version should be quite >>> common. As Iceberg is a table format which should be adapted to different >>> engines like Hive, Flink, Spark. To support different versions is a real >>> problem, Spark is just one case, Hive, Flink could also be the case if the >>> interface is changed across major versions. Also version lock may have >>> problems when several engines coexisted in the same build, as they will >>> transiently introduce lots of dependencies which may be conflicted, it may >>> be hard to figure out one version which could satisfy all, and usually they >>> only confined to a single module. >>> >>> So I think we should figure out a way to support such scenario, not >>> just maintaining branches one by one. >>> >>> Ryan Blue <rb...@netflix.com> 于2020年3月5日周四 上午2:53写道: >>> >>>> I think the key is that this wouldn't be using the same published >>>> artifacts. This work would create a spark-2.4 artifact and a spark-3.0 >>>> artifact. (And possibly a spark-common artifact.) >>>> >>>> It seems reasonable to me to have those in the same build instead of in >>>> separate branches, as long as the Spark dependencies are not leaked outside >>>> of the modules. That said, I'd rather have the additional checks that >>>> baseline provides in general since this is a short-term problem. It would >>>> just be nice if we could have versions that are confined to a single >>>> module. The Nebula plugin that baseline uses claims to support that, but I >>>> couldn't get it to work. >>>> >>>> On Wed, Mar 4, 2020 at 6:38 AM Saisai Shao <sai.sai.s...@gmail.com> >>>> wrote: >>>> >>>>> Just think a bit on this. I agree that generally introducing different >>>>> versions of same dependencies could be error prone. But I think the case >>>>> here should not lead to issue: >>>>> >>>>> 1. These two sub-modules spark-2 and spark-3 are isolated, they're >>>>> not dependent on either. >>>>> 2. They can be differentiated by names when generating jars, also they >>>>> will not be relied by other modules in Iceberg. >>>>> >>>>> So this dependency issue should not be the case here. And in Maven it >>>>> could be achieved easily. Please correct me if wrong. >>>>> >>>>> Best regards, >>>>> Saisai >>>>> >>>>> Saisai Shao <sai.sai.s...@gmail.com> 于2020年3月4日周三 上午10:01写道: >>>>> >>>>>> Thanks Matt, >>>>>> >>>>>> If branching is the only choice, then we would potentially have two >>>>>> *master* branches until spark-3 is vastly adopted. That will somehow >>>>>> increase the maintenance burden and lead to inconsistency. IMO I'm OK >>>>>> with >>>>>> the branching way, just think that we should have a clear way to keep >>>>>> tracking of two branches. >>>>>> >>>>>> Best, >>>>>> Saisai >>>>>> >>>>>> Matt Cheah <mch...@palantir.com.invalid> 于2020年3月4日周三 上午9:50写道: >>>>>> >>>>>>> I think it’s generally dangerous and error-prone to try to support >>>>>>> two versions of the same library in the same build, in the same >>>>>>> published >>>>>>> artifacts. This is the stance that Baseline >>>>>>> <https://github.com/palantir/gradle-baseline> + Gradle Consistent >>>>>>> Versions <https://github.com/palantir/gradle-consistent-versions> >>>>>>> takes. Gradle Consistent Versions is specifically opinionated towards >>>>>>> building against one version of a library across all modules in the >>>>>>> build. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would think that branching would be the best way to build and >>>>>>> publish against multiple versions of a dependency. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Matt Cheah >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From: *Saisai Shao <sai.sai.s...@gmail.com> >>>>>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>>>>> *Date: *Tuesday, March 3, 2020 at 5:45 PM >>>>>>> *To: *Iceberg Dev List <dev@iceberg.apache.org> >>>>>>> *Cc: *Ryan Blue <rb...@netflix.com> >>>>>>> *Subject: *Re: [Discuss] Merge spark-3 branch into master >>>>>>> >>>>>>> >>>>>>> >>>>>>> I didn't realized that Gradle cannot support two different versions >>>>>>> in one build. I think I did such things for Livy to build scala 2.10 and >>>>>>> 2.11 jars simultaneously with Maven. I'm not so familiar with Gradle >>>>>>> thing, >>>>>>> I can take a shot to see if there's some hacky ways to make it work. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Besides, are we saying that we will move to spark-3 support after >>>>>>> 0.8 release in the master branch to replace Spark-2, or we maintain two >>>>>>> branches for both spark-2 and spark-3 and make two releases? From >>>>>>> my understanding, the adoption of spark-3 may not be so fast, and there >>>>>>> still has lots users who stick on spark-2. Ideally, it might be better >>>>>>> to >>>>>>> support two versions in a near future. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Saisai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Mass Dosage <massdos...@gmail.com> 于2020年3月4日周三 上午1:33写道: >>>>>>> >>>>>>> +1 for a 0.8.0 release with Spark 2.4 and then move on for Spark 3.0 >>>>>>> when it's ready. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 3 Mar 2020 at 16:32, Ryan Blue <rb...@netflix.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>> Thanks for bringing this up, Saisai. I tried to do this a couple of >>>>>>> months ago, but ran into a problem with dependency locks. I couldn't get >>>>>>> two different versions of Spark packages in the build with baseline, but >>>>>>> maybe I was missing something. If you can get it working, I think it's a >>>>>>> great idea to get this into master. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Otherwise, I was thinking about proposing an 0.8.0 release in the >>>>>>> next month or so based on Spark 2.4. Then we could merge the branch into >>>>>>> master and do another release for Spark 3.0 when it's ready. >>>>>>> >>>>>>> >>>>>>> >>>>>>> rb >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 3, 2020 at 6:07 AM Saisai Shao <sai.sai.s...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi team, >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was thinking of merging spark-3 branch into master, also per the >>>>>>> discussion before we could make spark-2 and spark-3 coexisted into 2 >>>>>>> different sub-modules. With this, one build could generate both spark-2 >>>>>>> and >>>>>>> spark-3 runtime jars, user could pick either at preference. >>>>>>> >>>>>>> >>>>>>> >>>>>>> One concern is that they share lots of common code in read/write >>>>>>> path, this will increase the maintenance overhead to keep consistency of >>>>>>> two copies. >>>>>>> >>>>>>> >>>>>>> >>>>>>> So I'd like to hear your thoughts, any suggestions on it? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Saisai >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Ryan Blue >>>>>>> >>>>>>> Software Engineer >>>>>>> >>>>>>> Netflix >>>>>>> >>>>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >