Hi saisai,
Would you please share your progress on merging spark-3 branch into master? 
We  are trying iceberg with spark sql, which is only supported in spark 3. 

On 2020/03/27 01:53:09, Saisai Shao <s...@gmail.com> wrote: 
> Thanks Ryan, let me take a try.> 
> 
> Best regards,> 
> Saisai> 
> 
> Ryan Blue <rb...@netflix.com.invalid> 于2020年3月27日周五 上午12:15写道:> 
> 
> > Here’s how it was done before:> 
> > https://github.com/apache/incubator-iceberg/blob/867ec79a5c2f7619cb10546b5cc7f7bbc7d61621/build.gradle#L225-L244>
> >  
> >> 
> > That defines a set of projects called baselineProjects and applies> 
> > baseline like this:> 
> >> 
> > configure(baselineProjects) {> 
> >   apply plugin: 'com.palantir.baseline-checkstyle'> 
> >   ...> 
> > }> 
> >> 
> > The baseline config has since been moved into baseline.gradle> 
> > <https://github.com/apache/incubator-iceberg/blob/master/baseline.gradle>> 
> > so changes should probably go into that file. Thanks for looking into 
> > this!> 
> >> 
> > On Thu, Mar 26, 2020 at 6:23 AM Mass Dosage <ma...@gmail.com> wrote:> 
> >> 
> >> We'd like to know how to do this too. We're working on the Hive> 
> >> integration and Hive requires older versions of many of the libraries 
> >> that> 
> >> Iceberg uses (Guava, Calcite and Avro are being the most problematic).> 
> >> We're going to need to shade some of these in the iceberg modules we 
> >> depend> 
> >> on but it would also be very useful to be able to override the versions 
> >> in> 
> >> the iceberg-hive and iceberg-mr modules so that they aren't locked to the> 
> >> same versions as the rest of the projects.> 
> >>> 
> >> On Thu, 26 Mar 2020 at 01:53, Saisai Shao <sa...@gmail.com> wrote:> 
> >>> 
> >>> Hi Ryan,> 
> >>>> 
> >>> As mentioned in the meeting, would you please point me out the way to> 
> >>> make some submodules excluded from consistent-versions plugin.> 
> >>>> 
> >>> Thanks> 
> >>> Saisai> 
> >>>> 
> >>> Anton Okolnychyi <ao...@apple.com.invalid> 于2020年3月18日周三 上午4:14写道:> 
> >>>> 
> >>>> I am +1 on having spark-2 and spark-3 modules as well.> 
> >>>>> 
> >>>> On 7 Mar 2020, at 15:03, RD <rd...@gmail.com> wrote:> 
> >>>>> 
> >>>> I'm +1 to separate modules for spark-2 and spark-3, after the 0.8> 
> >>>> release.> 
> >>>> I think it would be a big change in organizations to adopt Spark-3> 
> >>>> since that brings in Scala-2.12 which is binary incompatible to 
> >>>> previous> 
> >>>> Scala versions. Hence this adoption could take a lot of time. I know in 
> >>>> our> 
> >>>> company we have no near term plans to move to Spark 3.> 
> >>>>> 
> >>>> -Best,> 
> >>>> R.> 
> >>>>> 
> >>>> On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao <sa...@gmail.com>> 
> >>>> wrote:> 
> >>>>> 
> >>>>> I was thinking that if it is possible to limit version lock plugin to> 
> >>>>> only iceberg core related subprojects., seems like current> 
> >>>>> consistent-versions plugin doesn't allow to do so. So not sure if 
> >>>>> there're> 
> >>>>> some other plugins which could provide similar functionality with more> 
> >>>>> flexibility?> 
> >>>>>> 
> >>>>>  Any suggestions on this?> 
> >>>>>> 
> >>>>> Best regards,> 
> >>>>> Saisai> 
> >>>>>> 
> >>>>> Saisai Shao <sa...@gmail.com> 于2020年3月5日周四 下午3:12写道:> 
> >>>>>> 
> >>>>>> I think the requirement of supporting different version should be> 
> >>>>>> quite common. As Iceberg is a table format which should be adapted to> 
> >>>>>> different engines like Hive, Flink, Spark. To support different 
> >>>>>> versions is> 
> >>>>>> a real problem, Spark is just one case, Hive, Flink could also be the 
> >>>>>> case> 
> >>>>>> if the interface is changed across major versions. Also version lock 
> >>>>>> may> 
> >>>>>> have problems when several engines coexisted in the same build, as 
> >>>>>> they> 
> >>>>>> will transiently introduce lots of dependencies which may be 
> >>>>>> conflicted, it> 
> >>>>>> may be hard to figure out one version which could satisfy all, and 
> >>>>>> usually> 
> >>>>>> they only confined to a single module.> 
> >>>>>>> 
> >>>>>>  So I think we should figure out a way to support such scenario, not> 
> >>>>>> just maintaining branches one by one.> 
> >>>>>>> 
> >>>>>> Ryan Blue <rb...@netflix.com> 于2020年3月5日周四 上午2:53写道:> 
> >>>>>>> 
> >>>>>>> I think the key is that this wouldn't be using the same published> 
> >>>>>>> artifacts. This work would create a spark-2.4 artifact and a 
> >>>>>>> spark-3.0> 
> >>>>>>> artifact. (And possibly a spark-common artifact.)> 
> >>>>>>>> 
> >>>>>>> It seems reasonable to me to have those in the same build instead of> 
> >>>>>>> in separate branches, as long as the Spark dependencies are not 
> >>>>>>> leaked> 
> >>>>>>> outside of the modules. That said, I'd rather have the additional 
> >>>>>>> checks> 
> >>>>>>> that baseline provides in general since this is a short-term problem. 
> >>>>>>> It> 
> >>>>>>> would just be nice if we could have versions that are confined to a 
> >>>>>>> single> 
> >>>>>>> module. The Nebula plugin that baseline uses claims to support that, 
> >>>>>>> but I> 
> >>>>>>> couldn't get it to work.> 
> >>>>>>>> 
> >>>>>>> On Wed, Mar 4, 2020 at 6:38 AM Saisai Shao <sa...@gmail.com>> 
> >>>>>>> wrote:> 
> >>>>>>>> 
> >>>>>>>> Just think a bit on this. I agree that generally introducing> 
> >>>>>>>> different versions of same dependencies could be error prone. But I 
> >>>>>>>> think> 
> >>>>>>>> the case here should not lead to  issue:> 
> >>>>>>>>> 
> >>>>>>>> 1.  These two sub-modules spark-2 and spark-3 are isolated, they're> 
> >>>>>>>> not dependent on either.> 
> >>>>>>>> 2. They can be differentiated by names when generating jars, also> 
> >>>>>>>> they will not be relied by other modules in Iceberg.> 
> >>>>>>>>> 
> >>>>>>>> So this dependency issue should not be the case here. And in Maven> 
> >>>>>>>> it could be achieved easily. Please correct me if wrong.> 
> >>>>>>>>> 
> >>>>>>>> Best regards,> 
> >>>>>>>> Saisai> 
> >>>>>>>>> 
> >>>>>>>> Saisai Shao <sa...@gmail.com> 于2020年3月4日周三 上午10:01写道:> 
> >>>>>>>>> 
> >>>>>>>>> Thanks Matt,> 
> >>>>>>>>>> 
> >>>>>>>>> If branching is the only choice, then we would potentially have> 
> >>>>>>>>> two *master* branches until spark-3 is vastly adopted. That will 
> >>>>>>>>> somehow> 
> >>>>>>>>> increase the maintenance burden and lead to inconsistency. IMO I'm 
> >>>>>>>>> OK with> 
> >>>>>>>>> the branching way, just think that we should have a clear way to 
> >>>>>>>>> keep> 
> >>>>>>>>> tracking of two branches.> 
> >>>>>>>>>> 
> >>>>>>>>> Best,> 
> >>>>>>>>> Saisai> 
> >>>>>>>>>> 
> >>>>>>>>> Matt Cheah <mc...@palantir.com.invalid> 于2020年3月4日周三 上午9:50写道:> 
> >>>>>>>>>> 
> >>>>>>>>>> I think it’s generally dangerous and error-prone to try to> 
> >>>>>>>>>> support two versions of the same library in the same build, in the 
> >>>>>>>>>> same> 
> >>>>>>>>>> published artifacts. This is the stance that Baseline> 
> >>>>>>>>>> <https://github.com/palantir/gradle-baseline> + Gradle> 
> >>>>>>>>>> Consistent Versions> 
> >>>>>>>>>> <https://github.com/palantir/gradle-consistent-versions> takes.> 
> >>>>>>>>>> Gradle Consistent Versions is specifically opinionated towards 
> >>>>>>>>>> building> 
> >>>>>>>>>> against one version of a library across all modules in the build.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> I would think that branching would be the best way to build and> 
> >>>>>>>>>> publish against multiple versions of a dependency.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> -Matt Cheah> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> *From: *Saisai Shao <sa...@gmail.com>> 
> >>>>>>>>>> *Reply-To: *"dev@iceberg.apache.org" <de...@iceberg.apache.org>> 
> >>>>>>>>>> *Date: *Tuesday, March 3, 2020 at 5:45 PM> 
> >>>>>>>>>> *To: *Iceberg Dev List <de...@iceberg.apache.org>> 
> >>>>>>>>>> *Cc: *Ryan Blue <rb...@netflix.com>> 
> >>>>>>>>>> *Subject: *Re: [Discuss] Merge spark-3 branch into master> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> I didn't realized that Gradle cannot support two different> 
> >>>>>>>>>> versions in one build. I think I did such things for Livy to build 
> >>>>>>>>>> scala> 
> >>>>>>>>>> 2.10 and 2.11 jars simultaneously with Maven. I'm not so familiar 
> >>>>>>>>>> with> 
> >>>>>>>>>> Gradle thing, I can take a shot to see if there's some hacky ways 
> >>>>>>>>>> to> 
> >>>>>>>>>> make it work.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Besides, are we saying that we will move to spark-3 support after> 
> >>>>>>>>>> 0.8 release in the master branch to replace Spark-2, or we 
> >>>>>>>>>> maintain two> 
> >>>>>>>>>> branches for both spark-2 and spark-3 and make two releases? From> 
> >>>>>>>>>> my understanding, the adoption of spark-3 may not be so fast, and 
> >>>>>>>>>> there> 
> >>>>>>>>>> still has lots users who stick on spark-2. Ideally, it might be 
> >>>>>>>>>> better to> 
> >>>>>>>>>> support two versions in a near future.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Thanks> 
> >>>>>>>>>>> 
> >>>>>>>>>> Saisai> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Mass Dosage <ma...@gmail.com> 于2020年3月4日周三 上午1:33写道:> 
> >>>>>>>>>>> 
> >>>>>>>>>> +1 for a 0.8.0 release with Spark 2.4 and then move on for Spark> 
> >>>>>>>>>> 3.0 when it's ready.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> On Tue, 3 Mar 2020 at 16:32, Ryan Blue 
> >>>>>>>>>> <rb...@netflix.com.invalid>> 
> >>>>>>>>>> wrote:> 
> >>>>>>>>>>> 
> >>>>>>>>>> Thanks for bringing this up, Saisai. I tried to do this a couple> 
> >>>>>>>>>> of months ago, but ran into a problem with dependency locks. I 
> >>>>>>>>>> couldn't get> 
> >>>>>>>>>> two different versions of Spark packages in the build with 
> >>>>>>>>>> baseline, but> 
> >>>>>>>>>> maybe I was missing something. If you can get it working, I think 
> >>>>>>>>>> it's a> 
> >>>>>>>>>> great idea to get this into master.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Otherwise, I was thinking about proposing an 0.8.0 release in the> 
> >>>>>>>>>> next month or so based on Spark 2.4. Then we could merge the 
> >>>>>>>>>> branch into> 
> >>>>>>>>>> master and do another release for Spark 3.0 when it's ready.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> rb> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> On Tue, Mar 3, 2020 at 6:07 AM Saisai Shao <> 
> >>>>>>>>>> sai.sai.s...@gmail.com> wrote:> 
> >>>>>>>>>>> 
> >>>>>>>>>> Hi team,> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> I was thinking of merging spark-3 branch into master, also per> 
> >>>>>>>>>> the discussion before we could make spark-2 and spark-3 coexisted 
> >>>>>>>>>> into 2> 
> >>>>>>>>>> different sub-modules. With this, one build could generate both 
> >>>>>>>>>> spark-2 and> 
> >>>>>>>>>> spark-3 runtime jars, user could pick either at preference.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> One concern is that they share lots of common code in read/write> 
> >>>>>>>>>> path, this will increase the maintenance overhead to keep 
> >>>>>>>>>> consistency of> 
> >>>>>>>>>> two copies.> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> So I'd like to hear your thoughts, any suggestions on it?> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> Thanks> 
> >>>>>>>>>>> 
> >>>>>>>>>> Saisai> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> --> 
> >>>>>>>>>>> 
> >>>>>>>>>> Ryan Blue> 
> >>>>>>>>>>> 
> >>>>>>>>>> Software Engineer> 
> >>>>>>>>>>> 
> >>>>>>>>>> Netflix> 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>> 
> >>>>>>> --> 
> >>>>>>> Ryan Blue> 
> >>>>>>> Software Engineer> 
> >>>>>>> Netflix> 
> >>>>>>>> 
> >>>>>>> 
> >>>>> 
> >> 
> > --> 
> > Ryan Blue> 
> > Software Engineer> 
> > Netflix> 
> >> 
> 

Reply via email to