+CC dev@hbase Thanks, Cheng Pan
On Fri, May 19, 2023 at 4:08 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > > > > On Thu, 18 May 2023 at 03:45, Cheng Pan <cheng...@apache.org> wrote: >> >> Steve, thanks for the information, I think HADOOP-17046 should be fine for >> the Spark case. >> >> Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and the >> hadoop-client-runtime shades protobuf 2 during the package, which results in >> protobuf 2 and 3 co-exist in hadoop-client-runtime in different packages: >> >> - protobuf 2: org.apache.hadoop.shaded.com.google.protobuf >> - protobuf 3: org.apache.hadoop.thirdparty.protobuf > > j > oh, so in fact that "put it back in unshaded" change doesn't do anything > useful through the hadoop-client lib. so it is very much useless. >> >> >> As HADOOP-18487 plans to mark the protobuf 2 optional, will this make >> hadoop-client-runtime does not ship protobuf 2? If yes, things become worse >> for downstream projects who consumes hadoop shaded client, like Spark, >> because it requires the user to add vanilla protobuf 2 jar into the >> classpath if they want to access those API. > > > Well, what applications are using > org.apache.hadoop.shaded.com.google.protobuf ? hadoop itself doesn't; it's > only referenced in unshaded form because hbase wanted the IPC library to > still work with the unshaded version they were still using. But if the > parquet2 lib is now only available shaded, their protobuf compiled .class > files aren't going to link to it, are they? > > does anyone know how spark + hbase + hadoop-client-runtime work so that spark > can talk to an hbase server? especially: what is needed on the classpath, and > what gets loaded for a call >> >> >> In summary, I think the current state is fine. But for security purposes, >> the Hadoop community may want to remove the EOL protobuf 2 classes from >> hadoop-client-runtime. > > > +1. the shaded one which is in use also needs upgrading. > >> >> Thanks, >> Cheng Pan >> >> >> On May 17, 2023 at 04:10:43, Dongjoon Hyun <dongj...@apache.org> wrote: >>> >>> Thank you for sharing, Steve. >>> >>> Dongjoon >>> >>> On Tue, May 16, 2023 at 11:44 AM Steve Loughran >>> <ste...@cloudera.com.invalid> wrote: >>>> >>>> I have some bad news here which is even though hadoop cut protobuf 2.5 >>>> support, hbase team put it back in (HADOOP-17046). I don't know if the >>>> shaded hadoop client has removed that dependency on protobuf 2.5. >>>> >>>> In HADOOP-18487 i want to allow hadoop to cut that dependency, with hbase >>>> having to add it to the classpath if they still want it: >>>> https://github.com/apache/hadoop/pull/4996 >>>> >>>> It's been neglected -if you can help with review/test etc that'd be great. >>>> I'd love to get this into the 3.3.6 release. >>>> >>>> On Sat, 13 May 2023 at 08:36, Cheng Pan <cheng...@apache.org> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove protobuf >>>>> 2.5.0 from the Spark dependencies. >>>>> >>>>> Spark does not use protobuf 2.5.0 directly, instead, it comes from other >>>>> dependencies, with the following changes, now, Spark does not require >>>>> protobuf 2.5.0. >>>>> >>>>> - SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a >>>>> shaded protobuf 3 >>>>> >>>>> - SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded >>>>> client, also removed the protobuf 2 dependency. SPARK-42452 removed the >>>>> support for Hadoop 2. >>>>> >>>>> - SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by >>>>> the kinesis client, into the kinesis assembly jar >>>>> >>>>> - Spark itself's core/connect/protobuf modules use protobuf 3, also >>>>> shaded and relocated all protobuf 3 deps. >>>>> >>>>> Feel free to comment if you still have any concerns. >>>>> >>>>> [1] https://github.com/apache/spark/pull/41153 >>>>> >>>>> Thanks, >>>>> Cheng Pan --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org