Steve, thanks for the information, I think HADOOP-17046 should be fine for the Spark case.
Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and the hadoop-client-runtime shades protobuf 2 during the package, which results in protobuf 2 and 3 co-exist in hadoop-client-runtime in different packages: - protobuf 2: org.apache.hadoop.shaded.com.google.protobuf - protobuf 3: org.apache.hadoop.thirdparty.protobuf As HADOOP-18487 plans to mark the protobuf 2 optional, will this make hadoop-client-runtime does not ship protobuf 2? If yes, things become worse for downstream projects who consumes hadoop shaded client, like Spark, because it requires the user to add vanilla protobuf 2 jar into the classpath if they want to access those API. In summary, I think the current state is fine. But for security purposes, the Hadoop community may want to remove the EOL protobuf 2 classes from hadoop-client-runtime. Thanks, Cheng Pan On May 17, 2023 at 04:10:43, Dongjoon Hyun <dongj...@apache.org> wrote: > Thank you for sharing, Steve. > > Dongjoon > > On Tue, May 16, 2023 at 11:44 AM Steve Loughran > <ste...@cloudera.com.invalid> wrote: > >> I have some bad news here which is even though hadoop cut protobuf 2.5 >> support, hbase team put it back in (HADOOP-17046). I don't know if the >> shaded hadoop client has removed that dependency on protobuf 2.5. >> >> In HADOOP-18487 i want to allow hadoop to cut that dependency, with hbase >> having to add it to the classpath if they still want it: >> https://github.com/apache/hadoop/pull/4996 >> >> It's been neglected -if you can help with review/test etc that'd be >> great. I'd love to get this into the 3.3.6 release. >> >> On Sat, 13 May 2023 at 08:36, Cheng Pan <cheng...@apache.org> wrote: >> >>> Hi all, >>> >>> In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove protobuf >>> 2.5.0 from the Spark dependencies. >>> >>> Spark does not use protobuf 2.5.0 directly, instead, it comes from other >>> dependencies, with the following changes, now, Spark does not require >>> protobuf 2.5.0. >>> >>> - SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a >>> shaded protobuf 3 >>> >>> - SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded >>> client, also removed the protobuf 2 dependency. SPARK-42452 removed the >>> support for Hadoop 2. >>> >>> - SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by >>> the kinesis client, into the kinesis assembly jar >>> >>> - Spark itself's core/connect/protobuf modules use protobuf 3, also >>> shaded and relocated all protobuf 3 deps. >>> >>> Feel free to comment if you still have any concerns. >>> >>> [1] https://github.com/apache/spark/pull/41153 >>> >>> Thanks, >>> Cheng Pan >>> >>