+CC dev@hbase

 

From: Steve Loughran <ste...@cloudera.com.INVALID>
Date: Friday, May 19, 2023 at 04:08
To:
Cc: dev <dev@spark.apache.org>
Subject: Re: Remove protobuf 2.5.0 from Spark dependencies

 

 

On Thu, 18 May 2023 at 03:45, Cheng Pan <cheng...@apache.org> wrote:

Steve, thanks for the information, I think HADOOP-17046 should be fine for the Spark case.

 

Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and the hadoop-client-runtime shades protobuf 2 during the package, which results in protobuf 2 and 3 co-exist in hadoop-client-runtime in different packages:

 

- protobuf 2: org.apache.hadoop.shaded.com.google.protobuf

- protobuf 3: org.apache.hadoop.thirdparty.protobuf

j

oh, so in fact that "put it back in unshaded" change doesn't do anything useful through the hadoop-client lib. so it is very much useless.  

 

As HADOOP-18487 plans to mark the protobuf 2 optional, will this make hadoop-client-runtime does not ship protobuf 2? If yes, things become worse for downstream projects who consumes hadoop shaded client, like Spark, because it requires the user to add vanilla protobuf 2 jar into the classpath if they want to access those API.

 

Well, what applications are using  org.apache.hadoop.shaded.com.google.protobuf ? hadoop itself doesn't; it's only referenced in unshaded form because hbase wanted the IPC library to still work with the unshaded version they were still using. But if the parquet2 lib is now only available shaded, their protobuf compiled .class files aren't going to link to it, are they?

 

does anyone know how spark + hbase + hadoop-client-runtime work so that spark can talk to an hbase server? especially: what is needed on the classpath, and what gets loaded for a call

 

In summary, I think the current state is fine. But for security purposes, the Hadoop community may want to remove the EOL protobuf 2 classes from hadoop-client-runtime.

 

 +1. the shaded one which is in use also needs upgrading.

 

 

Thanks,

Cheng Pan

 

 

On May 17, 2023 at 04:10:43, Dongjoon Hyun <dongj...@apache.org> wrote:

Thank you for sharing, Steve.

 

Dongjoon

 

On Tue, May 16, 2023 at 11:44 AM Steve Loughran <ste...@cloudera.com.invalid> wrote:

I have some bad news here which is even though hadoop cut protobuf 2.5 support, hbase team put it back in (HADOOP-17046). I don't know if the shaded hadoop client has removed that dependency on protobuf 2.5. 

In HADOOP-18487 i want to allow hadoop to cut that dependency, with hbase having to add it to the classpath if they still want it:
https://github.com/apache/hadoop/pull/4996

It's been neglected -if you can help with review/test etc that'd be great. I'd love to get this into the 3.3.6 release.

 

On Sat, 13 May 2023 at 08:36, Cheng Pan <cheng...@apache.org> wrote:

Hi all,

 

In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove protobuf 2.5.0 from the Spark dependencies.

 

Spark does not use protobuf 2.5.0 directly, instead, it comes from other dependencies, with the following changes, now, Spark does not require protobuf 2.5.0.



- SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a shaded protobuf 3

 

- SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded client, also removed the protobuf 2 dependency. SPARK-42452 removed the support for Hadoop 2.

 

- SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by the kinesis client, into the kinesis assembly jar

 

- Spark itself's core/connect/protobuf modules use protobuf 3, also shaded and relocated all protobuf 3 deps.

 

Feel free to comment if you still have any concerns.

 

 

Thanks,

Cheng Pan

--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to