Steve, thanks for the information, I think HADOOP-17046 should be fine for
the Spark case.

Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and the
hadoop-client-runtime shades protobuf 2 during the package, which results
in protobuf 2 and 3 co-exist in hadoop-client-runtime in different packages:

- protobuf 2: org.apache.hadoop.shaded.com.google.protobuf
- protobuf 3: org.apache.hadoop.thirdparty.protobuf

As HADOOP-18487 plans to mark the protobuf 2 optional, will this make
hadoop-client-runtime
does not ship protobuf 2? If yes, things become worse for downstream
projects who consumes hadoop shaded client, like Spark, because it requires
the user to add vanilla protobuf 2 jar into the classpath if they want to
access those API.

In summary, I think the current state is fine. But for security purposes,
the Hadoop community may want to remove the EOL protobuf 2 classes from
hadoop-client-runtime.

Thanks,
Cheng Pan


On May 17, 2023 at 04:10:43, Dongjoon Hyun <dongj...@apache.org> wrote:

> Thank you for sharing, Steve.
>
> Dongjoon
>
> On Tue, May 16, 2023 at 11:44 AM Steve Loughran
> <ste...@cloudera.com.invalid> wrote:
>
>> I have some bad news here which is even though hadoop cut protobuf 2.5
>> support, hbase team put it back in (HADOOP-17046). I don't know if the
>> shaded hadoop client has removed that dependency on protobuf 2.5.
>>
>> In HADOOP-18487 i want to allow hadoop to cut that dependency, with hbase
>> having to add it to the classpath if they still want it:
>> https://github.com/apache/hadoop/pull/4996
>>
>> It's been neglected -if you can help with review/test etc that'd be
>> great. I'd love to get this into the 3.3.6 release.
>>
>> On Sat, 13 May 2023 at 08:36, Cheng Pan <cheng...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove protobuf
>>> 2.5.0 from the Spark dependencies.
>>>
>>> Spark does not use protobuf 2.5.0 directly, instead, it comes from other
>>> dependencies, with the following changes, now, Spark does not require
>>> protobuf 2.5.0.
>>>
>>> - SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0 to a
>>> shaded protobuf 3
>>>
>>> - SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded
>>> client, also removed the protobuf 2 dependency. SPARK-42452 removed the
>>> support for Hadoop 2.
>>>
>>> - SPARK-14421 shaded and relocated protobuf 2.6.1, which is required by
>>> the kinesis client, into the kinesis assembly jar
>>>
>>> - Spark itself's core/connect/protobuf modules use protobuf 3, also
>>> shaded and relocated all protobuf 3 deps.
>>>
>>> Feel free to comment if you still have any concerns.
>>>
>>> [1] https://github.com/apache/spark/pull/41153
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>

Reply via email to