Hi Wei-chu and steve, Thanks for sharing insights.
I have also tried to compile and execute ozone pointing to trunk(3.4.0-SNAPSHOT) which have shaded and upgraded protobuf. Other than just the usage of internal protobuf APIs, because of which compilation would break, I found another major problem was, the Hadoop-rpc implementations in downstreams which is based on non-shaded Protobuf classes. 'ProtobufRpcEngine' takes arguments and tries to typecast to Protobuf 'Message', which its expecting to be of 3.7 version and shaded package (i.e. o.a.h.thirdparty.*). So,unless downstreams upgrade their protobuf classes to 'hadoop-thirdparty' this issue will continue to occur, even after solving compilation issues due to internal usage of private APIs with protobuf signatures. I found a possible workaround for this problem. Please check https://issues.apache.org/jira/browse/HADOOP-17046 This Jira proposes to keep existing ProtobuRpcEngine as-is (without shading and with protobuf-2.5.0 implementation) to support downstream implementations. Use new ProtobufRpcEngine2 to use shaded protobuf classes within Hadoop and later projects who wish to upgrade their protobufs to 3.x. For Ozone compilation: I have submitted to PRs to make preparations to adopt to Hadoop 3.3+ upgrade. These PRs will remove dependency on Hadoop for those internal APIs and implemented their own copy in ozone with non-shaded protobuf. HDDS-3603: https://github.com/apache/hadoop-ozone/pull/93 <https://github.com/apache/hadoop-ozone/pull/933>2 HDDS-3604: https://github.com/apache/hadoop-ozone/pull/933 Also, I had run some tests on Ozone after applying these PRs and HADOOP-17046 with 3.4.0, tests seems to pass. Please help review these PRs. Thanks, -Vinay On Wed, Apr 29, 2020 at 5:02 PM Steve Loughran <ste...@cloudera.com.invalid> wrote: > Okay. > > I am not going to be a purist and say "what were they doing -using our > private APIs?" because as we all know, with things like UGI tagged @private > there's been no way to get something is done without getting into the > private stuff. > > But why did we do the protobuf changes? So that we could update our private > copy of protobuf with out breaking every single downstream application. The > great protobuf upgrade to 2.5 is not something we wanted to repeat. When > was that? before hadoop-2.2 shipped? I certainly remember a couple of weeks > were absolutely nothing would build whatsoever, not until every downstream > project had upgraded to the same version of the library. > > If you ever want to see an upgrade which makes a guava update seem a minor > detail, protobuf upgrades are it. Hence the shading > > HBase > ===== > > it looks like HBase has been using deep internal stuff. That is, > "unfortunate". I think in that world we have to look and say is there > something specific we can do here to help HBase in a way we could also > backport. They shouldn't need those IPC internals. > > Tez & Tokens > ============ > > I didn't know Tez was using those protobuf APIs internally. That is, > "unfortunate". > > What is key is this: without us moving those methods things like Spark > wouldn't work. And they weren't even using the methods, just trying to work > with Token for job submission. > > All Tez should need is a byte array serialization of a token. Given Token > is also Writable, that could be done via WritableUtils in a way which will > also work with older releases. > > Ozone > ===== > > When these were part of/in-sync with the hadoop build there wouldn't have > been problems. Now there are. Again, they're going in deep, but here > clearly to simulate some behaviour. Any way to do that differently? > > Ratis > ===== > > No idea. > > On Wed, 29 Apr 2020 at 07:12, Wei-Chiu Chuang <weic...@cloudera.com.invalid > > > wrote: > > > Most of the problems are downstream applications using Hadoop's private > > APIs. > > > > Tez: > > > > 17:08:38 2020/04/16 00:08:38 INFO : [ERROR] COMPILATION ERROR : > > 17:08:38 2020/04/16 00:08:38 INFO : [INFO] > > ------------------------------------------------------------- > > 17:08:38 2020/04/16 00:08:38 INFO : [ERROR] > > > > > /grid/0/jenkins/workspace/workspace/CDH-CANARY-parallel-centos7/SOURCES/tez/tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java:[757,45] > > incompatible types: com.google.protobuf.ByteString cannot be converted > > to org.apache.hadoop.thirdparty.protobuf.ByteString > > 17:08:38 2020/04/16 00:08:38 INFO : [INFO] 1 error > > > > > > Tez keeps track of job tokens internally. > > The change would look like this: > > > > private void recordJobShuffleInfo(JobID jobId, String user, > > Token<JobTokenIdentifier> jobToken) throws IOException { > > if (stateDb != null) { > > TokenProto tokenProto = ProtobufHelper.protoFromToken(jobToken); > > /*TokenProto tokenProto = TokenProto.newBuilder() > > .setIdentifier(ByteString.copyFrom(jobToken.getIdentifier())) > > .setPassword(ByteString.copyFrom(jobToken.getPassword())) > > .setKind(jobToken.getKind().toString()) > > .setService(jobToken.getService().toString()) > > .build();*/ > > JobShuffleInfoProto proto = JobShuffleInfoProto.newBuilder() > > .setUser(user).setJobToken(tokenProto).build(); > > try { > > stateDb.put(bytes(jobId.toString()), proto.toByteArray()); > > } catch (DBException e) { > > throw new IOException("Error storing " + jobId, e); > > } > > } > > addJobToken(jobId, user, jobToken); > > } > > > > > > HBase: > > > > 1. HBASE-23833 <https://issues.apache.org/jira/browse/HBASE-23833> > > (this > > is recently fixed in the master branch) > > 2. > > > > [ERROR] Failed to execute goal > > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > > (default-compile) on project hbase-server: Compilation failure: > > Compilation failure: > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputSaslHelper.java:[361,44] > > cannot access org.apache.hadoop.thirdparty.protobuf.MessageOrBuilder > > [ERROR] class file for > > org.apache.hadoop.thirdparty.protobuf.MessageOrBuilder not found > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputSaslHelper.java:[362,14] > > cannot access org.apache.hadoop.thirdparty.protobuf.GeneratedMessageV3 > > [ERROR] class file for > > org.apache.hadoop.thirdparty.protobuf.GeneratedMessageV3 not found > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputSaslHelper.java:[366,16] > > cannot access org.apache.hadoop.thirdparty.protobuf.ByteString > > [ERROR] class file for > > org.apache.hadoop.thirdparty.protobuf.ByteString not found > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputSaslHelper.java:[375,12] > > cannot find symbol > > [ERROR] symbol: method > > > > > writeDelimitedTo(org.apache.hbase.thirdparty.io.netty.buffer.ByteBufOutputStream) > > [ERROR] location: variable proto of type > > > > > org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.DataTransferEncryptorMessageProto > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputSaslHelper.java:[702,81] > > incompatible types: > > > > > org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.DataTransferEncryptorMessageProto > > cannot be converted to com.google.protobuf.MessageLite > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputHelper.java:[314,66] > > incompatible types: > > > > > org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.BlockOpResponseProto > > cannot be converted to com.google.protobuf.MessageLite > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputHelper.java:[330,81] > > cannot access org.apache.hadoop.thirdparty.protobuf.ProtocolMessageEnum > > [ERROR] class file for > > org.apache.hadoop.thirdparty.protobuf.ProtocolMessageEnum not found > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputHelper.java:[380,10] > > cannot find symbol > > [ERROR] symbol: method > > > > > writeDelimitedTo(org.apache.hbase.thirdparty.io.netty.buffer.ByteBufOutputStream) > > [ERROR] location: variable proto of type > > > org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutputHelper.java:[422,77] > > cannot access org.apache.hadoop.thirdparty.protobuf.Descriptors > > [ERROR] class file for > > org.apache.hadoop.thirdparty.protobuf.Descriptors not found > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/io/asyncfs/FanOutOneBlockAsyncDFSOutput.java:[323,64] > > incompatible types: > > org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.PipelineAckProto > > cannot be converted to com.google.protobuf.MessageLite > > [ERROR] > > > /Users/weichiu/sandbox/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/SyncReplicationWALProvider.java:[209,68] > > invalid method reference > > [ERROR] non-static method get() cannot be referenced from a > > static context > > > > > > Ozone: > > > > 17:01:19 2020/04/16 00:01:19 INFO : [ERROR] COMPILATION ERROR : > > 17:01:19 2020/04/16 00:01:19 INFO : [INFO] > > ------------------------------------------------------------- > > 17:01:19 2020/04/16 00:01:19 INFO : [ERROR] > > > > > /grid/0/jenkins/workspace/workspace/CDH-CANARY-parallel-centos7/SOURCES/ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ScmBlockLocationProtocolClientSideTranslatorPB.java:[110,47] > > incompatible types: com.google.protobuf.ServiceException cannot be > > converted to org.apache.hadoop.thirdparty.protobuf.ServiceException > > 17:01:19 2020/04/16 00:01:19 INFO : [ERROR] > > > > > /grid/0/jenkins/workspace/workspace/CDH-CANARY-parallel-centos7/SOURCES/ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:[116,47] > > incompatible types: com.google.protobuf.ServiceException cannot be > > converted to org.apache.hadoop.thirdparty.protobuf.ServiceException > > 17:01:19 2020/04/16 00:01:19 INFO : [INFO] 2 errors > > > > > > There's another error where Ozone uses the Hadoop RPC framework which > uses > > the hadoop.thirdparty protobuf. > > > > [ERROR] Failed to execute goal > > org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile > > (default-testCompile) on project hadoop-hdds-container-service: > Compilation > > failure > > [ERROR] > > > > > /Users/weichiu/sandbox/ozone/hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/SCMTestUtils.java:[103,41] > > incompatible types: com.google.protobuf.BlockingService cannot be > converted > > to org.apache.hadoop.thirdparty.protobuf.BlockingService > > > > BlockingService scmDatanodeService = > > StorageContainerDatanodeProtocolService. > > newReflectiveBlockingService( > > new StorageContainerDatanodeProtocolServerSideTranslatorPB( > > server, Mockito.mock(ProtocolMessageMetrics.class))); > > > > > > > > Ratis probably breaks as well since it depends on the Hadoop RPC > framework > > too. > > > > On Tue, Apr 28, 2020 at 10:58 PM Vinayakumar B <vinayakum...@apache.org> > > wrote: > > > > > hi Wei-Chiu, > > > > > > Can you elaborate on what failures you are facing related to relocated > > > protobuf classes.. ? > > > > > > IFAIK, if the issue with location of protobuf classes, still old jar > > > protobuf-2.5.0.jar will be available in classpath. So downstream > > depending > > > on 2.5.0 version of protobuf still be able to access them. > > > > > > -vinay > > > > > > On Wed, 29 Apr 2020, 11:17 am Wei-Chiu Chuang, <weic...@cloudera.com> > > > wrote: > > > > > >> I'm sorry for coming to this late. I missed this message. It should > have > > >> been a DISCUSS thread rather than NOTICE. > > >> > > >> Looks like this is inevitable. But we should make the downstream > > >> developers aware & make the update easier. As long as it is stated > > clearly > > >> how to update the code to support Hadoop 3.3, I am okay with that. > > >> > > >> Here's what I suggest: > > >> (1) label the jira incompatible (just updated the jira) and updated > the > > >> release note to tell app developer how to update. > > >> (2) declare ProtobufHelper a public API HADOOP-17019 > > >> <https://issues.apache.org/jira/browse/HADOOP-17019> > > >> > > >> Tez doesn't use the removed Token API, but there's code that breaks > with > > >> the relocated protobuf class. The ProtobufHelper API will make this > > >> transition much easier. > > >> > > >> Other downstreamers that break with the relocated protobuf include: > > Ozone > > >> and HBase. but neither of them use the removed Token API. > > >> > > >> > > >> On Wed, Jan 8, 2020 at 4:40 AM Vinayakumar B <vinayakum...@apache.org > > > > >> wrote: > > >> > > >>> Hi All, > > >>> > > >>> This mail is to notify about the Removal of following public APIs > > from > > >>> Hadoop Common. > > >>> > > >>> ClassName: org.apache.hadoop.security.token.Token > > >>> APIs: > > >>> public Token(TokenProto tokenPB); > > >>> public TokenProto toTokenProto(); > > >>> > > >>> Reason: These APIs are having Generated protobuf classes in the > > >>> signature. Right now due to protobuf upgrade in trunk (soon to be > 3.3.0 > > >>> release) these APIs are breaking the downstream builds, even though > > >>> downstreams dont use these APIs (just Loading Token class). > Downstreams > > >>> are > > >>> still referencing having older version (2.5.0) of protobuf, hence > build > > >>> is > > >>> being broken. > > >>> > > >>> These APIs were added for the internal purpose(HADOOP-12563), to > > >>> support serializing tokens using protobuf in UGI Credentials. > > >>> Same purpose can be achieved using the Helper classes without > > introducing > > >>> protobuf classes in API signatures. > > >>> > > >>> Token.java is marked as Evolving, so I believe APIs can be changed > > >>> whenever > > >>> absolute necessary. > > >>> > > >>> Jira https://issues.apache.org/jira/browse/HADOOP-16621 has been > > >>> reported to solve downstream build failure. > > >>> > > >>> So since this API was added for internal purpose easy approach to > solve > > >>> this is to remove APIs and use helper classes. Otherwise, as > mentioned > > in > > >>> HADOOP-16621, workaround will add unnecessary codes to be maintained. > > >>> > > >>> If anyone using these APIs outside hadoop project accidentally, > please > > >>> reply to this mail immediately. > > >>> > > >>> If no objection by next week, will go ahead with removal of above > said > > >>> APIs > > >>> in HADOOP-16621. > > >>> > > >>> -Vinay > > >>> > > >> > > >