[ 
https://issues.apache.org/jira/browse/HDDS-1636?focusedWorklogId=255982&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-255982
 ]

ASF GitHub Bot logged work on HDDS-1636:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/19 15:23
            Start Date: 07/Jun/19 15:23
    Worklog Time Spent: 10m 
      Work Description: hadoop-yetus commented on issue #895: HDDS-1636. 
Tracing id is not propagated via async datanode grpc call
URL: https://github.com/apache/hadoop/pull/895#issuecomment-499928283
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | 0 | reexec | 46 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 7 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 28 | Maven dependency ordering for branch |
   | +1 | mvninstall | 588 | trunk passed |
   | +1 | compile | 318 | trunk passed |
   | +1 | checkstyle | 88 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 966 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 171 | trunk passed |
   | 0 | spotbugs | 363 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 555 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 30 | Maven dependency ordering for patch |
   | +1 | mvninstall | 503 | the patch passed |
   | +1 | compile | 309 | the patch passed |
   | +1 | javac | 309 | the patch passed |
   | +1 | checkstyle | 83 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | -1 | whitespace | 0 | The patch has 1 line(s) that end in whitespace. Use 
git apply --whitespace=fix <<patch_file>>. Refer 
https://git-scm.com/docs/git-apply |
   | +1 | shadedclient | 723 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 169 | the patch passed |
   | +1 | findbugs | 616 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 175 | hadoop-hdds in the patch failed. |
   | -1 | unit | 1653 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 51 | The patch does not generate ASF License warnings. |
   | | | 7252 | |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.ozone.container.common.impl.TestHddsDispatcher 
|
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.TestStorageContainerManager |
   |   | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestBCSID |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/895 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 68a40c51a9aa 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon 
Mar 18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 14552d1 |
   | Default Java | 1.8.0_212 |
   | whitespace | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/artifact/out/whitespace-eol.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/testReport/ |
   | Max. process+thread count | 5143 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/client hadoop-hdds/common hadoop-ozone/client 
hadoop-ozone/integration-test hadoop-ozone/objectstore-service 
hadoop-ozone/ozone-manager U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-895/6/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 255982)
    Time Spent: 2h  (was: 1h 50m)

> Tracing id is not propagated via async datanode grpc call
> ---------------------------------------------------------
>
>                 Key: HDDS-1636
>                 URL: https://issues.apache.org/jira/browse/HDDS-1636
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Recently a new exception become visible in the datanode logs, using standard 
> freon (STANDLAONE)
> {code}
> datanode_2  | 2019-06-03 12:18:21 WARN  
> PropagationRegistry$ExceptionCatchingExtractorDecorator:60 - Error when 
> extracting SpanContext from carrier. Handling gracefully.
> datanode_2  | 
> io.jaegertracing.internal.exceptions.MalformedTracerStateStringException: 
> String does not match tracer state format: 
> 7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
> datanode_2  |         at 
> io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
> datanode_2  |         at 
> io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
> datanode_2  |         at 
> io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
> datanode_2  |         at 
> io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> datanode_2  |         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> datanode_2  |         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> {code}
> It turned out that the tracingId propagation between XCeiverClient and Server 
> doesn't work very well (in case of Standalone and async commands)
>  1. there are many places (on the client side) where the traceId filled with  
> UUID.randomUUID().toString();  
>  2. This random id is propagated between the Output/InputStream and different 
> part of the clients
>  3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc 
> the traceId field is overridden with the real opentracing id anyway 
> (sendCommand/sendCommandAsync)
>  4. Except in the XceiverClientGrpc.sendCommandAsync where this part is 
> accidentally missing.
> Things to fix:
>  1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with 
> the good one)
>  2. remove the usage of the UUID based traceId (it's not used)
>  3. Improve the error logging in case of an invalid traceId on the server 
> side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to