[ https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949373#comment-15949373 ]
Adam J Shook commented on ACCUMULO-4506: ---------------------------------------- This is popping up again and again. I'm unsure of the cause, but I think a reasonable workaround here is to add a configurable timeout to the physical replication work and run it with a {{Future}}. If it doesn't complete after 5 minutes (for example), kill the replication work and release the lock so it can be tried again. [~elserj] thoughts? > Some in-progress files for replication never replicate > ------------------------------------------------------- > > Key: ACCUMULO-4506 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4506 > Project: Accumulo > Issue Type: Bug > Components: replication > Affects Versions: 1.7.2 > Reporter: Adam J Shook > > We're seeing an issue with replication where two files have been in-progress > for a long time and based on the logs are not going to be replicated. The > metadata from the {{accumulo.replication}} table looks a little funky, with a > very large {{begin}} value. > *Logs* > {noformat} > 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: > Not queueing work for > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > to Remote Name: peer_instance Remote identifier: 5h Source Table ID: k > because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477314365827] doesn't need replication > 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: > Not queueing work for > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > to Remote Name: peer_instance Remote identifier: 5i Source Table ID: l > because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477052816174] doesn't need replication > {noformat} > *Replication table* > {noformat} > scan -r > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > -t accumulo.replication > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > repl:j [] [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: > 1477314369633] > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > repl:k [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: > true createdTime: 1477314365827] > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > repl:l [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: > true createdTime: 1477314365707] > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j > [] [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: > 1477314369633] > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k > [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477314365827] > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l > [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477314365707] > scan -r > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > -t accumulo.replication > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > repl:j [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: > true createdTime: 1477052819752] > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > repl:k [] [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: > 1477052816238] > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > repl:l [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: > true createdTime: 1477052816174] > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j > [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477052819752] > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k > [] [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: > 1477052816238] > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l > [] [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true > createdTime: 1477052816174] > {noformat} > *HDFS* > {noformat} > hdfs dfs -ls > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > -rwxr-xr-x 3 ubuntu supergroup 1117650900 2016-10-24 13:09 > hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 > -rwxr-xr-x 3 ubuntu supergroup 1171968390 2016-10-21 12:31 > hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)