[ 
https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949373#comment-15949373
 ] 

Adam J Shook commented on ACCUMULO-4506:
----------------------------------------

This is popping up again and again.  I'm unsure of the cause, but I think a 
reasonable workaround here is to add a configurable timeout to the physical 
replication work and run it with a {{Future}}.  If it doesn't complete after 5 
minutes (for example), kill the replication work and release the lock so it can 
be tried again.

[~elserj] thoughts?

>  Some in-progress files for replication never replicate
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4506
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4506
>             Project: Accumulo
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 1.7.2
>            Reporter: Adam J Shook
>
> We're seeing an issue with replication where two files have been in-progress 
> for a long time and based on the logs are not going to be replicated.  The 
> metadata from the {{accumulo.replication}} table looks a little funky, with a 
> very large {{begin}} value.
> *Logs*
> {noformat}
> 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: 
> Not queueing work for 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> to Remote Name: peer_instance Remote identifier: 5h Source Table ID: k 
> because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365827] doesn't need replication
> 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: 
> Not queueing work for 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> to Remote Name: peer_instance Remote identifier: 5i Source Table ID: l 
> because [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052816174] doesn't need replication
> {noformat}
> *Replication table*
> {noformat}
> scan -r 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> -t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:j []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:k []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> repl:l []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477314365707]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
>  []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477314365707]
> scan -r 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> -t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:j []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:k []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> repl:l []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: 
> true createdTime: 1477052816174]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
>  []    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 
> 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 
> work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
>  []    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true 
> createdTime: 1477052816174]
> {noformat}
> *HDFS*
> {noformat}
> hdfs dfs -ls 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> -rwxr-xr-x   3 ubuntu supergroup 1117650900 2016-10-24 13:09 
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
> -rwxr-xr-x   3 ubuntu supergroup 1171968390 2016-10-21 12:31 
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to