[ 
https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708229#comment-14708229
 ] 

Josh Elser commented on ACCUMULO-3967:
--------------------------------------

Summary of what happened across the two tservers. 3 of the import requests 
which had failed were never retried. These match up with the tablets that never 
had a file in the metadata table.

It seems like BulkImporter is not correctly handling re-imports when a Tablet 
has moved. I'll have to dig some more into this -- this is a really bad bug 
(thanks so much for reporting, Edward!).

{noformat}
# Job create a single rfile to import

2015-08-22 19:14:24,453 [client.ClientServiceHandler] DEBUG: Got request to 
bulk import files to table(4): 
[hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf]
2015-08-22 19:14:24,516 [client.BulkImporter] DEBUG: Map file 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf found to overlap 
24 tablets
2015-08-22 19:14:24,526 [client.BulkImporter] DEBUG: Estimated map files sizes 
in   0.01 secs
2015-08-22 19:14:24,527 [client.BulkImporter] DEBUG: Assigning 1 map files to 
12 tablets at hw10447.local:49817

# Tabletserver asks the tabletserver to bulk load
2015-08-22 19:14:24,528 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 
to bulk load
{
  
4;02;01={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;05;04={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;06;05={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;14;13={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;15;14={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;17;16={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;18;17={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;20;19={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;21;20={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
  
4;23;22={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)},
}

# Successful imports

2015-08-22 19:14:24,544 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,548 [tablet.DatafileManager] TABLET_HIST: 4;20;19 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,552 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,554 [tablet.DatafileManager] TABLET_HIST: 4;03;02 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,557 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,560 [tablet.DatafileManager] TABLET_HIST: 4;14;13 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,562 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,565 [tablet.DatafileManager] TABLET_HIST: 4<;23 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 828182 0
2015-08-22 19:14:24,568 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,570 [tablet.DatafileManager] TABLET_HIST: 4;15;14 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,573 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,575 [tablet.DatafileManager] TABLET_HIST: 4;05;04 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,578 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,581 [tablet.DatafileManager] TABLET_HIST: 4;06;05 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,583 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,586 [tablet.DatafileManager] TABLET_HIST: 4;18;17 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,588 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,591 [tablet.DatafileManager] TABLET_HIST: 4;17;16 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,593 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,596 [tablet.DatafileManager] TABLET_HIST: 4;21;20 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0

# These are the two tablets which failed to get loaded
2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map 
files to tablet 4;02;01 because : Not Serving Tablet .  Will retry ...
2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map 
files to tablet 4;23;22 because : Not Serving Tablet .  Will retry ...

# Tries again, original two failures are not included

2015-08-22 19:14:24,598 [client.BulkImporter] DEBUG: Assigning 1 map files to 
12 tablets at localhost:64680
2015-08-22 19:14:24,599 [client.BulkImporter] DEBUG: Asking localhost:64680 to 
bulk load
{
  
4;01<={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:724659)},
  
4;04;03={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;07;06={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;08;07={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;09;08={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;10;09={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;11;10={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;16;15={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4;19;18={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
  
4;22;21={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
}

# Again, most succeed, but not all

2015-08-22 19:14:24,602 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,605 [tablet.DatafileManager] TABLET_HIST: 4;13;12 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,607 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,610 [tablet.DatafileManager] TABLET_HIST: 4;01< import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 724659 0
2015-08-22 19:14:24,612 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,616 [tablet.DatafileManager] TABLET_HIST: 4;10;09 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,618 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,621 [tablet.DatafileManager] TABLET_HIST: 4;16;15 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,624 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,627 [tablet.DatafileManager] TABLET_HIST: 4;04;03 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,629 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,633 [tablet.DatafileManager] TABLET_HIST: 4;22;21 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,635 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 
2015-08-22 19:14:24,638 [tablet.DatafileManager] TABLET_HIST: 4;07;06 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,640 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,644 [tablet.DatafileManager] TABLET_HIST: 4;08;07 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,646 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,649 [tablet.DatafileManager] TABLET_HIST: 4;09;08 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0
2015-08-22 19:14:24,652 [tserver.TabletServer] DEBUG: ScanSess tid 
192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 
2015-08-22 19:14:24,655 [tablet.DatafileManager] TABLET_HIST: 4;19;18 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0

# Another two tablets failed to import. Total of 4 failures need to be retried
2015-08-22 19:14:24,655 [client.BulkImporter] INFO : Could not assign 1 map 
files to tablet 4;11;10 because : Not Serving Tablet .  Will retry ...
2015-08-22 19:14:24,656 [client.BulkImporter] INFO : Could not assign 1 map 
files to tablet 4;12;11 because : Not Serving Tablet .  Will retry ...

# Retrying the failure

2015-08-22 19:14:28,660 [client.BulkImporter] DEBUG: Trying to assign 1 map 
files that previously failed on some key extents
2015-08-22 19:14:28,728 [client.BulkImporter] DEBUG: Estimated map files sizes 
in   0.01 secs
2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Assigning 1 map files to 3 
tablets at hw10447.local:49817

# Retrying 3 of the import requests. 3 of the previous 4 failures are not 
included here, 1 of the 4 is. The other 2 were
# successfully imported the first time.
2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 
to bulk load
{
  
4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)},
  
4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)},
  
4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
}

# Confirming to us that two of the files we tried to reimport were already done

2015-08-22 19:14:28,733 [tablet.DatafileManager] DEBUG: Ignoring request to 
re-import a file already imported: 4;03;02: 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf
2015-08-22 19:14:28,736 [tablet.DatafileManager] DEBUG: Ignoring request to 
re-import a file already imported: 4<;23: 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf

# We did get one more imported which had previously failed 

2015-08-22 19:14:28,741 [tablet.DatafileManager] TABLET_HIST: 4;12;11 import 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0

# Another cycle of failure retrying, again, trying to reimport one that didn't 
fail the first time

2015-08-22 19:14:28,741 [client.BulkImporter] DEBUG: Assigning 1 map files to 1 
tablets at localhost:64680
2015-08-22 19:14:28,742 [client.BulkImporter] DEBUG: Asking localhost:64680 to 
bulk load
{
  
4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}
}

# Again, telling us that.

2015-08-22 19:14:28,745 [tablet.DatafileManager] DEBUG: Ignoring request to 
re-import a file already imported: 4;13;12: 
hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf

# Final summary

2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: BULK IMPORT TIMING 
STATISTICS
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Files: [I000005i.rf]
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Examine map files    :     
  0.08 secs   1.84%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Query accumulo.metadata :  
     0.06 secs   1.40%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Import Map Files     :     
  0.15 secs   3.42%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Sleep                :     
  4.00 secs  93.29%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Misc                 :     
  0.00 secs   0.05%
2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Total                :     
  4.29 secs
{noformat}

> bulk import loses records when loading pre-split table
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3967
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3967
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>    Affects Versions: 1.7.0
>         Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7
> 7 node cluster
>            Reporter: Edward Seidl
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>
> I just noticed that some records I'm loading via importDirectory go missing.  
> After a lot of digging around trying to reproduce the problem, I discovered 
> that it occurs most frequently when loading a table that I have just recently 
> added splits to.  In the tserver logs I'll see messages like 
> 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to 
> tablet 1xw;18;17 because : Not Serving Tablet .  Will retry ...
>  
> or
> 20 16:25:44,826 [tserver.TabletServer] INFO : files 
> [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf] not imported to 
> 1xw;03;02: tablet 1xw;03;02 is closed
> these appear after messages about unloading tablets...it seems that tablets 
> are being redistributed at the same time as the bulk import is occuring.
> Steps to reproduce
> 1) I run a mapreduce job that produces random data in rfiles
> 2) copy the rfiles to an import directory
> 3) create table or deleterows -f
> 4) addsplits
> 5) importdirectory
> I have also performed the above completely within the mapreduce job, with 
> similar results.  The difference with the mapreduce job is that the time 
> between adding splits and the import directory is minutes rather than seconds.
> my current test creates 1000000 records, and after the importdirectory 
> returns a count of rows will be anywhere from ~800000 to 1000000.
> With my original workflow, I found that re-importing the same set of rfiles 
> three times would eventually get all rows loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to