[ https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708229#comment-14708229 ]
Josh Elser commented on ACCUMULO-3967: -------------------------------------- Summary of what happened across the two tservers. 3 of the import requests which had failed were never retried. These match up with the tablets that never had a file in the metadata table. It seems like BulkImporter is not correctly handling re-imports when a Tablet has moved. I'll have to dig some more into this -- this is a really bad bug (thanks so much for reporting, Edward!). {noformat} # Job create a single rfile to import 2015-08-22 19:14:24,453 [client.ClientServiceHandler] DEBUG: Got request to bulk import files to table(4): [hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf] 2015-08-22 19:14:24,516 [client.BulkImporter] DEBUG: Map file hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf found to overlap 24 tablets 2015-08-22 19:14:24,526 [client.BulkImporter] DEBUG: Estimated map files sizes in 0.01 secs 2015-08-22 19:14:24,527 [client.BulkImporter] DEBUG: Assigning 1 map files to 12 tablets at hw10447.local:49817 # Tabletserver asks the tabletserver to bulk load 2015-08-22 19:14:24,528 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 to bulk load { 4;02;01={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;05;04={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;06;05={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;14;13={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;15;14={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;17;16={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;18;17={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;20;19={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;21;20={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)} 4;23;22={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)}, } # Successful imports 2015-08-22 19:14:24,544 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,548 [tablet.DatafileManager] TABLET_HIST: 4;20;19 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,552 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,554 [tablet.DatafileManager] TABLET_HIST: 4;03;02 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,557 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,560 [tablet.DatafileManager] TABLET_HIST: 4;14;13 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,562 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,565 [tablet.DatafileManager] TABLET_HIST: 4<;23 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 828182 0 2015-08-22 19:14:24,568 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,570 [tablet.DatafileManager] TABLET_HIST: 4;15;14 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,573 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,575 [tablet.DatafileManager] TABLET_HIST: 4;05;04 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,578 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,581 [tablet.DatafileManager] TABLET_HIST: 4;06;05 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,583 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,586 [tablet.DatafileManager] TABLET_HIST: 4;18;17 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,588 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,591 [tablet.DatafileManager] TABLET_HIST: 4;17;16 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,593 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51083 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,596 [tablet.DatafileManager] TABLET_HIST: 4;21;20 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 # These are the two tablets which failed to get loaded 2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 4;02;01 because : Not Serving Tablet . Will retry ... 2015-08-22 19:14:24,598 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 4;23;22 because : Not Serving Tablet . Will retry ... # Tries again, original two failures are not included 2015-08-22 19:14:24,598 [client.BulkImporter] DEBUG: Assigning 1 map files to 12 tablets at localhost:64680 2015-08-22 19:14:24,599 [client.BulkImporter] DEBUG: Asking localhost:64680 to bulk load { 4;01<={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:724659)}, 4;04;03={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;07;06={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;08;07={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;09;08={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;10;09={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;11;10={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;16;15={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4;19;18={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)} 4;22;21={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, } # Again, most succeed, but not all 2015-08-22 19:14:24,602 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,605 [tablet.DatafileManager] TABLET_HIST: 4;13;12 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,607 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,610 [tablet.DatafileManager] TABLET_HIST: 4;01< import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 724659 0 2015-08-22 19:14:24,612 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,616 [tablet.DatafileManager] TABLET_HIST: 4;10;09 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,618 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,621 [tablet.DatafileManager] TABLET_HIST: 4;16;15 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,624 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,627 [tablet.DatafileManager] TABLET_HIST: 4;04;03 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,629 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,633 [tablet.DatafileManager] TABLET_HIST: 4;22;21 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,635 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1] 2015-08-22 19:14:24,638 [tablet.DatafileManager] TABLET_HIST: 4;07;06 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,640 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,644 [tablet.DatafileManager] TABLET_HIST: 4;08;07 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,646 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,649 [tablet.DatafileManager] TABLET_HIST: 4;09;08 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 2015-08-22 19:14:24,652 [tserver.TabletServer] DEBUG: ScanSess tid 192.168.2.40:51085 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1] 2015-08-22 19:14:24,655 [tablet.DatafileManager] TABLET_HIST: 4;19;18 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 # Another two tablets failed to import. Total of 4 failures need to be retried 2015-08-22 19:14:24,655 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 4;11;10 because : Not Serving Tablet . Will retry ... 2015-08-22 19:14:24,656 [client.BulkImporter] INFO : Could not assign 1 map files to tablet 4;12;11 because : Not Serving Tablet . Will retry ... # Retrying the failure 2015-08-22 19:14:28,660 [client.BulkImporter] DEBUG: Trying to assign 1 map files that previously failed on some key extents 2015-08-22 19:14:28,728 [client.BulkImporter] DEBUG: Estimated map files sizes in 0.01 secs 2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Assigning 1 map files to 3 tablets at hw10447.local:49817 # Retrying 3 of the import requests. 3 of the previous 4 failures are not included here, 1 of the 4 is. The other 2 were # successfully imported the first time. 2015-08-22 19:14:28,729 [client.BulkImporter] DEBUG: Asking hw10447.local:49817 to bulk load { 4;03;02={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)}, 4<;23={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:828182)}, 4;12;11={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)} } # Confirming to us that two of the files we tried to reimport were already done 2015-08-22 19:14:28,733 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file already imported: 4;03;02: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 2015-08-22 19:14:28,736 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file already imported: 4<;23: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf # We did get one more imported which had previously failed 2015-08-22 19:14:28,741 [tablet.DatafileManager] TABLET_HIST: 4;12;11 import hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf 776421 0 # Another cycle of failure retrying, again, trying to reimport one that didn't fail the first time 2015-08-22 19:14:28,741 [client.BulkImporter] DEBUG: Assigning 1 map files to 1 tablets at localhost:64680 2015-08-22 19:14:28,742 [client.BulkImporter] DEBUG: Asking localhost:64680 to bulk load { 4;13;12={hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf=MapFileInfo(estimatedSize:776421)} } # Again, telling us that. 2015-08-22 19:14:28,745 [tablet.DatafileManager] DEBUG: Ignoring request to re-import a file already imported: 4;13;12: hdfs://localhost:8020/accumulo/tables/4/b-000005h/I000005i.rf # Final summary 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: BULK IMPORT TIMING STATISTICS 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Files: [I000005i.rf] 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Examine map files : 0.08 secs 1.84% 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Query accumulo.metadata : 0.06 secs 1.40% 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Import Map Files : 0.15 secs 3.42% 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Sleep : 4.00 secs 93.29% 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Misc : 0.00 secs 0.05% 2015-08-22 19:14:28,746 [client.BulkImporter] DEBUG: Total : 4.29 secs {noformat} > bulk import loses records when loading pre-split table > ------------------------------------------------------ > > Key: ACCUMULO-3967 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3967 > Project: Accumulo > Issue Type: Bug > Components: client, tserver > Affects Versions: 1.7.0 > Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7 > 7 node cluster > Reporter: Edward Seidl > Priority: Blocker > Fix For: 1.7.1, 1.8.0 > > > I just noticed that some records I'm loading via importDirectory go missing. > After a lot of digging around trying to reproduce the problem, I discovered > that it occurs most frequently when loading a table that I have just recently > added splits to. In the tserver logs I'll see messages like > 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to > tablet 1xw;18;17 because : Not Serving Tablet . Will retry ... > > or > 20 16:25:44,826 [tserver.TabletServer] INFO : files > [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf] not imported to > 1xw;03;02: tablet 1xw;03;02 is closed > these appear after messages about unloading tablets...it seems that tablets > are being redistributed at the same time as the bulk import is occuring. > Steps to reproduce > 1) I run a mapreduce job that produces random data in rfiles > 2) copy the rfiles to an import directory > 3) create table or deleterows -f > 4) addsplits > 5) importdirectory > I have also performed the above completely within the mapreduce job, with > similar results. The difference with the mapreduce job is that the time > between adding splits and the import directory is minutes rather than seconds. > my current test creates 1000000 records, and after the importdirectory > returns a count of rows will be anywhere from ~800000 to 1000000. > With my original workflow, I found that re-importing the same set of rfiles > three times would eventually get all rows loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)