[ https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708220#comment-14708220 ]
Josh Elser commented on ACCUMULO-3967: -------------------------------------- Looking at metadata, it appears we didn't get the reference for three tablets: {{4;02}}, {{4;11}}, {{4;23}}: {noformat} 4;01 file:hdfs://localhost:8020/accumulo/tables/4/t-0000030/A000003n.rf [] 776344,41660 4;03 file:hdfs://localhost:8020/accumulo/tables/4/t-000002v/A00000at.rf [] 773215,41469 4;04 file:hdfs://localhost:8020/accumulo/tables/4/t-000002z/A000003m.rf [] 775453,41598 4;05 file:hdfs://localhost:8020/accumulo/tables/4/t-0000038/A00000as.rf [] 772770,41451 4;06 file:hdfs://localhost:8020/accumulo/tables/4/t-000002s/A00000ar.rf [] 776667,41680 4;07 file:hdfs://localhost:8020/accumulo/tables/4/t-000002x/A000003o.rf [] 779467,41810 4;08 file:hdfs://localhost:8020/accumulo/tables/4/t-0000035/A000003p.rf [] 776883,41688 4;09 file:hdfs://localhost:8020/accumulo/tables/4/t-000002w/A000003q.rf [] 775616,41611 4;10 file:hdfs://localhost:8020/accumulo/tables/4/t-000002y/A000003r.rf [] 782617,41975 4;12 file:hdfs://localhost:8020/accumulo/tables/4/t-000002q/A00000au.rf [] 772907,41461 4;13 file:hdfs://localhost:8020/accumulo/tables/4/t-0000034/A000003s.rf [] 773722,41509 4;14 file:hdfs://localhost:8020/accumulo/tables/4/t-000003b/A00000av.rf [] 773786,41518 4;15 file:hdfs://localhost:8020/accumulo/tables/4/t-000002t/A00000aw.rf [] 778756,41789 4;16 file:hdfs://localhost:8020/accumulo/tables/4/t-0000033/A000003t.rf [] 772805,41459 4;17 file:hdfs://localhost:8020/accumulo/tables/4/t-000003c/A00000ax.rf [] 776262,41643 4;18 file:hdfs://localhost:8020/accumulo/tables/4/t-000002r/A00000ay.rf [] 778681,41774 4;19 file:hdfs://localhost:8020/accumulo/tables/4/t-0000031/A000003u.rf [] 774599,41555 4;20 file:hdfs://localhost:8020/accumulo/tables/4/t-0000036/A00000az.rf [] 781791,41936 4;21 file:hdfs://localhost:8020/accumulo/tables/4/t-000002u/A00000b0.rf [] 785018,42116 4;22 file:hdfs://localhost:8020/accumulo/tables/4/t-0000032/A000003v.rf [] 780546,41877 4< file:hdfs://localhost:8020/accumulo/tables/4/default_tablet/A00000b1.rf [] 778454,41757 {noformat} .. given 24 tablets in the table: {noformat} root@accumulo17 accumulo.metadata> getsplits -t loadtest.T_Test 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 {noformat} > bulk import loses records when loading pre-split table > ------------------------------------------------------ > > Key: ACCUMULO-3967 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3967 > Project: Accumulo > Issue Type: Bug > Components: client, tserver > Affects Versions: 1.7.0 > Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7 > 7 node cluster > Reporter: Edward Seidl > Priority: Blocker > Fix For: 1.7.1, 1.8.0 > > > I just noticed that some records I'm loading via importDirectory go missing. > After a lot of digging around trying to reproduce the problem, I discovered > that it occurs most frequently when loading a table that I have just recently > added splits to. In the tserver logs I'll see messages like > 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to > tablet 1xw;18;17 because : Not Serving Tablet . Will retry ... > > or > 20 16:25:44,826 [tserver.TabletServer] INFO : files > [hdfs://xxxx:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf] not imported to > 1xw;03;02: tablet 1xw;03;02 is closed > these appear after messages about unloading tablets...it seems that tablets > are being redistributed at the same time as the bulk import is occuring. > Steps to reproduce > 1) I run a mapreduce job that produces random data in rfiles > 2) copy the rfiles to an import directory > 3) create table or deleterows -f > 4) addsplits > 5) importdirectory > I have also performed the above completely within the mapreduce job, with > similar results. The difference with the mapreduce job is that the time > between adding splits and the import directory is minutes rather than seconds. > my current test creates 1000000 records, and after the importdirectory > returns a count of rows will be anywhere from ~800000 to 1000000. > With my original workflow, I found that re-importing the same set of rfiles > three times would eventually get all rows loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)