I started investigating this today. I quickly noticed two things. First when I scanned the table for some of the data reported as missing by the MR job it was there. Second the range of the supposed missing data covered the entire table. Then I realized we never stopped ingest, so during verification ingest was still running. This would cause data to report as missing because mappers read different parts of the table at different times.
I was advising Karthick on how to run the test and I gave some really bad advice, I forgot to mention to stop ingest. Sorry everyone, please forgive my blunder. I stopped ingest and started the verification job again. It has more data to verify than last time, there are 71B entries in the table now. I also enabled snappy compression of map output as a precaution to ensure there is enough space. Was planning on less data when allocating the cluster, but I think there is enough space to run the job though. The cluster has 9TB of 25TB used. I'll report back when the job finishes. On Sun, Aug 30, 2020 at 12:07 PM karthick rn <[email protected]> wrote: > > -1 > > Keith and I tested the continuous ingest with agitation on 11 nodes (9 > workers) for 24 hrs and noticed the following after running the MapReduce > verify job. The "undefined" counter is greater than 0 may be indicating > data loss. > > org.apache.accumulo.test.continuous.ContinuousVerify$Counts > REFERENCED=35470816664 > * UNDEFINED=707335949* > UNREFERENCED=715995424 > > The cluster was setup using Muchos (https://github.com/apache/fluo-muchos) > & following are the details > > * 11 Azure VMs (Standard D8s) > * Managed Disk – 3 x 1024g per VM > * Hadoop 2.10.0 > * Accumulo 1.10.0-rc2 > * Java 8 > * CentOS7.5 > > Created the table 'ci' and split using the below commands > $ accumulo org.apache.accumulo.test.continuous.GenSplits 90 > splits.txt > $ accumulo shell -u root -p secret -e 'createtable ci -sf splits.txt' > > Below is the agitator options used for the test > nohup ./tserver-agitator.pl 1:10 1:10 1 3 > logs/tserver-ag.out 2> > logs/tserver-ag.err & > > We will investigate the data loss and share our findings. > > Thanks, > Karthick > > On Sat, 29 Aug 2020 at 23:32, Keith Turner <[email protected]> wrote: > > > Karthick and I are working together to run random walk and continuous > > ingest on two clusters using this RC. After continuous ingest ran for > > 24 hrs we tried to start verification and ran into the following > > issue. I plan to vote after the test completes, but wanted to let > > anyone else running continuous ingest know about this. > > > > https://github.com/apache/accumulo/issues/1695 > > > > We also ran into another issue w/ the verification script related to > > ZK 3.5 that we worked around w/ a hack (Karthick copied the ZK jar > > from $ZK_HOME/lib to $ZK_HOME so the script could find the jars :). > > Need to open an issue about this too. > > > > On Thu, Aug 27, 2020 at 12:36 PM Mike Miller <[email protected]> wrote: > > > > > > Accumulo Developers, > > > > > > Please consider the following candidate for Apache Accumulo 1.10.0. > > > > > > Git Commit: > > > 4d261254c3ac43a3bd13ce974e91ce4303a83998 > > > Branch: > > > 1.10.0-rc2 > > > > > > If this vote passes, a gpg-signed tag will be created using: > > > git tag -f -m 'Apache Accumulo 1.10.0' -s rel/1.10.0 \ > > > 4d261254c3ac43a3bd13ce974e91ce4303a83998 > > > > > > Staging repo: > > > > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1086 > > > Source (official release artifact): > > > > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-src.tar.gz > > > Binary: > > > > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/org/apache/accumulo/accumulo/1.10.0/accumulo-1.10.0-bin.tar.gz > > > > > > Append ".asc" to download the cryptographic signature for a given > > artifact. > > > (You can also append ".sha1" or ".md5" instead in order to verify the > > > checksums > > > generated by Maven to verify the integrity of the Nexus repository > > staging > > > area.) > > > > > > Signing keys are available at https://www.apache.org/dist/accumulo/KEYS > > > (Expected fingerprint: 1914AF6FE2C53672C87CE1DADC8FFDC342894E89) > > > > > > In addition to the tarballs and their signatures, the following checksum > > > files will be added to the dist/release SVN area after release: > > > accumulo-1.10.0-src.tar.gz.sha512 will contain: > > > SHA512 (accumulo-1.10.0-src.tar.gz) = > > > > > 81f2a8f8273e2bdfe46d6a807dc38276ee2937ced648829648b7750bfc22816c13d43461d1b08c50a6957d78a999ae3109c93d2f31c7d8be116e91e0ea25f5c2 > > > accumulo-1.10.0-bin.tar.gz.sha512 will contain: > > > SHA512 (accumulo-1.10.0-bin.tar.gz) = > > > > > 9d3023c8724069282035ed6dcb047f737c1c53dc05f7b15da2cfd941f51d1d7720892496475430eb639f3a36c83f4eecc1942c0317c67d38dcf2061d06beb648 > > > > > > Release notes (in progress) can be found at: > > > https://accumulo.apache.org/release/accumulo-1.10.0/ > > > > > > Release testing instructions: > > > https://accumulo.apache.org/contributor/verifying-release > > > > > > Please vote one of: > > > [ ] +1 - I have verified and accept... > > > [ ] +0 - I have reservations, but not strong enough to vote against... > > > [ ] -1 - Because..., I do not accept... > > > ... these artifacts as the 1.10.0 release of Apache Accumulo. > > > > > > This vote will remain open until at least Sun Aug 30 16:30:00 UTC 2020. > > > (Sun Aug 30 12:30:00 EDT 2020 / Sun Aug 30 09:30:00 PDT 2020) > > > Voting can continue after this deadline until the release manager > > > sends an email ending the vote. > > > > > > Thanks! > > > > > > P.S. Hint: download the whole staging repo with > > > wget -erobots=off -r -l inf -np -nH \ > > > > > > > > https://repository.apache.org/content/repositories/orgapacheaccumulo-1086/ > > > # note the trailing slash is needed > >
