I just noticed that swappiness is set to 60 at the slow site and 1 at the other site. I am going to work with the system administrators to change this as soon as possible.
Thanks, On Thu, Mar 16, 2023 at 9:34 AM Vincent Russell <vincent.russ...@gmail.com> wrote: > Thank you Dave. I didn't take look at the slow sync cost message when I > shut those nodes down. I just monitied the ingest speed. I can try that > again. > > I also shutdown the tserver on one of those slow sync cost nodes and ingst > stopped for about 30 seconds and then continued at the same slow speed. > > Also according to the accumulo monitor the tablets are pretty > evenly-distributed. > > I am going to try to move the node that's doing the ingesting to another > host and see what happens. > > Thanks, > > On Wed, Mar 15, 2023 at 7:26 PM Dave Marion <dmario...@gmail.com> wrote: > >> When you shut down the two datanodes, did you have the same "slow sync >> cost" messages concentrated on two nodes? If so, is it possible that a >> majority of the writes are going to a small set of tablet servers? You >> might be able to see this on the Monitor. Is it possible that tablets you >> are ingesting are collocated instead of spread out? >> >> On Wed, Mar 15, 2023 at 7:01 PM Vincent Russell < >> vincent.russ...@gmail.com> >> wrote: >> >> > I stopped the two data nodes and it had no effect. >> > >> > Thanks, >> > >> > On Wed, Mar 15, 2023 at 6:53 PM Vincent Russell < >> vincent.russ...@gmail.com >> > > >> > wrote: >> > >> > > Yes. We have the hdfs rack-aware set up to divide the blocks equally. >> > > And according to the name node http page it doesn't look like those >> nodes >> > > have a much higher number of blocks that nother nodes. >> > > >> > > I can try temporarily shutting down one of the data nodes to see what >> > that >> > > does. >> > > >> > > We did already lose a node on the cluster a few days ago. I'm >> currently >> > > waiting for the system administrators to replace a disk. >> > > >> > > Thanks, >> > > >> > > On Wed, Mar 15, 2023 at 5:59 PM Dave Marion <dmario...@gmail.com> >> wrote: >> > > >> > >> sounds like you have a hot-spot on those two datanode hosts. Either >> > >> because >> > >> the blocks that it's writing to are all (or a majority) located >> there, >> > or >> > >> there is some type of issue with the host. Stopping the DN processes >> on >> > >> those two hosts should confirm this, unless the hot spot moves. Do >> you >> > >> have >> > >> the HDFS rack script set up appropriately to distribute the blocks >> for >> > >> files across the hosts? >> > >> >> > >> On Wed, Mar 15, 2023 at 5:52 PM Vincent Russell < >> > >> vincent.russ...@gmail.com> >> > >> wrote: >> > >> >> > >> > Hello, >> > >> > >> > >> > I am using accumulo 2.0.1 with hadoop 3.3.1. >> > >> > >> > >> > I have two identical clusters with 28 tservers. >> > >> > >> > >> > I have writers on both clusters which are set with 10 batch writers >> > >> with a >> > >> > max memory of 50m. >> > >> > >> > >> > However, one server is ingesting 10x faster than the other. >> > >> > >> > >> > Is there anything I should check for? >> > >> > >> > >> > I don't see any errors, but one thing that I noticed is that the >> slow >> > >> site >> > >> > has a lot of "Slow sync cost" info log messages from the tservers. >> > >> > >> > >> > I see these messages on the fast cluster as well, but they are far >> > less. >> > >> > It also appears that on the slow cluster these messages are >> occurring >> > on >> > >> > only two of the nodes in the cluster, where these messages appear >> to >> > be >> > >> > more spread out on the fast cluster. >> > >> > >> > >> > Thank you in advance for your help, >> > >> > Vincent >> > >> > >> > >> >> > > >> > >> >