When you shut down the two datanodes, did you have the same "slow sync
cost" messages concentrated on two nodes? If so, is it possible that a
majority of the writes are going to a small set of tablet servers? You
might be able to see this on the Monitor. Is it possible that tablets you
are ingesting are collocated instead of spread out?

On Wed, Mar 15, 2023 at 7:01 PM Vincent Russell <vincent.russ...@gmail.com>
wrote:

> I stopped the two data nodes and it had no effect.
>
> Thanks,
>
> On Wed, Mar 15, 2023 at 6:53 PM Vincent Russell <vincent.russ...@gmail.com
> >
> wrote:
>
> > Yes.  We have the hdfs rack-aware set up to divide the blocks equally.
> > And according to the name node http page it doesn't look like those nodes
> > have a much higher number of blocks that nother nodes.
> >
> > I can try temporarily shutting down one of the data nodes to see what
> that
> > does.
> >
> > We did already lose a node on the cluster a few days ago.  I'm currently
> > waiting for the system administrators to replace a disk.
> >
> > Thanks,
> >
> > On Wed, Mar 15, 2023 at 5:59 PM Dave Marion <dmario...@gmail.com> wrote:
> >
> >> sounds like you have a hot-spot on those two datanode hosts. Either
> >> because
> >> the blocks that it's writing to are all (or a majority) located there,
> or
> >> there is some type of issue with the host. Stopping the DN processes on
> >> those two hosts should confirm this, unless the hot spot moves. Do you
> >> have
> >> the HDFS rack script set up appropriately to distribute the blocks for
> >> files across the hosts?
> >>
> >> On Wed, Mar 15, 2023 at 5:52 PM Vincent Russell <
> >> vincent.russ...@gmail.com>
> >> wrote:
> >>
> >> > Hello,
> >> >
> >> > I am using accumulo 2.0.1 with hadoop 3.3.1.
> >> >
> >> > I have two identical clusters with 28 tservers.
> >> >
> >> > I have writers on both clusters which are set with 10 batch writers
> >> with a
> >> > max memory of 50m.
> >> >
> >> > However, one server is ingesting 10x faster than the other.
> >> >
> >> > Is there anything I should check for?
> >> >
> >> > I don't see any errors, but one thing that I noticed is that the slow
> >> site
> >> > has a lot of "Slow sync cost" info log messages from the tservers.
> >> >
> >> > I see these messages on the fast cluster as well, but they are far
> less.
> >> > It also appears that on the slow cluster these messages are occurring
> on
> >> > only two of the nodes in the cluster, where these messages appear to
> be
> >> > more spread out on the fast cluster.
> >> >
> >> > Thank you in advance for your help,
> >> > Vincent
> >> >
> >>
> >
>

Reply via email to