Yes.  We have the hdfs rack-aware set up to divide the blocks equally.  And
according to the name node http page it doesn't look like those nodes have
a much higher number of blocks that nother nodes.

I can try temporarily shutting down one of the data nodes to see what that
does.

We did already lose a node on the cluster a few days ago.  I'm currently
waiting for the system administrators to replace a disk.

Thanks,

On Wed, Mar 15, 2023 at 5:59 PM Dave Marion <dmario...@gmail.com> wrote:

> sounds like you have a hot-spot on those two datanode hosts. Either because
> the blocks that it's writing to are all (or a majority) located there, or
> there is some type of issue with the host. Stopping the DN processes on
> those two hosts should confirm this, unless the hot spot moves. Do you have
> the HDFS rack script set up appropriately to distribute the blocks for
> files across the hosts?
>
> On Wed, Mar 15, 2023 at 5:52 PM Vincent Russell <vincent.russ...@gmail.com
> >
> wrote:
>
> > Hello,
> >
> > I am using accumulo 2.0.1 with hadoop 3.3.1.
> >
> > I have two identical clusters with 28 tservers.
> >
> > I have writers on both clusters which are set with 10 batch writers with
> a
> > max memory of 50m.
> >
> > However, one server is ingesting 10x faster than the other.
> >
> > Is there anything I should check for?
> >
> > I don't see any errors, but one thing that I noticed is that the slow
> site
> > has a lot of "Slow sync cost" info log messages from the tservers.
> >
> > I see these messages on the fast cluster as well, but they are far less.
> > It also appears that on the slow cluster these messages are occurring on
> > only two of the nodes in the cluster, where these messages appear to be
> > more spread out on the fast cluster.
> >
> > Thank you in advance for your help,
> > Vincent
> >
>

Reply via email to