I just noticed that swappiness is set to 60 at the slow site and 1 at the
other site.   I am going to work with the system administrators to change
this as soon as possible.

Thanks,

On Thu, Mar 16, 2023 at 9:34 AM Vincent Russell <vincent.russ...@gmail.com>
wrote:

> Thank you Dave.   I didn't take  look at the slow sync cost message when I
> shut those nodes down.  I just monitied the ingest speed.  I can try that
> again.
>
> I also shutdown the tserver on one of those slow sync cost nodes and ingst
> stopped for about 30 seconds and then continued at the same slow speed.
>
> Also according to the accumulo monitor the tablets are pretty
> evenly-distributed.
>
> I am going to try to move the node that's doing the ingesting to another
> host and see what happens.
>
> Thanks,
>
> On Wed, Mar 15, 2023 at 7:26 PM Dave Marion <dmario...@gmail.com> wrote:
>
>> When you shut down the two datanodes, did you have the same "slow sync
>> cost" messages concentrated on two nodes? If so, is it possible that a
>> majority of the writes are going to a small set of tablet servers? You
>> might be able to see this on the Monitor. Is it possible that tablets you
>> are ingesting are collocated instead of spread out?
>>
>> On Wed, Mar 15, 2023 at 7:01 PM Vincent Russell <
>> vincent.russ...@gmail.com>
>> wrote:
>>
>> > I stopped the two data nodes and it had no effect.
>> >
>> > Thanks,
>> >
>> > On Wed, Mar 15, 2023 at 6:53 PM Vincent Russell <
>> vincent.russ...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Yes.  We have the hdfs rack-aware set up to divide the blocks equally.
>> > > And according to the name node http page it doesn't look like those
>> nodes
>> > > have a much higher number of blocks that nother nodes.
>> > >
>> > > I can try temporarily shutting down one of the data nodes to see what
>> > that
>> > > does.
>> > >
>> > > We did already lose a node on the cluster a few days ago.  I'm
>> currently
>> > > waiting for the system administrators to replace a disk.
>> > >
>> > > Thanks,
>> > >
>> > > On Wed, Mar 15, 2023 at 5:59 PM Dave Marion <dmario...@gmail.com>
>> wrote:
>> > >
>> > >> sounds like you have a hot-spot on those two datanode hosts. Either
>> > >> because
>> > >> the blocks that it's writing to are all (or a majority) located
>> there,
>> > or
>> > >> there is some type of issue with the host. Stopping the DN processes
>> on
>> > >> those two hosts should confirm this, unless the hot spot moves. Do
>> you
>> > >> have
>> > >> the HDFS rack script set up appropriately to distribute the blocks
>> for
>> > >> files across the hosts?
>> > >>
>> > >> On Wed, Mar 15, 2023 at 5:52 PM Vincent Russell <
>> > >> vincent.russ...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Hello,
>> > >> >
>> > >> > I am using accumulo 2.0.1 with hadoop 3.3.1.
>> > >> >
>> > >> > I have two identical clusters with 28 tservers.
>> > >> >
>> > >> > I have writers on both clusters which are set with 10 batch writers
>> > >> with a
>> > >> > max memory of 50m.
>> > >> >
>> > >> > However, one server is ingesting 10x faster than the other.
>> > >> >
>> > >> > Is there anything I should check for?
>> > >> >
>> > >> > I don't see any errors, but one thing that I noticed is that the
>> slow
>> > >> site
>> > >> > has a lot of "Slow sync cost" info log messages from the tservers.
>> > >> >
>> > >> > I see these messages on the fast cluster as well, but they are far
>> > less.
>> > >> > It also appears that on the slow cluster these messages are
>> occurring
>> > on
>> > >> > only two of the nodes in the cluster, where these messages appear
>> to
>> > be
>> > >> > more spread out on the fast cluster.
>> > >> >
>> > >> > Thank you in advance for your help,
>> > >> > Vincent
>> > >> >
>> > >>
>> > >
>> >
>>
>

Reply via email to