Re: [Gluster-devel] [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-04 Thread Raghavendra Gowdappa
+Gluster Devel , +Gluster-users


I would like to point out another issue. Even if what I suggested prevents
disconnects, part of the solution would be only symptomatic treatment and
doesn't address the root cause of the problem. In most of the
ping-timer-expiry issues, the root cause is the increased load on bricks
and the inability of bricks to be responsive under high load. So, the
actual solution would be doing any or both of the following:
* identify the source of increased load and if possible throttle it.
Internal heal processes like self-heal, rebalance, quota heal are known to
pump traffic into bricks without much throttling (io-threads _might_ do
some throttling, but my understanding is its not sufficient).
* identify the reason for bricks to become unresponsive during load. This
may be fixable issues like not enough event-threads to read from network or
difficult to fix issues like fsync on backend fs freezing the process or
semi fixable issues (in code) like lock contention.

So any genuine effort to fix ping-timer-issues (to be honest most of the
times they are not issues related to rpc/network) would involve performance
characterization of various subsystems on bricks and clients. Various
subsystems can include (but not necessarily limited to), underlying
OS/filesystem, glusterfs processes, CPU consumption etc

regards,
Raghavendra

On Mon, Mar 4, 2019 at 9:31 PM Mauro Tridici  wrote:

> Thank you, let’s try!
> I will inform you about the effects of the change.
>
> Regards,
> Mauro
>
> On 4 Mar 2019, at 16:55, Raghavendra Gowdappa  wrote:
>
>
>
> On Mon, Mar 4, 2019 at 8:54 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> thank you for your reply.
>> Yes, you are right. It is a problem that seems to happen randomly.
>> At this moment, server.event-threads value is 4. I will try to increase
>> this value to 8. Do you think that it could be a valid value ?
>>
>
> Yes. We can try with that. You should see at least frequency of ping-timer
> related disconnects  reduce with this value (even if it doesn't eliminate
> the problem completely).
>
>
>> Regards,
>> Mauro
>>
>>
>> On 4 Mar 2019, at 15:36, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Mon, Mar 4, 2019 at 8:01 PM Nithya Balachandran 
>> wrote:
>>
>>> Hi Mauro,
>>>
>>> It looks like some problem on s06. Are all your other nodes ok? Can you
>>> send us the gluster logs from this node?
>>>
>>> @Raghavendra G  , do you have any idea as to
>>> how this can be debugged? Maybe running top ? Or debug brick logs?
>>>
>>
>> If we can reproduce the problem, collecting tcpdump on both ends of
>> connection will help. But, one common problem is these bugs are
>> inconsistently reproducible and hence we may not be able to capture tcpdump
>> at correct intervals. Other than that, we can try to collect some evidence
>> that poller threads were busy (waiting on locks). But, not sure what debug
>> data provides that information.
>>
>> From what I know, its difficult to collect evidence for this issue and we
>> could only reason about it.
>>
>> We can try a workaround though - try increasing server.event-threads and
>> see whether ping-timer expiry issues go away with an optimal value. If
>> that's the case, it kind of provides proof for our hypothesis.
>>
>>
>>>
>>> Regards,
>>> Nithya
>>>
>>> On Mon, 4 Mar 2019 at 15:25, Mauro Tridici 
>>> wrote:
>>>
 Hi All,

 some minutes ago I received this message from NAGIOS server













 ** Nagios *Notification Type: PROBLEMService: Brick -
 /gluster/mnt2/brickHost: s06Address: s06-stgState: CRITICALDate/Time: Mon
 Mar 4 10:25:33 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket
 timeout after 10 seconds.*

 I checked the network, RAM and CPUs usage on s06 node and everything
 seems to be ok.
 No bricks are in error state. In /var/log/messages, I detected again a
 crash of “check_vol_utili” that I think it is a module used by NRPE
 executable (that is the NAGIOS client).

 Mar  4 10:15:29 s06 kernel: traps: check_vol_utili[161224] general
 protection ip:7facffa0a66d sp:7ffe9f4e6fc0 error:0 in
 libglusterfs.so.0.0.1[7facff9b7000+f7000]
 Mar  4 10:15:29 s06 abrt-hook-ccpp: Process 161224 (python2.7) of user
 0 killed by SIGSEGV - dumping core
 Mar  4 10:15:29 s06 abrt-server: Generating core_backtrace
 Mar  4 10:15:29 s06 abrt-server: Error: Unable to open './coredump': No
 such file or directory
 Mar  4 10:16:01 s06 systemd: Created slice User Slice of root.
 Mar  4 10:16:01 s06 systemd: Starting User Slice of root.
 Mar  4 10:16:01 s06 systemd: Started Session 201010 of user root.
 Mar  4 10:16:01 s06 systemd: Starting Session 201010 of user root.
 Mar  4 10:16:01 s06 systemd: Removed slice User Slice of root.
 Mar  4 10:16:01 s06 systemd: Stopping User Slice of root.
 Mar  4 10:16:24 s06 abrt-server: Duplicate: UUID
 

Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-04 Thread Shyam Ranganathan
On 3/4/19 10:08 AM, Atin Mukherjee wrote:
> 
> 
> On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan
> mailto:atumb...@redhat.com>> wrote:
> 
> Thanks to those who participated.
> 
> Update at present:
> 
> We found 3 blocker bugs in upgrade scenarios, and hence have marked
> release
> as pending upon them. We will keep these lists updated about progress.
> 
> 
> I’d like to clarify that upgrade testing is blocked. So just fixing
> these test blocker(s) isn’t enough to call release-6 green. We need to
> continue and finish the rest of the upgrade tests once the respective
> bugs are fixed.

Based on fixes expected by tomorrow for the upgrade fixes, we will build
an RC1 candidate on Wednesday (6-Mar) (tagging early Wed. Eastern TZ).
This RC can be used for further testing.

> 
> 
> 
> -Amar
> 
> On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com > wrote:
> 
> > Hi all,
> >
> > We are calling out our users, and developers to contribute in
> validating
> > ‘glusterfs-6.0rc’ build in their usecase. Specially for the cases of
> > upgrade, stability, and performance.
> >
> > Some of the key highlights of the release are listed in release-notes
> > draft
> >
> 
> .
> > Please note that there are some of the features which are being
> dropped out
> > of this release, and hence making sure your setup is not going to
> have an
> > issue is critical. Also the default lru-limit option in fuse mount for
> > Inodes should help to control the memory usage of client
> processes. All the
> > good reason to give it a shot in your test setup.
> >
> > If you are developer using gfapi interface to integrate with other
> > projects, you also have some signature changes, so please make
> sure your
> > project would work with latest release. Or even if you are using a
> project
> > which depends on gfapi, report the error with new RPMs (if any).
> We will
> > help fix it.
> >
> > As part of test days, we want to focus on testing the latest upcoming
> > release i.e. GlusterFS-6, and one or the other gluster volunteers
> would be
> > there on #gluster channel on freenode to assist the people. Some
> of the key
> > things we are looking as bug reports are:
> >
> >    -
> >
> >    See if upgrade from your current version to 6.0rc is smooth,
> and works
> >    as documented.
> >    - Report bugs in process, or in documentation if you find mismatch.
> >    -
> >
> >    Functionality is all as expected for your usecase.
> >    - No issues with actual application you would run on production
> etc.
> >    -
> >
> >    Performance has not degraded in your usecase.
> >    - While we have added some performance options to the code, not
> all of
> >       them are turned on, as they have to be done based on usecases.
> >       - Make sure the default setup is at least same as your current
> >       version
> >       - Try out few options mentioned in release notes (especially,
> >       --auto-invalidation=no) and see if it helps performance.
> >    -
> >
> >    While doing all the above, check below:
> >    - see if the log files are making sense, and not flooding with some
> >       “for developer only” type of messages.
> >       - get ‘profile info’ output from old and now, and see if
> there is
> >       anything which is out of normal expectation. Check with us
> on the numbers.
> >       - get a ‘statedump’ when there are some issues. Try to make
> sense
> >       of it, and raise a bug if you don’t understand it completely.
> >
> >
> >
> 
> Process
> > expected on test days.
> >
> >    -
> >
> >    We have a tracker bug
> >    [0]
> >    - We will attach all the ‘blocker’ bugs to this bug.
> >    -
> >
> >    Use this link to report bugs, so that we have more metadata around
> >    given bugzilla.
> >    - Click Here
> >     
>  
> 
> >       [1]
> >    -
> >
> >    The test cases which are to be tested are listed here in this sheet
> >   
> 
> [2],
> >    please add, update, and keep it up-to-date to reduce duplicate
> efforts
> 
> -- 
> - Atin (atinm)
> 
> ___
> Gluster-devel mailing list

Re: [Gluster-devel] [Gluster-Maintainers] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-04 Thread Atin Mukherjee
On Mon, 4 Mar 2019 at 20:33, Amar Tumballi Suryanarayan 
wrote:

> Thanks to those who participated.
>
> Update at present:
>
> We found 3 blocker bugs in upgrade scenarios, and hence have marked release
> as pending upon them. We will keep these lists updated about progress.


I’d like to clarify that upgrade testing is blocked. So just fixing these
test blocker(s) isn’t enough to call release-6 green. We need to continue
and finish the rest of the upgrade tests once the respective bugs are fixed.


>
> -Amar
>
> On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com> wrote:
>
> > Hi all,
> >
> > We are calling out our users, and developers to contribute in validating
> > ‘glusterfs-6.0rc’ build in their usecase. Specially for the cases of
> > upgrade, stability, and performance.
> >
> > Some of the key highlights of the release are listed in release-notes
> > draft
> > <
> https://github.com/gluster/glusterfs/blob/release-6/doc/release-notes/6.0.md
> >.
> > Please note that there are some of the features which are being dropped
> out
> > of this release, and hence making sure your setup is not going to have an
> > issue is critical. Also the default lru-limit option in fuse mount for
> > Inodes should help to control the memory usage of client processes. All
> the
> > good reason to give it a shot in your test setup.
> >
> > If you are developer using gfapi interface to integrate with other
> > projects, you also have some signature changes, so please make sure your
> > project would work with latest release. Or even if you are using a
> project
> > which depends on gfapi, report the error with new RPMs (if any). We will
> > help fix it.
> >
> > As part of test days, we want to focus on testing the latest upcoming
> > release i.e. GlusterFS-6, and one or the other gluster volunteers would
> be
> > there on #gluster channel on freenode to assist the people. Some of the
> key
> > things we are looking as bug reports are:
> >
> >-
> >
> >See if upgrade from your current version to 6.0rc is smooth, and works
> >as documented.
> >- Report bugs in process, or in documentation if you find mismatch.
> >-
> >
> >Functionality is all as expected for your usecase.
> >- No issues with actual application you would run on production etc.
> >-
> >
> >Performance has not degraded in your usecase.
> >- While we have added some performance options to the code, not all of
> >   them are turned on, as they have to be done based on usecases.
> >   - Make sure the default setup is at least same as your current
> >   version
> >   - Try out few options mentioned in release notes (especially,
> >   --auto-invalidation=no) and see if it helps performance.
> >-
> >
> >While doing all the above, check below:
> >- see if the log files are making sense, and not flooding with some
> >   “for developer only” type of messages.
> >   - get ‘profile info’ output from old and now, and see if there is
> >   anything which is out of normal expectation. Check with us on the
> numbers.
> >   - get a ‘statedump’ when there are some issues. Try to make sense
> >   of it, and raise a bug if you don’t understand it completely.
> >
> >
> > <
> https://hackmd.io/YB60uRCMQRC90xhNt4r6gA?both#Process-expected-on-test-days
> >Process
> > expected on test days.
> >
> >-
> >
> >We have a tracker bug
> >[0]
> >- We will attach all the ‘blocker’ bugs to this bug.
> >-
> >
> >Use this link to report bugs, so that we have more metadata around
> >given bugzilla.
> >- Click Here
> >   <
> https://bugzilla.redhat.com/enter_bug.cgi?blocked=1672818_severity=high=core=high=GlusterFS_whiteboard=gluster-test-day=6
> >
> >   [1]
> >-
> >
> >The test cases which are to be tested are listed here in this sheet
> ><
> https://docs.google.com/spreadsheets/d/1AS-tDiJmAr9skK535MbLJGe_RfqDQ3j1abX1wtjwpL4/edit?usp=sharing
> >[2],
> >please add, update, and keep it up-to-date to reduce duplicate efforts

-- 
- Atin (atinm)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GlusterFS - 6.0RC - Test days (27th, 28th Feb)

2019-03-04 Thread Amar Tumballi Suryanarayan
Thanks to those who participated.

Update at present:

We found 3 blocker bugs in upgrade scenarios, and hence have marked release
as pending upon them. We will keep these lists updated about progress.

-Amar

On Mon, Feb 25, 2019 at 11:41 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Hi all,
>
> We are calling out our users, and developers to contribute in validating
> ‘glusterfs-6.0rc’ build in their usecase. Specially for the cases of
> upgrade, stability, and performance.
>
> Some of the key highlights of the release are listed in release-notes
> draft
> .
> Please note that there are some of the features which are being dropped out
> of this release, and hence making sure your setup is not going to have an
> issue is critical. Also the default lru-limit option in fuse mount for
> Inodes should help to control the memory usage of client processes. All the
> good reason to give it a shot in your test setup.
>
> If you are developer using gfapi interface to integrate with other
> projects, you also have some signature changes, so please make sure your
> project would work with latest release. Or even if you are using a project
> which depends on gfapi, report the error with new RPMs (if any). We will
> help fix it.
>
> As part of test days, we want to focus on testing the latest upcoming
> release i.e. GlusterFS-6, and one or the other gluster volunteers would be
> there on #gluster channel on freenode to assist the people. Some of the key
> things we are looking as bug reports are:
>
>-
>
>See if upgrade from your current version to 6.0rc is smooth, and works
>as documented.
>- Report bugs in process, or in documentation if you find mismatch.
>-
>
>Functionality is all as expected for your usecase.
>- No issues with actual application you would run on production etc.
>-
>
>Performance has not degraded in your usecase.
>- While we have added some performance options to the code, not all of
>   them are turned on, as they have to be done based on usecases.
>   - Make sure the default setup is at least same as your current
>   version
>   - Try out few options mentioned in release notes (especially,
>   --auto-invalidation=no) and see if it helps performance.
>-
>
>While doing all the above, check below:
>- see if the log files are making sense, and not flooding with some
>   “for developer only” type of messages.
>   - get ‘profile info’ output from old and now, and see if there is
>   anything which is out of normal expectation. Check with us on the 
> numbers.
>   - get a ‘statedump’ when there are some issues. Try to make sense
>   of it, and raise a bug if you don’t understand it completely.
>
>
> Process
> expected on test days.
>
>-
>
>We have a tracker bug
>[0]
>- We will attach all the ‘blocker’ bugs to this bug.
>-
>
>Use this link to report bugs, so that we have more metadata around
>given bugzilla.
>- Click Here
>   
> 
>   [1]
>-
>
>The test cases which are to be tested are listed here in this sheet
>
> [2],
>please add, update, and keep it up-to-date to reduce duplicate efforts.
>
> Lets together make this release a success.
>
> Also check if we covered some of the open issues from Weekly untriaged
> bugs
> 
> [3]
>
> For details on build and RPMs check this email
> 
> [4]
>
> Finally, the dates :-)
>
>- Wednesday - Feb 27th, and
>- Thursday - Feb 28th
>
> Note that our goal is to identify as many issues as possible in upgrade
> and stability scenarios, and if any blockers are found, want to make sure
> we release with the fix for same. So each of you, Gluster users, feel
> comfortable to upgrade to 6.0 version.
>
> Regards,
> Gluster Ants.
>
> --
> Amar Tumballi (amarts)
>


-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] What happened to 'gluster-ansible' RPM?

2019-03-04 Thread Sachidananda URS
Hi Yaniv,

On Mon, Mar 4, 2019 at 1:21 PM Yaniv Kaul  wrote:

> I've used[1] it to deploy Gluster (on CentOS 7) and now it seems to be
> missing.
> I'm not seeing this meta package @ [2] - am I supposed to install the
> specific packages?
>
>
We still have the meta-package. It is called gluster-ansible-roles[1][2],
I'm sorry about this renaming.
The reason behind this change is the discrepancy I had created in upstream
and downstream naming.
The downstream package was called gluster-ansible-roles (we followed the
same convention as ovirt roles). But
by then I had done a few builds in upstream. So, recently we synced the
naming. Future Fedora packages
will follow this naming convention.

-sac

[1]
https://copr.fedorainfracloud.org/coprs/sac/gluster-ansible/build/863675/
[2] https://copr.fedorainfracloud.org/coprs/sac/gluster-ansible/builds/


> TIA,
> Y.
> [1]
> https://github.com/mykaul/vg/blob/d36ad9948a1be49be5b7f7d95a2007aa8c540d95/ansible/machine_config.yml#L75
> [2] https://copr.fedorainfracloud.org/coprs/sac/gluster-ansible/packages/
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel