Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Ravishankar N

On 02/16/2017 04:09 AM, Jeff Darcy wrote:

One of the issues that has come up with multiplexing is that all of the bricks 
in a process end up sharing a single log file.  The reaction from both of the 
people who have mentioned this is that we should find a way to give each brick 
its own log even when they're in the same process, and make sure gf_log etc. 
are able to direct messages to the correct one.  I can think of ways to do 
this, but it doesn't seem optimal to me.  It will certainly use up a lot of 
file descriptors.  I think it will use more memory.  And then there's the issue 
of whether this would really be better for debugging.  Often it's necessary to 
look at multiple brick logs while trying to diagnose this problem, so it's 
actually kind of handy to have them all in one file.  Which would you rather do?

(a) Weave together entries in multiple logs, either via a script or in your 
head?

(b) Split or filter entries in a single log, according to which brick they're 
from?

To me, (b) seems like a much more tractable problem.  I'd say that what we need 
is not multiple logs, but *marking of entries* so that everything pertaining to 
one brick can easily be found.  One way to do this would be to modify volgen so 
that a brick ID (not name because that's a path and hence too long) is 
appended/prepended to the name of every translator in the brick.  Grep for that 
brick ID, and voila!  You now have all log messages for that brick and no 
other.  A variant of this would be to leave the names alone and modify gf_log 
so that it adds the brick ID automagically (based on a thread-local variable 
similar to THIS).  Same effect, other than making translator names longer, so 
I'd kind of prefer this approach.  Before I start writing the code, does 
anybody else have any opinions, preferences, or alternatives I haven't 
mentioned yet?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel
My vote is for having separate log files per brick. Even in separate log 
files
that we have today, I find it difficult to mentally ignore irrelevant 
messages

in a single log file as I am sifting through it to look for errors that are
related to the problem at hand. Having entries from multiple bricks and then
grepping it would only make things harder. I cannot think of a case 
where having

entries from all bricks in one file would particularly be beneficial for
debugging since what happens in one brick is independent of the other 
bricks
(at least until we move client xlators to server side and run them in 
the brick process).

As for file descriptor count/memory usage, I think we should be okay
as it is not any worse than that in the non-multiplexed approach we have
today.

On a side-note, I think the problem is not having too many log files but 
having
them in multiple nodes. Having a log-aggregation solution where all 
messages are
logged to a single machine (but still in separate files) would make it 
easier to

monitor/debug issues.
-Ravi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Nithya Balachandran
On 16 February 2017 at 07:30, Shyam  wrote:

> On 02/15/2017 08:51 PM, Atin Mukherjee wrote:
>
>>
>> On Thu, 16 Feb 2017 at 04:09, Jeff Darcy > > wrote:
>>
>> One of the issues that has come up with multiplexing is that all of
>> the bricks in a process end up sharing a single log file.  The
>> reaction from both of the people who have mentioned this is that we
>> should find a way to give each brick its own log even when they're
>> in the same process, and make sure gf_log etc. are able to direct
>> messages to the correct one.  I can think of ways to do this, but it
>> doesn't seem optimal to me.  It will certainly use up a lot of file
>> descriptors.  I think it will use more memory.  And then there's the
>> issue of whether this would really be better for debugging.  Often
>> it's necessary to look at multiple brick logs while trying to
>> diagnose this problem, so it's actually kind of handy to have them
>> all in one file.  Which would you rather do?
>>
>> (a) Weave together entries in multiple logs, either via a script or
>> in your head?
>>
>> (b) Split or filter entries in a single log, according to which
>> brick they're from?
>>
>> To me, (b) seems like a much more tractable problem.  I'd say that
>> what we need is not multiple logs, but *marking of entries* so that
>> everything pertaining to one brick can easily be found.  One way to
>> do this would be to modify volgen so that a brick ID (not name
>> because that's a path and hence too long) is appended/prepended to
>> the name of every translator in the brick.  Grep for that brick ID,
>> and voila!  You now have all log messages for that brick and no
>> other.  A variant of this would be to leave the names alone and
>> modify gf_log so that it adds the brick ID automagically (based on a
>> thread-local variable similar to THIS).  Same effect, other than
>> making translator names longer, so I'd kind of prefer this
>> approach.  Before I start writing the code, does anybody else have
>> any opinions, preferences, or alternatives I haven't mentioned yet?
>>
>


A few questions/thoughts here:

Debugging will involve getting far more/bigger files from customers unless
we have a script (?) to  grep out only those messages pertaining to the
volume in question. IIUC, this would just be grepping for the volname and
then determining which brick each message pertains to based on the brick
id, correct?

Would brick ids remain constant across add/remove brick operations? An easy
way would probably be just to use the client xlator number as the brick id
which would make it easy to map the brick to client connection.

With several brick processes all writing to the same log file, can there be
problems with interleaving messages?

Logrotate might kick in faster as well causing us to lose debugging data if
only a limited number of files are saved, as all those files would now hold
less log data per volume. The logrotate config options would need to be
changed to keep more files.

Having all messages for the bricks of the same volume in a single file
would definitely be helpful. Still thinking through logging all messages
for all bricks in a single file. :)



>
> (b) is better. Considering centralized logging or log file redirection
> etc, (a) becomes unnatural and unwieldy, (b) is better.
>
>
>>
>> I like this idea. +1
>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org 
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> --
>> - Atin (atinm)
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Shyam

On 02/15/2017 08:51 PM, Atin Mukherjee wrote:


On Thu, 16 Feb 2017 at 04:09, Jeff Darcy > wrote:

One of the issues that has come up with multiplexing is that all of
the bricks in a process end up sharing a single log file.  The
reaction from both of the people who have mentioned this is that we
should find a way to give each brick its own log even when they're
in the same process, and make sure gf_log etc. are able to direct
messages to the correct one.  I can think of ways to do this, but it
doesn't seem optimal to me.  It will certainly use up a lot of file
descriptors.  I think it will use more memory.  And then there's the
issue of whether this would really be better for debugging.  Often
it's necessary to look at multiple brick logs while trying to
diagnose this problem, so it's actually kind of handy to have them
all in one file.  Which would you rather do?

(a) Weave together entries in multiple logs, either via a script or
in your head?

(b) Split or filter entries in a single log, according to which
brick they're from?

To me, (b) seems like a much more tractable problem.  I'd say that
what we need is not multiple logs, but *marking of entries* so that
everything pertaining to one brick can easily be found.  One way to
do this would be to modify volgen so that a brick ID (not name
because that's a path and hence too long) is appended/prepended to
the name of every translator in the brick.  Grep for that brick ID,
and voila!  You now have all log messages for that brick and no
other.  A variant of this would be to leave the names alone and
modify gf_log so that it adds the brick ID automagically (based on a
thread-local variable similar to THIS).  Same effect, other than
making translator names longer, so I'd kind of prefer this
approach.  Before I start writing the code, does anybody else have
any opinions, preferences, or alternatives I haven't mentioned yet?


(b) is better. Considering centralized logging or log file redirection 
etc, (a) becomes unnatural and unwieldy, (b) is better.





I like this idea. +1



___
Gluster-devel mailing list
Gluster-devel@gluster.org 
http://lists.gluster.org/mailman/listinfo/gluster-devel

--
- Atin (atinm)


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Atin Mukherjee
On Thu, 16 Feb 2017 at 04:09, Jeff Darcy  wrote:

> One of the issues that has come up with multiplexing is that all of the
> bricks in a process end up sharing a single log file.  The reaction from
> both of the people who have mentioned this is that we should find a way to
> give each brick its own log even when they're in the same process, and make
> sure gf_log etc. are able to direct messages to the correct one.  I can
> think of ways to do this, but it doesn't seem optimal to me.  It will
> certainly use up a lot of file descriptors.  I think it will use more
> memory.  And then there's the issue of whether this would really be better
> for debugging.  Often it's necessary to look at multiple brick logs while
> trying to diagnose this problem, so it's actually kind of handy to have
> them all in one file.  Which would you rather do?
>
> (a) Weave together entries in multiple logs, either via a script or in
> your head?
>
> (b) Split or filter entries in a single log, according to which brick
> they're from?
>
> To me, (b) seems like a much more tractable problem.  I'd say that what we
> need is not multiple logs, but *marking of entries* so that everything
> pertaining to one brick can easily be found.  One way to do this would be
> to modify volgen so that a brick ID (not name because that's a path and
> hence too long) is appended/prepended to the name of every translator in
> the brick.  Grep for that brick ID, and voila!  You now have all log
> messages for that brick and no other.  A variant of this would be to leave
> the names alone and modify gf_log so that it adds the brick ID
> automagically (based on a thread-local variable similar to THIS).  Same
> effect, other than making translator names longer, so I'd kind of prefer
> this approach.  Before I start writing the code, does anybody else have any
> opinions, preferences, or alternatives I haven't mentioned yet?


I like this idea. +1


>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
-- 
- Atin (atinm)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Release 3.10: Request fix status for RC1 tagging

2017-02-15 Thread Shyam

Hi,

The 3.10 release tracker [1], shows 6 bugs needing a fix in 3.10. We 
need to get RC1 out so that we can start tracking the same for a 
potential release.


Request folks on these bugs to provide a date by when we can expect a 
fix for these issues.


Request others to add any other bug to the tracker as appropriate.

Current bug list [2]:
  - 1415226: Kaleb/Niels do we need to do more for the python 
dependency or is the last fix in?


  - 1417915: Vitaly Lipatov/Niels, I assume one of you would do the 
backport for this one into 3.10


  - 1421590: Jeff, this needs a fix? Also, Samikshan can you provide 
Jeff with a .t that can reproduce this (if possible)?


  - 1421649: Ashis/Niels when can we expect a fix to land for this?

  - 1421956: Xavi, I guess you would backport the fix on mainline once 
that is merged into 3.10, right?


  - 1422363: Poornima, I am awaiting a merge of the same into mainline 
and also an update of the commit message for the backport to 3.10, 
before merging this into 3.10, request you to take care of the same.


Pranith, is a bug filed and added to the tracker for the mail below?
  - 
http://lists.gluster.org/pipermail/maintainers/2017-February/002221.html


Thanks,
Shyam

[1] Tracker bug: https://bugzilla.redhat.com/show_bug.cgi?id=1416031

[2] Open bugs against the tracker: 
https://bugzilla.redhat.com/buglist.cgi?quicksearch=1415226%201417915%201421590%201421649%201421956%201422363_id=7089913

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Logging in a multi-brick daemon

2017-02-15 Thread Jeff Darcy
One of the issues that has come up with multiplexing is that all of the bricks 
in a process end up sharing a single log file.  The reaction from both of the 
people who have mentioned this is that we should find a way to give each brick 
its own log even when they're in the same process, and make sure gf_log etc. 
are able to direct messages to the correct one.  I can think of ways to do 
this, but it doesn't seem optimal to me.  It will certainly use up a lot of 
file descriptors.  I think it will use more memory.  And then there's the issue 
of whether this would really be better for debugging.  Often it's necessary to 
look at multiple brick logs while trying to diagnose this problem, so it's 
actually kind of handy to have them all in one file.  Which would you rather do?

(a) Weave together entries in multiple logs, either via a script or in your 
head?

(b) Split or filter entries in a single log, according to which brick they're 
from?

To me, (b) seems like a much more tractable problem.  I'd say that what we need 
is not multiple logs, but *marking of entries* so that everything pertaining to 
one brick can easily be found.  One way to do this would be to modify volgen so 
that a brick ID (not name because that's a path and hence too long) is 
appended/prepended to the name of every translator in the brick.  Grep for that 
brick ID, and voila!  You now have all log messages for that brick and no 
other.  A variant of this would be to leave the names alone and modify gf_log 
so that it adds the brick ID automagically (based on a thread-local variable 
similar to THIS).  Same effect, other than making translator names longer, so 
I'd kind of prefer this approach.  Before I start writing the code, does 
anybody else have any opinions, preferences, or alternatives I haven't 
mentioned yet?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 3.10: Backports (reminder and action needed)

2017-02-15 Thread Shyam

Hi,

Checking back on status this week.

Everything looks good, the 2 missing backports are also in (thanks Ashish).

We probably will have one last check pre-release.

Thanks,
Shyam

On 02/06/2017 01:12 PM, Shyam wrote:

Hi,

A recheck after one more week, and our status is healthy, backports that
appear in 3.8 or 3.9 are appearing against 3.10 as well. Thank you.

However the exception raised last week is not handled yet.

Pranith/Xavi/Ashish,

The following commits were merged into master post 3.10 branching and
also backported to 3.9. Essentially this would mean they need to be
backported to 3.10 as well, but are missing against that release.

Request one of you to take a look and let us know if these backports are
needed, and if so backport the same.

1) https://review.gluster.org/#/c/16439/
Commit message: cluster/disperse: Do not log fop failed for lockless fops
Commit in release 3.9: 0f63bda0df91ee7ff42e3262c74220178ef87a21

2) https://review.gluster.org/#/c/16444/
Commit message: cluster/ec: Do not start heal on good file while IO is
going on
Commit in release 3.9: bc1ce55a7c83c7f5394b26a65bd4a3a669b5962a

Thanks,
Shyam



On 01/26/2017 03:09 PM, Shyam wrote:

Hi,

As we have branched release 3.10 (some time ago), patches that are being
backported to 3.8/9 and are *applicable* to 3.10 need to be backported
to 3.10 as well.

An analysis of this state since branching shows the following commits in
3.9 not backported to 3.10. Ashish, request you to address this at your
convenience.

bc1ce55a7c83c7f5394b26a65bd4a3a669b5962a (Ashish Pandey) Wed, 11 Jan
2017 17:19:30 +0530 cluster/ec: Do not start heal on good file while IO
is going on

0f63bda0df91ee7ff42e3262c74220178ef87a21 (Ashish Pandey) Thu, 19 Jan
2017 18:20:44 +0530 cluster/disperse: Do not log fop failed for lockless
fops

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

___
maintainers mailing list
maintain...@gluster.org
http://lists.gluster.org/mailman/listinfo/maintainers

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 3.10: Testing feedback requested

2017-02-15 Thread Shyam

Hi,

We have some feedback on the glusterd and bitrot issues.

I am testing out upgrade in CentOS and also doing some performance 
regression testing across AFR and EC volumes, results expected by end of 
this week.


Request others to post updates to tests done on the issues.

If people are blocked, then please let us know as well.

Just a friendly reminder, we always slip releases due to lack of testing 
or testing feedback, let's try not to repeat the same.


Current release date is still 21st Feb 2017

Shyam

On 02/05/2017 10:00 PM, Shyam wrote:

Hi,

For the 3.10 release, we are tracking testing feedback per component
using github issues. The list of issues to report testing feedback
against is in [1].

We have assigned each component level task to the noted maintainer to
provide feedback, in case others are taking up the task please assign
the same to yourself.

Currently github allows only issue assignment to members of the gluster
organization, and that is not complete or filled up as expected. So,
request maintainers to mention folks who would be doing the testing in
the issue using @ or the user to assign the task to
themselves.

Feedback is expected at the earliest to meet the current release date of
21st Feb, 2017.

Once we have the packages built, we would request the users list to help
with the testing as well.

Thanks,
Release team

[1] Link to testing issues: http://bit.ly/2kDCR8M
___
maintainers mailing list
maintain...@gluster.org
http://lists.gluster.org/mailman/listinfo/maintainers

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-Maintainers] Release 3.10 spurious(?) regression failures in the past week

2017-02-15 Thread Shyam

Update from week of: (2017-02-06 to 2017-02-13)

No major failures to report this week, things look fine from a 
regression suite failure stats perspective.


Do we have any updates on the older cores? Specifically,
  - https://build.gluster.org/job/centos6-regression/3046/consoleText 
(./tests/basic/tier/tier.t -- tier rebalance)
  - https://build.gluster.org/job/centos6-regression/2963/consoleFull 
(./tests/basic/volume-snapshot.t -- glusterd)


Shyam

On 02/06/2017 02:21 PM, Shyam wrote:

Update from week of: (2017-01-30 to 2017-02-06)

Failure stats and actions:

1) ./tests/basic/tier/tier.t
Core dump needs attention
https://build.gluster.org/job/centos6-regression/3046/consoleText

Looks like the tier rebalance process has crashed (see below for the
stack details)

2) ./tests/basic/ec/ec-background-heals.t
Marked as bad in master, not in release-3.10. May cause unwanted
failures in 3.10 and as a result marked this as bad in 3.10 as well.

Commit: https://review.gluster.org/16549

3) ./tests/bitrot/bug-1373520.t
Marked as bad in master, not in release-3.10. May cause unwanted
failures in 3.10 and as a result marked this as bad in 3.10 as well.

Commit: https://review.gluster.org/16549

Thanks,
Shyam

On 01/30/2017 03:00 PM, Shyam wrote:

Hi,

The following is a list of spurious(?) regression failures in the 3.10
branch last week (from fstat.gluster.org).

Request component owner or other devs to take a look at the failures,
and weed out real issues.

Regression failures 3.10:

Summary:
1) https://build.gluster.org/job/centos6-regression/2960/consoleFull
  ./tests/basic/ec/ec-background-heals.t

2) https://build.gluster.org/job/centos6-regression/2963/consoleFull
  
  ./tests/basic/volume-snapshot.t

3) https://build.gluster.org/job/netbsd7-regression/2694/consoleFull
  ./tests/basic/afr/self-heald.t

4) https://build.gluster.org/job/centos6-regression/2954/consoleFull
  ./tests/basic/tier/legacy-many.t

5) https://build.gluster.org/job/centos6-regression/2858/consoleFull
  ./tests/bugs/bitrot/bug-1245981.t

6) https://build.gluster.org/job/netbsd7-regression/2637/consoleFull
  ./tests/basic/afr/self-heal.t

7) https://build.gluster.org/job/netbsd7-regression/2624/consoleFull
  ./tests/encryption/crypt.t

Thanks,
Shyam


Core details from
https://build.gluster.org/job/centos6-regression/3046/consoleText

Core was generated by `/build/install/sbin/glusterfs -s localhost
--volfile-id tierd/patchy -p /var/li'.
Program terminated with signal 11, Segmentation fault.
#0  0x7ffb62c2c4c4 in __strchr_sse42 () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffb5a169700 (LWP 467)):
#0  0x7ffb62c2c4c4 in __strchr_sse42 () from /lib64/libc.so.6
No symbol table info available.
#1  0x7ffb56b7789f in dht_filter_loc_subvol_key
(this=0x7ffb50015930, loc=0x7ffb2c002de4, new_loc=0x7ffb2c413f80,
subvol=0x7ffb2c413fc0) at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-helper.c:307

new_name = 0x0
new_path = 0x0
trav = 0x0
key = '\000' 
ret = 0
#2  0x7ffb56bb2ce4 in dht_lookup (frame=0x7ffb4c00623c,
this=0x7ffb50015930, loc=0x7ffb2c002de4, xattr_req=0x7ffb4c00949c) at
/home/jenkins/root/workspace/centos6-regression/xlators/cluster/dht/src/dht-common.c:2494

subvol = 0x0
hashed_subvol = 0x0
local = 0x7ffb4c00636c
conf = 0x7ffb5003f380
ret = -1
op_errno = -1
layout = 0x0
i = 0
call_cnt = 0
new_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0,
gfid = '\000' , pargfid = '\000' }
__FUNCTION__ = "dht_lookup"
#3  0x7ffb63ff6f5c in syncop_lookup (subvol=0x7ffb50015930,
loc=0x7ffb2c002de4, iatt=0x7ffb2c415af0, parent=0x0,
xdata_in=0x7ffb4c00949c, xdata_out=0x7ffb2c415a50) at
/home/jenkins/root/workspace/centos6-regression/libglusterfs/src/syncop.c:1223

_new = 0x7ffb4c00623c
old_THIS = 0x7ffb50019490
tmp_cbk = 0x7ffb63ff69b3 
task = 0x7ffb2c009790
frame = 0x7ffb2c001b3c
args = {op_ret = 0, op_errno = 0, iatt1 = {ia_ino = 0, ia_gfid =
'\000' , ia_dev = 0, ia_type = IA_INVAL, ia_prot =
{suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0
'\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000',
write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0
'\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev
= 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0,
ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0,
ia_ctime_nsec = 0}, iatt2 = {ia_ino = 0, ia_gfid = '\000' , ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid
= 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0
'\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000',
exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0
'\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size =
0, ia_blksize = 0,