Re: [Gluster-devel] NetBSD regressions, memory corruption

2015-03-25 Thread Venky Shankar
looks like the iobref (and the iobuf) was allocated in protocol/server..

(gdb) x/16x (ie-ie_iobref-iobrefs - 8)
0xbb11a438: 0xbb18ba80  0x0001  0x0068  0x0040
0xbb11a448: 0xbb1e2018  0xcafebabe  0x  0x
0xbb11a458: 0x0003  0x0003  0x0008  0x0003
0xbb11a468: 0x000c  0x0003  0x000e  0x0003

8 bytes before the magic header (0xcafebabe) lives the xlator (this)
that invoked GF_MALLOC. Here it's:

(gdb) p *(xlator_t *)0xbb1e2018
$9 = {name = 0xbb1dbb08 patchy-server, type = 0xbb1dbb38
protocol/server, next = 0xbb1e1018, prev = 0x0, parents = 0x0,
  children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000,
fops = 0xb9adf0e0 fops, cbks = 0xb9adc8cc cbks,
  dumpops = 0xb9ade460 dumpops, volume_options = {next = 0xbb1dbb68,
prev = 0xbb1dbbf8}, fini = 0xb9ab539d fini,
  init = 0xb9ab48a5 init, reconfigure = 0xb9ab418c reconfigure,
mem_acct_init = 0xb9ab3cb1 mem_acct_init,
  notify = 0xb9ab53a3 notify, loglevel = GF_LOG_NONE, latencies =
{{min = 0, max = 0, total = 0, std = 0, mean = 0,
  count = 0} repeats 50 times}, history = 0x0, ctx = 0xbb109000,
graph = 0xbb1c30f8, itable = 0x0,
  init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct =
{num_types = 144, rec = 0xbb1c6000}, winds = 0,
  switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}

looking into it more. if the above strikes a bell to someone, let us know.

-venky

On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos nde...@redhat.com wrote:
 On Tue, Mar 24, 2015 at 05:18:44PM +, Emmanuel Dreyfus wrote:
 Hi

 The merge of http://review.gluster.org/9953/ removed a few crashes from
 NetBSD regression tests, but the thing remains uterly broken since the
 merge of http://review.gluster.org/9708/ though I cannot tell if I have
 bugs leftover form this commit or if I face new problems.

 Here are the known problem so far:

 ...snip! I'll only give some info to your 2nd point.

 2) I still experience memory corruption, which usually crash glsuterfsd
 because some pointer waas replaced by value 0x3. This strikes on iobref
 most of the time, but it can happens elsewhere.

 I would be glad if someone could help here. On nbslave70:/autobuild I
 added code to check for iobref/iobuf sanity at random place (by calling
 iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
 but I have not been able to spot the source of the problem yet.

 The weird thing is that memory seems to always be overwritten by the
 same values, and magic 0xcafebabe number before the buffer is preserved.
 Here is an example: where iobref-iobrefs = 0xbb11a458
 0xbb11a44c: 0xcafebabe  0x  0x  0x0003
 0xbb11a45c: 0x0003  0x0008  0x0003  0x000c
 0xbb11a46c: 0x0003  0x000e  0x0003  0x0010
 0xbb11a47c: 0x0003  0x0009  0x0003  0x000d
 0xbb11a48c: 0x0003  0x0015  0x0003  0x0016
 0xbb11a49c: 0x0003  0x0032  0x0034  0xbb1e2018
 0xbb11a4ac: 0xcafebabe  0x  0x  0xbb11a5d8

 Recently I was looking into something that involved some more
 understanding of GF_MALLOC(). I did not really continue with it becase
 other things got a higher priority. But, maybe this layout helps you a
 little:

  :  :
  :  :
  +--+
  | GF_MEM_TRAILER_MAGIC |
  +--+
  |  |
  | ...  |
  |  |
  +--+
  |   8 bytes|
  +--+
  | GF_MEM_HEADER_MAGIC  |
  +--+
  |  *xlator_t   |
  +--+
  |size  |
  +--+
  |type  |
  +--+
  :  :
  :  :

  #define GF_MEM_HEADER_MAGIC  0xCAFEBABE
  #define GF_MEM_TRAILER_MAGIC 0xBAADF00D


 Because there is no 0xbaadfood in your memory dump, I would assume that
 the memory has just been allocated, and the 0xcafebabe at 0xbb11a4ac is
 a left over from a previous allocation.

 You could try to run a test with more strict memory enforcing. All the
 GF_ASSERT() calls will actually call abort() in that case, and it may
 make things a little easier to debug. You would pass --enable-debug to
 the configure commandline:

 $ ./configure --enable-debug

 I hope that we will be able to setup scheduled automated regression
 tests with --enable-debug build binaries. It may be helpful to catch
 unintended NULL usage a little earlier.

 HTH,
 Niels
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Responsibilities and expectations of our maintainers

2015-03-25 Thread Niels de Vos
Hi all,

With many new features getting merged for glusterfs-3.7, I would like to
get your attention to some of the more 'boring' bits that come with
proposing and implementing a feature.


1. Who is going to maintain the new features?

Features that are pretty self-contained, like adding a new xlator,
daemon or the like, should get added to our MAINTAINERS file. Only
very few features have provided patches for this, the others would
need to still do so, or we can collect a bunch of them and add them
all at once (might be easier to prevent conflicts).

Some features only add some functionality to existing components. If
the current component maintainer asks your support for maintaining
your added changes, please be very responsive.


2. Maintainers should be active in responding to users

As a maintainer of a component, you (or the group of maintainers)
have the (end) responsibility to respond to questions and problems
reported by users. This does not mean that you are required to
respond to all questions and problems yourself, but try to track
responses by others and answer outstanding questions.

There are several ways our community users can ask questions and
report problems:
  - gluster-us...@gluster.org, gluster-devel@gluster.org lists
  - #gluster and #gluster-dev on Freenode IRC
  - https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

Maintainers are expected to keep an eye on relevant topics,
questions and bugs through these channels.


3. What about reported bugs, there is the Bug Triaging in place?

Indeed, at the moment we have a weekly Community Bug Triage meeting.
This meeting is intended as a fall-back for bugs that have not been
triaged by community members (users, developers, managers, ...). It
seems that most new bugs get triaged during the meeting, but this is
an activity that can happen completely independently from the
meeting. Maintainers and developers are strongly encouraged to
triage bugs that are reported against the components that they work
on. The following links contains the workflow for triaging, and
bugzilla queries that show the untriaged bugs:
 - http://www.gluster.org/community/documentation/index.php/Bug_triage
 - https://public.pad.fsfe.org/p/gluster-bug-triage
 - http://www.gluster.org/pipermail/gluster-devel/2015-March/044114.html

Reminder: anyone is welcome to join the Bug Triage meeting.


4. Maintainers should keep an eye on open bugs affecting their component

When a bug has been triaged, someone would need to work on getting
the problem fixed. Bugs move their status like this:

NEW - NEW+Triaged - ASSIGNED - POST - MODIFIED - ...

What happens after MODIFIED is for the release maintainers and QA
(also community) teams. Maintainers would mostly focus on the first
steps of the process. To assist with this, I have created a Bugzilla
report where you can click on the component, or the component/status
and get a list of all bugs (without FutureFeature keyword):
 - http://red.ht/1BKWsRq

There is still an ongoing action item to find someone that has a
good overview of how busy developers (mostly Red Hat) are and which
community reported bugs should get fixed with priority. Maintainers
are not expected to be managers that can force other developers to
work on certain issues, but in most cases a friendly request does
the trick too ;-)


5. Maintainers are expected to be responsive on patch reviews

When a developer posts a patch to Gerrit, they are eager to hear
about any changes they would need to make. Responding fast with a
review also helps in getting the posted change updated quicker.
Developers tend to switch between many tasks and having the change
fresh in their memory helps with their responsiveness too. Our
Guidelines for Maintainers list a few ways on how to receive email
notifications and displaying a list of changes in Gerrit:
 - 
http://www.gluster.org/community/documentation/index.php/Guidelines_For_Maintainers


6. Maintainers should try to attend IRC meetings

There is the weekly Gluster Community meeting on IRC. This is
scheduled for every Wednesday. Maintainers and active developers are
expected to attend these meetings whenever they can. More
information about these meetings can be found here:
 - https://public.pad.fsfe.org/p/gluster-community-meetings


Note that these are mostly the expectations I have of maintainers, and I
try hard to fulfill them myself too. Let me know if you have any
questions, objections, additions or ideas about this topic. When you
reply, do so by inlining or bottom-posting your comments and feel free
to trim unrelated parts of this email in your response.

Thanks,
Niels


pgpUp9SemKysU.pgp
Description: PGP signature

Re: [Gluster-devel] Responsibilities and expectations of our maintainers

2015-03-25 Thread Emmanuel Dreyfus
On Wed, Mar 25, 2015 at 02:04:10PM +0100, Niels de Vos wrote:
 1. Who is going to maintain the new features?
 2. Maintainers should be active in responding to users
 3. What about reported bugs, there is the Bug Triaging in place?
 4. Maintainers should keep an eye on open bugs affecting their component
 5. Maintainers are expected to be responsive on patch reviews
 6. Maintainers should try to attend IRC meetings

May I suggest a personnal item:
 7. Check your feature does not break NetBSD regression

NetBSD regression does not vote but is reported in gerrit. Please seek
help resolving breakage before merging.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] REMINDER: Weekly Gluster Community meeting today at 12:00 UTC

2015-03-25 Thread Vijay Bellur

On 03/25/2015 04:23 PM, Vijay Bellur wrote:

Hi all,

In about 70 minutes from now we will have the regular weekly Gluster
Community meeting.




Meeting minutes from today can be found at [1].

Thanks,
Vijay

[1] 
http://meetbot.fedoraproject.org/gluster-meeting/2015-03-25/gluster-meeting.2015-03-25-12.00.html


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Update on 3.7 feature freeze branching

2015-03-25 Thread Vijay Bellur

On 03/19/2015 11:52 PM, Vijay Bellur wrote:

Hi All,

The last few days have been busy for us with feature patches getting
reviewed, rebased and merged. As of now we have been able to review and
merge most features that we wanted to have in 3.7. Parts of bitrot
detection  tiering features have already been merged on mainline and we
are awaiting few more patches from these areas to declare completion
with respect to features. We seem to be well set to feature freeze 3.7
tomorrow.



The missing pieces of bitrot detection and tiering are in now. So we are 
good to declare feature freeze now. Kudos to all of us in reaching here :).



I will delay branching of 3.7 by a week or so after tomorrow to
primarily include several bug fixes and minor improvements in areas like
logging that have not received enough attention from us in the recent
past. I will send out an update as we have a release-3.7 branch.



Branching for 3.7 will happen after we merge coverity fixes, logging 
improvements and critical bug fixes.


Maintainers - can you please provide an ACK for sanity of the quality of 
your component(s) in 3.7? Basically we should aim to avoid functional 
regressions and critical issues before we branch.


Once we have both of these in place, I will branch release-3.7.

Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] New Defects reported by Coverity Scan for gluster/glusterfs

2015-03-25 Thread scan-admin

Hi,

Please find the latest report on new defect(s) introduced to gluster/glusterfs 
found with Coverity Scan.

33 new defect(s) introduced to gluster/glusterfs found with Coverity Scan.
9 defect(s), reported by Coverity Scan earlier, were marked fixed in the recent 
build analyzed by Coverity Scan.

New defect(s) Reported-by: Coverity Scan
Showing 20 of 33 defect(s)


** CID 1291734:  Error handling issues  (CHECKED_RETURN)
/xlators/cluster/dht/src/tier.c: 451 in tier_build_migration_qfile()



*** CID 1291734:  Error handling issues  (CHECKED_RETURN)
/xlators/cluster/dht/src/tier.c: 451 in tier_build_migration_qfile()
445 {
446 gfdb_time_t current_time;
447 _gfdb_brick_dict_info_t gfdb_brick_dict_info;
448 gfdb_time_t time_in_past;
449 int ret = -1;
450 
 CID 1291734:  Error handling issues  (CHECKED_RETURN)
 Calling remove((is_promotion ? /var/run/gluster/promotequeryfile : 
 /var/run/gluster/demotequeryfile)) without checking return value. This 
 library function may fail and return an error code.
451 remove (GET_QFILE_PATH (is_promotion));
452 time_in_past.tv_sec = args-freq_time;
453 time_in_past.tv_usec = 0;
454 if (gettimeofday (current_time, NULL) == -1) {
455 gf_log (args-this-name, GF_LOG_ERROR,
456 Failed to get current timen);



To view the defects in Coverity Scan visit, 
https://scan.coverity.com/projects/987?tab=overview

To manage Coverity Scan email notifications for gluster-devel@gluster.org, 
click 
https://scan.coverity.com/subscriptions/edit?email=gluster-devel%40gluster.orgtoken=7dffab14bc5a7180e75b0d047539f148
 .

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regressions, memory corruption

2015-03-25 Thread Emmanuel Dreyfus
On Wed, Mar 25, 2015 at 10:32:08PM +0530, Venky Shankar wrote:
 Could I run some tests on nbslave70 

Sure, I stopped doing tests since I assumed you were alredy doing some.

 (I plan to disable some
 translators). Just running AFR test cases should trigger the segfault,
 correct?

Yes, if you are lucky you can pass one or two tests, but it crashes 
quite reliably.

I already tries to disable some translators: recently introduced upcall
for instance, and of course changelog, which is the componenent that was
modified when things started to break. The problem is that absence
of the translatoor seems to break the tests because of the lack of 
functionnality.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Update on 3.7 feature freeze branching

2015-03-25 Thread Vijay Bellur

On 03/25/2015 10:43 PM, Vijay Bellur wrote:

On 03/19/2015 11:52 PM, Vijay Bellur wrote:

Hi All,

The last few days have been busy for us with feature patches getting
reviewed, rebased and merged. As of now we have been able to review and
merge most features that we wanted to have in 3.7. Parts of bitrot
detection  tiering features have already been merged on mainline and we
are awaiting few more patches from these areas to declare completion
with respect to features. We seem to be well set to feature freeze 3.7
tomorrow.



The missing pieces of bitrot detection and tiering are in now. So we are
good to declare feature freeze now. Kudos to all of us in reaching here :).


I will delay branching of 3.7 by a week or so after tomorrow to
primarily include several bug fixes and minor improvements in areas like
logging that have not received enough attention from us in the recent
past. I will send out an update as we have a release-3.7 branch.



Branching for 3.7 will happen after we merge coverity fixes, logging
improvements and critical bug fixes.

Maintainers - can you please provide an ACK for sanity of the quality of
your component(s) in 3.7? Basically we should aim to avoid functional
regressions and critical issues before we branch.

Once we have both of these in place, I will branch release-3.7.



Forgot to add -  one more criterion for branching as discussed in 
today's community meeting is to have no known spurious regression test 
failures.


Thanks,
Vijay


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regressions, memory corruption

2015-03-25 Thread Venky Shankar
On Wed, Mar 25, 2015 at 4:28 PM, Niels de Vos nde...@redhat.com wrote:
 Ai, top posting, this makes it really difficult to follow the email if
 you have not read the first parts :-/ Please remember to inline or
 bottom post when replying.

 On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote:
 looks like the iobref (and the iobuf) was allocated in protocol/server..

 (gdb) x/16x (ie-ie_iobref-iobrefs - 8)
 0xbb11a438: 0xbb18ba80  0x0001  0x0068  0x0040
 0xbb11a448: 0xbb1e2018  0xcafebabe  0x  0x
 0xbb11a458: 0x0003  0x0003  0x0008  0x0003
 0xbb11a468: 0x000c  0x0003  0x000e  0x0003

 8 bytes before the magic header (0xcafebabe) lives the xlator (this)
 that invoked GF_MALLOC. Here it's:

 (gdb) p *(xlator_t *)0xbb1e2018
 $9 = {name = 0xbb1dbb08 patchy-server, type = 0xbb1dbb38
 protocol/server, next = 0xbb1e1018, prev = 0x0, parents = 0x0,
   children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000,
 fops = 0xb9adf0e0 fops, cbks = 0xb9adc8cc cbks,
   dumpops = 0xb9ade460 dumpops, volume_options = {next = 0xbb1dbb68,
 prev = 0xbb1dbbf8}, fini = 0xb9ab539d fini,
   init = 0xb9ab48a5 init, reconfigure = 0xb9ab418c reconfigure,
 mem_acct_init = 0xb9ab3cb1 mem_acct_init,
   notify = 0xb9ab53a3 notify, loglevel = GF_LOG_NONE, latencies =
 {{min = 0, max = 0, total = 0, std = 0, mean = 0,
   count = 0} repeats 50 times}, history = 0x0, ctx = 0xbb109000,
 graph = 0xbb1c30f8, itable = 0x0,
   init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct =
 {num_types = 144, rec = 0xbb1c6000}, winds = 0,
   switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}

 looking into it more. if the above strikes a bell to someone, let us know.

 Going by the output from gdb above and the below layout:

 $ printf 'type=%d\nsize=%d\n' 0x0068 0x0040
 type=104
 size=64

 This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is
 an enum for the mem-type and libglusterfs/src/mem-types.h points to
 gf_common_mt_iobrefs. There is only one function that uses
 gf_common_mt_iobrefs, which is iobref_new().

 protocol/server calls iobref_new() only once directly (there could be
 some other indirect calls too) in server_submit_reply().

yes, that's the only place in protocol/server than calls iobref_new().


 I do not quickly see how the issue can happen with the analyzed data in
 this email. Possibly an allocation before (memory address wise) this
 went awry and caused the wreckage. We may need to follow these
 diagnostic steps back upwards and try to find the first occurrence where
 0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d.

What's interesting is the number of used iobufs is zero but -iobrefs points
to a memory address (iobref_unref() iterates -alloced times and frees anything
which isn't NULL). There's someone who put it there.

(gdb) p *ie-ie_iobref
$1 = {lock = {pts_magic = 2004287495, pts_spin = 0 '\000', pts_flags =
0}, ref = 1, iobrefs = 0xbb11a458, alloced = 16, used = 0}

Emmanuel,

Could I run some tests on nbslave70 (I plan to disable some
translators). Just running AFR test cases should trigger the segfault,
correct?


 That's the only idea I have for now, but I'll keep thinking of something
 that could make this easier.

 Note: the iobref structure is used really a lot, this makes it a likely
 structure to blow away other structures when something else frees some
 memory, but wants to use it afterwards. I think a use-after-free could
 be one cause for this.

 Niels


 -venky

 On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos nde...@redhat.com wrote:
  On Tue, Mar 24, 2015 at 05:18:44PM +, Emmanuel Dreyfus wrote:
  Hi
 
  The merge of http://review.gluster.org/9953/ removed a few crashes from
  NetBSD regression tests, but the thing remains uterly broken since the
  merge of http://review.gluster.org/9708/ though I cannot tell if I have
  bugs leftover form this commit or if I face new problems.
 
  Here are the known problem so far:
 
  ...snip! I'll only give some info to your 2nd point.
 
  2) I still experience memory corruption, which usually crash glsuterfsd
  because some pointer waas replaced by value 0x3. This strikes on iobref
  most of the time, but it can happens elsewhere.
 
  I would be glad if someone could help here. On nbslave70:/autobuild I
  added code to check for iobref/iobuf sanity at random place (by calling
  iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
  but I have not been able to spot the source of the problem yet.
 
  The weird thing is that memory seems to always be overwritten by the
  same values, and magic 0xcafebabe number before the buffer is preserved.
  Here is an example: where iobref-iobrefs = 0xbb11a458
  0xbb11a44c: 0xcafebabe  0x  0x  0x0003
  0xbb11a45c: 0x0003  0x0008  0x0003  0x000c

Re: [Gluster-devel] New Defects reported by Coverity Scan for gluster/glusterfs

2015-03-25 Thread Niels de Vos
On Wed, Mar 25, 2015 at 10:59:24AM -0700, scan-ad...@coverity.com wrote:
 
 Hi,
 
 Please find the latest report on new defect(s) introduced to 
 gluster/glusterfs found with Coverity Scan.
 
 33 new defect(s) introduced to gluster/glusterfs found with Coverity Scan.
 9 defect(s), reported by Coverity Scan earlier, were marked fixed in the 
 recent build analyzed by Coverity Scan.
 
 New defect(s) Reported-by: Coverity Scan
 Showing 20 of 33 defect(s)
 
 
 ** CID 1291734:  Error handling issues  (CHECKED_RETURN)
 /xlators/cluster/dht/src/tier.c: 451 in tier_build_migration_qfile()

Dan already posted a fix for this:

http://review.gluster.org/1

Niels

 
 
 
 *** CID 1291734:  Error handling issues  (CHECKED_RETURN)
 /xlators/cluster/dht/src/tier.c: 451 in tier_build_migration_qfile()
 445 {
 446 gfdb_time_t current_time;
 447 _gfdb_brick_dict_info_t gfdb_brick_dict_info;
 448 gfdb_time_t time_in_past;
 449 int ret = -1;
 450 
  CID 1291734:  Error handling issues  (CHECKED_RETURN)
  Calling remove((is_promotion ? /var/run/gluster/promotequeryfile : 
  /var/run/gluster/demotequeryfile)) without checking return value. This 
  library function may fail and return an error code.
 451 remove (GET_QFILE_PATH (is_promotion));
 452 time_in_past.tv_sec = args-freq_time;
 453 time_in_past.tv_usec = 0;
 454 if (gettimeofday (current_time, NULL) == -1) {
 455 gf_log (args-this-name, GF_LOG_ERROR,
 456 Failed to get current timen);
 
 
 
 To view the defects in Coverity Scan visit, 
 https://scan.coverity.com/projects/987?tab=overview
 
 To manage Coverity Scan email notifications for gluster-devel@gluster.org, 
 click 
 https://scan.coverity.com/subscriptions/edit?email=gluster-devel%40gluster.orgtoken=7dffab14bc5a7180e75b0d047539f148
  .
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] New features and their location in packages

2015-03-25 Thread Niels de Vos
With all the new features merged, we need to know on what side of the
system the new xlators, libraries and scripts are used. There always are
questions on reducing the installation size on client systems, so
anything that is not strictly needed client-side, should not be in the
client packages.

For example, I would like to hear from all feature owners, which files,
libraries, scripts, docs or other bits are required for clients to
operate. For example, here is what tiering does:

  - client-side: cluster/tiering xlator
  - server-side: libgfdb (with sqlite dependency)

By default, any library is included in the glusterfs-libs RPM. If the
library is only useful on a system with glusterd installed, it should
move to the glusterfs-server RPM.

Clients include fuse and libgfapi, the common package for client-side
bits (and shared client/server bits) is 'glusterfs'. There is no need
for you to post patches for moving files around in the RPMs, that is
something I can do in one go. Just let me know which files are needed
where.

Thanks!
Niels


pgpb4undkZmA1.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] REMINDER: Weekly Gluster Community meeting today at 12:00 UTC

2015-03-25 Thread Vijay Bellur

Hi all,

In about 70 minutes from now we will have the regular weekly Gluster 
Community meeting.


Meeting details:
- location: #gluster-meeting on Freenode IRC
- date: every Wednesday
- time: 8:00 EDT, 12:00 UTC, 13:00 CET, 17:30 IST (in your terminal, 
run: date -d 12:00 UTC)

- agenda: available at [1]

Currently the following items are listed:
* Roll Call
* Status of last week's action items
* GlusterFS 3.6
* GlusterFS 3.5
* GlusterFS 3.4
* GlusterFS Next
* Open Floor
   - Fix regression tests with spurious failures
   - docs
   - Awesum Web Presence
   - Gluster Summit Barcelona, second week in May
   - Gluster Summer of Code


The last topic has space for additions. If you have a suitable topic to
discuss, please add it to the agenda.

Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-community-meetings
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regressions, memory corruption

2015-03-25 Thread Niels de Vos
Ai, top posting, this makes it really difficult to follow the email if
you have not read the first parts :-/ Please remember to inline or
bottom post when replying.

On Wed, Mar 25, 2015 at 03:21:28PM +0530, Venky Shankar wrote:
 looks like the iobref (and the iobuf) was allocated in protocol/server..
 
 (gdb) x/16x (ie-ie_iobref-iobrefs - 8)
 0xbb11a438: 0xbb18ba80  0x0001  0x0068  0x0040
 0xbb11a448: 0xbb1e2018  0xcafebabe  0x  0x
 0xbb11a458: 0x0003  0x0003  0x0008  0x0003
 0xbb11a468: 0x000c  0x0003  0x000e  0x0003
 
 8 bytes before the magic header (0xcafebabe) lives the xlator (this)
 that invoked GF_MALLOC. Here it's:
 
 (gdb) p *(xlator_t *)0xbb1e2018
 $9 = {name = 0xbb1dbb08 patchy-server, type = 0xbb1dbb38
 protocol/server, next = 0xbb1e1018, prev = 0x0, parents = 0x0,
   children = 0xbb1dbbc8, options = 0xbb18a028, dlhandle = 0xb9b7d000,
 fops = 0xb9adf0e0 fops, cbks = 0xb9adc8cc cbks,
   dumpops = 0xb9ade460 dumpops, volume_options = {next = 0xbb1dbb68,
 prev = 0xbb1dbbf8}, fini = 0xb9ab539d fini,
   init = 0xb9ab48a5 init, reconfigure = 0xb9ab418c reconfigure,
 mem_acct_init = 0xb9ab3cb1 mem_acct_init,
   notify = 0xb9ab53a3 notify, loglevel = GF_LOG_NONE, latencies =
 {{min = 0, max = 0, total = 0, std = 0, mean = 0,
   count = 0} repeats 50 times}, history = 0x0, ctx = 0xbb109000,
 graph = 0xbb1c30f8, itable = 0x0,
   init_succeeded = 1 '\001', private = 0xbb1e3018, mem_acct =
 {num_types = 144, rec = 0xbb1c6000}, winds = 0,
   switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false}
 
 looking into it more. if the above strikes a bell to someone, let us know.

Going by the output from gdb above and the below layout:

$ printf 'type=%d\nsize=%d\n' 0x0068 0x0040
type=104
size=64

This means that the protocol/server did a GF_?ALLOC(64, 104). The 104 is
an enum for the mem-type and libglusterfs/src/mem-types.h points to
gf_common_mt_iobrefs. There is only one function that uses
gf_common_mt_iobrefs, which is iobref_new().

protocol/server calls iobref_new() only once directly (there could be
some other indirect calls too) in server_submit_reply().

I do not quickly see how the issue can happen with the analyzed data in
this email. Possibly an allocation before (memory address wise) this
went awry and caused the wreckage. We may need to follow these
diagnostic steps back upwards and try to find the first occurrence where
0xcafebabe is followed by 0xcafebabe instead of 0xbaadf00d.

That's the only idea I have for now, but I'll keep thinking of something
that could make this easier.

Note: the iobref structure is used really a lot, this makes it a likely
structure to blow away other structures when something else frees some
memory, but wants to use it afterwards. I think a use-after-free could
be one cause for this.

Niels

 
 -venky
 
 On Tue, Mar 24, 2015 at 11:28 PM, Niels de Vos nde...@redhat.com wrote:
  On Tue, Mar 24, 2015 at 05:18:44PM +, Emmanuel Dreyfus wrote:
  Hi
 
  The merge of http://review.gluster.org/9953/ removed a few crashes from
  NetBSD regression tests, but the thing remains uterly broken since the
  merge of http://review.gluster.org/9708/ though I cannot tell if I have
  bugs leftover form this commit or if I face new problems.
 
  Here are the known problem so far:
 
  ...snip! I'll only give some info to your 2nd point.
 
  2) I still experience memory corruption, which usually crash glsuterfsd
  because some pointer waas replaced by value 0x3. This strikes on iobref
  most of the time, but it can happens elsewhere.
 
  I would be glad if someone could help here. On nbslave70:/autobuild I
  added code to check for iobref/iobuf sanity at random place (by calling
  iobref_sanity()). I do this in synask_wrap and in STACK_WIND/UNWIND,
  but I have not been able to spot the source of the problem yet.
 
  The weird thing is that memory seems to always be overwritten by the
  same values, and magic 0xcafebabe number before the buffer is preserved.
  Here is an example: where iobref-iobrefs = 0xbb11a458
  0xbb11a44c: 0xcafebabe  0x  0x  0x0003
  0xbb11a45c: 0x0003  0x0008  0x0003  0x000c
  0xbb11a46c: 0x0003  0x000e  0x0003  0x0010
  0xbb11a47c: 0x0003  0x0009  0x0003  0x000d
  0xbb11a48c: 0x0003  0x0015  0x0003  0x0016
  0xbb11a49c: 0x0003  0x0032  0x0034  0xbb1e2018
  0xbb11a4ac: 0xcafebabe  0x  0x  0xbb11a5d8
 
  Recently I was looking into something that involved some more
  understanding of GF_MALLOC(). I did not really continue with it becase
  other things got a higher priority. But, maybe this layout helps you a
  little:
 
   :  :
   :  :