Re: [Gluster-devel] Spurious Failures in regression runs

2015-03-31 Thread Nithya Balachandran
I'll take a look at the hangs.


Regards,
Nithya

- Original Message -
From: Justin Clift jus...@gluster.org
To: Vijay Bellur vbel...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org, Nithya Balachandran 
nbala...@redhat.com
Sent: Tuesday, 31 March, 2015 5:40:29 AM
Subject: Re: [Gluster-devel] Spurious Failures in regression runs

On 30 Mar 2015, at 18:54, Vijay Bellur vbel...@redhat.com wrote:
 Hi All,
 
 We are attempting to capture all known spurious regression failures from the 
 jenkins instance in build.gluster.org at [1].
 The issues listed in the etherpad impede our patch merging workflow and need 
 to be sorted out before we branch
 release-3.7. If you happen to be the owner of one or more issues in the 
 etherpad, can you please look into the failures and
 have them addressed soon?

To help show up more regression failures, we ran 20x new VM's
in Rackspace with a full regression test each of master head
branch:

 * Two hung regression tests on tests/bugs/posix/bug-1113960.t
   * Still hung in case anyone wants to check them out
 * 162.242.167.96
 * 162.242.167.132
 * Both allowing remote root login, and using our jenkins
   slave password as their root pw

* 2 x failures on ./tests/basic/afr/sparse-file-self-heal.t
  Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

  Added to etherpad

* 1 x failure on ./tests/bugs/disperse/bug-1187474.t
  Failed tests:  11-12

  Added to etherpad

* 1 x failure on ./tests/basic/uss.t
  Failed test:  153

  Already on etherpad

Looks like our general failure rate is improving. :)  The hangs
are a bit worrying though. :(

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Responsibilities and expectations of our maintainers

2015-03-31 Thread Venky Shankar
On Tue, Mar 31, 2015 at 12:14 PM, Vijay Bellur vbel...@redhat.com wrote:
 On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:

 Pranith Kumar Karampuri pkara...@redhat.com wrote:

 Emmanuel,
   What can we do to make it vote -2 when it fails? Things will
 automatically fall in place if it gives -2.


 I will do this once I will have recovered. The changelog change broke
 regression for weeks, and now we have a fix for it I discover many other
 poblems have crop.

 While there, to anyone:
 - dd bs=1M is not portable. Use
dd bs=1024k
 - echo 3  /proc/sys/vm/drop_caches is not portable. use instead this
 command that fails but flushes inodes first.
( cd $M0  umount $M0 )
 - umount $N0 brings many problems, use instead
EXPECT_WITHIN $UMOUNT_TIMEOUT Y umount_nfs $N0



 I wonder if we can add these as checks to flag errors in checkpatch.pl so
 that we nip these problems off even before they appear for review?

Makes sense. Over the time we would have a good list of portability checks.


 Thanks,
 Vijay


 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] New components in bugzilla added

2015-03-31 Thread Vijay Bellur

The following new components have been added to bugzilla:

- tiering
- sharding
- bitrot

Please use these components while filing bugs or triaging.

Thanks,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Security hardening RELRO PIE flags

2015-03-31 Thread Kaushal M
IMHO, doing hardening and security should be left the individual
distributions and the package maintainers. Generally, each distribution has
it's own policies with regards to hardening and security. We as an upstream
project cannot decide on what a distribution should do. But we should be
ready to fix bugs that could arise when distributions do hardened builds.

So, I vote against having these hardening flags added to the base GlusterFS
build. But we could add the flags the Fedora spec files which we carry with
our source.

~kaushal

On Tue, Mar 31, 2015 at 11:49 AM, Atin Mukherjee amukh...@redhat.com
wrote:

 Folks,

 There are some projects which uses compiler/glibc features to strengthen
 the security claims. Popular distros suggest to harden daemon with
 RELRO/PIE flags. You could see [1] [2] [3]

 Partial relro is when you have -Wl,-z,relro in the LDFLAGS for building
 libraries. Partial relro means that some ELF sections are reordered so
 that overflows in some likely sections don't affect others and the local
 offset table is readonly. To get full relro, you also need to have:
 -Wl,-z,bind_now added to LDFLAGS. What this does is make the Global
 Offset table and Procedure Lookup Table readonly. This takes
 some time, so its only worth it for apps that have a real possibility of
 being attacked. This would be setuid/setgid/setcap and daemons. There
 are some security critical apps that can have this too. If the apps
 likely parses files from an untrusted source (internet), then it might
 also want to have full relro.

 To enable PIE, you would pass -fPIE -DPIE in the CFLAGS and -pie in the
 LDFLAGS. What PIE does is randomize the locations of important items
 such as the base address of an executable and position of libraries,
 heap, and stack, in a process's address space. Sometimes this is called
 ASLR. Its designed to make buffer/heap overflow, return into libc
 attacks much harder. Part of the way it does this is to make a new
 section in the ELF image that is writable to redirect function calls to
 the correct address (offsets). This has to be writable because each
 invocation will have different layouts and needs to be fixed up. So,
 when you have an application with PIE, you want full relro so that
 these sections become readonly and not part of an attacker's target areas.

 I would like to hear from the community whether we should introduce
 these hardening flags in glusterfs as well.

 [1] https://fedorahosted.org/fesco/ticket/563
 [2] https://wiki.debian.org/Hardening
 [3] https://wiki.ubuntu.com/Security/Features#relro
 --
 ~Atin
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Additional pre-post checks(WAS: Responsibilities and expectations of our maintainers)

2015-03-31 Thread Kaushal M
On Tue, Mar 31, 2015 at 12:49 PM, Niels de Vos nde...@redhat.com wrote:

 On Tue, Mar 31, 2015 at 12:14:29PM +0530, Vijay Bellur wrote:
  On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:
  Pranith Kumar Karampuri pkara...@redhat.com wrote:
  
  Emmanuel,
What can we do to make it vote -2 when it fails? Things will
  automatically fall in place if it gives -2.
  
  I will do this once I will have recovered. The changelog change broke
  regression for weeks, and now we have a fix for it I discover many other
  poblems have crop.
  
  While there, to anyone:
  - dd bs=1M is not portable. Use
 dd bs=1024k
  - echo 3  /proc/sys/vm/drop_caches is not portable. use instead this
  command that fails but flushes inodes first.
 ( cd $M0  umount $M0 )
  - umount $N0 brings many problems, use instead
 EXPECT_WITHIN $UMOUNT_TIMEOUT Y umount_nfs $N0
  
 
 
  I wonder if we can add these as checks to flag errors in checkpatch.pl so
  that we nip these problems off even before they appear for review?

 That would surely be good. I heard that Kaushal understands and can
 write Perl ;-)


This is not true. I can understand the hieroglyphics with some
difficulty, but I sure cannot write it.
But if needed, I could try.


 While on the topic of checkpatch.pl, having a check for empty commit
 messages and multi-line subjects would be nice too.

 Niels
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Security hardening RELRO PIE flags

2015-03-31 Thread Niels de Vos
On Tue, Mar 31, 2015 at 12:20:19PM +0530, Kaushal M wrote:
 IMHO, doing hardening and security should be left the individual
 distributions and the package maintainers. Generally, each distribution has
 it's own policies with regards to hardening and security. We as an upstream
 project cannot decide on what a distribution should do. But we should be
 ready to fix bugs that could arise when distributions do hardened builds.
 
 So, I vote against having these hardening flags added to the base GlusterFS
 build. But we could add the flags the Fedora spec files which we carry with
 our source.

Indeed, I agree that the compiler flags should be specified by the
distributions. At least Fedora and Debian do this already include
(probably different) options within their packaging scripts. We should
set the flags we need, but not more. It would be annoying to set default
flags that can conflict with others, or which are not (yet) available on
architectures that we normally do not test.

Niels

 
 ~kaushal
 
 On Tue, Mar 31, 2015 at 11:49 AM, Atin Mukherjee amukh...@redhat.com
 wrote:
 
  Folks,
 
  There are some projects which uses compiler/glibc features to strengthen
  the security claims. Popular distros suggest to harden daemon with
  RELRO/PIE flags. You could see [1] [2] [3]
 
  Partial relro is when you have -Wl,-z,relro in the LDFLAGS for building
  libraries. Partial relro means that some ELF sections are reordered so
  that overflows in some likely sections don't affect others and the local
  offset table is readonly. To get full relro, you also need to have:
  -Wl,-z,bind_now added to LDFLAGS. What this does is make the Global
  Offset table and Procedure Lookup Table readonly. This takes
  some time, so its only worth it for apps that have a real possibility of
  being attacked. This would be setuid/setgid/setcap and daemons. There
  are some security critical apps that can have this too. If the apps
  likely parses files from an untrusted source (internet), then it might
  also want to have full relro.
 
  To enable PIE, you would pass -fPIE -DPIE in the CFLAGS and -pie in the
  LDFLAGS. What PIE does is randomize the locations of important items
  such as the base address of an executable and position of libraries,
  heap, and stack, in a process's address space. Sometimes this is called
  ASLR. Its designed to make buffer/heap overflow, return into libc
  attacks much harder. Part of the way it does this is to make a new
  section in the ELF image that is writable to redirect function calls to
  the correct address (offsets). This has to be writable because each
  invocation will have different layouts and needs to be fixed up. So,
  when you have an application with PIE, you want full relro so that
  these sections become readonly and not part of an attacker's target areas.
 
  I would like to hear from the community whether we should introduce
  these hardening flags in glusterfs as well.
 
  [1] https://fedorahosted.org/fesco/ticket/563
  [2] https://wiki.debian.org/Hardening
  [3] https://wiki.ubuntu.com/Security/Features#relro
  --
  ~Atin
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-devel
 

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel



pgpsJwWPbQkmR.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Additional pre-post checks(WAS: Responsibilities and expectations of our maintainers)

2015-03-31 Thread Niels de Vos
On Tue, Mar 31, 2015 at 12:14:29PM +0530, Vijay Bellur wrote:
 On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:
 Pranith Kumar Karampuri pkara...@redhat.com wrote:
 
 Emmanuel,
   What can we do to make it vote -2 when it fails? Things will
 automatically fall in place if it gives -2.
 
 I will do this once I will have recovered. The changelog change broke
 regression for weeks, and now we have a fix for it I discover many other
 poblems have crop.
 
 While there, to anyone:
 - dd bs=1M is not portable. Use
dd bs=1024k
 - echo 3  /proc/sys/vm/drop_caches is not portable. use instead this
 command that fails but flushes inodes first.
( cd $M0  umount $M0 )
 - umount $N0 brings many problems, use instead
EXPECT_WITHIN $UMOUNT_TIMEOUT Y umount_nfs $N0
 
 
 
 I wonder if we can add these as checks to flag errors in checkpatch.pl so
 that we nip these problems off even before they appear for review?

That would surely be good. I heard that Kaushal understands and can
write Perl ;-)

While on the topic of checkpatch.pl, having a check for empty commit
messages and multi-line subjects would be nice too.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Rebalance improvement design

2015-03-31 Thread Susant Palai
Hi,
   Posted patch for rebalance improvement here: 
http://review.gluster.org/#/c/9657/ .
You can find the feature page here: 
http://www.gluster.org/community/documentation/index.php/Features/improve_rebalance_performance

The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node

Brief design explanation for the above two points.

1. Rebalance multiple files in parallel:
   -

The existing rebalance engine is single threaded. Hence, introduced 
multiple threads which will be running parallel to the crawler.
The current rebalance migration is converted to a Producer-Consumer 
frame work. 
Where Producer is : Crawler 
  Consumer is : Migrating Threads 
 
Crawler: Crawler is the main thread. The job of the crawler is now 
limited to fix-layout of each directory and add the files 
 which are eligible for the migration to a global queue. Hence, 
the crawler will not be blocked by migration process. 

   Producer: Producer will monitor the global queue. If any file is added 
to this queue, it will dqueue that entry and migrate the file.
Currently 15 migration threads are spawned at the beginning of 
the rebalance process. Hence, multiple file migration 
happens in parallel.


2. Crawl only bricks that belong to the current node:
   --

   As rebalance process is spawned per node, it migrates only the files 
that belongs to it's own node for the sake of load
   balancing. But it also reads entries from the whole cluster, which 
is not necessary as readdir hits other nodes.

 New Design:
   As part of the new design the rebalancer decides the subvols that 
are local to the rebalancer node by checking the node-uuid of 
   root directory prior to the crawler starts. Hence, readdir won't hit 
the whole cluster  as it has already the context of
  local subvols and also node-uuid request for each file can be 
avoided. This makes the rebalance process more scalable.


Requesting reviews asap.

Regards,
Susant





 








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] feature/trash and NetBSD

2015-03-31 Thread Anoop C S


On 03/31/2015 02:49 PM, Emmanuel Dreyfus wrote:
 On Tue, Mar 31, 2015 at 10:57:12AM +0530, Anoop C S wrote:
 The above mentioned patch for skipping extended truncate
 [http://review.gluster.org/#/c/9984/] got merged yesterday. And some
 portability fixes for trash.t was included in your recently merged patch
 [http://review.gluster.org/#/c/10033/]. Now we expect trash.t to run
 more smoothly than before on NetBSD. Feel free to reply with outstanding
 failures.
 
 There are other problems, many tieming issues that can be addressed
 using the appropriate wrappers (see patch below). However, it still fails
 on test 56 which is about restarting the volume:
 

Thanks for the patch.

 TEST 56 (line 207): gluster --mode=script --wignore volume start patchy1 force
 [09:12:53] ./tests/features/trash.t .. 56/65 
 not ok 56 
 
 Could you have a look? You will find the test ready to run with my
 latest patches on nbslave76.cloud.gluster.org:/autobuild/glusterfs
 

Thanks for spending your valuable time on trash.t. I will login and
check now. By the way, what is the password for root login?

--Anoop C S.

 diff --git a/tests/features/trash.t b/tests/features/trash.t
 index cbcff23..4546b57 100755
 --- a/tests/features/trash.t
 +++ b/tests/features/trash.t
 @@ -7,7 +7,11 @@ cleanup
  
  test_mount() {
  glusterfs -s $H0 --volfile-id $V0 $M0 --attribute-timeout=0
 -test -d $M0/.trashcan
 +timeout=0
 +while [ $timeout -lt $PROCESS_UP_TIMEOUT ] ; do
 + timeout=$(( $timeout + 1 ))
 +test -d $M0/.trashcan  break
 +done
  }
  
  start_vol() {
 @@ -15,19 +19,23 @@ start_vol() {
  test_mount
  }
  
 -stop_vol() {
 -umount $M0
 -$CLI volume stop $V0
 -}
 -
  create_files() {
  echo 'Hi'  $1
  echo 'Hai'  $2
  }
  
 -file_exists() {
 -test -e $B0/${V0}1/$1 -o -e $B0/${V0}2/$1
 -test -e $B0/${V0}1/$2 -o -e $B0/${V0}2/$2
 +file_exists () {
 +vol=$1
 + shift
 + for file in `ls $B0/${vol}1/$@ 2/dev/null` ; do
 +test -e ${file}  { echo Y; return 0; }
 +done
 + for file in `ls $B0/${vol}2/$@ 2/dev/null` ; do
 +test -e ${file}  { echo Y; return 0; }
 +done
 +
 + echo N
 + return 1;
  }
  
  unlink_op() {
 @@ -85,7 +93,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash'
  
  # files directly under mount point [13]
  create_files $M0/file1 $M0/file2
 -TEST file_exists file1 file2
 +TEST file_exists $V0 file1 file2
  
  # perform unlink [14]
  TEST unlink_op file1
 @@ -96,7 +104,7 @@ TEST truncate_op file2 4
  # create files directory hierarchy and check [16]
  mkdir -p $M0/1/2/3
  create_files $M0/1/2/3/foo1 $M0/1/2/3/foo2
 -TEST file_exists 1/2/3/foo1 1/2/3/foo2
 +TEST file_exists $V0 1/2/3/foo1 1/2/3/foo2
  
  # perform unlink [17]
  TEST unlink_op 1/2/3/foo1
 @@ -113,7 +121,7 @@ EXPECT '/a' volinfo_field $V0 
 'features.trash-eliminate-path'
  
  # create two files and check [21]
  create_files $M0/a/test1 $M0/a/test2
 -TEST file_exists a/test1 a/test2
 +TEST file_exists $V0 a/test1 a/test2
  
  # remove from eliminate pattern [22]
  rm -f $M0/a/test1
 @@ -131,7 +139,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash-internal-op'
  
  # again create two files and check [28]
  create_files $M0/inop1 $M0/inop2
 -TEST file_exists inop1 inop2
 +TEST file_exists $V0 inop1 inop2
  
  # perform unlink [29]
  TEST unlink_op inop1
 @@ -141,11 +149,12 @@ TEST truncate_op inop2 4
  
  # remove one brick and restart the volume [31-33]
  TEST $CLI volume remove-brick $V0 $H0:$B0/${V0}2 force
 -TEST stop_vol
 +EXPECT_WITHIN $UMOUNT_TIMEOUT Y force_umount $M0
 +$CLI volume stop $V0
  TEST start_vol
  # again create two files and check [34]
  create_files $M0/rebal1 $M0/rebal2
 -TEST file_exists rebal1 rebal2
 +TEST file_exists $V0 rebal1 rebal2
  
  # add one brick [35-36]
  TEST $CLI volume add-brick $V0 $H0:$B0/${V0}3
 @@ -158,7 +167,8 @@ sleep 3
  # check whether rebalance was succesful [38-40]
  TEST [ -e $B0/${V0}3/rebal2 ]
  TEST [ -e $B0/${V0}1/.trashcan/internal_op/rebal2* ]
 -TEST stop_vol
 +EXPECT_WITHIN $UMOUNT_TIMEOUT Y force_umount $M0
 +$CLI volume stop $V0
  
  # create a replicated volume [41]
  TEST $CLI volume create $V1 replica 2 $H0:$B0/${V1}{1,2}
 @@ -187,9 +197,10 @@ touch $M1/self
  TEST [ -e $B0/${V1}1/self -a -e $B0/${V1}2/self ]
  
  # kill one brick and delete the file from mount point [55]
 -kill `ps aux| grep glusterfsd | awk '{print $2}' | head -1`
 +kill `ps auxww| grep glusterfsd | awk '{print $2}' | head -1`
  sleep 2
  rm -f $M1/self
 +sleep 1
  TEST [ -e $M1/.trashcan/self* ]
  
  # force start the volume and trigger the self-heal manually [56]
 @@ -197,7 +208,7 @@ TEST $CLI volume start $V1 force
  sleep 3
  
  # check for the removed file in trashcan [57]
 -TEST [ -e $B0/${V1}1/.trashcan/internal_op/self* -o -e 
 $B0/${V1}2/.trashcan/internal_op/self* ]
 +EXPECT_WITHIN $HEAL_TIMEOUT Y file_exists $V1 

Re: [Gluster-devel] feature/trash and NetBSD

2015-03-31 Thread Emmanuel Dreyfus
On Tue, Mar 31, 2015 at 10:57:12AM +0530, Anoop C S wrote:
 The above mentioned patch for skipping extended truncate
 [http://review.gluster.org/#/c/9984/] got merged yesterday. And some
 portability fixes for trash.t was included in your recently merged patch
 [http://review.gluster.org/#/c/10033/]. Now we expect trash.t to run
 more smoothly than before on NetBSD. Feel free to reply with outstanding
 failures.

There are other problems, many tieming issues that can be addressed
using the appropriate wrappers (see patch below). However, it still fails
on test 56 which is about restarting the volume:

TEST 56 (line 207): gluster --mode=script --wignore volume start patchy1 force
[09:12:53] ./tests/features/trash.t .. 56/65 
not ok 56 

Could you have a look? You will find the test ready to run with my
latest patches on nbslave76.cloud.gluster.org:/autobuild/glusterfs

diff --git a/tests/features/trash.t b/tests/features/trash.t
index cbcff23..4546b57 100755
--- a/tests/features/trash.t
+++ b/tests/features/trash.t
@@ -7,7 +7,11 @@ cleanup
 
 test_mount() {
 glusterfs -s $H0 --volfile-id $V0 $M0 --attribute-timeout=0
-test -d $M0/.trashcan
+timeout=0
+while [ $timeout -lt $PROCESS_UP_TIMEOUT ] ; do
+   timeout=$(( $timeout + 1 ))
+test -d $M0/.trashcan  break
+done
 }
 
 start_vol() {
@@ -15,19 +19,23 @@ start_vol() {
 test_mount
 }
 
-stop_vol() {
-umount $M0
-$CLI volume stop $V0
-}
-
 create_files() {
 echo 'Hi'  $1
 echo 'Hai'  $2
 }
 
-file_exists() {
-test -e $B0/${V0}1/$1 -o -e $B0/${V0}2/$1
-test -e $B0/${V0}1/$2 -o -e $B0/${V0}2/$2
+file_exists () {
+vol=$1
+   shift
+   for file in `ls $B0/${vol}1/$@ 2/dev/null` ; do
+test -e ${file}  { echo Y; return 0; }
+done
+   for file in `ls $B0/${vol}2/$@ 2/dev/null` ; do
+test -e ${file}  { echo Y; return 0; }
+done
+
+   echo N
+   return 1;
 }
 
 unlink_op() {
@@ -85,7 +93,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash'
 
 # files directly under mount point [13]
 create_files $M0/file1 $M0/file2
-TEST file_exists file1 file2
+TEST file_exists $V0 file1 file2
 
 # perform unlink [14]
 TEST unlink_op file1
@@ -96,7 +104,7 @@ TEST truncate_op file2 4
 # create files directory hierarchy and check [16]
 mkdir -p $M0/1/2/3
 create_files $M0/1/2/3/foo1 $M0/1/2/3/foo2
-TEST file_exists 1/2/3/foo1 1/2/3/foo2
+TEST file_exists $V0 1/2/3/foo1 1/2/3/foo2
 
 # perform unlink [17]
 TEST unlink_op 1/2/3/foo1
@@ -113,7 +121,7 @@ EXPECT '/a' volinfo_field $V0 
'features.trash-eliminate-path'
 
 # create two files and check [21]
 create_files $M0/a/test1 $M0/a/test2
-TEST file_exists a/test1 a/test2
+TEST file_exists $V0 a/test1 a/test2
 
 # remove from eliminate pattern [22]
 rm -f $M0/a/test1
@@ -131,7 +139,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash-internal-op'
 
 # again create two files and check [28]
 create_files $M0/inop1 $M0/inop2
-TEST file_exists inop1 inop2
+TEST file_exists $V0 inop1 inop2
 
 # perform unlink [29]
 TEST unlink_op inop1
@@ -141,11 +149,12 @@ TEST truncate_op inop2 4
 
 # remove one brick and restart the volume [31-33]
 TEST $CLI volume remove-brick $V0 $H0:$B0/${V0}2 force
-TEST stop_vol
+EXPECT_WITHIN $UMOUNT_TIMEOUT Y force_umount $M0
+$CLI volume stop $V0
 TEST start_vol
 # again create two files and check [34]
 create_files $M0/rebal1 $M0/rebal2
-TEST file_exists rebal1 rebal2
+TEST file_exists $V0 rebal1 rebal2
 
 # add one brick [35-36]
 TEST $CLI volume add-brick $V0 $H0:$B0/${V0}3
@@ -158,7 +167,8 @@ sleep 3
 # check whether rebalance was succesful [38-40]
 TEST [ -e $B0/${V0}3/rebal2 ]
 TEST [ -e $B0/${V0}1/.trashcan/internal_op/rebal2* ]
-TEST stop_vol
+EXPECT_WITHIN $UMOUNT_TIMEOUT Y force_umount $M0
+$CLI volume stop $V0
 
 # create a replicated volume [41]
 TEST $CLI volume create $V1 replica 2 $H0:$B0/${V1}{1,2}
@@ -187,9 +197,10 @@ touch $M1/self
 TEST [ -e $B0/${V1}1/self -a -e $B0/${V1}2/self ]
 
 # kill one brick and delete the file from mount point [55]
-kill `ps aux| grep glusterfsd | awk '{print $2}' | head -1`
+kill `ps auxww| grep glusterfsd | awk '{print $2}' | head -1`
 sleep 2
 rm -f $M1/self
+sleep 1
 TEST [ -e $M1/.trashcan/self* ]
 
 # force start the volume and trigger the self-heal manually [56]
@@ -197,7 +208,7 @@ TEST $CLI volume start $V1 force
 sleep 3
 
 # check for the removed file in trashcan [57]
-TEST [ -e $B0/${V1}1/.trashcan/internal_op/self* -o -e 
$B0/${V1}2/.trashcan/internal_op/self* ]
+EXPECT_WITHIN $HEAL_TIMEOUT Y file_exists $V1 .trashcan/internal_op/self*
 
 # check renaming of trash directory through cli [58-62]
 TEST $CLI volume set $V0 trash-dir abc


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Justin Clift
Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our standard 2GB testing VMs.

These are the failure results:

  * 20 x tests/basic/mount-nfs-auth.t
Failed test:  40

100% fail rate. ;)

  * 20 x tests/basic/uss.t
Failed tests:  149, 151-153, 157-159

100% fail rate

  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15

55% fail rate

  * 2 x tests/performance/open-behind.t
Failed test:  17

10% fail rate

  * 1 x tests/basic/afr/self-heald.t
Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
   67-75, 77, 79-81

5% fail rate

  * 1 x tests/basic/afr/entry-self-heal.t
Failed tests:  127-128

5% fail rate

  * 1 x tests/features/trash.t
Failed test:  57

5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

  http://ded.ninja/gluster/blk0/
  http://ded.ninja/gluster/blk1/

Hoping someone has some time to check those quickly and see
if there's anything useful in them or not.

(the hosts are all still online atm, shortly to be nuked)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Does anyone care if GlusterFS 3.7 does not work on older distributions?

2015-03-31 Thread Dan Mons
On 27 March 2015 at 04:48, Niels de Vos nde...@redhat.com wrote:
 If you have a strong desire for GlusterFS 3.7 clients on older
 distributions, contact us as soon as possible (definitely within the
 next two/three weeks) so that we can look into the matter.

You mention RHEL5 and Ubuntu 12.04LTS Precise as two targets that are
potentially causing problems.

I'm assuming for newer targets you're referring to RHEL7, Ubuntu
14.04LTS Trusty, and similar current long-term releases?

My only objection is when software requires me to start running
non-LTS releases (Ubuntu short term releases, Fedora, etc).  That's a
recipe for pain and heartache in a busy production world.

-Dan


Dan Mons - RD Sysadmin
Cutting Edge
http://cuttingedge.com.au
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Justin Clift
On 1 Apr 2015, at 03:03, Emmanuel Dreyfus m...@netbsd.org wrote:
 Jeff Darcy jda...@redhat.com wrote:
 
 That's fine.  I left a note for you in the script, regarding what I
 think it needs to do at that point.
 
 Here is the comment:
 
 # We shouldn't be touching CR at all.  For V, we should set V+1 iff this
 # test succeeded *and* the value was already 0 or 1, V-1 otherwise. I
 # don't know how to do that, but the various smoke tests must be doing
 # something similar/equivalent.  It's also possible that this part should
 # be done as a post-build action instead.
 
 The problem is indeed that we do now know how to retreive previous V
 value. I guess gerrit is the place where V combinations should be
 correctly handled.
 
 What is the plan for NetBSD regression now? It will fail anything which
 has not been rebased after recent fixes were merged, but apart from that
 the thing is in rather good shape right now.

It sounds like we need a solution to have both the NetBSD and CentOS
regressions run, and only give the +1 when both of them have successfully
finished.  If either of them fail, then it gets a -1.

Research time. ;)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Justin Clift
On 31 Mar 2015, at 14:18, Shyam srang...@redhat.com wrote:
snip
 Also, most of the regression runs produced cores.  Here are
 the first two:
 
   http://ded.ninja/gluster/blk0/
 
 There are 4 cores here, 3 pointing to the (by now hopefully) famous bug 
 #1195415. One of the cores exhibit a different stack etc. Need more analysis 
 to see what the issue could be here, core file: core.16937
 
   http://ded.ninja/gluster/blk1/
 
 There is a single core here, pointing to the above bug again.

Both the blk0 and blk1 VM's are still online and available,
if that's helpful?

If not, please let me know and I'll nuke them. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Emmanuel Dreyfus
Jeff Darcy jda...@redhat.com wrote:

 That's fine.  I left a note for you in the script, regarding what I
 think it needs to do at that point.

Here is the comment:

 # We shouldn't be touching CR at all.  For V, we should set V+1 iff this
 # test succeeded *and* the value was already 0 or 1, V-1 otherwise. I
 # don't know how to do that, but the various smoke tests must be doing
 # something similar/equivalent.  It's also possible that this part should
 # be done as a post-build action instead.

The problem is indeed that we do now know how to retreive previous V
value. I guess gerrit is the place where V combinations should be
correctly handled.

What is the plan for NetBSD regression now? It will fail anything which
has not been rebased after recent fixes were merged, but apart from that
the thing is in rather good shape right now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Justin Clift
On 1 Apr 2015, at 04:07, Emmanuel Dreyfus m...@netbsd.org wrote:
 Justin Clift jus...@gluster.org wrote:
 
 It sounds like we need a solution to have both the NetBSD and CentOS
 regressions run, and only give the +1 when both of them have successfully
 finished.  If either of them fail, then it gets a -1.
 
 That, or perhaps we could have two verified fields?

Sure.  Whichever works. :)

Personally, I'm not sure how to do either yet.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Justin Clift
On 1 Apr 2015, at 05:04, Emmanuel Dreyfus m...@netbsd.org wrote:
 Justin Clift jus...@gluster.org wrote:
 
 That, or perhaps we could have two verified fields?
 
 Sure.  Whichever works. :)
 
 Personally, I'm not sure how to do either yet.
 
 In http://build.gluster.org/gerrit-trigger/ you have Verdict
 categories with CRVW (code review) and VRIF (verified), and there is a
 add verdict category, which suggest this is something that can be
 done.
 
 Of course the Gerrit side will need some configuration too, but if
 Jenkins can deal with more Gerrit fields, there must be a way to add
 fields in Gerrit.

Interesting.  Marcelo, this sounds like something you'd know
about.  Any ideas? :)

We're trying to add an extra Verified column to our Gerrit +
Jenkins setup.  We have an existing one for Gluster Build System
(which is our CentOS Regression testing).  Now we want to add one for
our NetBSD Regression testing.

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Justin Clift
On 31 Mar 2015, at 17:43, Nithya Balachandran nbala...@redhat.com wrote:
snip
  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15
 
55% fail rate
 
 Is the test output for the bug-1117851.t failure available anywhere? 

Not at the moment.  It would be really easy to setup a new VM with
a failure of this, and give you access to it, if that would help?

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Emmanuel Dreyfus
Justin Clift jus...@gluster.org wrote:

 It sounds like we need a solution to have both the NetBSD and CentOS
 regressions run, and only give the +1 when both of them have successfully
 finished.  If either of them fail, then it gets a -1.

That, or perhaps we could have two verified fields?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Hangout: BitRot detection in GlusterFS

2015-03-31 Thread Venky Shankar

Hello folks,

I've scheduled an Hangout[1] session tomorrow regarding upcoming BitRot 
Detection feature in GlusterFS. This session would include a preview of 
the feature, implementation details and quick demo.


Please plan to join the Hangout session and spread the word around.

[1]: http://goo.gl/ZvvWNC

Thanks,
Venky
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Nithya Balachandran
Yes, that would be great. 

Regards,
Nithya

- Original Message -
From: Justin Clift jus...@gluster.org
To: Nithya Balachandran nbala...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Wednesday, 1 April, 2015 8:20:21 AM
Subject: Re: [Gluster-devel] Extra overnight regression test run results

On 31 Mar 2015, at 17:43, Nithya Balachandran nbala...@redhat.com wrote:
snip
  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15
 
55% fail rate
 
 Is the test output for the bug-1117851.t failure available anywhere? 

Not at the moment.  It would be really easy to setup a new VM with
a failure of this, and give you access to it, if that would help?

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Emmanuel Dreyfus
Justin Clift jus...@gluster.org wrote:

  That, or perhaps we could have two verified fields?
 
 Sure.  Whichever works. :)
 
 Personally, I'm not sure how to do either yet.

In http://build.gluster.org/gerrit-trigger/ you have Verdict
categories with CRVW (code review) and VRIF (verified), and there is a
add verdict category, which suggest this is something that can be
done.

Of course the Gerrit side will need some configuration too, but if
Jenkins can deal with more Gerrit fields, there must be a way to add
fields in Gerrit.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Nithya Balachandran


- Original Message -
From: Justin Clift jus...@gluster.org
To: Gluster Devel gluster-devel@gluster.org
Sent: Tuesday, 31 March, 2015 6:03:49 PM
Subject: [Gluster-devel] Extra overnight regression test run results

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our standard 2GB testing VMs.

These are the failure results:

  * 20 x tests/basic/mount-nfs-auth.t
Failed test:  40

100% fail rate. ;)

  * 20 x tests/basic/uss.t
Failed tests:  149, 151-153, 157-159

100% fail rate

  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15

55% fail rate


Is the test output for the bug-1117851.t failure available anywhere? 

Nithya

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] About split-brain-resolution.t

2015-03-31 Thread Emmanuel Dreyfus
Anuradha Talur ata...@redhat.com wrote:

 1) I send a patch today to revert the .t and send it again along with the fix.
 Or...
 2) Can this be failure be ignored till the fix is merged in?

We can ignore: NetBSD regresssion skips the test for now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Major function rename needed for uuid_*() API

2015-03-31 Thread Niels de Vos
Manu noticed a very ugly issue related to different implementations/API
for uuid_*() functions. It seems that not all OS implementations of uuid
functions have the same API. This is not a major issue, Gluster carries
contrib/uuid/ for this.

However, applications that trigger loading of libglusterfs.so through a
dlopen() call, might have uuid_* symbols loaded already. On Linux this
problem is likely not noticible, because the symbols from libuuid and
libglusterfs do not conflict. Unfortunately, on NetBSD the libc library
provides the same uuid_* symbols, but these expect different parameters.

The plan to clean this up, and fix the dlopen() loading on NetBSD is
like this:

1. replace/rename all uuid_*() functions with gf_uuid_*()
   NetBSD can use the contrib/uuid (with gf_ prefix) symbols

2. glue the OS implementations of uuid_*() functions into libglusterfs,
   replacing the gf_uuid_*() functions from contrib/uuid
   - this can be done gradually, contrib/uuid will become unneeded when
 a glue layer is available

3. once all OS glue layers are in place, remove contrib/uuid completely


Please keep an eye out for patch #8 from Many:

http://review.gluster.org/10017

For tracking this particular issue, Bug 1206587 was opened. The patch
above should make it for the 3.7 release, but points 2 and 3 do not have
the same high priority.

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Niels de Vos
On Tue, Mar 31, 2015 at 01:33:49PM +0100, Justin Clift wrote:
 Hi all,
 
 Ran 20 x regression test jobs on (severely resource
 constrained) 1GB Rackspace VM's last night (in addition to the
 20x normal VM's ones also run).
 
 The 1GB VM's have much much slower disk, only one virtual CPU,
 and 1/2 the RAM of our standard 2GB testing VMs.
 
 These are the failure results:
 
   * 20 x tests/basic/mount-nfs-auth.t
 Failed test:  40
 
 100% fail rate. ;)

Jiffin is working on improving this, should be ready soon:

http://review.gluster.org/10047

Cheers,
Niels

 
   * 20 x tests/basic/uss.t
 Failed tests:  149, 151-153, 157-159
 
 100% fail rate
 
   * 11 x tests/bugs/distribute/bug-1117851.t
 Failed test:  15
 
 55% fail rate
 
   * 2 x tests/performance/open-behind.t
 Failed test:  17
 
 10% fail rate
 
   * 1 x tests/basic/afr/self-heald.t
 Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
67-75, 77, 79-81
 
 5% fail rate
 
   * 1 x tests/basic/afr/entry-self-heal.t
 Failed tests:  127-128
 
 5% fail rate
 
   * 1 x tests/features/trash.t
 Failed test:  57
 
 5% fail rate
 
 Wouldn't surprise me if some/many of the failures are due to
 time out of various sorts in tests.  Very slow VMs. ;)
 
 Also, most of the regression runs produced cores.  Here are
 the first two:
 
   http://ded.ninja/gluster/blk0/
   http://ded.ninja/gluster/blk1/
 
 Hoping someone has some time to check those quickly and see
 if there's anything useful in them or not.
 
 (the hosts are all still online atm, shortly to be nuked)
 
 Regards and best wishes,
 
 Justin Clift
 
 --
 GlusterFS - http://www.gluster.org
 
 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.
 
 My personal twitter: twitter.com/realjustinclift
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel


pgpuI72yYMmXM.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [HEADS UP] NetBSD regression voting enabled

2015-03-31 Thread Emmanuel Dreyfus
On Tue, Mar 31, 2015 at 06:21:00AM +0200, Emmanuel Dreyfus wrote:
 On success: verified=0, code-review=0
 On failure: verified=0, code-review=-2

But unfortunately this approach is broken, as Linux regression 
overrides NetBSD's result, even if it does not cast a vote for
code review. 

It seems we need to have two different users in Gerrit to
report NetBSD and Linux regressions. Opinion, anyone?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Shyam

On 03/31/2015 08:33 AM, Justin Clift wrote:

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our standard 2GB testing VMs.

These are the failure results:

   * 20 x tests/basic/mount-nfs-auth.t
 Failed test:  40

 100% fail rate. ;)

   * 20 x tests/basic/uss.t
 Failed tests:  149, 151-153, 157-159

 100% fail rate

   * 11 x tests/bugs/distribute/bug-1117851.t
 Failed test:  15

 55% fail rate

   * 2 x tests/performance/open-behind.t
 Failed test:  17

 10% fail rate

   * 1 x tests/basic/afr/self-heald.t
 Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
67-75, 77, 79-81

 5% fail rate

   * 1 x tests/basic/afr/entry-self-heal.t
 Failed tests:  127-128

 5% fail rate

   * 1 x tests/features/trash.t
 Failed test:  57

 5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

   http://ded.ninja/gluster/blk0/


There are 4 cores here, 3 pointing to the (by now hopefully) famous bug 
#1195415. One of the cores exhibit a different stack etc. Need more 
analysis to see what the issue could be here, core file: core.16937



   http://ded.ninja/gluster/blk1/


There is a single core here, pointing to the above bug again.



Hoping someone has some time to check those quickly and see
if there's anything useful in them or not.

(the hosts are all still online atm, shortly to be nuked)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Vijay Bellur

On 03/31/2015 06:48 PM, Shyam wrote:

On 03/31/2015 08:33 AM, Justin Clift wrote:

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our standard 2GB testing VMs.

These are the failure results:

   * 20 x tests/basic/mount-nfs-auth.t
 Failed test:  40

 100% fail rate. ;)

   * 20 x tests/basic/uss.t
 Failed tests:  149, 151-153, 157-159

 100% fail rate

   * 11 x tests/bugs/distribute/bug-1117851.t
 Failed test:  15

 55% fail rate

   * 2 x tests/performance/open-behind.t
 Failed test:  17

 10% fail rate

   * 1 x tests/basic/afr/self-heald.t
 Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
67-75, 77, 79-81

 5% fail rate

   * 1 x tests/basic/afr/entry-self-heal.t
 Failed tests:  127-128

 5% fail rate

   * 1 x tests/features/trash.t
 Failed test:  57

 5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

   http://ded.ninja/gluster/blk0/


There are 4 cores here, 3 pointing to the (by now hopefully) famous bug
#1195415. One of the cores exhibit a different stack etc. Need more
analysis to see what the issue could be here, core file: core.16937



Adding Pranith as he mentioned a possible root cause for this now famous 
bug :).


-Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Jeff Darcy
The following Gerrit patchsets were affected:

http://review.gluster.org/#/c/9557/ (Nandaja Varma)
changelog: Fixing buffer overrun coverity issues

http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
cluster/ec: Refactor inode-writev

http://review.gluster.org/#/c/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script

http://review.gluster.org/#/c/10075/ (Jeff Darcy)
socket: use OpenSSL multi-threading interfaces
this one nuked a CR+1 (from Kaleb) as well as V+1

In the absence of any other obvious way to fix this up, I'll
start new jobs for these momentarily.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Jeff Darcy
 The following Gerrit patchsets were affected:
 
 http://review.gluster.org/#/c/9557/ (Nandaja Varma)
 changelog: Fixing buffer overrun coverity issues
 
 http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
 cluster/ec: Refactor inode-writev
 
 http://review.gluster.org/#/c/9970/ (Kotresh HR)
 extras: Fix stop-all-gluster-processes.sh script
 
 http://review.gluster.org/#/c/10075/ (Jeff Darcy)
 socket: use OpenSSL multi-threading interfaces
 this one nuked a CR+1 (from Kaleb) as well as V+1
 
 In the absence of any other obvious way to fix this up, I'll
 start new jobs for these momentarily.

Found another one:

http://review.gluster.org/#/c/9859/ (Raghavendra Talur)
libglusterfs/syncop: Add xdata to all syncop calls

Started a new job for that one too.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Justin Clift
On 1 Apr 2015, at 00:48, Jeff Darcy jda...@redhat.com wrote:
 The following Gerrit patchsets were affected:
 
http://review.gluster.org/#/c/9557/ (Nandaja Varma)
changelog: Fixing buffer overrun coverity issues
 
http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
cluster/ec: Refactor inode-writev
 
http://review.gluster.org/#/c/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script
 
http://review.gluster.org/#/c/10075/ (Jeff Darcy)
socket: use OpenSSL multi-threading interfaces
this one nuked a CR+1 (from Kaleb) as well as V+1
 
 In the absence of any other obvious way to fix this up, I'll
 start new jobs for these momentarily.
 
 Found another one:
 
http://review.gluster.org/#/c/9859/ (Raghavendra Talur)
libglusterfs/syncop: Add xdata to all syncop calls
 
 Started a new job for that one too.

If you have a build.gluster.org login, fixing this is pretty
simple.  Doesn't need the job to be re-run. ;)

All you need to do is change to the jenkins user (on
build.gluster.org) then run the command that's at the bottom
of the regression test run.

For example, looking at the regression run for the first
issue you have in the list:

  
http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull

At the very end of the regression run, it shows this:

  ssh bu...@review.gluster.org gerrit review --message 
''\''http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull
 : SUCCESS'\''' --project=glusterfs --verified=+1 --code-review=0 
ab9bdb54f89a6f8080f8b338b32b23698e9de515

Running that command from the jenkins user on build.gluster.org
resend the SUCCESS message to Gerrit:

 [homepc]$ ssh build.gluster.org

 [justin@build]$ sudo su - jenkins

 [jenkins@build]$ ssh bu...@review.gluster.org gerrit review --message 
''\''http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull
 : SUCCESS'\''' --project=glusterfs --verified=+1 --code-review=0 
ab9bdb54f89a6f8080f8b338b32b23698e9de515
 [jenkins@build]$

And it's done. ;)

I've done the first one.  I'll leave the others for you, so you
embed the skill :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Jeff Darcy
It was improperly clearing previously-set V+1 flags, even on success.  That is 
counterproductive in the most literal sense of the word.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Jeff Darcy
 I've done the first one.  I'll leave the others for you, so you
 embed the skill :)

Done.  Thanks!  I also canceled the now-superfluous jobs.  Maybe
in my Copious Spare Time(tm) I'll write a script to do this more
easily for other obviously-spurious regression results.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Emmanuel Dreyfus
Jeff Darcy jda...@redhat.com wrote:

  http://review.gluster.org/#/c/9970/ (Kotresh HR)
  extras: Fix stop-all-gluster-processes.sh script

Theses are the NetBSD regression failures for which we got fixes merged
recently. Doesn't it just need to be rebased?

I re-enabled NetBSD regression, with voting disabled until the mess is
fixed.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

2015-03-31 Thread Jeff Darcy
   http://review.gluster.org/#/c/9970/ (Kotresh HR)
   extras: Fix stop-all-gluster-processes.sh script
 
 Theses are the NetBSD regression failures for which we got fixes merged
 recently. Doesn't it just need to be rebased?

Quite possibly.  I wasn't looking at patch contents all that closely.

 I re-enabled NetBSD regression, with voting disabled until the mess is
 fixed.

That's fine.  I left a note for you in the script, regarding what I
think it needs to do at that point.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel