Re: [Pvfs2-developers] parallel state machine code

2006-09-05 Thread Walter B. Ligon III

OK, I think I fixed the small-io problem and the mkdir problem.
That only leaves the mounting problem.  I've never attempted to build 
the kernel interface or mount the file system (being the old goat that I 
am) so that might take a bit.


I'll commit the changes I made and you can run them in the next nightly 
to see if anything new pops up.


Walt

Robert Latham wrote:

On Tue, Aug 29, 2006 at 04:55:06PM -0400, Walter B. Ligon III wrote:

So, I would appreciate some help running some tests on the branch, while 
I start documenting, and let me know when you think I should start 
merging it back with the trunk.  Or I'm open to whatever other 
suggestions ...



OK, walt, we're getting close.  I committed a couple small fixes to
get pvfs2-client-core building.  Here's what's not working so well
right now:

- mounting pvfs2 fails with a timeout

- many MPI-IO workloads pass, but the noncontig test triggered a
  segfault in small_io_cleanup, where it cleans up various fields in
  the sm_p structure.  In particular, 'sm_p->msgarray = NULL' caused a
  core dump, and when I look at that core file in gdb,
  sm_p->msgarray_count is really high (135950228).  Looks like maybe
  the sm_p wasn't properly allocated? I dunno, I'm just the messenger.

- pvfs2-cp dies with a segfault when using a very small blocksize (-b
  128). here's where gdb says the fault lies:

---
  #0  0x0806d3d8 in small_io_completion_fn (user_args=0x80f0da8, 
resp_p=0xbfffb42c, index=0) at sys-small-io.sm:242

242 fdata.server_nr = sm_p->u.io.datafile_index_array[index];
(gdb) p sm_p->u.io
$8 = {io_type = 135162104, file_req = 0x2, file_req_offset = 0, buffer = 0x0, 
  mem_req = 0x0, io_resp_p = 0x50, flowproto_type = 17, encoding = 135206232, 
  datafile_index_array = 0x0, datafile_count = 0, 
  msgpair_completion_count = 81, flow_completion_count = 0, 
  write_ack_completion_count = 0, contexts = 0x80f13d4, 
  context_count = 135205832, total_cancellations_remaining = 0, 
  retry_count = 135206064, stored_error_code = 3396, total_size = 9, 
  dfile_size_array = 0x0, small_io = 0}

---

- test-zero-fill fails with a segfault in the same place as pvfs2-cp:

---
#0  0x08065149 in small_io_completion_fn (user_args=0x80e9940, 
resp_p=0xbfffb86c, index=0) at sys-small-io.sm:317

317 sm_p->u.io.dfile_size_array[index] = 
resp_p->u.small_io.bstream_size;
---

- pvfs2-mkdir (a test contributed by acxiom) fails with a seg fault:

---
#0  0x080b134e in PINT_smcb_op (smcb=0x0)
at 
/sandbox/robl/pvfs2-nightly/pvfs2-WALT3/src/common/misc/state-machine-fns.c:348
348 return smcb->op;
---


So I think if you can take care of the small-io cases, that would be a
good start, as it would knock out 3 of the 5 failures.  Once WALT3
passes our nightlies, we can think about merging into HEAD.

==rob



--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


[Pvfs2-developers] MPI-io tests

2006-09-05 Thread Murali Vilayannur
Hey guys,
It turns out that the mpi-io programs that were stalling on Chiba was not
because of any I/O related issues, but due to a simultaneous file create
issue that was causing all sorts of weirdness. So, the workaround was to
create the file in rank 0 and have all other ranks open the file after a
barrier. So please ignore my previous message..

Sam and I will get to the bottom of the simultaneous create/delete bug
after our paper deadline later this week,
thanks,
Murali
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] MPI-io tests

2006-09-05 Thread Robert Latham
On Tue, Sep 05, 2006 at 11:17:31AM -0500, Murali Vilayannur wrote:
> Hey guys,
> It turns out that the mpi-io programs that were stalling on Chiba was not
> because of any I/O related issues, but due to a simultaneous file create
> issue that was causing all sorts of weirdness. So, the workaround was to
> create the file in rank 0 and have all other ranks open the file after a
> barrier. So please ignore my previous message..

If you opened the file with MPI_COMM_WORLD, that should have already
happened in ROMIO.  If you need to do independent I/O for some reason,
then that wouldn't have helped you.I haven't seen simultaneous
indpendent file opens cause problems on jazz, but in those cases I'm
not using the VFS either

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] MPI-io tests

2006-09-05 Thread Murali Vilayannur
RobL,

> If you opened the file with MPI_COMM_WORLD, that should have already
> happened in ROMIO.  If you need to do independent I/O for some reason,
> then that wouldn't have helped you.I haven't seen simultaneous
> indpendent file opens cause problems on jazz, but in those cases I'm
> not using the VFS either

We just the MPI part for starting up a lot of processes.. Sorry for the
incorrect phrasing in my emails. We dont use
MPI I/O. Used the posix interface directly.

Simultaneous create problems with vfs is possibly due to the request
scheduler on server not serializing crdirents of the same component name.
I doubt you will even see that kind of pattern at the server with the mpi
i/o interface.
Thanks,
Murali
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


[Pvfs2-developers] patches: acl test cleanups

2006-09-05 Thread Phil Carns
These patches clean up the remaining issues from the tacl_xattr.sh test 
script (this is easy stuff- Murali did all of the hard work).


tacl-xattr-homedir.patch:
-
This makes tacl-xattr.sh slightly more portable.  Some Linux 
distributions have adduser utilities that do not create the home 
directory for you.  This patch explicitly does a mkdir -p and chmod 
after adduser to make sure that required home directories exist.


tacl-xattr-symlink.patch:
-
This is an important fix to the test script.  It was using the -L 
argument to getfattr to traverse symbolic links when dumping the ACLs 
from the test directory.  This leads to unpredictable results because 
there was no way to tell whether getfattr would traverse a real 
directory or its symbolic link first (it depends on the dirent order) 
and the other would always be left out.  ACLs are not supported on 
symbolic links anyway, so it was just adding noise to the test script.


xattr-symlink.patch:
-
This is the only change to PVFS2 itself.  Like most Linux file systems, 
PVFS2 does not support xattrs on symbolic links (despite what is implied 
by man pages).  This is due to the fact that symbolic links have 777 
permissions by default that would allow anyone on the system to store 
xattrs in any symbolic link.  This patch updates PVFS2 semantics 
slightly, however, to be more in line with how other file systems 
implement this.  In particular listxattr() is now allowed (it just 
returns that there are zero entries), and setxattr() is implemented to 
return EPERM rather than EOPNOTSUPP.


-Phil
Index: pvfs2_src/test/automated/tacl_xattr.sh
===
--- pvfs2_src/test/automated/tacl_xattr.sh	(revision 2425)
+++ pvfs2_src/test/automated/tacl_xattr.sh	(revision 2426)
@@ -91,9 +91,17 @@
 CUR_PATH=`pwd`
 
 /usr/sbin/adduser -d $CUR_PATH/tacluser1 tacluser1
+mkdir -p $CUR_PATH/tacluser1
+chown tacluser1 $CUR_PATH/tacluser1
 /usr/sbin/adduser -d $CUR_PATH/tacluser2 tacluser2
+mkdir -p $CUR_PATH/tacluser2
+chown tacluser2 $CUR_PATH/tacluser2
 /usr/sbin/adduser -d $CUR_PATH/tacluser3 tacluser3
+mkdir -p $CUR_PATH/tacluser3
+chown tacluser3 $CUR_PATH/tacluser3
 /usr/sbin/adduser -d $CUR_PATH/tacluser4 tacluser4
+mkdir -p $CUR_PATH/tacluser4
+chown tacluser4 $CUR_PATH/tacluser4
 
 if [ ! -e shared ]
 then
Index: pvfs2_src/test/automated/tacl_xattr.sh
===
--- pvfs2_src/test/automated/tacl_xattr.sh	(revision 2426)
+++ pvfs2_src/test/automated/tacl_xattr.sh	(revision 2427)
@@ -702,10 +702,10 @@
 #
 #
 
-getfacl -RL shared > tmp1
+getfacl -RP shared > tmp1
 setfacl -m u::--- -m g::--- -m o::--- shared/team1
 setfacl --restore tmp1
-getfacl -RL shared > tmp2
+getfacl -RP shared > tmp2
 
 if [ `diff tmp1 tmp2` ]
 then 
Index: pvfs2_src/src/kernel/linux-2.6/symlink.c
===
--- pvfs2_src/src/kernel/linux-2.6/symlink.c	(revision 2441)
+++ pvfs2_src/src/kernel/linux-2.6/symlink.c	(revision 2442)
@@ -52,11 +52,15 @@
 follow_link : pvfs2_follow_link,
 setattr : pvfs2_setattr,
 revalidate : pvfs2_revalidate,
+#ifdef HAVE_XATTR
+listxattr: pvfs2_listxattr,
+#endif
 #else
 .readlink = pvfs2_readlink,
 .follow_link = pvfs2_follow_link,
 .setattr = pvfs2_setattr,
 .getattr = pvfs2_getattr,
+.listxattr = pvfs2_listxattr,
 #if defined(HAVE_GENERIC_GETXATTR) && defined(CONFIG_FS_POSIX_ACL)
 .permission = pvfs2_permission,
 #endif
Index: pvfs2_src/src/kernel/linux-2.6/xattr-default.c
===
--- pvfs2_src/src/kernel/linux-2.6/xattr-default.c	(revision 2442)
+++ pvfs2_src/src/kernel/linux-2.6/xattr-default.c	(revision 2443)
@@ -26,6 +26,11 @@
 
 if (strcmp(name, "") == 0)
 return -EINVAL;
+if ( !S_ISREG(inode->i_mode) &&
+   (!S_ISDIR(inode->i_mode) || inode->i_mode & S_ISVTX))
+{
+   return -EPERM;
+}
 gossip_debug(GOSSIP_XATTR_DEBUG, "pvfs2_setxattr_default %s\n", name);
 internal_flag = convert_to_internal_xattr_flags(flags);
 return pvfs2_inode_setxattr(inode, PVFS2_XATTR_NAME_DEFAULT_PREFIX,
Index: pvfs2_src/src/kernel/linux-2.6/symlink.c
===
--- pvfs2_src/src/kernel/linux-2.6/symlink.c	(revision 2442)
+++ pvfs2_src/src/kernel/linux-2.6/symlink.c	(revision 2443)
@@ -53,6 +53,7 @@
 setattr : pvfs2_setattr,
 revalidate : pvfs2_revalidate,
 #ifdef HAVE_XATTR
+setxattr: pvfs2_setxattr,
 listxattr: pvfs2_listxattr,
 #endif
 #else
@@ -62,6 +63,11 @@
 .getattr = pvfs2_getattr,
 .listxattr = pvfs2_listxattr,
 #if defined(HAVE_GENERIC_GETXATTR) && defined(CONFIG_FS_POSIX_ACL)
+.setxattr = generic_setxattr,
+#else
+.setxattr = pvfs2_setxattr,
+#endif
+#i

[Pvfs2-developers] patches: bug fixes

2006-09-05 Thread Phil Carns

pread-pwrite.patch:
---
This fixes a bug in a patch that I submitted earlier to provide a simple 
alternate AIO implementation.  It defines _GNU_SOURCE in a limited area 
for dbpf so that we can get proper definitions of pread() and pwrite() 
on Linux.  I tried using _XOPEN_SOURCE=500, but it will break any .c 
file that includes dbpf.h due to incompatibilities with Berkeley DB.


zero-dfile.patch:
---
This fixes a bug in the getattr handling on pvfs2-server if it happens 
to find an attribute structure with the dfile array zeroed out.  In this 
case, it needs to set the attr flag appropriately to prevent the 
response encoder from segfaulting while processing the array in the 
response structure.  This condition is very hard to trigger, but the 
server should be able to gracefully report the error rather than crashing.


-Phil
Index: pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c
===
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c	(revision 2444)
+++ pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c	(revision 2445)
@@ -1424,8 +1424,8 @@
 }
 
 /* prototypes for pread and pwrite; _XOPEN_SOURCE causes db.h problems */
-ssize_t pread(int fd, void *buf, size_t count, off_t offset);
-ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);
+ssize_t pread64(int fd, void *buf, size_t count, off64_t offset);
+ssize_t pwrite64(int fd, const void *buf, size_t count, off64_t offset);
 static void* alt_lio_thread(void* foo)
 {
 struct alt_aio_item* tmp_item = (struct alt_aio_item*)foo;
@@ -1433,14 +1433,14 @@
 
 if(tmp_item->cb_p->aio_lio_opcode == LIO_READ)
 {
-ret = pread(tmp_item->cb_p->aio_fildes,
+ret = pread64(tmp_item->cb_p->aio_fildes,
 (void*)tmp_item->cb_p->aio_buf,
 tmp_item->cb_p->aio_nbytes,
 tmp_item->cb_p->aio_offset);
 }
 else if(tmp_item->cb_p->aio_lio_opcode == LIO_WRITE)
 {
-ret = pwrite(tmp_item->cb_p->aio_fildes,
+ret = pwrite64(tmp_item->cb_p->aio_fildes,
 (const void*)tmp_item->cb_p->aio_buf,
 tmp_item->cb_p->aio_nbytes,
 tmp_item->cb_p->aio_offset);
Index: pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c
===
--- pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c	(revision 2460)
+++ pvfs2_src/src/io/trove/trove-dbpf/dbpf-bstream.c	(revision 2461)
@@ -1423,9 +1423,6 @@
 return(0);
 }
 
-/* prototypes for pread and pwrite; _XOPEN_SOURCE causes db.h problems */
-ssize_t pread64(int fd, void *buf, size_t count, off64_t offset);
-ssize_t pwrite64(int fd, const void *buf, size_t count, off64_t offset);
 static void* alt_lio_thread(void* foo)
 {
 struct alt_aio_item* tmp_item = (struct alt_aio_item*)foo;
@@ -1433,14 +1430,14 @@
 
 if(tmp_item->cb_p->aio_lio_opcode == LIO_READ)
 {
-ret = pread64(tmp_item->cb_p->aio_fildes,
+ret = pread(tmp_item->cb_p->aio_fildes,
 (void*)tmp_item->cb_p->aio_buf,
 tmp_item->cb_p->aio_nbytes,
 tmp_item->cb_p->aio_offset);
 }
 else if(tmp_item->cb_p->aio_lio_opcode == LIO_WRITE)
 {
-ret = pwrite64(tmp_item->cb_p->aio_fildes,
+ret = pwrite(tmp_item->cb_p->aio_fildes,
 (const void*)tmp_item->cb_p->aio_buf,
 tmp_item->cb_p->aio_nbytes,
 tmp_item->cb_p->aio_offset);
Index: pvfs2_src/src/io/trove/trove-dbpf/module.mk.in
===
--- pvfs2_src/src/io/trove/trove-dbpf/module.mk.in	(revision 2460)
+++ pvfs2_src/src/io/trove/trove-dbpf/module.mk.in	(revision 2461)
@@ -16,5 +16,7 @@
 	$(DIR)/dbpf-keyval-pcache.c \
 	$(DIR)/dbpf-sync.c
 
-# grab trove-ledger.h from handle-mgmt.
-MODCFLAGS_$(DIR) = -I$(srcdir)/src/io/trove/trove-handle-mgmt
+# Grab trove-ledger.h from handle-mgmt.  Also make _GNU_SOURCE definition 
+# required for access to pread/pwrite on Linux.  _XOPEN_SOURCE seems to be
+# incompatible with Berkeley DB.
+MODCFLAGS_$(DIR) = -I$(srcdir)/src/io/trove/trove-handle-mgmt -D_GNU_SOURCE
Index: pvfs2_src/src/server/get-attr.sm
===
--- pvfs2_src/src/server/get-attr.sm	(revision 2366)
+++ pvfs2_src/src/server/get-attr.sm	(revision 2367)
@@ -512,6 +512,10 @@
 	llu(s_op->u.getattr.handle), llu(s_op->u.getattr.handle),
 	(int)s_op->u.getattr.fs_id);
 
+/*If we hit an error the DIST & DFILES are no longer valid*/
+s_op->resp.u.getattr.attr.mask &= ~PVFS_ATTR_META_DIST;
+s_op->resp.u.getattr.attr.mask &= ~PVFS_ATTR_META_DFILES;
+
 	js_p->error_code = -PVFS_EOVERFLOW;
 	return 1;
 }
@@ -624,6 +628,14 @@
  */
 js_p->error_code = 0;
 }
+if(js_p->error_code < 0)
+{
+if(s_op->val.buffer)
+{
+free(s_op->val.buffer);
+}
+ret

[Pvfs2-developers] patch: binding to specific addresses

2006-09-05 Thread Phil Carns

bind-specific.patch:

This patch adds a new config file option (TCPBindSpecific) that if 
enabled tells the server to bind only to its specific IP address rather 
than using INADDR_ANY.  This is particularly helpful in failover 
scenarios where you would like one physical machine to assume two ip 
addresses (and two servers) to run simultaneously when another server 
crashes.  Without this patch you would need to select a different port 
on each server to prevent collisions.


-Phil
Index: pvfs2_src/src/server/pvfs2-server.c
===
--- pvfs2_src/src/server/pvfs2-server.c	(revision 2419)
+++ pvfs2_src/src/server/pvfs2-server.c	(revision 2420)
@@ -976,6 +976,7 @@
 PVFS_fs_id orig_fsid;
 PVFS_ds_flags init_flags = 0;
 int port_num = 0;
+int bmi_flags = BMI_INIT_SERVER;
 
 /* Initialize distributions */
 ret = PINT_dist_initialize(0);
@@ -999,8 +1000,15 @@
  "Passing %s as BMI listen address.\n",
  server_config.host_id);
 
+/* does the configuration dictate that we bind to a specific address? */
+if(server_config.tcp_bind_specific)
+{
+bmi_flags |= BMI_TCP_BIND_SPECIFIC;
+}
+
 ret = BMI_initialize(server_config.bmi_modules, 
- server_config.host_id, BMI_INIT_SERVER);
+ server_config.host_id,
+ bmi_flags);
 if (ret < 0)
 {
 PVFS_perror_gossip("Error: BMI_initialize", ret);
Index: pvfs2_src/src/io/bmi/bmi-types.h
===
--- pvfs2_src/src/io/bmi/bmi-types.h	(revision 2419)
+++ pvfs2_src/src/io/bmi/bmi-types.h	(revision 2420)
@@ -39,7 +39,8 @@
 /** BMI method initialization flags */
 enum
 {
-BMI_INIT_SERVER = 1 /**< set up to listen for unexpected messages */
+BMI_INIT_SERVER = 1, /**< set up to listen for unexpected messages */
+BMI_TCP_BIND_SPECIFIC = 2 /**< bind to a specific IP address if INIT_SERVER */
 };
 
 enum bmi_op_type
Index: pvfs2_src/src/io/bmi/bmi_tcp/sockio.c
===
--- pvfs2_src/src/io/bmi/bmi_tcp/sockio.c	(revision 2419)
+++ pvfs2_src/src/io/bmi/bmi_tcp/sockio.c	(revision 2420)
@@ -60,6 +60,27 @@
 return (sockd);
 }
 
+int BMI_sockio_bind_sock_specific(int sockd,
+  const char *name,
+	  int service)
+{
+struct sockaddr saddr;
+int ret;
+
+if ((ret = BMI_sockio_init_sock(&saddr, name, service)) != 0)
+	return (ret); /* converted to PVFS error code below */
+
+  bind_sock_restart:
+if (bind(sockd, &saddr, sizeof(saddr)) < 0)
+{
+	if (errno == EINTR)
+	goto bind_sock_restart;
+	return (-1);
+}
+return (sockd);
+}
+
+
 int BMI_sockio_connect_sock(int sockd,
 		 const char *name,
 		 int service)
Index: pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c
===
--- pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2419)
+++ pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 2420)
@@ -1716,6 +1716,7 @@
 int oldfl = 0;		/* old socket flags */
 struct tcp_addr *tcp_addr_data = NULL;
 int tmp_errno = bmi_tcp_errno_to_pvfs(-EINVAL);
+int ret = 0;
 
 /* create a socket */
 tcp_addr_data = tcp_method_params.listen_addr->method_data;
@@ -1737,8 +1738,20 @@
 BMI_sockio_set_sockopt(tcp_addr_data->socket, SO_REUSEADDR, 1);
 
 /* bind it to the appropriate port */
-if (BMI_sockio_bind_sock(tcp_addr_data->socket, tcp_addr_data->port) < 0)
+if(tcp_method_params.method_flags & BMI_TCP_BIND_SPECIFIC)
 {
+ret = BMI_sockio_bind_sock_specific(tcp_addr_data->socket,
+tcp_addr_data->hostname,
+tcp_addr_data->port);
+}
+else
+{
+ret = BMI_sockio_bind_sock(tcp_addr_data->socket,
+tcp_addr_data->port);
+}
+
+if (ret < 0)
+{
 	tmp_errno = errno;
 	gossip_err("Error: BMI_sockio_bind_sock: %s\n", strerror(tmp_errno));
 	return (bmi_tcp_errno_to_pvfs(-tmp_errno));
Index: pvfs2_src/src/io/bmi/bmi_tcp/sockio.h
===
--- pvfs2_src/src/io/bmi/bmi_tcp/sockio.h	(revision 2419)
+++ pvfs2_src/src/io/bmi/bmi_tcp/sockio.h	(revision 2420)
@@ -32,6 +32,9 @@
 int BMI_sockio_new_sock(void);
 int BMI_sockio_bind_sock(int,
 			 int);
+int BMI_sockio_bind_sock_specific(int sockd,
+  const char *name,
+	  int service);
 int BMI_sockio_connect_sock(int,
 			const char *,
 			int);
Index: pvfs2_src/src/common/misc/server-config.c
===
--- pvfs2_src/src/common/misc/server-config.c	(revision 2419)
+++ pvfs2_src/src/common/misc/server-config.c	(revision 2420)
@@ -52,6 +52,7 @@
 static DOTCONF_CB(get_unexp_req);
 static DOTCONF_CB(get_tcp_buffer_send);
 static DOTCONF_CB(get_tcp_buffer_receive);
+static DO