Re: [OMPI devel] [OMPI users] [Open MPI] #3351: JAVA scatter error
On Dec 21, 2012, at 2:32 PM, Siegmar Gross wrote: > Today I found something about the memory layout of 2D matrices in > Java (I'm not sure if the information is valid). Java has one 1D array > with pointers to every 1D row. All elements of a row are stored in > contiguous memory. Different rows can be stored in "arbitrary" places > so that a 2D matrix is normally not stored in a contiguous memory area. This makes it sound just like C -- in that if you want a contiguous chunk of memory for an N dimensional array, you need to write a wrapper method that allocates a contiguous chunk of memory and then sets all the pointers properly so that successive rows/columns/etc. point to the Right places in memory. This wrapper will likely need to be written in C. > In fact it would be better in that case if the extent of a new column > type is equal to the extent of the base type of the array. Yes, via a "resized" type. The basic MPI bindings should do pretty much exactly what the MPI C bindings should do. > It would also > be necessary that a new column type is something like an array itself > pointing for example to the first element of each row (perhaps it is > even possible to use the Java pointer array of the 2D matrix). To make > things worse, Java allows non-rectangular matrices (but they could be > prohibited for MPI). Perhaps this is no news to you, but I wanted to > mention it in case you also didn't know (as I said I'm not sure if the > information about 2D Java matrices is true). Nope, I don't know very much about Java at all. :-) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] Fwd: [OMPI users] [Open MPI] #3351: JAVA scatter error
Oops -- really should have sent this to the devel list.Begin forwarded message:From: "Jeff Squyres (jsquyres)"Subject: Re: [OMPI users] [Open MPI] #3351: JAVA scatter errorDate: December 24, 2012 8:41:56 AM ESTTo: Siegmar Gross , Open MPI Users Reply-To: Open MPI Users On Dec 19, 2012, at 9:22 AM, Siegmar Gross wrote: >> I think the real shortcoming is that there is no Datatype.Resized >> function. That can be fixed. > > Are you sure? That would at least solve one problem. Here's a first cut at a patch. I don't know if this is fully correct; I don't quite understand yet how baseSize is used in the .java files, but it seems incorrect to me. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users resized.patch Description: Binary data -- Jeff Squyresjsquy...@cisco.comFor corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] Trunk is broken
Hi folks This is a heads-up to all: It appears a recent commit has broken the trunk - I think it relates to something done to the MCA parameter system. When running across multiple nodes, the daemons segfault on finalize with a stacktrace of: (gdb) where #0 0x003dc4477e92 in _int_free () from /lib64/libc.so.6 #1 0x7f18a163f756 in param_destructor (p=0x118d940) at mca_base_param.c:1982 #2 0x7f18a163ab41 in opal_obj_run_destructors (object=0x118d940) at ../../../opal/class/opal_object.h:448 #3 0x7f18a163cb94 in mca_base_param_finalize () at mca_base_param.c:853 #4 0x7f18a1609c06 in opal_finalize_util () at runtime/opal_finalize.c:69 #5 0x7f18a1609cbc in opal_finalize () at runtime/opal_finalize.c:155 #6 0x7f18a18e366b in orte_finalize () at runtime/orte_finalize.c:107 #7 0x7f18a1911313 in orte_daemon (argc=35, argv=0x7d7ea8b8) at orted/orted_main.c:834 #8 0x0040091a in main (argc=35, argv=0x7d7ea8b8) at orted.c:62 (gdb) up #1 0x7f18a163f756 in param_destructor (p=0x118d940) at mca_base_param.c:1982 1982free(p->mbp_env_var_name); gdb) print array[i] $2 = {mbp_super = {obj_magic_id = 0, obj_class = 0x7f18a18c6460, obj_reference_count = 1, cls_init_file_name = 0x7f18a169d04e "mca_base_param.c", cls_init_lineno = 1154}, mbp_type = MCA_BASE_PARAM_TYPE_STRING, mbp_type_name = 0x1185110 "\300O\030\001", mbp_component_name = 0x0, mbp_param_name = 0x1185130 "", mbp_full_name = 0x1185150 "orte_debugger_test_daemon", mbp_synonyms = 0x0, mbp_internal = false, mbp_read_only = false, mbp_deprecated = false, mbp_deprecated_warning_shown = true, mbp_help_msg = 0x11850a0 "Name of the executable to be used to simulate a debugger colaunch (relative or absolute path)", mbp_env_var_name = 0x1185180 "\020P\030\001", mbp_default_value = {intval = 0, stringval = 0x0}, mbp_file_value_set = false, mbp_file_value = { intval = 0, stringval = 0x0}, mbp_source_file = 0x0, mbp_override_value_set = false, mbp_override_value = {intval = 0, stringval = 0x0}} As you can see, the problem is that the mbp_env_var_name field is trash, so the destructor's attempt to free that field crashes. I believe it was Nathan that last touched this area, so perhaps he could take a gander and see what happened? Meantime, I'm afraid the trunk is down. Thanks Ralph
Re: [OMPI devel] Trunk is broken
FWIW: I have installed a temporary patch that allows the trunk to run by no longer finalizing OPAL. Once the param system has been repaired, this will be removed. Meantime, at least you can run the trunk. On Dec 24, 2012, at 10:39 AM, Ralph Castain wrote: > Hi folks > > This is a heads-up to all: It appears a recent commit has broken the trunk - > I think it relates to something done to the MCA parameter system. When > running across multiple nodes, the daemons segfault on finalize with a > stacktrace of: > > (gdb) where > #0 0x003dc4477e92 in _int_free () from /lib64/libc.so.6 > #1 0x7f18a163f756 in param_destructor (p=0x118d940) at > mca_base_param.c:1982 > #2 0x7f18a163ab41 in opal_obj_run_destructors (object=0x118d940) at > ../../../opal/class/opal_object.h:448 > #3 0x7f18a163cb94 in mca_base_param_finalize () at mca_base_param.c:853 > #4 0x7f18a1609c06 in opal_finalize_util () at runtime/opal_finalize.c:69 > #5 0x7f18a1609cbc in opal_finalize () at runtime/opal_finalize.c:155 > #6 0x7f18a18e366b in orte_finalize () at runtime/orte_finalize.c:107 > #7 0x7f18a1911313 in orte_daemon (argc=35, argv=0x7d7ea8b8) at > orted/orted_main.c:834 > #8 0x0040091a in main (argc=35, argv=0x7d7ea8b8) at orted.c:62 > (gdb) up > #1 0x7f18a163f756 in param_destructor (p=0x118d940) at > mca_base_param.c:1982 > 1982 free(p->mbp_env_var_name); > > gdb) print array[i] > $2 = {mbp_super = {obj_magic_id = 0, obj_class = 0x7f18a18c6460, > obj_reference_count = 1, cls_init_file_name = 0x7f18a169d04e > "mca_base_param.c", >cls_init_lineno = 1154}, mbp_type = MCA_BASE_PARAM_TYPE_STRING, > mbp_type_name = 0x1185110 "\300O\030\001", mbp_component_name = 0x0, > mbp_param_name = 0x1185130 "", mbp_full_name = 0x1185150 > "orte_debugger_test_daemon", mbp_synonyms = 0x0, mbp_internal = false, > mbp_read_only = false, mbp_deprecated = false, mbp_deprecated_warning_shown > = true, > mbp_help_msg = 0x11850a0 "Name of the executable to be used to simulate a > debugger colaunch (relative or absolute path)", > mbp_env_var_name = 0x1185180 "\020P\030\001", mbp_default_value = {intval = > 0, stringval = 0x0}, mbp_file_value_set = false, mbp_file_value = { >intval = 0, stringval = 0x0}, mbp_source_file = 0x0, > mbp_override_value_set = false, mbp_override_value = {intval = 0, stringval = > 0x0}} > > As you can see, the problem is that the mbp_env_var_name field is trash, so > the destructor's attempt to free that field crashes. > > I believe it was Nathan that last touched this area, so perhaps he could take > a gander and see what happened? Meantime, I'm afraid the trunk is down. > > Thanks > Ralph >