Hi Kai,

 

Thank you for your reply. I have run the Norne model with 1, 2, 4, 8, 16, 36, 
72, and 108 processes successfully. The run gets stuck at 144 processes. As I 
mentioned, I did try a larger model, but even there I was not able to scale on 
more than 1 node for a certain configuration. Finally, I have an 11 million 
cells model with just 1 producer and 1 injector, which runs on 144 processes.

 

I would still like to figure out why domain decomposition fails for other 
models? I understand that the performance benefit may be minimal if there are 
fewer cells per process, still it should run.

 

Thank you,

Yogi

 

From: Kai Bao [mailto:kai....@sintef.no] 
Sent: Wednesday, March 11, 2020 2:54 AM
To: Yogi Pandey <yogi.pan...@oracle.com>; opm@opm-project.org
Subject: Re: [Opm] OPM Flow multi-node simulations stuck at domain 
decomposition step

 

Hi, Yogi,

 

I see you are trying to run Norne with 144 processes.  Did you see the problem 
with much less processes, for example 4 or 8 processes?

 

In my opinion, with the current approach for domain decomposition, it can be 
challenging to run Norne with so many processes, considering the relatively 
small size of the Norne model and the long wells existing in this model.  I am 
not totally sure though. 

 

Best Regards, 

Kai Bao

  _____  

From: Opm <HYPERLINK 
"mailto:opm-boun...@opm-project.org"opm-boun...@opm-project.org> on behalf of 
Yogi Pandey <HYPERLINK "mailto:yogi.pan...@oracle.com"yogi.pan...@oracle.com>
Sent: Tuesday, March 10, 2020 10:15 PM
To: HYPERLINK "mailto:opm@opm-project.org"opm@opm-project.org <HYPERLINK 
"mailto:opm@opm-project.org"opm@opm-project.org>
Subject: [Opm] OPM Flow multi-node simulations stuck at domain decomposition 
step 

 

All,

 

I am trying to run OPM Flow simulations on multiple nodes. I have built OPM 
Flow from source on Oracle Linux 7 OS (binary compatible with RHEL) with:

.        GCC-8.3.1

.        openmpi-4.0.2 (built from source)

.        boost-1.72.0 (built from source)

.        cmake-3.16.4 (built from source)

.        parmetis-4.0.3 (built from source)

.        dune-2.6.0: dune-common, dune-geometry, dune-grid, dune-istl (built 
from source)

.        Zoltan-3.83 (built from source)

.        OPM Flow modules are built using following commads:

o   cmake -DCMAKE_BUILD_TYPE=Release -DUSE_MPI=ON -DUSE_OPENMP=ON 
-DBLAS_LIBRARIES=/usr/lib64 -DCMAKE_INSTALL_PREFIX=/usr/local ..

o   sudo make

 

For Norne data set, following is the input file (params) content:

ecl-deck-file-name=NORNE_ATW2013.DATA

output-dir=out_parallel

output-mode=none

output-interval=1000000

enable-opm-rst-file=false

threads-per-process=1

 

Simulation is being run on 4 nodes with 32 processors each using following 
command:

mpirun --display-map -mca btl self -x UCX_TLS=rc,self,sm -x 
HCOLL_ENABLE_MCAST_ALL=0 -mca coll_hcoll_enable 0 -x UCX_IB_TRAFFIC_CLASS=105 
-x UCX_IB_GID_INDEX=3 --cpu-set 
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35
 -np 144 --hostfile /etc/opt/rdma/hostfile 
/mnt/nfs-share/etc/opm-flow/opm-simulators/build/bin/flow 
--parameter-file=/mnt/nfs-share/data/norne/params

 

The simulation get stuck indefinitely at the domain decomposition step. I am 
able to finish a parallel run up to 3 nodes, but always getting stuck at 4 
nodes.

 

I have also created some customized simulation decks with about 11 million 
cells to rule-out that fewer number of cells in the Norne model may be a 
reason, but the simulation gets stuck as soon as I scale from 1 node to 2 
nodes. Can someone please help me understand, what might be causing it?

 

Thank you,

Yogi

 
_______________________________________________
Opm mailing list
HYPERLINK "mailto:Opm@opm-project.org"Opm@opm-project.org
HYPERLINK 
"https://urldefense.com/v3/__https:/eur03.safelinks.protection.outlook.com/?url=https*3A*2F*2Fopm-project.org*2Fcgi-bin*2Fmailman*2Flistinfo*2Fopm&amp;data=02*7C01*7Ckai.bao*40sintef.no*7C1ac6994dae0842f9a4a008d7c5387102*7Ce1f00f39604145b0b309e0210d8b32af*7C1*7C1*7C637194718508875070&amp;sdata=dk*2BNtwSBcDA9*2F2ubjoNpmQcLmp1tt63ftJkHkBcUsq0*3D&amp;reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUl!!GqivPVa7Brio!NXPeS6Wi75BbMuFJj0iyFFCzX4zA9HcUvdNY4YHmJfQwqEq5Rs07e7ZHq4CWvc8S6g$"https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopm-project.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopm&amp;data=02%7C01%7Ckai.bao%40sintef.no%7C1ac6994dae0842f9a4a008d7c5387102%7Ce1f00f39604145b0b309e0210d8b32af%7C1%7C1%7C637194718508875070&amp;sdata=dk%2BNtwSBcDA9%2F2ubjoNpmQcLmp1tt63ftJkHkBcUsq0%3D&amp;reserved=0
_______________________________________________
Opm mailing list
Opm@opm-project.org
https://opm-project.org/cgi-bin/mailman/listinfo/opm

Reply via email to