Re: [Opm] ParMETIS error on HPC

2020-10-21 Thread Markus Blatt
Hi,

On Mon, Oct 19, 2020 at 12:22:13PM +, Antoine B Jacquey wrote:
> 
> But during the first time step calculation, I get the following errors:
> 
> Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
> Switching control mode for well INJ from RATE to BHP on rank 20
> Switching control mode for well INJ from BHP to RATE on rank 20
> PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 6 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 8 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 12 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 14 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 16 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 18 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 20 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 0 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 10 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 22 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 26 has no 
> vertices assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 24 has no 
> vertices assigned to it!
> 
> Does anyone know what this error means? Is it coming because of a bad mesh 
> partitioning or is it due to something else?
>

in the AMG of dune-istl we try to agglomerate the linear system to successuvely 
fewer processors ( N -> n, with nhttps://opm-op.com | +4916097590858
___
Opm mailing list
Opm@opm-project.org
https://opm-project.org/cgi-bin/mailman/listinfo/opm


Re: [Opm] ParMETIS error on HPC

2020-10-20 Thread Antoine B Jacquey
Hi Atgeirr,

I use AMG as preconditioner  (—use-amg=true). When the ParMETIS errors occur, 
the simulation crashes.
I indeed linked to the ParMETIS library when configuring DUNE.

Do you actually advise to use PTScotch instead of ParMETIS? I could try to 
recompile Dune + OPM with PTScotch to see if the simulation runs with this 
configuration.

Thanks for your answer.

Antoine

> On Oct 20, 2020, at 05:43, Atgeirr Rasmussen  
> wrote:
> 
> Hi Antoine!
> 
> Our partitioning scheme starts with the whole graph on a single process, so 
> indeed this would be a "bad" starting partition. The partitioning we end up 
> with does not seem any worse though, although for very large process counts, 
> this could become a bottleneck.
> 
> I am a little confused though, as OPM Flow uses Zoltan for partitioning, not 
> ParMETIS. This is because ParMETIS is not open source. However, if you do 
> have access to ParMETIS I believe you can configure the dune-istl parallel 
> linear solvers (that in turn are used by OPM Flow) to use ParMETIS (or the 
> PTScotch workalike library) for redistribution of coarse systems within the 
> algebraic multigrid (AMG) solver. However, that is not the default linear 
> solver for OPM Flow. So I am a bit at a loss, as to where those ParMETIS 
> messages come from! Did you run with the default linear solver or not? I 
> assume that the simulation actually runs?
> 
> Atgeirr
> 
> From: Opm  on behalf of Antoine B Jacquey 
> 
> Sent: 19 October 2020 14:22
> To: opm@opm-project.org 
> Subject: [Opm] ParMETIS error on HPC
> 
> Hi OPM community,
> 
> I recently compiled OPM Flow on a local cluster in my institute. I linked to 
> the PartMETIS library during configuration to make use of mesh partitioning 
> when using large number of MPI processes.
> When I run a flow simulation, it seems that the mesh is partitioned 
> automatically. Here is part of the output I get for a simulation with 8 MPI 
> processes:
> 
> Load balancing distributes 30 active cells on 8 processes as follows:
>  rank   owned cells   overlap cells   total cells
> --
> 0 369602760 39720
> 1 401103720 43830
> 2 381004110 42210
> 3 389402250 41190
> 4 366002280 38880
> 5 336603690 37350
> 6 378003690 41490
> 7 378302730 40560
> --
>   sum30   25230325230
> 
> The problem occurs when I use a larger number of MPI processes (here for 27 
> MPI processes). The mesh is also partitioned:
> 
> Load balancing distributes 1012500 active cells on 27 processes as follows:
>  rank   owned cells   overlap cells   total cells
> --
> 0 402306390 46620
> 1 401855175 45360
> 2 406354050 44685
> 3 402306255 46485
> 4 409055850 46755
> 5 398256030 45855
> 6 370352610 39645
> 7 369455625 42570
> 8 406804185 44865
> 9 358355460 41295
>10 412506765 48015
>11 398255310 45135
>12 368552655 39510
>13 328503690 36540
>14 387905400 44190
>15 365405625 42165
>16 301053105 33210
>17 403205400 45720
>18 356854185 39870
>19 394655490 44955
>20 201601800 21960
>21 399154860 44775
>22 400506165 46215
>23 340202475 36495
>24 396456345 45990
>25 369906570 43560
>26 375304005 41535
> --
>   sum   1012500  131475   1143975
> 
> But during the first time step calculation, I get the following errors:
> 
> Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
>Switching control mode for well INJ from RATE to BHP on rank 20
>Switching control mode for well INJ from BHP to RATE on rank 20
> PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices 
> assigned to it!
> PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices 
> assigned to it!
> 

Re: [Opm] ParMETIS error on HPC

2020-10-20 Thread Atgeirr Rasmussen
Hi Antoine!

Our partitioning scheme starts with the whole graph on a single process, so 
indeed this would be a "bad" starting partition. The partitioning we end up 
with does not seem any worse though, although for very large process counts, 
this could become a bottleneck.

I am a little confused though, as OPM Flow uses Zoltan for partitioning, not 
ParMETIS. This is because ParMETIS is not open source. However, if you do have 
access to ParMETIS I believe you can configure the dune-istl parallel linear 
solvers (that in turn are used by OPM Flow) to use ParMETIS (or the PTScotch 
workalike library) for redistribution of coarse systems within the algebraic 
multigrid (AMG) solver. However, that is not the default linear solver for OPM 
Flow. So I am a bit at a loss, as to where those ParMETIS messages come from! 
Did you run with the default linear solver or not? I assume that the simulation 
actually runs?

Atgeirr

From: Opm  on behalf of Antoine B Jacquey 

Sent: 19 October 2020 14:22
To: opm@opm-project.org 
Subject: [Opm] ParMETIS error on HPC

Hi OPM community,

I recently compiled OPM Flow on a local cluster in my institute. I linked to 
the PartMETIS library during configuration to make use of mesh partitioning 
when using large number of MPI processes.
When I run a flow simulation, it seems that the mesh is partitioned 
automatically. Here is part of the output I get for a simulation with 8 MPI 
processes:

Load balancing distributes 30 active cells on 8 processes as follows:
  rank   owned cells   overlap cells   total cells
--
 0 369602760 39720
 1 401103720 43830
 2 381004110 42210
 3 389402250 41190
 4 366002280 38880
 5 336603690 37350
 6 378003690 41490
 7 378302730 40560
--
   sum30   25230325230

The problem occurs when I use a larger number of MPI processes (here for 27 MPI 
processes). The mesh is also partitioned:

Load balancing distributes 1012500 active cells on 27 processes as follows:
  rank   owned cells   overlap cells   total cells
--
 0 402306390 46620
 1 401855175 45360
 2 406354050 44685
 3 402306255 46485
 4 409055850 46755
 5 398256030 45855
 6 370352610 39645
 7 369455625 42570
 8 406804185 44865
 9 358355460 41295
10 412506765 48015
11 398255310 45135
12 368552655 39510
13 328503690 36540
14 387905400 44190
15 365405625 42165
16 301053105 33210
17 403205400 45720
18 356854185 39870
19 394655490 44955
20 201601800 21960
21 399154860 44775
22 400506165 46215
23 340202475 36495
24 396456345 45990
25 369906570 43560
26 375304005 41535
--
   sum   1012500  131475   1143975

But during the first time step calculation, I get the following errors:

Time step 0, stepsize 1 days, at day 0/7, date = 01-Jan-2015
Switching control mode for well INJ from RATE to BHP on rank 20
Switching control mode for well INJ from BHP to RATE on rank 20
PARMETIS ERROR: Poor initial vertex distribution. Processor 2 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 4 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 6 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 8 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 12 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 14 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 16 has no vertices 
assigned to it!
PARMETIS ERROR: Poor initial vertex distribution. Processor 18 has no vertices 
assigned to it!
PARMETIS ERROR: