I have been working for some time now with improving performance for loading of 
summary data for an ensemble of models. I have been testing with some real full 
field models and this is typical dimensions of the problems I have looked at

•Number of summary vector: 30000
•Number of timesteps: 3000
•Size of ensemble: 100

On a typical maching avaliable for engineers in Equinor maximum number of cpu's 
available is 12.

My conclusions is that the UNSMRY format is not suitable for load on demand 
with respect to performance, not even with utilizing multithreading.

The LODSMRY file format currently implemented in opm-common if a first approach 
to solve this. This is using existing utilities in EclUtil and the only 
difference is that the data is stored pr vector instead of per time steps. This 
helped a lot in terms of performance.

This implementation have limitations as mentioned by Alf. The most severe as I 
see it, is that we have to wait until all simulations are finished and then 
convert. Monitoring simulation results while the runs are in progress is 
important.

The vendor of Eclipse has done a similar approach as done with LODSMRY in OPM. 
When running Eclipse with the eclrun wrapper script from Schlumberer a h5 file 
(HDF5 format) is created from SMSPEC + UNSMRY at the end of the simulation. 
Notice that h5 file will not be created if the run is stopped before end of 
simulation. Hence, same limitations as our current implementation. The H5 file 
from eclipse is not supported by

I’ve had a look at the h5 file from eclrun (with HDcompass) and I see that this 
files have one dataset pr vector, which means that there will be a large number 
of datasets (# vectors) which needs to be updated for each timestep, if we 
should use this exact same format for writing a h5 file being continuously 
updated as the simulation progresses.

I have been testing HDF5 on my own and what I have looked in to saving all of 
the summary vectors in one 2 dimensional dataset (number timesteps x number of 
vector). The dataspace is define as unlimited in one of the dimensions (time 
steps). It is then easy to add summary data in chunks in one dimension 
(H5Select_hyperslab) during the simulation. It is also easy to extract data 
from “the opposite” dimension. So far I have only tested this on a small test 
case (5 x 4) so I don’t know how this will scale and if the performance on real 
models is good. I’m planning to do so now.

I also believe that the "new" summary file should be self-contained, that is 
not dependent on the SMSPEC file (as is the case with the current LODSMRY file 
in OPM). From inspection of the h5 from eclrun with HDCompass, I believe this 
file is.  The h5 file from eclrun is supported in S3graf when used together 
with a SMSPEC file, but it is not supported alone.


Regards

Torbjørn Skille

-----Original Message-----
From: Opm <opm-boun...@opm-project.org> On Behalf Of Alf Birger Rustad
Sent: tirsdag 19. mai 2020 09:50
To: Joakim Hove <joakim.h...@opm-op.com>; opm@opm-project.org
Subject: Re: [Opm] File formats

> The feasability of implementing/using said format in post processing
    tools should therefore be an important criteria.

I would even say a prerequisite. We already have it in opm-common in a shape 
that can be used without post processing tools, but if we are to support it 
within Flow, I believe we must have support in at least Resinsight.

> . I *think* Petrel / eclrun / eclipse has some functionality in this
    regard - if this is a file we can be compatible with that would make
    very much sense.

Thanks for pointing it out. Yes, there is such a format. There are a number of 
unknowns related to that format yet. What I believe already is clear is that it 
is not supported by Eclipse directly, so it is also of the type the is created 
after simulation is done. If anybody knows more about this format, please share.

> In addition to HDF5 I would consider looking into Parquet which at
    least is a much newer format than HDF5

Thanks for the suggestion! Yes, we should read up on alternatives before 
deciding. If anybody has any experience or knowledge on any of the containers, 
please share. I am in deep water here 😊

-----Original Message-----
From: Opm <opm-boun...@opm-project.org> On Behalf Of Joakim Hove
Sent: tirsdag 19. mai 2020 07:23
To: opm@opm-project.org
Subject: Re: [Opm] File formats

My take on this is:

 1. Yes I see the value of a transposed file format - however the value
    is quite limited before it is implemented in post processing tools.
    The feasability of implementing/using said format in post processing
    tools should therefor be an important criteria.
 2. I *think* Petrel / eclrun / eclipse has some functionality in this
    regard - if this is a file we can be compatible with that would make
    very much sense.
 3. In addition to HDF5 I would consider looking into Parquet which at
    least is a much newer format than HDF5

Here is an extensive file-format comparison:
https://indico.cern.ch/event/613842/contributions/2585787/attachments/1463230/2260889/pivarski-data-formats.pdf



On 5/18/20 5:51 PM, Alf Birger Rustad wrote:
> Dear community,
>
> We are at a cross roads with respect to file formats, and I hope you are 
> motivated to help us arrive at the best solution. We need better 
> load-on-demand performance for summary files than what is currently possible 
> with the default Eclipse format for summary files. Currently you will find an 
> implementation in opm-common that simply transposes the summary vectors, 
> while still using the same Fortran77 binary format. That approach has mainly 
> three drawbacks. One is that it is not supported by any post-processing 
> application (yet).
> The second is that it can only be created from a finished simulation, so you 
> need to wait for simulations to finish before you get the performant result 
> file.

For a traditional column oriented file format in any sense I think you will 
need to write out the file in full, i.e. I think this will apply anyways. Use 
of a database format might resolve this, or at least handle the appending 
transparently, but that is maybe a bit overkill?


> The third being that it is not suited for parallel processing, so forget 
> about each process writing out it's part.

For the summary files that is not so relevant, because the final calculation of 
summary properties like WWCT = WWPR / (WWPR + WOPR) is only done on the IO rank 
anyway.


Joakim


_______________________________________________
Opm mailing list
Opm@opm-project.org
https://opm-project.org/cgi-bin/mailman/listinfo/opm


-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is intended 
for the addressee only. Any unauthorized use, dissemination of the information 
or copying of this message is prohibited. If you are not the addressee, please 
notify the sender immediately by return e-mail and delete this message.
Thank you
_______________________________________________
Opm mailing list
Opm@opm-project.org
https://opm-project.org/cgi-bin/mailman/listinfo/opm


-------------------------------------------------------------------
The information contained in this message may be CONFIDENTIAL and is
intended for the addressee only. Any unauthorized use, dissemination of the
information or copying of this message is prohibited. If you are not the
addressee, please notify the sender immediately by return e-mail and delete
this message.
Thank you
_______________________________________________
Opm mailing list
Opm@opm-project.org
https://opm-project.org/cgi-bin/mailman/listinfo/opm

Reply via email to