Re: [Paraview] Parallel Data Redistribution

burlen Wed, 16 Dec 2009 10:26:36 -0800

Hey John,

Also : for dynamic load balancing, I'd like to instruct several reader to read 
the same piece - since the algorithm controls (for example) the particles the 
algorithm can internally communicate information about what to do amongst its 
processes, but it can't talk upstream to the readers and fudge them.


I am wondering if there is any way of supporting this kind of thing using the 
current information keys and my instinct says no.

I guess you can kind of do this with the current "request update" stuffbut thanks to the flexibility of the pipeline information key,values youcan also roll your own very easy.

I recently implemented dynamic load balancing in a new stream linetracer. To get the work load balanced its crucial that each process haveto have on demand access to the entire data set. I accomplished it withinformation keys and by using a "meta-reader" in place of thetraditional paraview reader. The meta reader does two things, itpopulates the new keys and it gives PV a dummy dataset that is one cellper process such that the bounds, shape, and array names are the same asthe real dataset which is not read during the meta-reader execution.When the stream tracer executes downstream of the meta-reader he picksthe keys out of the pipeline information. The important key,value is anout-of-core (ooc) reader. The ooc reader is a vtkDataObject so that itcan be passed through the information. Once the stream tracer has it hecan make repeated IO requests as particles move through the dataset asneeded. My interface accepts a point and returns a chunk of data. Theooc reader internally handles caching and memory management. In this wayyou can keep all processes busy all the time when tracing stream lines.The approach worked out well and was very simple to implement, with nomodification to the executive. Also the filter has control of caching,and can free all the memory at the end of its execution which reducessignificantly the memory footprint compared to the traditional PVreader. And I need not worry if PV or some upstream filter uses MPIcommunications in between during my IO requests. There is a little moreto our scheduling algorithm which I wont discus now but so far formaking poincare maps we scaled well up to 2E7 stream lines per frame and96 processes and we minimize the memory footprint which is important to us.

Berk and Ken already basically gave you all the options you need but Iadd this because it shows how flexible and powerful the pipelineinformation really is.


Burlen

Biddiscombe, John A. wrote:

Berk,

We had a discussion back in 2008, which resides here 
http://www.cmake.org/pipermail/paraview/2008-May/008170.html

Continuing from this, my question of the other day, touches on the same problem.

I'd like to manipulate the piece number read by each reader. As mentioned 
before, UPDATE_PIECE is not passed into RequestInformation at first (since 
nobody knows how many pieces there are yet!), so I can't (directly) generate 
information in the reader which is 'piece dependent'. And I can't be sure that 
someone doing streaming won't interfere with piece numbers when using the code 
differently.

For the particle tracer (for example), I'd like to tell the upstream pipeline to read no 
pieces when certain processes are empty of particles (currently they update and 
generate{=read} data when they don't need to). I may be able to suppress the forward 
upstream somehow, but I don't know of an easy way for the algorithm to say 
"Stop" to the executive to prevent it updating if the timestep changes, but the 
algorithm has determined that no processing is required (ForwardUpstream of Requests 
continues unabated). I'd like to set the UPdatePiece to -1 to tell the executive to stop 
operating.

Also : for dynamic load balancing, I'd like to instruct several reader to read 
the same piece - since the algorithm controls (for example) the particles the 
algorithm can internally communicate information about what to do amongst its 
processes, but it can't talk upstream to the readers and fudge them.

I am wondering if there is any way of supporting this kind of thing using the 
current information keys and my instinct says no. It seems like the update pice 
and numpieces were really intended for streaming and we need two kinds of 
'pieces', one for streaming, another for splitting in _parallel_ because they 
aren't quite the same. (Please note that I haven't actually tried changing 
piece requests in the algorithms yet, so I'm only guessing that it won't work 
properly)

<cough>
UPDATE_STREAM_PIECE

UPDATE_PARALLEL_PIECE<\cough>


Comments?

JB

I would have the reader (most parallel readers do this) generate empty
data on all processes of id >= N. Then your filter can redistribute
from those N processes to all M processes. I am pretty sure
RedistributePolyData can do this for polydata as long as you set the
weight to 1 on all processes. Ditto for D3.

-berk

On Fri, Dec 11, 2009 at 4:13 PM, Biddiscombe, John A. <biddi...@cscs.ch>
wrote:

Berk

It sounds like M is equal to the number of processors (pipelines) and
M >> N. Is that correct?

Yes, That's the idea. N blocks, broken (in place) into M new blocks, then

fanned out to the M processes downstream where they can be processed
separately . If it were on a single node, then each block could be a
separate 'connection' to a downstream filter, but distributed, an explicit
send is needed.

JB

-berk

On Fri, Dec 11, 2009 at 10:40 AM, Biddiscombe, John A. <biddi...@cscs.ch>
wrote:

Berk

The data will be UnstructuredGrid for now. Multiblock, but actually, I

don't really care what each block is, only that I accept one block on

each

of N processes, split it into more pieces, and the next filter accepts

one

(or more if the numbers don't match up nicely) blocks and process them.

The

redistribution shouldn't care what data types, only how many blocks in

and

out.

Looking at RedistributePolyData makes me realize my initial idea is no

good. In my mind I had a pipeline where multiblock datasets are passed

down

the pipeline and simply the number of pieces is manipulated to achieve

what

I wanted - but I see now that if I have M pieces downstream mapped

upstream

to N pieces, what will happen is the readers will be effectively

duplicated

and M/N readers will read the same pieces. I don't want this to happen as

IO

will be a big problem if readers read the same blocks M/N times.

I was hoping there was a way of simply instructing the pipeline to

manage

the pieces, but I see now that this won't work, as there needs to be a
specific Send from each N to their M/N receivers (because the data is
physically in another process, so the pipeline can't see it). This is

very

annoying as there must be a class which already does this (block
redistribution, rather than polygon level redistribution), and I would

like

it to be more 'pipeline integrated' so that the user doesn't have to
explicitly send each time an algorithm needs it.

I'll go through RedistributePolyData in depth and see what I can pull

out

of it - please feel free to steer me towards another possibility :)

JB

-----Original Message-----
From: Berk Geveci [mailto:berk.gev...@kitware.com]
Sent: 11 December 2009 16:09
To: Biddiscombe, John A.
Cc: paraview@paraview.org
Subject: Re: [Paraview] Parallel Data Redistribution

What is the data type? vtkRedistributePolyData and its subclasses do
this for polydata. It can do load balancing (where you can specify a
weight for each processor) as well.

-berk

On Fri, Dec 11, 2009 at 9:59 AM, Biddiscombe, John A.

<biddi...@cscs.ch>

wrote:

I have a filter pipeline which reads N blocks from disk, this works

fine

on N processors.

I now wish to subdivide those N blocks (using a custom filter) to

produce

new data which will consist of M blocks - where M >> N.

I wish to run the algorithm on M processors and have the piece

information

transformed between the two filters (reader -> splitter), so that

blocks

are

distributed correctly. The reader will Read N blocks (leaving M-N

processes

unoccupied), but the filter which splits them up needs to output a

different

number of pieces and have the full M processes receiving data.

I have a reasonably good idea of how to implement this, but I'm

wondering

if any filters already do something similar. I will of course take

apart

the

D3 filter for ideas, but I don't need to do a parallel spatial

decomposition

since my blocks are already discrete - I just want to redistribute the
blocks around and more importantly change the numbers of them between
filters.

If anyone can suggest examples which do this already, please do

Thanks

JB

--
John Biddiscombe,                            email:biddisco @

cscs.ch

http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91)

610.82.07

Via Cantonale, 6928 Manno, Switzerland      | Fax:  +41 (91)

610.82.82

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at

http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at:

http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Re: [Paraview] Parallel Data Redistribution

Reply via email to