Hi,

I guess I forgot about the data duplication issue.  For most statistics that I 
calculate in parallel, the basic results are not affected that much (i.e. mean, 
standard dev). But for the Pearson correlation coefficient, which is a bit more 
complicated, there is a noticeable difference (0.833 for parallel and 0.99 for 
serial).

Will this point sharing information become available in later versions of 
ParaView? i.e. would it ever be easy to identify and count duplicates?


Thanks,

Sohail



________________________________
 From: David Thompson <david.thomp...@kitware.com>
To: Sohail Shafii <sohailsha...@yahoo.com> 
Cc: "paraview@paraview.org" <paraview@paraview.org> 
Sent: Friday, August 17, 2012 3:40 PM
Subject: Re: [Paraview] Numpy masking (via programm filter) not quite working 
in parallel
 
Hi Sohail,

This is likely caused by points shared on several processes. While ParaView 
splits the cells of a mesh across processes, cells on the boundary between 
processes share vertices. Thus if a vertex bounds cells split across 3 
processes, that vertex will appear in 3 different lists of local vertices and 
thus be counted 3 times instead of once. There is nothing in ParaView for 
determining which vertices are shared (and by how many processes).

This also affects the statistics filters when run in parallel on point-centered 
data. For most large data, the number of vertices on inter-process boundaries 
is small compared to the total so we documented the behavior for the statistics 
filters but did not implement a solution (because the expected skew is small).

    David

On Aug 17, 2012, at 5:05 PM, Sohail Shafii <sohailsha...@yahoo.com> wrote:

> Hi,
> 
> I need to use Numpy in a lot of the programmable filters that I write, and 
> I've run into differences in how its masking feature works in serial and 
> parallel. Masking allows one to filter out portions of an array that do not 
> pass some condition.
> 
> As an example, I've created a stock paraview wavelet, and saved it as a pvd 
> file.  I then load it in, and run this inside of a programmable filter:
> ---
> import numpy
> 
> data = inputs[0].PointData['RTData']
> # create a mask that tells us which points are equal to one
> mask = numpy.ma.masked_equal(data, 1)
> # filter data array by the mask conditions (so that other points are excluded)
> maskedPnts = numpy.extract(mask, data)
> 
> print len(maskedPnts)
> 
> ---
> In serial mode, I get 9261 points. With two processes, I get 2 x 4851 or 
> 9702.  So masking always produces more points.
> 
> Any ideas to why that is?  Is there anything I can do/print out to see why 
> masking doesn't quite work in parallel?
> 
> Thanks, Sohail
> _______________________________________________
> Powered by www.kitware.com
> 
> Visit other Kitware open-source projects at 
> http://www.kitware.com/opensource/opensource.html
> 
> Please keep messages on-topic and check the ParaView Wiki at: 
> http://paraview.org/Wiki/ParaView
> 
> Follow this link to subscribe/unsubscribe:
> http://www.paraview.org/mailman/listinfo/paraview
_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Reply via email to