Re: [Perldl] building a quad-tree like index with PDL

P Kishor Wed, 01 Sep 2010 09:23:46 -0700

top-posting further elaboration --

I have really two different beasts to serve -- the users can only
understand images, and images, by their nature, are 2D structures at a
point in time, spread over a landscape (I am talking geography here...
other sciences may have different paradigms). The user says, "Show me
the soil conditions in the State of Wisconsin for Dec 30, 1989" and
bam! there is a map of that. Or, the user says, "Show me an animation
of the changes in the daily precipitation in the last month of 1980
for the North Eastern quarter of Iowa" and bam! there is a set of 31
images created that show daily precip for that area. So, that beast
wants 2D images where X and Y are the more typical geographic X and Y.


The other beast is the model -- the model works on a unit of area at a
time. For each unit of area, the model wants *all* the data for that
specific unit. So, it is more like a drill core through a stack of
images that plucks out a column of data. This column is made up of all
the data layers for all the time periods, for that particular unit of
area. The model works on one unit of area, then moves on to the next
unit of area and then the next, and so on.

The canonical data source is logically the same, but it is viewed
differently by the two beasts. Sure, if I could create a single 3D
stack of images approx a few Terabytes in size and latch PDL on to its
end like a little, hungry monster, that would be great. I could slice
and dice and range through it at will. But, that is really not a
feasible scenario.

On top of that, I am going to have a lot of users coming in through a
web interface, requesting either an image or wanting to run the model.
The data are read only, so they don't have to be locked. Have tried a
db already, and already brought it down to its knees. I know, rather,
I am convinced, albeit unscientifically, that PDL can conquer this.
Now, I have to figure out how.


On Wed, Sep 1, 2010 at 11:03 AM, P Kishor <[email protected]> wrote:
> On Wed, Sep 1, 2010 at 10:43 AM, Ingo Schmid <[email protected]> wrote:
>>  May I ask how many is a lot?
>
> There are two scenarios --
>
> In one, I already have the data, and it is a lot of data. While I
> don't know the exact numbers, my estimate is that they would weigh in
> somewhere around 800 GB. I would like to organize them in separate
> piddles (logically, each piddle would be one image) that would weigh
> in at around 90 MB.
>
> Since, at any given time, I would be looking at only a small range in
> each piddle, it just seems a waste to be slogging through 90 MB of
> data. Not a problem in a one off process, but in a web based process
> with multiple users, that could easily become a hog.
>
>
> In the  second scenario, the piddle size is actually tiny, but piddles
> are created on an ad hoc basis, for different parts of the country.
> Once created, they are stored, so they are separate by nature of
> creation, not lumped together.
>
> So, either way, being able to identify the specific piddle required
> based on a meta-index would be a useful capability.
>
>
>
>> I process imaging data, i.e. stacks of 3D
>> images all concatenated into a 4D piddle. If you look for for my previous
>> message(s) on this list, there is a limit to 2GB for a single piddle and
>> some suggestions to patch Core.pm to push it further. If your memory is
>> large enough to hold the piddle stick to it and enjoy slicing, dicing and
>> threading! If it's much bigger than memory, that's a different story,
>>
>> Ingo
>>
>> On 09/01/2010 05:27 PM, P Kishor wrote:
>>>
>>> (near future scenario) I have a lot of piddles covering contiguous
>>> rectangular areas. I could stitch them up together, but then, I would
>>> have a one very large piddle. So, I leave them the way they are. The
>>> user supplies a pair of coordinate pairs which lets me identify the
>>> piddle we want, open it up, use range() to extract the
>>> area-of-interest (AOI), and analyze it. I can do the identification of
>>> the piddle either based on some naming scheme I can develop, or by
>>> storing some kind of area->to->name index. (sidenote: Of course, I can
>>> do this identification task with PostGIS/Postgres, or with
>>> SQLite+R*Tree, but I am hoping for an all PDL solution, or rather, a
>>> NoSQL solution). So, that is the first problem... a name->to->area
>>> index is easy, but an arbitrary_area->to->name index is difficult.
>>>
>>> Second problem -- what if the arbitrary_area->to->name index returns
>>> multiple piddles, as in, an AOI that overlaps several piddles? So,
>>> first, the aribitrary_area->to->name index should be able to return
>>> multiple piddles. Then, my program should be able to extract the
>>> various smaller regions from the identified piddles and glue (or
>>> append) them together into a piddle of the AOI, cache it temporarily,
>>> and do analysis on it.
>>>
>>> I am thinking... this could be done with some kind of quad-tree
>>> indexing scheme. Has this been done already? If not, suggestions on
>>> how to proceed with this would be much welcome.
>>>


-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] building a quad-tree like index with PDL

Reply via email to