top-posting further elaboration -- I have really two different beasts to serve -- the users can only understand images, and images, by their nature, are 2D structures at a point in time, spread over a landscape (I am talking geography here... other sciences may have different paradigms). The user says, "Show me the soil conditions in the State of Wisconsin for Dec 30, 1989" and bam! there is a map of that. Or, the user says, "Show me an animation of the changes in the daily precipitation in the last month of 1980 for the North Eastern quarter of Iowa" and bam! there is a set of 31 images created that show daily precip for that area. So, that beast wants 2D images where X and Y are the more typical geographic X and Y.
The other beast is the model -- the model works on a unit of area at a time. For each unit of area, the model wants *all* the data for that specific unit. So, it is more like a drill core through a stack of images that plucks out a column of data. This column is made up of all the data layers for all the time periods, for that particular unit of area. The model works on one unit of area, then moves on to the next unit of area and then the next, and so on. The canonical data source is logically the same, but it is viewed differently by the two beasts. Sure, if I could create a single 3D stack of images approx a few Terabytes in size and latch PDL on to its end like a little, hungry monster, that would be great. I could slice and dice and range through it at will. But, that is really not a feasible scenario. On top of that, I am going to have a lot of users coming in through a web interface, requesting either an image or wanting to run the model. The data are read only, so they don't have to be locked. Have tried a db already, and already brought it down to its knees. I know, rather, I am convinced, albeit unscientifically, that PDL can conquer this. Now, I have to figure out how. On Wed, Sep 1, 2010 at 11:03 AM, P Kishor <[email protected]> wrote: > On Wed, Sep 1, 2010 at 10:43 AM, Ingo Schmid <[email protected]> wrote: >> May I ask how many is a lot? > > There are two scenarios -- > > In one, I already have the data, and it is a lot of data. While I > don't know the exact numbers, my estimate is that they would weigh in > somewhere around 800 GB. I would like to organize them in separate > piddles (logically, each piddle would be one image) that would weigh > in at around 90 MB. > > Since, at any given time, I would be looking at only a small range in > each piddle, it just seems a waste to be slogging through 90 MB of > data. Not a problem in a one off process, but in a web based process > with multiple users, that could easily become a hog. > > > In the second scenario, the piddle size is actually tiny, but piddles > are created on an ad hoc basis, for different parts of the country. > Once created, they are stored, so they are separate by nature of > creation, not lumped together. > > So, either way, being able to identify the specific piddle required > based on a meta-index would be a useful capability. > > > >> I process imaging data, i.e. stacks of 3D >> images all concatenated into a 4D piddle. If you look for for my previous >> message(s) on this list, there is a limit to 2GB for a single piddle and >> some suggestions to patch Core.pm to push it further. If your memory is >> large enough to hold the piddle stick to it and enjoy slicing, dicing and >> threading! If it's much bigger than memory, that's a different story, >> >> Ingo >> >> On 09/01/2010 05:27 PM, P Kishor wrote: >>> >>> (near future scenario) I have a lot of piddles covering contiguous >>> rectangular areas. I could stitch them up together, but then, I would >>> have a one very large piddle. So, I leave them the way they are. The >>> user supplies a pair of coordinate pairs which lets me identify the >>> piddle we want, open it up, use range() to extract the >>> area-of-interest (AOI), and analyze it. I can do the identification of >>> the piddle either based on some naming scheme I can develop, or by >>> storing some kind of area->to->name index. (sidenote: Of course, I can >>> do this identification task with PostGIS/Postgres, or with >>> SQLite+R*Tree, but I am hoping for an all PDL solution, or rather, a >>> NoSQL solution). So, that is the first problem... a name->to->area >>> index is easy, but an arbitrary_area->to->name index is difficult. >>> >>> Second problem -- what if the arbitrary_area->to->name index returns >>> multiple piddles, as in, an AOI that overlaps several piddles? So, >>> first, the aribitrary_area->to->name index should be able to return >>> multiple piddles. Then, my program should be able to extract the >>> various smaller regions from the identified piddles and glue (or >>> append) them together into a piddle of the AOI, cache it temporarily, >>> and do analysis on it. >>> >>> I am thinking... this could be done with some kind of quad-tree >>> indexing scheme. Has this been done already? If not, suggestions on >>> how to proceed with this would be much welcome. >>> -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu ----------------------------------------------------------------------- Assertions are politics; backing up assertions with evidence is science ======================================================================= _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
