The recent questions around nested (or sub-) views come at an opportune time for me since I am about to generate a Metakit database for the "FactFinder" portion of my text mining platform. Each "fact" (and this is a simplistic concept at this point in time) will be represented as a set of (generally 3) "components". And each component will have the same structure, which will contain such information as its position in the source, the ontology topic it immediately falls under, etc.. The "facts view" will then consist of rows of facts.
A flattened representation would look something like this: [sourceId:I,comp_1_start:I,comp_1_end:I,comp_1_topic:I,comp_2_start:I,comp_2_end:I,comp_2_topic:I,comp_3_start:I,comp_3_end:I,comp_3_topic:I] whereas a fairly natural and elegant nested approach would look like: [sourceId:I,comp_1:[start:I,end:I,topic:I],comp_2:[start:I,end:I,topic:I],comp_3:[start:I,end:I,topic:I]] But what are the tradeoffs here? What does the nested approach get me other than a more appealing-looking structure (which no one will ever see)? The nested structure is certainly a bit easier and natural to deal with, but functions and methods could be written to make the flat approach work about as well. Obviously, if I want or need to pass "components" around, then the nested approach is cleaner and easier. I'm very drawn to it, but I'm concerned about performance if, for example, I need to loop through the rows and do checks and comparisons on values within the nested views. (For example, this view itself will be generated by running an FSA (FST, actually) on a larger view of "topic occurrences", and a similar approach may need to be taken in order to extract further information out of the facts view.) Or should I keep the fundamental facts view in the nested form and then flatten it for various kinds of processing? What is the overhead involved in flattening? Any thoughts or insights will be appreciated. -------------------------------------- Gary H. Merrill Director and Principal Scientist, New Applications Data Exploration Sciences GlaxoSmithKline Inc. (919) 483-8456 _______________________________________________ metakit mailing list - [EMAIL PROTECTED] http://www.equi4.com/mailman/listinfo/metakit
