Here are just a few observations.
What you have described is a classic taxonomy problem. The simplest approach
is to create a table to contain the basic attributes of the obtjects in the
database. Using your example, such attributes may be anatomical (two eyes, two
ears, four limbs, one backbone, &c.), ecological (categories of food items,
such as plant, animal, fungal, &c., temperature tolerances, &c.), conventional
taxonomic data (such as genus, family, class, phylum, &c.). You might even get
creative and choose to use additional novel or unconventional kinds of
attributes. And you could go so far as to add a dynamic classification scheme
based on conventional or novel numeric taxonomic algorithms, so the data itself
provides a "natural" classification hierarchy. Additional tables may provide
more useful predictive capability, such as a table representing empiric
observations (based either on field observation or analysis of gut contents) of
what eats what (e.g. on date x animal y was observed
eating z by Fred at location Q). Analysis of such a table could provide bases
for inferences such as "animal x eats mammals of the family" or "animal y eats
only insects, and all insects that will fit in its mouth". This avoids a need
for predefined relationships, and a suite of probable relationships can be
constructed directly from the basic data. It also provides support for a
measure of predictive capability. For example, if animal x has been observed
eating mule deer, you might predict that it may also eat white tail deer where
they coexist even if the animal has not been observed eating white tail deer.
I'd suggest avoiding the route involving defining very high dimensional
spaces. That gets unweldy very quickly, and you'll find that most of such high
dimensional spaces is empty.
The probability of observing grass eating a cow is identically 0. But the
probability of observing pitcher plants, sun dews, or venus fly traps eating
insects is identically 1, on the condition you make an effort to study
carnivorous plants. :-) The number of possible relationships between objects
increases as n! where n is the number of objects, but the number of kinds of
relationships will be small.
No one can tell you what the right approach is or what a better way to do
what you want to do might be, at least from what you have said so far. You
haven't said precisely what you want to accomplish. Is it simple information
management, to tell your users what they already know? Or perhaps it is a
research tool. Or maybe it is simple record keeping, and may rarely if ever be
used except for audit purposes.
The only way to provide the kind of answers you want, regarding a comparison
of options for managing and using your data, is to do a formal analysis of your
problem and how your users may want to use your product. You might want to
begin with a problem definition, from the perspective of a client or end user,
and then work up a few use case scenarios. And then, from that, you can work
up a prototype data model with an entity-relationship diagram (NB: these
relationships may or may not be related to the relationships you want to
capture from your data - there are different kinds of relationships just as
there are different kinds of models - don't you just love the fact natural
language is multivocal?). Until this homework is done, the best you'll likely
get are a few general observations of the sort I have offered (and I do so from
the perspective of an ecologist who happens to develop software). You will
want to ensure that any example you offer to illustrate what you
want to do is directly related to the actual problem you are trying to solve.
Otherwise the advice you get may be inappropriate. The data model I'd use for
a decision support tool for government officials responsible for environmental
protection (such as at the EPA in the US) will be quite different from what I'd
use for a university researcher, and both will be different from what I'd use
for a financial consultant or a human resources office. What you need to do
will be as dependant on what real world problem you're trying to solve as it is
on what data is available. A tool to support research in the life sciences
will probably not want to store relationships among living things explicitly,
preferring to infer them from data analysis, but a tool for a government
environmental regulator will instead likely want to store them explicitly (e.g.
as input to simulation models used to assess dose and risk issues).
Why wait for quantum computers? Don't sell the computers available today
short. You can run mesoscale weather models on a modern desktop (the ones that
used to run on mainframes only a few years ago, not the highly detailed ones
used on mainframes now). I have done ecosystem scale modeling using desktop
machines, even those available ten years ago. Yes, I'd love to have more
computing power at my disposal, but that doesn't stop me from doing useful
stuff now! Modern ecological models are, at present, much more limited by the
availability of good data, both for parameterizing models and for validating
models, than they are by available computing resources.
HTH
Ted
Lincoln Yeoh <[EMAIL PROTECTED]> wrote:
Hi,
Sorry, this really isn't postgresql specific, but I figure there are
lots of smarter people around here.
Say I have lots of different objects (thousands or even millions?).
Example: cow, grass, tiger, goat, fish, penguin.
BUT I'm not so interested in defining things by linking them to
categories or giving them names, I'm trying to figure out a way to
define things by their relationships with other things, and more
importantly do searches and other processing by those relationships.
So, what would be the best way to store them so that a search for the
relationship like "grass is to cow", will also turn up cow is to
tiger, and goat is to tiger, and fish is to penguin (and penguin is
to bigger fish ;) ), and electricity is to computer. And a search for
cow is to goat, could turn up tiger is to lion, and goat is to cow.
Is the only way to store all the links explicitly? e.g. have a huge
link table storing stuff like obj => cow, subj => grass, type =>
consumes, probability=90% ( => means points/links to). Or even just
have one table (links are objects too).
Or is it possible to somehow put the objects in a multidimensional
space (1000 dimensions?) and keep trying to arrange the objects so
that their relationships/vectors with/from each other are fairly
consistent/reasonable based on "current knowledge"? Trouble is in
some cases the grass eventually eats the cow, so maybe that doesn't
work at all ;).
Or even do both? Maybe use the first as a cache, and the second for
"deeper" stuff ("flash of insight" or "got the punchline" = figure
out better arrangement/ joining of disparate maps).
My worry about the first approach is that the number of links might
go up very much faster as you add more objects. But perhaps this
won't be true in practice. The worry about the second approach is
that it might get "stuck", or run out of dimensions.
Is there a better way to do this? There must be right?
Wait for huge quantum computers and use qubits for each
multidimensional coordinate? ;).
Regards,
Link.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings