Dear Boost friends, > was unwittingly working with apples and oranges. Correct me if I'm wrong, > but it really feels like the two containers are a scalar data type (ignoring > the attached associative data structures) and a metric (or "distance" > function). "Metric" is already a much more generic term than "distance" or > even "spatial," although related. I don't have a problem with the word > "metric," (although I suspect it's a term more familiar to physicists or > topologists than to programmers), but I'm damned if I can think of a good, > simple term for the first data type. "Distance" is certainly one > possibility, although it carries a lot of denotative freight that most > programmers wouldn't be aware of; it has common meaning, which wouldn't > cover all the uses of this data object, and yet is really a pretty abstract > topological concept (again, as is "metric"). Anything mensurable can have a > metric associated with it; what then, do you call the "distance" the metric > is measuring? Especially if it isn't a "distance" in the conventional > sense? >
The discussion about terminology is really useful, because we already thought a lot about it and we know there is room to improve it. When we talk about spatial containers we think of containers whose elements have keys which belong to the Cartesian product D = D1 x D2 x D3 x ... x DK for some fixed value K (the "dimension") and for totally ordered domains D1, D2, ..., DK. The typical example of course is D1 = D2 = ... = DK = R (the real numbers). Obviously you can easily endow this space with a metric (= distance), but it is not clear how to do it for other domains D. For instance, D = Employee with D1 = name, D2 = salary, and D3 = age. For spatial containers, the typical queries are orthogonal range queries and partial matches (example: list all employees that are 23 years old, how many employees earn more than 50,000 USD whose name starts with "A"). Indeed the name "spatial" is not very fortunate (as Reid points out, in the mathematical sense a space has a norm (=metric) by definition). The term "spatial containers" is perhaps confusing (and definitively not rigorous!), but we decided on that name because of Samet's well known textbook "The Design and Analysis of Spatial Data Structures". We use the term "metric container" when the elements belong to a type T for which we have a distance function (well, you pass it via a template parameter which is function object). The function must satisfy the usual properties of distances, in particular, the triangular inequality. This inequality is systematically exploited to speed up nearest neighbor searches: given an object x, find the object closest to x in the container according to the given distance. The terms "metric" and "(dis)similarity measure" have been often used in the literature about algorithms and data structures for the nearest neighbor and proximity search, and we finally decided on that name. Conclusion: Spatial containers support range queries and its elements are K-tuples. Metric containers support nearest neighbor queries and there is a measure of similarity or distance between elements. Of course there are case for which both types of queries make sense (so spatial containers can support nearest neighbor searches if a metric is supplied; but the metric is not required at construction time, whereas the metric is a must when you create a metric container). Last but not least I must confess that "Spatial and Metric" made for a nice acronym: SMTL ;-) Conrado _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost