On Dec 26, 2008, at 7:24 PM, Philip Hunt wrote:
2008/12/27 J. Andrew Rogers <and...@ceruleansystems.com>:
I think many people greatly underestimate how many gaping algorithm
holes
there are in computer science for even the most important and
mundane tasks.
The algorithm coverage of computer science is woefully incomplete,
Is it? In all my time as a programmer, it's never occurred to me to
think "I wish there was an algorithm to do X". mybe that's just me.
And there are vast numbers of useful algorithms that people use every
day.
Computers are general, so there always exists an obvious algorithm for
doing any particular task. Whether or not that obvious algorithm is
efficient is quite another thing, since the real costs of various
algorithms are far from equivalent even if their functionality is.
The Sieve of Eratosthenes will allow you to factor any integer in
theory, but for non-trivial integers you will want to use a number
field sieve. The limitations of many types of software are
fundamentally based in the complexity class of the of the attributes
of the algorithms they use. We frequently improperly conflate
"theoretically impossible" and "no tractable algorithm currently
exists".
I wonder (thinking out loud here) are there any statistics for this?
For example if you plot the number of such algorithms that've been
found over time, what sort of curve would you get? (Of course, you'd
have to define "general, elegant algorithm for basic problem", which
might be tricky)
I am still surprised often enough that it is obvious that there is
considerable amounts of innovation still being done. It both amuses
and annoys me no end that some common algorithms have design
characteristics that reflect long-forgotten assumptions that do not
even make sense in the context they are used e.g. compulsive tree
balancing behavior of intrinsically unbalanced data structures.
In short, we have no idea what important and fundamental
algorithms will be discovered from one year to the next that change
the
boundaries of what is practically possible with computer science.
Is this true? It doesn't seem right to me. AIUI the current state of
the art in operating systems, compilers, garbage collectors, etc is
only slightly more efficient than it was 10 or 20 years ago. (In fact,
most practical programs are a good deal less efficient, because faster
processors mean they don't have to be).
It is easy to forget how many basic algorithms we use ubiquitously are
relatively recent. The concurrent B-tree algorithm that is
pervasively used in databases, file systems, and just about everything
else was published in the 1980s. In fact, most of the algorithms that
make up a modern SQL database as we understand them were developed in
the 1980s, even though the relational model goes back to the 1960s.
I don't think I understand you. To me "indexing" means what the Google
search engine or an SQL database does -- but you're using the word
with a different meaning aren't you?
I mean it exactly like you understand it. Indexed access methods and
representations.
Sorry, you've lost me again -- I've never heard of the term
"hyper-rectangles" in relation to relational databases.
Most people haven't, because there are no hyper-rectangles in
relational database *implementations* seeing as how there are no
useful algorithms for representing them. Nonetheless, the underlying
model describes operations using hyper-rectangles in high-dimensional
spaces.
In an ideal relational implementation there are never external
indexes, only data organized in its native high-dimensionality logical
space, since external indexes are a de-normalization.
It is not because it is theoretically impossible, but
because it is only possible if someone discovers a general
algorithm for
indexing hyper-rectangles -- faking it is not distributable.
How do we know that there is such an algorithm?
We don't unless someone publishes one, but there is a lot of evidence
that seems to imply otherwise and which proves that much of the
research that has been done was misdirected. Aesthetically, the
current algorithms for doing this are nasty ugly hacks, and that lack
of elegance is often an indicator that a better way exists.
In the specific case of indexing hyper-rectangles, the first basic
algorithm was published in 1971 (IIRC), but was supplanted by a
completely different family of algorithm in 1981. Virtually all
research has been based on derivatives of the 1981 algorithm, since it
appeared to have better properties. Unfortunately, we can now prove
that this algorithm class can never yield a general solution and that
a solution must look like a variant of the original 1971 algorithm
family that has been ignored for a quarter century. Interestingly, the
proof of this comes by way of the recent explosion in the research on
massively concurrent data structures due to the proliferation of multi-
core CPUs, which has published results that when applied to this
particular problem show both that we were doing it wrong and gives a
lot of hints as to how to do it right that suggest a completely novel
type of algorithm that has never been tried.
You have to understand that until a couple years ago, no one was doing
useful research on this particular problem despite the fact that many,
many people had done endless quantities of research trying to solve it
down an apparent dead-end path. Computer science is full of this kind
of thing, and the elegant solutions only look obvious in hindsight.
Cheers,
J. Andrew Rogers
-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com