On 03/11/11 13:19, Paolo Castagna wrote:
nat lu wrote:
Impressive work, was wondering how hadoop might fit into the hadoop
picture.
Hi,
can you clarify what you mean with "how hadoop might fit into the
hadoop picture"?
tdbloader3 is just a very specific (and probably not trivial)
MapReduce job/use cases. So, it's a practical use case for Hadoop.
Building dictionaries or B+Tree is one of the good examples where
MapReduce does not perfectly fit (however... there are potentially
significant advantages in making your algorithm fit into the
MapReduce paradigm model). PageRank (I know from experience) is
another good example where MapReduce isn't ideal. :-)
As far as I am aware, there are no publicly examples of people using
MapReduce to build B+Tree indexes. Although, it's not rocket science
since mainly you need to sort stuff and Hadoop/MapReduce are great
at sorting stuff.
> Any thoughts on doing this with SDB ?
One reason why something such as tdbloader3 is possible is because
having the source code, it's possible to see the binary format of
TDB indexes.
While in theory would be possible to do something similar with some
of the open source DBMS systems supported by SDB, I do not see how
a similar approach could be employed with closed source DBMS systems.
Many databases have "dump" format which can be loaded faster than blocks
on INSERT statements.
e.g.
http://dev.mysql.com/doc/refman/5.5/en/load-data.html
http://dev.mysql.com/doc/refman/5.5/en/insert-speed.html
"""
When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times faster than using INSERT statements.
"""
Andy