On Sun, 2005-08-21 at 20:33 +0200, Dirk Meyer wrote:
> You both do different stuff.

That's what I thought too.   But I wasn't sure if I misunderstood his
test. :)

> Question is: what do we want? In 90% of all cases we want a directory
> listing, not just one file. This means Tacks wins. One the other hand,
> my idea with one select to get basic file attributes and after that
> one select for each file could also work. But maybe not. 

One select for each file would be absolutely painful.  To do what you
suggest, it'd be better to do this:

   SELECT * FROM simple_file_list WHERE dir_id=X;

   ... display simple dir list in UI ...

   SELECT * FROM complex_file_list WHERE file_id IN \
      SELECT file_id FROM simple_file_list WHERE dir_id=X;
  
   ... update UI with detailed metadata ...

I really don't think this approach is going to win big.

My approach gets all the data into python space up front.  Obviously
this will be slower than selecting just a few columns from
simple_file_list, but I don't think the practical difference makes it
worth caring.  What we _can_ do is delay processing the row data into a
more manageable form (i.e. unpickling the pickle column and moving the
row data into a dict that's easier to work with) until after a simple
file list is displayed in the UI.  I think most of the big wins will be
these kinds of tricks, because working with python's data structures is
slow.

> that can handle everything. Maybe we make it very static. The table
> file contains everything we may need to do a query on and a pickled
> field for the rest. So we will always have album and artist, even for
> images. While this still makes sense, length for images and width for
> audio is stupid. But do we care? One table to hold everything soulds
> not so bad to me.

It's true that using a separate table for each file type adds complexity
to the design.  It's also true that using a single table and use
separate rows for each metadata attributes is the slowest possible
approach.  The fastest possible approach would be to use a single table
for all files, one row per file, and store all file-type specific data
into a pickle column.  But then we lose the ability to search on this
stuff.  The whole point in using a database is so that we can query.

I'm fairly confident that the way I did DBOverlord gives us the best
compromise.  We get flexibility of adding new attributes at run-time.
We get the option to be able to perform queries on these attributes if
we want.  If the attributes aren't very interesting, then we can store
them as ATTR_SIMPLE, which means they're just an entry in a pickled
dictionary.

I think Martijn has very grandiose ideas for what mediadb should be.  I
think he wants the user to be able to add his own attributes to files,
with the ability to construct arbitrarily complex queries on these
attributes.  So he is advocating the single table, one-row-per-attribute
approach which makes this easy to implement.  But it is slow.  And
anyway, DBOverlord already lets the user add custom attributes and
construct queries on these attributes.  The main difference, though, is
that a decision has to be made about what kind of attribute it will be.
Martijn's approach means all attributes are searchable.  Mine means that
they're only searchable if you want them to be.  This is the key
compromise in my approach.  But this is a solvable problem in the UI.

Jason.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to