Jason Tackaberry wrote:
>> Correct, still thinking about it. But sending data from process to
>> process also takes time. You think it is fast, but think of 10k of
>> files. 
>
> Well, fast is relative.  10k files will take about 0.1-0.2 seconds using
> my IPC class on my system.  That's not exactly _fast_ but we also need
> to be realistic: who is going to have a single directory of 10k files?

That's ok. I guess my 2k photos dir is one of the biggest we will ever
have. And I also need to sort it into subdirs to make it easier for me
the search something. It is only one folder because it is a good speed
test. 

>> don't need to add that to every file without one). Inside a dict of
>> files with the metadata. To keep unpickle fast, some stuff like the
>> above cover is only stored when it is different from the directory
>
> I find images are much faster to load directly from disk as PNGs.  I
> remember doing a test and found loading a pickled image slower.

Yes, it is. I only remeber the cover filename. E.g. you enter /foo and
there is a cover.png in it, I store that to the pickle. By that, a
file can find it's cover by checking cover in it's dict and if not
found, use the directory cover. The image itself isn't pickled
anymore, it is only in the thumbnail dir. 

> A pickled dir of 6MB is very small.  My media collection is also small,
> but I want to design MeBox so that it scales well and performs properly
> on a media collection of around 100k files and directory sizes up to 10k
> files.  I've shown in my last email that approach #1 is feasible
> time-wise, but I did completely ignore memory requirements (which I
> realized after I sent it).

6 MB pickeled is for about 20k of files here.

> In the tests I attached in my last email, the pickle approach has a
> memory requirement of 63MB and the sqlite approach uses 2MB.  That's a
> huge difference.  More importantly, the amount of data stored was by no
> means complete.  We can expect much more to be held in the dictionaries.
> So for large collections, the in-memory query used in #1 just doesn't
> scale memory-wise.
>
> So perhaps sqlite is the way to go after all.  The design is
> complicated, but it does scale better.

Yes. But it still prefer the pickle way of handling things. I only
have to find a good way to create an index.

>> That also was a big problem for me as you can see in my WIP mediadb
>> test. Even worse: you don't know what entries a table has for a
>> type. A plugin may want to store something to the video table and you
>> don't know the variable while you write the vfs.
>
> This is easily solved in design.  A plugin that provides support for a
> new file type will register with the vfs: name of type, name of database
> table, tuple of fields in database table, tuple of supported extensions,
> function to index a file, etc.  As an implementation detail, you can
> require that all fields have unique names across all tables (say, each
> field is prefixed uniquely), and then put each field into a dict,
> mapping to a table name.

I'm nout sure I understand. Freevo will register video to the
db. After that, the resume plugin wants to store the current
position. How do I add that to the table? One idea would be that is a
field a can't search for and each table has a field 'extra data' which
is a pickled string.

> So when you get a query, you parse it to see what fields are referenced.
> Then you look up in the field-to-table dictionary and get a list of what
> tables you need to query on.  Then you run the query on each of these
> tables, making sure to remove references to field names that don't exist
> on the current table.  You'd need some logic to construct the
> appropriate intersection or union (depending if it's an AND or OR) on
> the result sets.
>
> It's workable.

And slow. Each query needs time.

> It may make sense to not have a general purpose files table, and
> duplicate the fields (like filename, modification time, file size, etc.)
> in each type-specific table.  Or perhaps not, because then in the common
> case where the user enters a directory, we need to search all the media
> tables for that directory.  On the other hand, if we don't allow mixed
> media types when browsing, you only need to select on one type.  This
> may make most sense both in interface and database design.  Did that
> make any sense? :)

No. We allow more than one type in Freevo. E.g. you can activate
'audio' and 'video' in the audio menu. There is no difference for the
user between a mp3 file and a video clip of the music. I also like to
add video to images because I have some video clips from the camera
which should be shown besides the photos. Maybe I want a vacation
mediamenu. A vacation contains all files belonging together: images,
videos and mp3 files (I'm thinking of my Cuba vacation right now, the
music belongs to the images)

>> Aha, so a dict isn't even slower. But 100k files may be a small
>> system, I guess some people have more files. But I get the point. 
>
> I wonder who would have more than 100k files?  I suppose it's possible.
> We want to make sure things scale properly.  I mean, if a user has as
> collection of millions of files, he should own a Cray :)

OK, so let's say 100k is a good idea.

>> I did similar tests some weeks ago and came to the same conclusion. 
>
> Conclusions change. :)

Yes, and I'm switching from pickle to sqlite and back and to sqlite
again. Right now I prefer pickle and I'm coding that way. But I make
sure the db stuff can be changed very easy. Maybe I will support both
in some weeks to compare them.

>> I hope so. As I wrote on IRC I have a test version here with creating
>> metadata in the background. Very nice and very fast.
>
> Yeah, it makes all the difference, eh? :)

Yes. Without metadata for the 2k files the old idea takes a long time
until I see the menu. Now it's like the data is already stored. OK,
for my trailer dir, the titles are wrong at first because the nice
title is stored inside the file.


Dischi

-- 
The only problem with mornings is that they happen too early in the
day.

Attachment: pgpHTFBRvTtIh.pgp
Description: PGP signature

Reply via email to