[Freevo-devel] Re: improve thumbnailing

Dirk Meyer Thu, 21 Apr 2005 00:31:37 -0700

Jason Tackaberry wrote:
>  But, as you say, mmpython/epeg/mplayer/whatever is slow, so
> relative to the overall indexing time, the difference in performance
> between pickle and sqlite is epsilon.


True. Maybe it is possible to merge both ideas. Keep two
databases. One pickle for fast directory listings and one sqlite for
vfs listings on Artist/Album, etc. Maybe not, I'm not sure how to do
things. 

>> No, I had a field dirname in each record. So I selected by this. Take
>> a look at WIP/Dischi/mediadb for the code.
>
> Our approach for table design is somewhat different.  I have a files
> table which holds common data (basically stuff you get from stat) for
> all files that are interesting (i.e. videos, images, audio) where each
> file has a unique, integer id.  Then for each of the media-specific
> tables, the file id is a foreign key reference to the files table.
> (Well, sqlite doesn't enforce referential integrity of course, but the
> design is still useful.)

Can you send me the table design? My knowledge of sql is very limited,
I wrote the stuff by reading the w3 sql howto. 

> Well yes, but how meaningful is the above?  Firstly you're comparing
> my results dealing with 10000 files with your results dealing with
> 2000 files.  And secondly our systems are likely rather different.

I know. But I can compare my sqlite results with my pickle results. My
mediadb design (not the one in WIP) has one db file using pickle. It
should be easy to replace it again with a sqlite db. So I'm interessed
in your code.

> I upgraded to pysqlite2 from svn which has some performance
> improvements and, on my system, doing a select on 10k records and
> putting those records into a list of tuples takes 0.146 seconds (3
> samples averaged).  Loading a pickle of the same data set takes
> 0.121 seconds (3 samples averaged).  This is not a huge difference.

That sounds good.

> I can't argue that pickling is faster.  The question up in the air is:
> is using an sqlite backend prohibitively slower -- i.e. does the
> performance loss outweigh the benefits in flexibility?  

That's what I don't like with pickle. I want to to do stuff like
mediadb.Listing('artist like "Enya"'). I can't do that with
pickle. I'm thinking of how it could be possible, but all ideas I have
don'y sounds that good that I even started coding.

>> You don't need os.listdir. If the directory mtime is still the same
>> (you need an extra SELECT to get that information, I don't), you can
>> skip checking for new/deleted files.
>
> Yes, you're right of course.

I'm also trying to predict what needs to be scanned and what not. This
things would speed up the caching -- for both interfaces (sqlite and
pickle)

>> Yes, I add a progress box when there are more than x changed/new
>> items. 
>
> I don't want any popups for this sort of thing. 

I also don't like it. But the current freevo design needs the metadata
for creating the menu items. If the data is not available, I need to
create it before creating the menu.

> The only exception is if the number of files in the directory is
> unusually large and it will take very long to do the initial
> listdir/stat.  

IMHO the listdir/stat is always fast. I also need it before I show a
popup box. 

> Better to get a list of filenames and other data that can be gotten
> from stat(), display the files to the user with some "loading"
> icons, and fill them in as they're loaded by the vfs asynchronously.

That's an idea, but it would look strange. Freevo shows the title for
an item, maybe that item is a video file with 'title' as metadata.
After creating the data, the items may be look different. But I guess
the idea is good. Create a quick listing only using listdir/stat (I
already support that), create the metadata in the background and when
done rebuild the menu. On the fly is also a nice idea, I have to think
about it. Yes, it should work. The item gets his metadata in
self.info. Give an object with mmpython and other data and will in the
missing data later. Since python uses references, the menu doesn't
need to be rebuild (only for stuff like title changes). I have to play
with it. 

>> OK, you won't see all thumbnails at first, but they come later and the
>> gui still works.
>
> The way I've done my vfs is that directory loads take a synchronous
> timeout value, where after X seconds (0.2 by default but I may tweak
> that based on how it feels) it will return, and load the remaining file
> metadata asynchronously.  So if the time it takes to get the
> filelist/stats is 0.05 seconds, it has 0.15 seconds to load metadata and
> thumbnails.  This means that you will see thumbnails at first, but only
> as many as you can load in 0.2 seconds.  The rest will get loaded in the
> background and the UI updated as it goes.

OK, sounds good.

>>  And if you pre-cache the thumbnails with a helper, you don't have
>>  that problem.
>
> Well, if the helper is in another process, pre-caching isn't going to
> help much.  (Unless by pre-cache you mean a read-ahead at the OS level.)

With pre-cache I mean create the thumbnails.

>> browse it. I have a DVD with the 2k photos. Indexing creates too much
>> time, also creating thumbnails and you have to wait will be too much
>> for the user. But you have not much time between inserting the disc
>> and showing it. You can't force the user to wait until an extra app
>> checked the disc.
>
> I'm not sure I follow here.  With a fully asynchronous design, there is
> no such wait.  Of course the user will have to wait for the directory to
> be fully indexed, but the UI won't block while this is being done.

I didn't understand that you add the metadata on the fly. With that,
it is no problem. In fact, it is a very nice idea.

>> The helper sounds nice, but it should be optional.
>
> Why?  The user doesn't need to know about such design details.

I have to think about it. Maybe even a thread with good access points
to the main loop. By that, the 'helper' could also preload
thumbnails. 

> I wonder if we're talking about different things.  I mean, I'm not
> suggesting how Freevo should be designed, but rather how I am doing the
> vfs for MeBox.  Maybe we can use ideas from each other. :)

That's what I'm talking about. I don't what you to use my ideas for
mevas. But sharing ideas how things can be done could result in better
code for both. Or maybe we come up with a vfs design that both
projects could use. One simple pyvfs module. You are a good
programmer, it would be great if we could share as much code as
possible. 

>> > Also if there are other processes, say a web server, that wants
>> > to do directory monitoring, it can just talk to the monitor process via
>> > IPC, so we don't have multiple processes polling the same directory.
>> 
>> Replace mbus with IPC and I like your idea :)
>
> Well, I meant IPC as a general term.  mbus can be used for IPC. :)

Ah, InterProcessCommunication. OK, yes, mbus is that. I thought you
mean a special IPC with pickle you are creating.



Dischi

-- 
+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++
        -- (Terry Pratchett, Hogfather)

pgpR2Us8kAS8X.pgp
Description: PGP signature

[Freevo-devel] Re: improve thumbnailing

Reply via email to