Hi, With this new system can we still continue using the operating system filesystem ? If i have all my owncloud users shared also with samba, can I still use it ? If I want to share all my /home users folder with owncloud, can I still use it ?
Well the good question is : is it a file cache or the only entry point for file access ? Second question. I have closed the browser before the end of the scan (after 2hours ...) and I can't see all my files .. anyway to continue the scan ? Thanks Christophe 2012/2/9 Klaas Freitag <[email protected]> > On 08.02.2012 17:46, Robin Appelman wrote: > > Hi, > > I browsed a bit through the filecache code today and have some questions. > I am nitpicking here a bit because I think if we want to scale up to many > files this might become a performance bottleneck if we dont take care from > beginning. OTOH we can get real benefit from this cache, thanks for > starting that :-) > > > Earlier today I merged the filesystem branch into master, >> the filesystem branch holded multiply improvements to the entire >> filesystem >> infrastructure of ownCloud, including the option to access the files >> outside >> the users home folder, and caching of file info in the database for quick >> access. >> > > - Table layout of fscache: > * path and name columns: I think we should get rid of the name column to > keep the table small and avoid redundancies. The name col is AFAICS only > used in search(), there and in other places the name can be easily computed. > * user string: I strongly would stay away from a string based user col, > for two reasons: The string is more costly than an int, and, the user name > might not always be unique. Imagine we authenticate from two independent > sources of user data (LDAP and local for example) than there can be users > with the same name. Thats problematic anyway, but way better to handle if > you have an id to an owncloud user object that covers that kind of problems > > BTW - wouldn't it make sense to drop the user dependency completely and > create the fscache db within the users space, meaning one for every user, > maybe even in memory? Not sure, have never tried. > > * mimetype normalisation: I think the mimetypes should be normalized. The > mimetypes table can be cached in a var and the table becomes smaller. > > - Indexes: > Currently existing indexes AFAICS: > index|parent_index | oc_fscache |(parent ASC) > index|parent_name_index| oc_fscache |(parent ASC, name ASC) > > There are missing some IMO: > * on path -> used in get() > * on (name, user) -> used in search. This is a LIKE SELECT which is > difficult anyway, see > http://www.sqlite.org/**optoverview.html#like_opt<http://www.sqlite.org/optoverview.html#like_opt> > As said, I would try to get rid of name and possibly also of user. > * on (mimepart, user) -> used in searchByMime > > - TRANSACTION > For mass INSERTs, we should explicitely call BEGIN and END Transaction > > - while loops calling functions > Code running in while loops (here often readdir over all files in a dir) > often call sub functions in which others are called... Each of them can do > SQL statements independently. > > Database interaction becomes faster if a prepare statement is not called > for each and every individual execution of a statement, but once and than > executed for a list of values. So it might make sense to call prepare in an > outer function and hand the $query object to called subs. > > One example for a loop is > updateFolder() -> fileSystemWatcherWrite() -> scanFile() -> put() > > in updateFolder is the readdir loop and put() finally does UPDATE or > INSERT statements. In between there are SELECTs here and there. > > Often this can be solved by first collecting all object data in code, for > example the isUpdated thing: Now there is a loop over readdir, calling the > isUpdated() function, it does a prepare( "SELECT mtime...") for each path. > Maybe it would be better to first collect the paths like > while( readdir ) pathlist.append(path) > and than call something like > SELECT mtime from fscache WHERE path in (explode pathlist) > > - paths: Paths can be complicated anyway, because there are many starting > with the same string... I have seen system which store a hash such as an > MD5 in this kind of cache to have more powerful search support. Mabbe that > would be worth a try :-) > > Some of the points I made are argueable and a bit fishy and depend on a > lot of parameters such as the database, the kind of data etc. pp. It would > be good to have a testing and performance measuring framework for this I > think to really fix the measurements. > > Again, sorry if that sounds like wise-guying, thats not intended. Thanks > for picking this difficult but important task. I am very happy to discuss > and help whereever needed :-) > > regards, > > Klaas > > ______________________________**_________________ > Owncloud mailing list > [email protected] > https://mail.kde.org/mailman/**listinfo/owncloud<https://mail.kde.org/mailman/listinfo/owncloud> >
_______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
