Am Freitag, dem 18.05.2012 um 15:38 schrieb Klaas Freitag: > On 17.05.2012 22:26, Brad McEvoy wrote: > Hi Brad, > > thanks for your interesting feedback! I think your post did not make it > to the mailinglist, but I'll forward it with this answer. > > > > > I'm not a developer on OwnCloud, but i did a dotcom startup a while back > > trying to be a file sync service like dropbox, but was a bit late to the > > party > > > > I'm now converting that to an open source project (see > > https://github.com/Spliffy/spliffy - similar goals, but much less > > advanced then owncloud, java based). I posted to this list a few months > > back suggesting that we share experience and work towards a standards > > based and interoperable toolset. I think standards and interopability > > would generally strengthen the open source offerings as opposed to the > > closed source services currently proliferating. > Yes, standards are good. And I tried to stay as tight to WebDAV as > possible yet to keep the door open for interoperability. > > > > Regarding your question below I'd like to share my experience. I first > > implemented path based sync, as you have done. I have since come to > > believe this is far from optimal. And others from mature and established > > sync product companies share that view. > > > > What git does, and i think this is a good model for any sync tool, is > > calculate hashes (ie checksums) for files and for directories. Where the > > hash for a directory is the checksum of a formatted list of its members > > names and hashes. This means that the root folder has a hash which > > uniquely identifies the current state of everything inside it. The > > client can calculate the same hash for its contents. So, to check if > > files are in sync you simply compare the hash of the root directories on > > client and server. If they are different you walk down the directory > > tree, ignoring directories that have the same hash on client and server, > > and locating changed items based on their relative checksums. This is > > very fast, very efficient, and very very robust. Its easy to integrate > > into a webdav server as its just an extra propery in PROPFIND or header > > in a HEAD response. It requires server support so that any change to any > > resource results in updated hashes right up to the syncronisation root. > > I understand the concept and indeed its good. It's very near to what I > want to implement, with the only difference that instead of the hash > sums, I'd like to use the mtimes, as csync does. Why do we think thats a > benefit: Well, based on the mtimes its decideable which version is > newer. Moreover, the mtime is already a natural meta data in each file > system, so we do not have to add something new. That given, csync runs > without server support by now. >
But using mtimes you have the requirement to have sync'ed times, which maybe cannot be properly setup on every system (e.g. webspace you have no influence.). A checksum has not that issue. Just my two cents on that. Tom Using the checksum will be the starting point to implement de-doubing as well. Maybe we need it anyway sooner or later. > What is missing is the propagation of the mtimes from individual files > and directories to their parent directory. If we do that with the > ownCloud server support, I think we will have the same benefits that you > described above. As we have the data in a database server side we will > be able to retrieve the data fast. > > > > > Note that there is a related RFC - http://tools.ietf.org/html/rfc6578 - > > however I'm not confident that the approach outlined there is quite right. > Do you know if its implemented in a WebDAV server already? > > > Of course finding what files are new or updated is one thing, > > communicating those changes efficiently is another. Spliffy uses a > > similar approach to Bup (https://github.com/apenwarr/bup) to split files > > into blobs which are stable with respect to file changes. Only changed > > blobs are transmitted. > > > > The hashsplitting algorithm is **very** simple, and if you're not doing > > something like this yet i suggest you take a peek - > > https://github.com/HashSplit4J/hashsplit-lib > Thats cool and is a problem we also still have on our list to tackle. > I stumbled over this already and wonder if there is a C or C++ lib for that. > > > Sorry for the long post, and I hope this is of some assistance. > Great, I really appreciate your input. > > Best, > > Klaas > > > > > On 17/05/2012 9:12 p.m., Klaas Freitag wrote: > >> Hi, > >> > >> one of the biggest shortcomings of the sync client currently is that > >> it does a full scan of its the ownCloud directories via webdav to > >> query the last modified times. That causes load and other trouble. It > >> would be great to find out if something has changed server side more > >> cheaply. > >> > >> We have the file system cache which also has the mod times in the > >> database. My idea is now, instead of querying every single file, I > >> just issue a HEAD request on the top sync directory and get the latest > >> modtime of all files in that dir back. If that is younger than the one > >> I know, I have to do a sync. > >> > >> I know that it could be even more cool, ie. delivering the list of > >> files back etc. but lets do small steps. Doing just one HEAD instead > >> of querying the whole tree already will be great. > >> > >> The implementation seems easy: Just get all database id's of the > >> fscache table entries below the top directory of the sync dir and do > >> kind of > >> SELECT MAX(mtime) FROM fscache WHERE id in ( list-of-all-ids-in dir ); > >> That should be fast enough. > >> > >> My question now is: How do we do that? Should we have another app > >> called /files/sync? Or do we want to enhance the WebDAV server to be > >> able to do the described logic if a HEAD request on a dir comes in? > >> > >> I think the latter is more "within the concept" of doing the sync via > >> WebDAV, OTOH a sync app could be useful anyway for other sync related > >> server support. > >> > >> What do you think? > >> > >> Thanks, > >> > >> Klaas > >> _______________________________________________ > >> Owncloud mailing list > >> Owncloud@kde.org > >> https://mail.kde.org/mailman/listinfo/owncloud > > _______________________________________________ > Owncloud mailing list > Owncloud@kde.org > https://mail.kde.org/mailman/listinfo/owncloud _______________________________________________ Owncloud mailing list Owncloud@kde.org https://mail.kde.org/mailman/listinfo/owncloud