David E Jones wrote: > > This is an interesting overview and while I'm not sure why I hadn't > thought along these lines before, at least it's through my thick skull > now... > > I asked Adam about how this would deploy on multiple servers with the > stuff in the filesystem versus the database, and I think what you've > written Ean is the answer. > > Why not treat a source repo (either plain SVN or something more exotic > like GIT) like the database? Each app server would read from and write > to the source repo just like it would a database record. If SVN or GIT > support 2-phase commits we could probably even do write operations in > the a transaction that includes connections to both data stores. > > For performance reasons you'd want to cache content from the source repo > just like you would content from a relational database. If it's really > too terribly slow even doing that (ie reading directly from the repo and > caching) you could cache it locally in the app server's file system, > though it would probably be best to never write directly to the local > filesystem and you'd want some sort of timeout or other logic to > invalidate the file system cache just like you'd do with the in memory > cache (actually UtilCache supports this sort of thing, though now with > straight files in the filesystem, just a sort of mini-database for local > filesystem caching of data). > > Anyway, is this something you guys have considered for WebSlinger?
I've got a commons-vfs filesystem implementation that uses git plumbing to store content. Every single mutation causes a new 'tree' hash to be created in git. It uses jgit to do this. However, we don't currently use it, it was more of a quick test. One major problem with jgit is that it reads the entire file into memory, which will not work with large files. I have not tested whether this interoperates with other git porcelain. However, all that is moot. GIT is not a shared-write system. Each instance is completely local. You have your own copy of the repo, per install. You mutate it however. Then either you push to another machine/repo, or the other machine pulls from you. This could be made to work, doing some kind of anonymous ssh pulse thing, but it'd be a heavy system integration, which ofbiz tends not to do. > For the OFBiz Content side of things you could pretty easily have a > DataResourceType for data in a source repo (ie instead of LOCAL_FILE > something like REPOSITORY_FILE). On the DataResource entity the > objectInfo field would have the URL/location of the resource (ie like > the SVN/HTTP URL), and we could add a field like "revisionNumber" to > specify which revision we want or null to get the head revision (I was > thinking we could use the existing ContentRevision/Item entities for > this, but looking at them it seems they wouldn't work so well and are > really meant for a revision control built on top of the Content and > DataResource entities, and not one that would describe revision > information pointed to by them). The "revisionNumber" could also go on > the Content entity so that we could have multiple Content records with > different revision numbers pointing to the same DataResource records and > reduce how many DataResource records we would require. That would also > better fit how Content and DataResource are meant to work together, but > on the other hand might be somewhat confusing. No, no, you can't use a revisionNumber. They don't exist. Distributed systems change that completely. > Thoughts anyone? > > Oh, one more thing... I know there are some Java libraries for SVN, and > there probably are some for GIT... has anyone played with these? I've look at the documentation for svn/java; I've actually used jgit(however, it's been a few years).
