On 10/12/2010 11:26 AM, Adrian Crum wrote:
On 10/12/2010 9:23 AM, Adam Heath wrote:
On 10/12/2010 11:06 AM, Adrian Crum wrote:
On 10/12/2010 8:55 AM, Adam Heath wrote:
On 10/12/2010 10:25 AM, Adrian Crum wrote:
Actually, a discussion of database versus filesystem storage of
content
would be worthwhile. So far there has been some hyperbole, but few
facts.

How do you edit database content? What is the procedure? Can a simple
editor be used? By simple, I mean low-level, like vi.

How do you find all items in your content store that contain a certain
text word? Can grep and find be used?

How do you handle moving changes between a production server, that is
being directly managed by the client, and multiple developer
workstations, which then all have to go first to a staging server? Each
system in this case has its own set of code changes, and config+data
changes, that then have to be selectively picked for staging, before
finally being merged with production.

What about revision control? Can you go back in time to see what the
code+data looked like? Are there separate revision systems, one for the
database, and another for the content? And what about the code?

For users/systems that aren't capable of using revision control, is
there a way for them to mount/browse the content store? Think nfs/samba
here.

Storing everything directly into the filesystem lets you reuse existing
tools, that have been perfected over countless generations of
man-years.

I believe Jackrabbit has WebDAV and VFS front ends that will accommodate
file system tools. Watch the movie:

http://www.day.com/day/en/products/crx.html

Front end it wrong. It still being the store itself is in some other
system(database). The raw store needs to be managed by the filesystem,
so standard tools can move it between locations, or do backups, etc.
Putting yet another layer just to emulate file access is the wrong way.

<brainstorming>
Let's make a content management system. Yeah! Let's do it! So, we need
to be able to search for content, and mantain links between
relationships. Let's write brand new code to do that, and put it in the
database.

Ok, next, we need to pull the information out of the database, and serve
it thru an http server. Oh, damn, it's not running fast. Let's have a
cache that resides someplace faster than the database. Oh, I know,
memory! Shit, it's using too much memory. Let's put the cache in the
filesystem. Updates now remove the cache, and have it get rebuilt. That
means read-only access is faster, but updates then have to rebuild tons
of stuff.

Hmm. We have a designer request to be able to use photoshop to edit
images. The server in question is a preview server, is hosted, and not
on his immediate network. Let's create a new webdav access method, to
make the content look like a filesystem.

Our system is getting heavily loaded. Let's have a separate database
server, with multiple web frontends. Cool, that works.

The system is still heavily loaded, we need a super-huge database server.

Crap, still falling over. Time to have multiple read-only databases.
</brainstorming>

or...

<brainstorming>
Let's store all our content into the filesystem. That way, things like
ExpanDrive(remote ssh fs access for windows) will work for remote hosted
machines. Caching isn't a problem anymore, as the raw store is in files.
Servers have been doing file sharing for decades, it's a well known
problem. Let's have someone else maintain the file sharing code, we'll
just use it to support multiple frontends. And, ooh, our designers will
be able to use the tools they are familiar with to manipulate things.
And, we won't have the extra code running to maintain all the stuff in
the multiple databases. Cool, we can even use git, with rebase and
merge, to do all sorts of fancy branching and push/pulling between
multiple development scenarios.
</brainstorming>

If the raw store was in the filesystem in the first place, then all this
additional layering wouldn't be needed, to make the final output end up
looking like a filesystem, which is what was being replaced all along.

Okay. Will webslinger provide a JCR interface?

It could. Or maybe jackrabbit should have it's filesystem backends improved(or created).

However, the major problem we have with all those other systems, is still a big performance issue. Synchronization sucks for load. Webslinger doesn't synchronize. It makes *very* heavy use of concurrent programming techniques. The problem arises when certain api definitions require you to call multiple methods to fetch, then update. Such methods are broken, when doing non-blocking algorithms. So the fix in those situations is to have a synchronized block. But then you have a choke point.

*Any* time you have 2 separate methods, get(or contains), followed by a put(or remove), you must deal with multiple threads doing the exact same thing. You can either synchronize, or alter the later methods with put(key, old, new) and remove(key, old). The crux of the concurrent model is to move mutator type calls into a single method, that can eventually do something close to CAS.

Reply via email to