Viktor Klang wrote:
My not so extensive experience has told me that it depends on the kind of schema you're building. For something like a Twitter-clone you probably won't run into this unless you've done some bad planning,
but I definitely would agree with you that it could be(come) a big issue.
I'd love to hear if someone's had problems with this and what their domain/use was.

Regarding schema evolution...

The good old (err, ancient!) Z39.50 protocol used by libraries for distributed search is actually quite good for this. Lots of silly things in the protocol specification itself, but the semantic model is quite good. There are abstraction layers between the physical representation, what you query on, and what you retrieve (and more). The abstraction of the query model for example allows you to send the same query to different collections even if the schema was not identical. They just had to support the same query fields used by the query to evaluate the query. Actually, its even more general than that - you can set it up to return zero matches for unknown query fields rather than aborting with an error. This allowed introduction of newer versions of database schemas with backwards compatibility to old applications. (I am simplifying a bit here!)

We still use Z39.50 today in the non-SQL database system we develop at work (TeraText.com). We have customers who want to log a continuous flow of arriving information (e.g. syslog messages), retiring off old content. E.g. create a new database each week and keep the last 26 databases around for 6 months of historical data. Then query across the appropriate subset of databases to find results. Z39.50 makes it easy to introduce schema changes into next week's database while still being able to search across all the older databases as well. (Obviously only the new database would find matches on searches specifying newer query fields.)

Schema changes are typically not frequent, but when some new query comes along that the customer wants to be able to do, the ability to introduce new fields is very useful - especially if its a high volume of content. Rebuilding all the old databases to retrospectively add new indexes can take a long time and would potentially take the service off line, making it not so desirable.

Alan

--
You received this message because you are subscribed to the Google Groups "The Java 
Posse" group.
To post to this group, send email to javapo...@googlegroups.com.
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en.

Reply via email to