Re: Features: Tablepartitioning, Tablespaces and replication and Loadbalancing

Mike Matrigali 26 Apr 2005 18:02:01 -0000

just my opinion, but for what it's worth:

improved online backup - this seems like a good addition to derby. The current state is that you can take a backup while the system is running, but updating transactions will block until the backup is finished. Recently implemented rollforward recovery makes implementing a full non-blocking online backup a next logical step.

table partitioning-
      The question here is why do want to partition the table.  If it
      just to spread I/O randomly across disks, I don't think it is a
      very useful feature.  The same thing can easily accomplished on
      most modern hardware/OS's at a lower level while presenting the
      disk farm as one disk to the JVM/derby.

      Now if you are talking about key partitioning then that may be
      useful, but only if accompanying work is done to partition
      query execution in parallel against those partitions.  Below
      I will describe one approach that I think is the easiest and most
      maintainable first step towards this.

replication -
      For this is can see a few directions:

1) master/offline slave(s), hot stand by - Again with recent completed work of rollfoward recovery it would not be too hard to set up a secondary system which was ready to take over when a primary failed. Basically copy the db, and then stream the logs across and apply the logs using existing recovery algorithms when you want to bring the system online. Once the first slave initiated update is applied no new updates from master work using this algorithm.

      2) master/read only slave(s), hot stand by, with read only access
         - Building on 1, this would
         again not be too hard.  Some work needed to guarantee read
         access while applying recovery logic online rather than during
         boot.  Save caveats as above

3) master/(read/write slave(s)), very hard - the usual problems, what do you do with with conflicts. Such a system may better be handled by doing a higher level update/conflict tracking than the log. maybe something like mysql does.

Load Balancing - I don't know what you are looking for here.  Would be
    be interested in more detail here.

An approach to a more scalable Derby Database (again just an opinion, and note I don't have plans nor expertise to actually build the following, but would seem like a good project for someone interested in building distributed optimizer technology):

Taking a shared nothing approach to scalability, the following seems like a good first step to providing a more scalable Derby database. Rather than partitioning tables within a single derby database, instead use the existing derby database software in a single node in a multi-node distributed database. To do this build a new piece of software that glues a network of derby databases together, each piece of the database could be on the same machine or different machine.

The new software would handle the following: 1) Some new set of ddl which would could partition a single distributed table across multiple local derby databases. 2) Handle dml, sending it to appropriate local database. 3) optimizer/execution - this is the interesting part. Needs to partition queries, in parallel sending/receiving data from/to local dbs. 4) For extra credit one could build a fault tolerant system by applying RAID algorithms to the local db's. Lose one local DB it could be rebuilt from other replicated db's. 4) probably a lot else I haven't mentioned. Some benefits of doing this in derby: 1) If multiple partioned db's are local to distributed server then all communication can easily using embedded derby server interfaces - making them go fast. In first implementation I would suggest just using standard jdbc between the distributed derby server and the local nodes as the easiest way to get it all working. 2) If using jdbc, same exact code will work to access local vs. networked local db's. 3) Seems like using the same kind of "driver" trick as does the network server, applications could use this new distributed db with no code changes (apart from ddl to set the system up). 4) Using derby modules, one can probably reuse derby code for some pieces (like the sql parser), while not slowing down the core non-networked derby version. If done right a local system can be configured that includes no networked code overhead, while from the same codeline a distributed version can also be built.

I like this approach to a distributed derby database rather than trying to make one set of code handle both local and network paths. An optimizer is hard enough without making a single optimizer handle both local and distributed decisions. It also means that local user performance does not suffer from code path issues from unused code.

[EMAIL PROTECTED] wrote:

Hi all,
In theese Days there are some fine Databases out there but no one has the Features of Java and its scale abilities.For example i cannot mix a MySQL 32 and 64 Bit Database on my given Hardware. I can only use 64-Bit or 64-Bit Systems "only".

MySQL runs fast on Linux and poor on FreeBSD and other UNIX System without some modifications (for example Threadlib issue) or dirty Tricks.

Sometimes a Feature will be avaiable on Windows only and on Linux not e.t.c. With a Java SQL-Database like Derby there is a real Chance to have all the cool Features of the Database at any System at any tme, as long als a J2SE JVM is present.

Derby is the Right Way but is there any Plans to make it Enterprise ready (replication, Loadbalancing of Connections, Online Backup, Table PArtitioning)?
Josh Carpenter

Re: Features: Tablepartitioning, Tablespaces and replication and Loadbalancing

Reply via email to