[jira] Commented: (DERBY-2872) Add Replication functionality to Derby

JIRA Mon, 09 Jul 2007 02:57:25 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511066
 ]


Jørgen Løland commented on DERBY-2872:
--------------------------------------

Thank you for the extensive comments from both Rick and Narayanan. I have a few 
supplementary comments to those from Narayanan.

>>Looks like you have addressed issue (1). I see in your comments above, that 
>>you are in agreement about how to address issue (2), but I don't see this 
>>reflected in the new spec itself. I'm getting the impression that the answer 
>>to (3) and (4) is that the first rev of replication won't handle these 
>>issues; instead, they will be addressed in a later rev. Is that right?
>I interpret it that a manual startup is planned for now. Is a auto startup on 
>the cards? 

Re 2: It says so below the table of NetworkServerControl commands, but I will 
make it clearer in the next version of the spec.
Re 3 and 4: That's correct; in the first rev, there will be no automatic 
restart of replication when one of the instances have failed. The DB owner will 
have to manually restart replication. A later improvement may automate this 
step; this is a good candidate for extending the functionality later.

>>5) A heads-up about the user/password options on the new server commands. 
>>There has been some discussion about authenticating server shutdown 
>>operations and general agreement that the current situation is confusing. 
>>DERBY-2109 intends to add credentials to the server shutdown command. I think 
>>that the same api should be used to specify username and password for all of 
>>our server commands--whatever that api turns out to be.
>Thank you for this pointer. I guess taking the same lines as 2109 is the thing 
>to do here. 

I agree. There is no reason why authentication for replication should differ 
from other commands. The NetworkServerControl commands I wrote in the func spec 
show what information is needed. I will modify the next version of the func 
spec to state that authentication is needed, and should be performed in the 
same manner as other NetworkServerControl commands.

>>6) I think it would be clearer if the url option were called slaveurl. Do we 
>>need a symmetric masterurl option for the startslave command? How does the 
>>slave know that it is receiving records from the correct master? What  
>>happens if two masters try to replicate to the same slave?

>This would be an issue I guess because the slave would assume both to be 
>legitimate unless we send the database name each time.
>But what would happen if both use the same database also.
>Can this be eliminated by having a handshake phase before the actual log 
>transfer occurs. So if the same url is being used for a second handshake we 
>would reject this unless this is a reconnect attempt after the master has
crashed. 

We should only allow one connection to a slave database. A handshake sounds 
like a good idea. 

>>7) Is the startmaster command restricted to a server running on the same 
>>machine as the master database? Similarly, is the startslave command 
>>restricted to a server on the slave database machine? What about failover and 
>>stop?

I think the start and failover commands needs to be restricted to the same 
machine as the database resides, but this depends on the NetworkServerControl 
security. Again, this should be equal to the policy for other 
NetworkServerControl commands. See 12) for how to stop replication.

>>8) I am confused about the startslave command. Does this create a new 
>>database? If so, how are the credentials enforced in the case that 
>>credentials are stored in the database? If not, what happens if there is 
>>already a database by that name? Is the database destroyed and replaced after 
>>authentication?

Since this has not been implemented yet, the solution may have to change later. 
However, the current intention is that the first thing that happens on the 
slave is that it receives the database 'x' from the master. When 'x' has been 
received, the slave starts the boot process of 'x'. So, the slave does not 
create 'x', even though it did not exist on the slave when the startslave 
command was issued. 

We will have to check that a database with the same name does not exist on the 
slave. Furthermore, we should probably ensure that the owner of 'x' is allowed 
to create a database on the slave. Did you think of any other permissions we 
should check for? Maybe a allowedToReplicate credential would be needed?

>>9) If you have stopped replication, can you resume it later on?
>If stopping replication means that we will not archive logs anymore I guess 
>this will not be possible. If the logs are still archived we can transmit from 
>the log after replication has been stopped and the slave can still redo from 
>there and replication from continue. That is we should not call 
>SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after stopping 
>replication. Guess the user should be able to decide this.

I am not sure about this. If a failover was performed, the answer is definately 
'no' because the repliaction method assumes that the physical layout of the 
databases are equal. A failover will not preserve this exactly equal physical 
layout since the failover process will undo uncommitted transactions. If the 
replication was simply turned off, Narayanans suggestion of starting log 
shipment from some defined log record will probably work. 

However, I think we have to be restrictive in the first version of the 
functionality. For now, I think the answer will be 'no', i.e., you have to 
restart replication by first deleting the database (on the slave), and then 
send the entire database to the slave. Resuming replication makes a good 
candidate for extending the functionality.

>>10)  What is the sequence of these commands? Do you first issue a startmaster 
>>and then issue a startslave? What happens if the commands occur out of 
>>sequence? Similarly for 
>Since the startslave starts a listener this should be done first before 
>startmaster. 

It is correct that the slave will be listening for the master and therefore 
must be started before replication can start. However, I see no reason why the 
connection attempts should not be retried every now and then until the slave is 
ready to accept the connection.

Hence, I don't think we need a defined sequence of commands. When the slave 
starts, it does nothing until a master connects to it (except write some 
messages to derby.log). When the master is started, it continues as normal 
(also writes some messages to derby.log) until it is able to get a connection 
to the slave. 

>>11) It would be nice to understand how we insulate replication from 
>>man-in-the-middle attacks--even if we don't implement these protections in 
>>this first version.

That is a good point. It would, e.g., be possible to use a signature. The slave 
could send a hashed username to the master, and the master could respond by 
sending the hashed password. It should not be possible to "unhash" the 
username/password. But I am no security expert, hence input on this issue is 
appreciated. And you are right; this will not be handled in the first version.

>>12) What happens if someone tries to connect to an active slave? What happens 
>>if someone tries to shutdown an active slave without first stopping 
>>replication at the master's end?

If someone tries to connect to a db 'x' that has the slave role in derby 
instance 'i', the connection is refused. Note that the derby instance 'i' may 
manage other databases at the same time. Making a connection to these other 
databases is unaffected by replication.

>A connect attempt from the master would fail and the master would report that 
>the connection has been terminated due to the slave not being able to be 
>reached or that a slave could not be found. Would this case be different from 
>trying to connect to a Derby NetworkServer when it has been shutdown? 

The initial plan was to allow shutdown at both ends. Now that you mention it, 
however, stopping replication from the master seems to be more clean. Hence, I 
think the revised plan should be as follows: Stopping replication will be 
performed by issuing the stopreplication command at the master. The master then 
sends a stop replication message over the network connection to the slave.

>>13) What happens if the slave is shut down and then, later on, someone tries 
>>to boot the slave as an embedded database?

That will be allowed. In this case, the database will then boot to a 
transaction consistent state that includes all transactions that were committed 
(and sent, of course) before the shutdown.

> Add Replication functionality to Derby
> --------------------------------------
>
>                 Key: DERBY-2872
>                 URL: https://issues.apache.org/jira/browse/DERBY-2872
>             Project: Derby
>          Issue Type: New Feature
>          Components: Miscellaneous
>    Affects Versions: 10.4.0.0
>            Reporter: Jørgen Løland
>            Assignee: Jørgen Løland
>         Attachments: proof_of_concept_master.diff, 
> proof_of_concept_master.stat, proof_of_concept_slave.diff, 
> proof_of_concept_slave.stat, replication_funcspec.html, 
> replication_funcspec_v2.html, replication_script.txt
>
>
> It would be nice to have replication functionality to Derby; many potential 
> Derby users seem to want this. The attached functional specification lists 
> some initial thoughts for how this feature may work.
> Dag Wanvik had a look at this functionality some months ago. He wrote a proof 
> of concept patch that enables replication by copying (using file system copy) 
> and redoing the existing Derby transaction log to the slave (unfortunately, I 
> can not find the mail thread now).
> DERBY-2852 contains a patch that enables replication by sending dedicated 
> logical log records to the slave through a network connection and redoing 
> these.
> Replication has been requested and discussed previously in multiple threads, 
> including these:
> http://mail-archives.apache.org/mod_mbox/db-derby-user/200504.mbox/[EMAIL 
> PROTECTED]
> http://www.nabble.com/Does-Derby-support-Transaction-Logging---t2626667.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2872) Add Replication functionality to Derby

Reply via email to