NFS and Derby

2010-11-11 Thread Kathey Marsden
I have always told users they have to have their databases on a local 
disk to ensure data integrity and that  a system crash for an NFS 
mounted database could cause fatal corruption, but had a user this 
morning take me to task on this and ask me to explain exactly why.  I 
gave my general response about not being able to guarantee a sync to 
disk over the network, but want to have a more authoritative reference 
for why  you cannot count on an NFS mounted disk although I did find 
several places where the sync option favors data integrity which 
certainly doesn't sound like a guarantee.  Does anyone know a good 
general reference I can use on this topic to support my you gotta use a 
local disk mantra.



Also I think our documentation on this topic should be a bit stronger.  
Currently we just say it may not work and probably should be clearer 
that data corruption could occur.  I will file an issue to beef up the 
language based on the conversation in this thread.


http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html

Thanks

Kathey



Re: NFS and Derby

2010-11-11 Thread Donald McLean
A local database on an NFS mounted disk? I would never consider such a thing.

My experience with NFS mounted resources is that network congestion
can cause all sorts of nasty side effects. Even something as simple as
an unexpectedly slow read or write can cause unanticipated cascading
failure conditions. And no matter what value is used for a timeout,
you can pretty much guarantee that it will be exceeded eventually.

I realize that this doesn't address Derby specific concerns such as
database corruption. Fortunately, I have no experience with that.

Donald

On Thu, Nov 11, 2010 at 10:56 AM, Kathey Marsden
kmarsdende...@sbcglobal.net wrote:
 I have always told users they have to have their databases on a local disk
 to ensure data integrity and that  a system crash for an NFS mounted
 database could cause fatal corruption, but had a user this morning take me
 to task on this and ask me to explain exactly why.  I gave my general
 response about not being able to guarantee a sync to disk over the network,
 but want to have a more authoritative reference for why  you cannot count on
 an NFS mounted disk although I did find several places where the sync option
 favors data integrity which certainly doesn't sound like a guarantee.
  Does anyone know a good general reference I can use on this topic to
 support my you gotta use a local disk mantra.


 Also I think our documentation on this topic should be a bit stronger.
  Currently we just say it may not work and probably should be clearer that
 data corruption could occur.  I will file an issue to beef up the language
 based on the conversation in this thread.

 http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html


Re: NFS and Derby

2010-11-11 Thread Lily Wei
I would agree to be safe and able to sleep at night ensure data integration by 
not using NFS mounts database is definitely the way to go.  However, I remember 
there are SAP customers who do that with Oracle. Oracle push the idea to use 
NFS 
mounts database. I was referring to article like: 
http://www.sun.com/bigadmin/features/articles/7000_oracle_deploy.jsp

Some discussion on SAP community network: 
https://forums.sdn.sap.com/message.jspa?messageID=7964399

DB2 does not recommend such operation. 
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.uprun.doc/doc/c0025100.htm



I will not say I would never put a database (the data files) on an NFS 
filesystem. However, I will think three times before doing that. :)

Lily





From: Kathey Marsden kmarsdende...@sbcglobal.net
To: derby-user@db.apache.org
Sent: Thu, November 11, 2010 7:56:25 AM
Subject: NFS and Derby

I have always told users they have to have their databases on a local disk to 
ensure data integrity and that  a system crash for an NFS mounted database 
could 
cause fatal corruption, but had a user this morning take me to task on this and 
ask me to explain exactly why.  I gave my general response about not being able 
to guarantee a sync to disk over the network, but want to have a more 
authoritative reference for why  you cannot count on an NFS mounted disk 
although I did find several places where the sync option favors data 
integrity 
which certainly doesn't sound like a guarantee.  Does anyone know a good 
general 
reference I can use on this topic to support my you gotta use a local disk 
mantra.


Also I think our documentation on this topic should be a bit stronger.  
Currently we just say it may not work and probably should be clearer that data 
corruption could occur.  I will file an issue to beef up the language based on 
the conversation in this thread.

http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html

Thanks

Kathey


  

Re: NFS and Derby

2010-11-11 Thread Peter Ondruška
You could use NFS mounted read only databases as you can do so with
CD/DVD based media.

The risks with read-write databases on NFS devices is (was) that in
the old days of UDP protocol based NFS client/servers your connection
may easily break. It is not the case anymore with decent operating
systems (Solaris for example) and good NFS servers (again mostly
Solaris based or those from famous vendor) and good highly available
network infrastructure. Nowadays your servers disks are likely network
connected anyway (FC SAN, iSCSI).

On Thu, Nov 11, 2010 at 5:18 PM, Donald McLean dmclea...@gmail.com wrote:
 A local database on an NFS mounted disk? I would never consider such a 
 thing.

 My experience with NFS mounted resources is that network congestion
 can cause all sorts of nasty side effects. Even something as simple as
 an unexpectedly slow read or write can cause unanticipated cascading
 failure conditions. And no matter what value is used for a timeout,
 you can pretty much guarantee that it will be exceeded eventually.

 I realize that this doesn't address Derby specific concerns such as
 database corruption. Fortunately, I have no experience with that.

 Donald

 On Thu, Nov 11, 2010 at 10:56 AM, Kathey Marsden
 kmarsdende...@sbcglobal.net wrote:
 I have always told users they have to have their databases on a local disk
 to ensure data integrity and that  a system crash for an NFS mounted
 database could cause fatal corruption, but had a user this morning take me
 to task on this and ask me to explain exactly why.  I gave my general
 response about not being able to guarantee a sync to disk over the network,
 but want to have a more authoritative reference for why  you cannot count on
 an NFS mounted disk although I did find several places where the sync option
 favors data integrity which certainly doesn't sound like a guarantee.
  Does anyone know a good general reference I can use on this topic to
 support my you gotta use a local disk mantra.


 Also I think our documentation on this topic should be a bit stronger.
  Currently we just say it may not work and probably should be clearer that
 data corruption could occur.  I will file an issue to beef up the language
 based on the conversation in this thread.

 http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html




-- 
Peter


Re: NFS and Derby

2010-11-11 Thread Daniel John Debrunner

On 11/11/2010 07:56, Kathey Marsden wrote:

I have always told users they have to have their databases on a local
disk to ensure data integrity and that a system crash for an NFS mounted
database could cause fatal corruption, but had a user this morning take
me to task on this and ask me to explain exactly why. I gave my general
response about not being able to guarantee a sync to disk over the
network, but want to have a more authoritative reference for why you
cannot count on an NFS mounted disk although I did find several places
where the sync option favors data integrity which certainly doesn't
sound like a guarantee. Does anyone know a good general reference I can
use on this topic to support my you gotta use a local disk mantra.


Part of the issue is that that documentation is really old and file 
systems have moved on since it was written. There are other shared file 
systems that maybe do support integrity across the network with Derby, 
e.g. IBM's GPFS. Thus it's more complicated than local disk versus NFS.



Also I think our documentation on this topic should be a bit stronger.
Currently we just say it may not work and probably should be clearer
that data corruption could occur.


The documentation may need to state what Derby requires (sync through 
Java APIs ensure the data is recoverable) and then have per-file system 
sections, filled out on a scratch your own itch approach. E.g. even a 
local disk is not recoverable if the OS is performing disk caching.


Dan.





Re: NFS and Derby

2010-11-11 Thread Mike Matrigali

Kathey Marsden wrote:
I have always told users they have to have their databases on a local 
disk to ensure data integrity and that  a system crash for an NFS 
mounted database could cause fatal corruption, but had a user this 
morning take me to task on this and ask me to explain exactly why.  I 
gave my general response about not being able to guarantee a sync to 
disk over the network, but want to have a more authoritative reference 
for why  you cannot count on an NFS mounted disk although I did find 
several places where the sync option favors data integrity which 
certainly doesn't sound like a guarantee.  Does anyone know a good 
general reference I can use on this topic to support my you gotta use a 
local disk mantra.



The problem is one of documentation and implementation of nfs.  I don't
think there is just one nfs out there.  And there are definitely all 
sorts of other remote mounting options.


Some of the problems that can arise, that are avoided in local disk and
thus why to be safe we have documented we can't guarantee support include:

1) We may not be able to prevent dual booting and thus db may get corrupted.
All of our algorithms for preventing dual booting rely on the jvms that
are accessing the database to be on the same machine.  Once 2 machines 
can access the same file we have no way to prevent corruption.


2) Derby depends on synchonous write behavior when requested.  Basically 
at certain times Derby asks the JVM to guarantee that data to a table or 
recovery log file has been written and forced to disk before returning.

If this syncing is not correct a number of database problems can happen
such as:
1) we tell user a transaction was commited because we believe the log
   was forced, but the nfs was caching the result and crashes.  Now
   the committed xact is not there.
2) we want to remove some recovery log so we force data to disk, wait 
for it to hit disk and the delete the log file for those disk updates.

But data is actually cached and lost and now we have old data in the
db and no log files to recover it from.

 When this was first documented I don't believe any JVM implementation 
on top of nfs could guarantee a completed synchronous write.
It may be the case that certain remote file system implementations now 
can guarantee this, and it may be the case that the JVM implementations 
make the right calls to the nfs file system to do this - but I believe 
it is a support nightmare to try and support this.


A quick google of nfs topics seems to indicate that there may be some 
versions of nfs that do support write sync.  I believe this because most
of the hits that I got were descriptions of how to disable the syncing 
to get better performance, indicating that many of nfs that might 
support write sync actually have it disabled.  I did not see anyway that 
a java program could find out if the required syncing was being enforced.


Note that we also can not guarantee recovery on disks with write cache
enabled, which I believe many users have set.  Many may not even know it
as I believe it is the default for some disk installations.




Also I think our documentation on this topic should be a bit stronger.  
Currently we just say it may not work and probably should be clearer 
that data corruption could occur.  I will file an issue to beef up the 
language based on the conversation in this thread.


http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html

Thanks

Kathey






Re: NFS and Derby

2010-11-11 Thread Mike Matrigali
And for some really ancient history (at least 10 years ago), I believe 
this bit of documentation

actually resulted from one of the developers acidently running the set
of tests in their home directory on nfs and getting errors.  So at least 
at that time it didn't even take a crash to make something fail across nfs

vs local disk.  I don't think we have done any testing on remote file
systems on purpose since then.

Kathey Marsden wrote:
I have always told users they have to have their databases on a local 
disk to ensure data integrity and that  a system crash for an NFS 
mounted database could cause fatal corruption, but had a user this 
morning take me to task on this and ask me to explain exactly why.  I 
gave my general response about not being able to guarantee a sync to 
disk over the network, but want to have a more authoritative reference 
for why  you cannot count on an NFS mounted disk although I did find 
several places where the sync option favors data integrity which 
certainly doesn't sound like a guarantee.  Does anyone know a good 
general reference I can use on this topic to support my you gotta use a 
local disk mantra.



Also I think our documentation on this topic should be a bit stronger.  
Currently we just say it may not work and probably should be clearer 
that data corruption could occur.  I will file an issue to beef up the 
language based on the conversation in this thread.


http://db.apache.org/derby/docs/10.5/devguide/cdevdvlp40350.html

Thanks

Kathey






Re: NFS and Derby

2010-11-11 Thread Kathey Marsden

On 11/11/2010 11:27 AM, Mike Matrigali wrote:
[snip good summary of support  limitations]
  I did not see anyway that a java program could find out if the 
required syncing was being enforced.


Would it be reasonable to request such an API call in some future java 
version or would it just simply be impossible to implement?   End users 
are often victims of what seems to work until it doesn't especially as 
half the time folks don't know they are using Derby, much less know the 
settings and limitations of  file system they are using.  It would be 
great to be able to throw a clear message if a reliable sync is not 
possible.







Re: NFS and Derby

2010-11-11 Thread Daniel John Debrunner

On 11/11/2010 16:59, Kathey Marsden wrote:

On 11/11/2010 11:27 AM, Mike Matrigali wrote:
[snip good summary of support limitations]

I did not see anyway that a java program could find out if the
required syncing was being enforced.


Would it be reasonable to request such an API call in some future java
version or would it just simply be impossible to implement?


Actually there already is, the FileDescriptor.sync() call has this in 
its defined contract:


SyncFailedException - Thrown when the buffers cannot be flushed, or 
because the system cannot guarantee that all the buffers have been 
synchronized with physical media.


The problem maybe that the JVMs just do not implement this according to 
the spec. It also maybe true that a VM has no way of knowing if it could 
guarantee a sync, thus in reality it would throw SyncFailedException all 
the time ...


Dan.




Re: NFS and Derby

2010-11-11 Thread Lily Wei
On Nov 11, 2010, at 5:21 PM, Daniel John Debrunner d...@apache.org wrote:

 On 11/11/2010 16:59, Kathey Marsden wrote:
 On 11/11/2010 11:27 AM, Mike Matrigali wrote:
 [snip good summary of support limitations]
 I did not see anyway that a java program could find out if the
 required syncing was being enforced.
 
 Would it be reasonable to request such an API call in some future java
 version or would it just simply be impossible to implement?
 
 Actually there already is, the FileDescriptor.sync() call has this in its 
 defined contract:
 
 SyncFailedException - Thrown when the buffers cannot be flushed, or because 
 the system cannot guarantee that all the buffers have been synchronized with 
 physical media.
 
 The problem maybe that the JVMs just do not implement this according to the 
 spec. It also maybe true that a VM has no way of knowing if it could 
 guarantee a sync, thus in reality it would throw SyncFailedException all the 
 time ...
 
Is there any issue fire against the unexpected behavior based on spec?