Re: [AOLSERVER] Data corruption with fastpath caching

2008-08-22 Thread Jim Davidson
Ah -- I (finally) understand... I must have missed the detail re:  
serialization in message #30 out of #60 or so


So, this clarifies to me:

-- cache by filename key is correct and good for most cases and should  
be on by default
-- the grace period is a clever solution for the rapid-changing,  
same filename case you described and deserves to be on by default
-- ns_returnfile shouldn't use the cache but already does -- some  
config and/or command flags can be added to toggle the behavior


I'll update the code with the options above.


-Jim





On Aug 21, 2008, at 11:27 PM, John Caruso wrote:


On Thursday 02:34 PM 8/21/2008, Jim Davidson wrote:
To clarify one point:  There is no technical solution to creating  
temp

files with the same name and avoiding the race condition without
additional synchronization.


To clarify as well: the original code didn't involve a race  
condition--it was effectively serialized, as though it were like  
this snippet:


   foreach object $objects {
   eval exec /some/external/program --output-file $tempfile -- 
object $object

   ns_returnfile 200 text/plain $tempfile
   }

(As I mentioned to you, this was basically a batch process driven by  
a client-side Java applet making sequential HTTP requests to an  
AOLserver-driven API web server, one transaction at a time, with the  
results being returned by ns_returnfile on the server.  Also, the  
temp file in question was in a secure directory.)


So the bug can (and did) manifest itself with serialized access.


So, here's what I'd suggest:

-- Cache by filename key should be the default.  This is technically
the correct fix to enable temporary, uniquely named files, to be
returned via ns_returnfile.
-- John's grace period code is a clever optimization if fastpath is
being used in this way and could also be an option, default off.


Again, this wouldn't have resolved Arena's initial problem; the  
original code would still have hit the bug, and it would have been  
just as difficult to detect that that was happening (though slightly  
easier to debug).  That's why I'd recommend having the mtime  
workaround code active with a default of 1--otherwise people running  
a default config of AOLserver will still be open to the same issue.


(That's my only stake in this, BTW; Arena is already using the mtime  
fix and will continue to do so, but I'd really rather not have  
someone else run into this issue in the future.)


In thinking about it today I realized that it's useful to think  
about the four scenarios in which the bug can currently occur (which  
I believe partition the bug space):


1) Monotonically increasing time with a different filename
2) Monotonically increasing time with the same filename
3) Time travelling with a different filename
4) Time travelling with the same filename

(Time travelling here means mucking with the mtime artifically, a  
la rsync, and filename means fully-qualified filename.)


The mtime workaround resolves scenarios 1 and 2, and using the  
filename as the cache key resolves scenarios 1 and 3.  Nothing  
suggested so far resolves scenario 4--and in fact I don't think it's  
possible to resolve scenario 4 short of a major rewrite of the code  
(like Juan's suggestion of using inotify or similar functionality).   
So combining both fixes resolves all of the resolvable issues.


For reference: the bug occurred in scenario 2, and subsequently in  
scenario 1.  And the security implications apply to all four  
scenarios, though they're arguably worst in scenarios 1 and 3.


- John


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] 
 with the
body of SIGNOFF AOLSERVER in the email message. You can leave the  
Subject: field of your email blank.



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to [EMAIL PROTECTED] 
with the
body of SIGNOFF AOLSERVER in the email message. You can leave the Subject: 
field of your email blank.


Re: [AOLSERVER] Data corruption with fastpath caching

2008-08-22 Thread Tom Jackson
Jim,

Can I ask why the filename is important for the cache key? With the
cache delay, the inode/dev + *time + size should do it all.

In fact, I finally understood the difference between mtime and ctime, if
any change is made, it should be the change to ctime. 

Why ctime? ctime is unique in that it isn't something that can be set by
user level programs. It changes whenever the content of the file changes
or permissions, owners, or any of the metadata of the file. 

So, for instance, if someone replaces a file with an identical file, the
ctime would still change. If you check the ctime, you can also skip
checking the size. 

But none of this has to do with the filename. On Unix, filenames are
especially squishy. both stat and open follow symbolic links, and you
could therefore use a symlink to point to different files over time, but
the files could have identical mtime, atime, size, owner, etc. using
stat/open and comparing with what is in cache based upon the filename
would never detect this slight of hand. This is why the inode is most
important on unix. On windows, you can't do this, so filenames are safe.

My recommendation for changes are these:

1. use John's ingenious 1-2 second delay, optionally allow config of the
cache delay.

2. base the age for the above on the ctime, not the mtime. ctime is
always younger or the same age as mtime, and covers changes to metadata,
and is immune from easy modification.

Maybe: remove the fastpath config options from the basic config file, if
it is even there, other example configs could be set to cache = off.

Happy Weekend everyone.

tom jackson

On Fri, 2008-08-22 at 13:18 -0400, Jim Davidson wrote:
 Ah -- I (finally) understand... I must have missed the detail re:  
 serialization in message #30 out of #60 or so
 
 So, this clarifies to me:
 
 -- cache by filename key is correct and good for most cases and should  
 be on by default
 -- the grace period is a clever solution for the rapid-changing,  
 same filename case you described and deserves to be on by default
 -- ns_returnfile shouldn't use the cache but already does -- some  
 config and/or command flags can be added to toggle the behavior
 
 I'll update the code with the options above.
 
 
 -Jim
 
 
 
 
 
 On Aug 21, 2008, at 11:27 PM, John Caruso wrote:
 
  On Thursday 02:34 PM 8/21/2008, Jim Davidson wrote:
  To clarify one point:  There is no technical solution to creating  
  temp
  files with the same name and avoiding the race condition without
  additional synchronization.
 
  To clarify as well: the original code didn't involve a race  
  condition--it was effectively serialized, as though it were like  
  this snippet:
 
 foreach object $objects {
 eval exec /some/external/program --output-file $tempfile -- 
  object $object
 ns_returnfile 200 text/plain $tempfile
 }
 
  (As I mentioned to you, this was basically a batch process driven by  
  a client-side Java applet making sequential HTTP requests to an  
  AOLserver-driven API web server, one transaction at a time, with the  
  results being returned by ns_returnfile on the server.  Also, the  
  temp file in question was in a secure directory.)
 
  So the bug can (and did) manifest itself with serialized access.
 
  So, here's what I'd suggest:
 
  -- Cache by filename key should be the default.  This is technically
  the correct fix to enable temporary, uniquely named files, to be
  returned via ns_returnfile.
  -- John's grace period code is a clever optimization if fastpath is
  being used in this way and could also be an option, default off.
 
  Again, this wouldn't have resolved Arena's initial problem; the  
  original code would still have hit the bug, and it would have been  
  just as difficult to detect that that was happening (though slightly  
  easier to debug).  That's why I'd recommend having the mtime  
  workaround code active with a default of 1--otherwise people running  
  a default config of AOLserver will still be open to the same issue.
 
  (That's my only stake in this, BTW; Arena is already using the mtime  
  fix and will continue to do so, but I'd really rather not have  
  someone else run into this issue in the future.)
 
  In thinking about it today I realized that it's useful to think  
  about the four scenarios in which the bug can currently occur (which  
  I believe partition the bug space):
 
  1) Monotonically increasing time with a different filename
  2) Monotonically increasing time with the same filename
  3) Time travelling with a different filename
  4) Time travelling with the same filename
 
  (Time travelling here means mucking with the mtime artifically, a  
  la rsync, and filename means fully-qualified filename.)
 
  The mtime workaround resolves scenarios 1 and 2, and using the  
  filename as the cache key resolves scenarios 1 and 3.  Nothing  
  suggested so far resolves scenario 4--and in fact I don't think it's  
  possible to resolve scenario 4 short of a