On Fri, Mar 26, 2010 at 08:55:09PM -0500, Peter Karman wrote:
> > After the change, it would look like this:
> >
> > {
> > "entries" : [
> > "schema_3.json",
> > ],
> > "segments" : [
> > "seg_3"
> > ],
> > "format" : "1"
> > }
> >
>
> sounds good.
Thanks, and you're right, that would have been a fine solution. However, I
discovered while trying to implement it that things were a little more
complicated than I'd thought, so I revisited the simplest of the alternates I'd
originally discarded -- and I believe I've arrived at an improvement.
(It's always worthwhile to spend extra time and effort exploring simple
solutions when it comes to file formats, since file formats impose such a heavy
backwards compatibility burden.)
The new approach is to have directories listed in a snapshot file serve as
stand-ins for their contents -- so when the snapshot is in use, they protect
their contents recursively, and when the snapshot becomes obsolete, their
contents becomes subject to recursive deletion.
I originally thought to implement this by calling Folder_Delete_Tree() on each
entry in the snapshot and ruled it out because it seemed too draconian. But
Delete_Tree() turns out not to have been the right way to go about things
anyway, and the final implementation is much more sane.
The advantage of this approach is simplicity: Snapshot files continue to be
only a single flat list, as there's no need to add a new "segments" key. Even
better, snapshot files contain vastly fewer entries after this change, making
them easier to read.
{
"entries" : [
"schema_3.json",
"seg_3"
],
"format" : "1"
}
Yet we still achieve our desired result of avoiding the need to implement
Delete_Segment() for all DataWriters. Writing plugins just got a little
easier. :)
To make this approach safe security-wise, I had to take a couple steps.
First, I had to make sure that we don't follow symlinks when recursing so that
e.g. a maliciously placed symlink to "~" can't wreak havok. Second, I had to
ensure that we don't follow filepaths upwards out of the index directory, so
that a maliciously crafted snapshot file containing e.g. "../../../" can't do
any harm.
I actually uncovered an existing minor security vulnerability during this
work: under the current release, a maliciously crafted snapshot file
containing variants on e.g. "../../../../etc/passwd" could cause problems if
all the stars align. An attacker would first need to supply a malicious
index, which that most KinoSearch users aren't vulnerable since few if any use
untrusted indexes. Then there would need to be a permissions mistake for an
attacker to do something like bring down a box. However, if the attacker
guesses correctly on filepaths belonging to a user ("../../mail",
"../../../mail"), they could do some damage. There's no limit on the number
of entries in a snapshot file, so an attacker would be free to make a very
large number of guesses.
We should probably make a 0.30_091 bugfix release to get that vulnerability
taken care of, as well as the FreeBSD alloca() problem that's showing up in
CPAN testers (solution is to #include <stdlib.h> to get alloca() on that
platform).
The mods I've made since 0.30_09 dropped are all backwards compatible -- even
with the changes to FilePurger implementing the snapshot file semantics
change, we won't break compat until we empty out the Delete_Segment() routines
for all of the DataWriter subcomponents.
I'll probably get the alloca() problem sorted with a Charmonizer patch later
today.
Marvin Humphrey