zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread John Plocher
On Tue, Mar 30, 2010 at 1:36 PM, Nicolas Williams
 wrote:
> On Tue, Mar 30, 2010 at 02:04:39PM -0600, Tim Haley wrote:
>> It would be easy enough for me to print a 'time' column as the first


This is getting pretty close to "design by ARC" rather than "review by
ARC";  it might be a better use of ARC bandwidth to take this
discussion offline and place the case in "waiting need spec" mode...

-John (who has ratholed his share of these, and so recognizes the
symptoms easily :-)


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Glenn Brunette


On 3/30/10 4:36 PM, Nicolas Williams wrote:
> On Tue, Mar 30, 2010 at 02:04:39PM -0600, Tim Haley wrote:
>> It would be easy enough for me to print a 'time' column as the first
>> column, and the output could then be sent to 'sort -n'.  I'm not sure
>> how people feel about that.  Is that cheating?  :-)  The alternative
>> is to AVL sort by that time, which as you note will increase the
>> footprint, perhaps dramatically for a really big diff.
>
> I'd be happy with that.  Someone suggested a -o field1,field2,..,fieldN

That would be me.

> option, and that's starting to look desirable.  There's at least these
> fields that you could include in output:
>
>   - object number
>   - object type
>   - timestamp*
>   - generation number
>   - type of change (create*, unlink*, rename*, other*, other meta-data, data)
>   - old path*
>   - new path*
>   - link count*
>
> The starred ones are the ones included in your proposals so far.  I'd be
> happy with just those; the others would be icing :)

+1

g


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Nicolas Williams
On Tue, Mar 30, 2010 at 02:22:43PM -0600, Tim Haley wrote:
> You are correct that the command should work with clones, too, as
> though are desendant.
> For a clone we'd present its paths relative to where it is mounted.

I'd say make all paths relative to the root of the dataset, even when
the newer snapshot is of a clone.  Let the consumer worry about how to
find absolute paths to the named objects (by qualifying with the mount
point of the relevant dataset and .zfs/snapshot/...).

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Nicolas Williams
On Tue, Mar 30, 2010 at 02:04:39PM -0600, Tim Haley wrote:
> It would be easy enough for me to print a 'time' column as the first
> column, and the output could then be sent to 'sort -n'.  I'm not sure
> how people feel about that.  Is that cheating?  :-)  The alternative
> is to AVL sort by that time, which as you note will increase the
> footprint, perhaps dramatically for a really big diff.

I'd be happy with that.  Someone suggested a -o field1,field2,..,fieldN
option, and that's starting to look desirable.  There's at least these
fields that you could include in output:

 - object number
 - object type
 - timestamp*
 - generation number
 - type of change (create*, unlink*, rename*, other*, other meta-data, data)
 - old path*
 - new path*
 - link count*

The starred ones are the ones included in your proposals so far.  I'd be
happy with just those; the others would be icing :)

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Nicolas Williams
On Tue, Mar 30, 2010 at 12:42:00PM -0600, Tim Haley wrote:
> On 03/30/10 12:29 PM, Dan Price wrote:
> >The example was slightly messed up, sorry; that caused misunderstanding.
> >I'm worried about this situation:
> >
> > snapshot at 1
> > mv /myfiles/name1 /myfiles/name2
> > mkdir /myfiles/name1
> > snapshot at 2
> >
> >So, I'm fairly sure that between the two snapshots both events are
> >relevant.  So the above might yield:
> >
> > +   /myfiles/name1
> > R   /myfiles/name1 ->  /myfiles/name2

Ah, sure.

> Apologies for not responding sooner, I've been playing a bit with
> the code.  Based on what I've been doing this morning, I believe it
> will be possible to present the list in roughly chronological order.
> The order originally was just by object number, so you could get
> either of the outputs you show above.

If you include the object number in the output then you can let the
consumer figure it out.  Increasing zfs diff's footprint to get it to
sort its output correctly will decrease its performance, and the
consumer might not care.  (OTOH, a light-weight algorithm for properly
sorting these output lines is fairly obvious, so maybe there's no reason
to be concerned about performance.)

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Tim Haley
On 3/29/10 10:54 PM, Dan Price wrote:
> On Mon 29 Mar 2010 at 11:14AM, Bart Smaalders wrote:
>
>> On 03/29/10 11:01, Matthew Ahrens wrote:
>>
>>  
>>> How do commands like ls and find handle printing of filenames with
>>> arbitrary characters (newlines and such)?
>>>
>> In general, badly.
>>  
> Tim,
>
> My concern, which others have hinted at, is that there are a legion of
> people who are going to want to consume this information and there is
> great value in making said information be machine parseable.  Automated
> build systems, tripwires, fancy backup/recovery tools, et cetera.
>
> In summary, the current output seems mostly OK if it's for humans, but
> the case is ambiguous about who the expected consumer is.  It would
> be a tragedy if there wasn't a machine consumable way to get at this
> information.
>
>
I'm adding a -H option for scripting, with parseable output.
> I also have questions about how intelligent a consuming piece of
> software must be in order to make sense of this information.  Has anyone
> written a proof of concept tool using this?  For example, if a directory
> /foo/a is renamed to /foo/b, then an analyzer would need to stat /foo/b
> in order to discover that /foo/b is a directory, then traverse into as
> needed.  It would be a shame if everyone who wanted to consume this had
> to write the same thousand lines of code (I'm happy to be convinced that
> this isn't the case).
>
> Some specific questions...
>
> 1) In what order are the changes printed?  If I saw:
>
>   +   /myfiles/rename_dir
>   R   /myfiles/rename_dir ->  /myfiles/rename_dir
>
> My analyzer would need to be smart enough to realize that the second
> must have happened before the first, and that both paths need
> evaluation.  Right?
>
>
This got clarified, I believe.

> 2) The meaning of "file/directory" (Don's concern aside) seems ambiguous in
> the proposal.  Are we tracking the filesystem *namespace* entry?  Or the
> actual object?  I found that not being sure of this made the proposal
> hard to evaluate.  Simple thought experiment which confused me:
>
>   snapshot at 1
>   rm a/b
>   rm a/c
>   rmdir a
>   echo "foo">  a
>   snapshot at 2
>
> Does that yield this? Or this?
>
>   -   a/b   | -   a/b
>   -   a/c   | -   a/c
>   -   a | M   a
>   +   a |
>
>
We are tracking the actual file objects.  Running your test with the 
current code:

M   /files/
-   /files/a
-   /files/a/b
-   /files/a/c
+   /files/a

Having slept on this, I think an extra field in the output will help.  A 
type character
could be added to another column, using the same sorts of symbols that 
ls -F shows,
@ for symbolic link, / for directory, | for pipe, etc.  Plus
an 'F' to indicate a regular file.  So the above would become:

M   /files//
-   /files/aF
-   /files/a/bF
-   /files/a/cF
+   /files/aF
> 3) Output is shown with leading slashes.  Is output shown relative to the
> mount point?  Or something else?  (If the former, what if between @a and
> @b the mountpoint changed?)
>
>
The output shown is relative to where the dataset is mounted at the time 
of the diff.
We don't necessarily know where it was mounted at the time of any 
particular snapshot.

You are correct that the command should work with clones, too, as though 
are desendant.
For a clone we'd present its paths relative to where it is mounted.

-tim



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Tim Haley
On 03/30/10 01:25 PM, Nicolas Williams wrote:
> On Tue, Mar 30, 2010 at 12:42:00PM -0600, Tim Haley wrote:
>> On 03/30/10 12:29 PM, Dan Price wrote:
>>> The example was slightly messed up, sorry; that caused misunderstanding.
>>> I'm worried about this situation:
>>>
>>>  snapshot at 1
>>>  mv /myfiles/name1 /myfiles/name2
>>>  mkdir /myfiles/name1
>>>  snapshot at 2
>>>
>>> So, I'm fairly sure that between the two snapshots both events are
>>> relevant.  So the above might yield:
>>>
>>>  +   /myfiles/name1
>>>  R   /myfiles/name1 ->   /myfiles/name2
>
> Ah, sure.
>
>> Apologies for not responding sooner, I've been playing a bit with
>> the code.  Based on what I've been doing this morning, I believe it
>> will be possible to present the list in roughly chronological order.
>> The order originally was just by object number, so you could get
>> either of the outputs you show above.
>
> If you include the object number in the output then you can let the
> consumer figure it out.  Increasing zfs diff's footprint to get it to
> sort its output correctly will decrease its performance, and the
> consumer might not care.  (OTOH, a light-weight algorithm for properly
> sorting these output lines is fairly obvious, so maybe there's no reason
> to be concerned about performance.)
>
> Nico

It would be easy enough for me to print a 'time' column as the first 
column, and the output could then be sent to 'sort -n'.  I'm not sure
how people feel about that.  Is that cheating?  :-)  The alternative is 
to AVL sort by that time, which as you note will increase the footprint, 
perhaps dramatically for a really big diff.

-tim



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Vladimir Marek
Hi,

I'm wondering, will zfs diff work between two zfs pools?

I want to know if the current snapshot of my data differs from the
snapshot I created by "zfs send | zfs receive" while ago (on the same
machine, just different pool), so that I should refresh my backup.

Does 'zfs diff' read all the data in snapshot, or compares just some
checksums and thus is fast?

Thank you
-- 
Vlad


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Glenn Brunette

Dan,

I have to +1 all of your well-thought-out comments.  As a potential
consuming of this functionality for the Immutable Service Container
project, answers to these questions are critical.

I am also interested in whether additional fields can be added to
the output similar to a "-o field1,field2" scenario?  It would be
nice to have data such as file type, modification time (where
applicable).

Also, will this functionality be able to tell how files were modified?
Things like changes in file ownership, group membership, permissions
and ACLs, size, times, etc.?  Even if this processing is not directly
implemented as part of the zfs diff command, perhaps the fields could
be made available (per -o comment above) to be consumed by layered
tools?

g


On 3/30/10 12:54 AM, Dan Price wrote:
> On Mon 29 Mar 2010 at 11:14AM, Bart Smaalders wrote:
>> On 03/29/10 11:01, Matthew Ahrens wrote:
>>
>>> How do commands like ls and find handle printing of filenames with
>>> arbitrary characters (newlines and such)?
>>
>> In general, badly.
>
> Tim,
>
> My concern, which others have hinted at, is that there are a legion of
> people who are going to want to consume this information and there is
> great value in making said information be machine parseable.  Automated
> build systems, tripwires, fancy backup/recovery tools, et cetera.
>
> In summary, the current output seems mostly OK if it's for humans, but
> the case is ambiguous about who the expected consumer is.  It would
> be a tragedy if there wasn't a machine consumable way to get at this
> information.
>
> I also have questions about how intelligent a consuming piece of
> software must be in order to make sense of this information.  Has anyone
> written a proof of concept tool using this?  For example, if a directory
> /foo/a is renamed to /foo/b, then an analyzer would need to stat /foo/b
> in order to discover that /foo/b is a directory, then traverse into as
> needed.  It would be a shame if everyone who wanted to consume this had
> to write the same thousand lines of code (I'm happy to be convinced that
> this isn't the case).
>
> Some specific questions...
>
> 1) In what order are the changes printed?  If I saw:
>
>   +   /myfiles/rename_dir
>   R   /myfiles/rename_dir ->  /myfiles/rename_dir
>
> My analyzer would need to be smart enough to realize that the second
> must have happened before the first, and that both paths need
> evaluation.  Right?
>
> 2) The meaning of "file/directory" (Don's concern aside) seems ambiguous in
> the proposal.  Are we tracking the filesystem *namespace* entry?  Or the
> actual object?  I found that not being sure of this made the proposal
> hard to evaluate.  Simple thought experiment which confused me:
>
>   snapshot at 1
>   rm a/b
>   rm a/c
>   rmdir a
>   echo "foo">  a
>   snapshot at 2
>
> Does that yield this? Or this?
>
>   -   a/b   | -   a/b
>   -   a/c   | -   a/c
>   -   a | M   a
>   +   a |
>
> 3) Output is shown with leading slashes.  Is output shown relative to the
> mount point?  Or something else?  (If the former, what if between @a and
> @b the mountpoint changed?)
>
> 4) I would also vote for a mode which simply outputs a list of
> pathnames to investigate for differences.  This would enable:
>
> zfs diff -someflag a at 1 a at 2 | xargs do_some_analysis_on_these
>
>
> Thanks for tackling this,
>
>  -dp
>


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Tim Haley
On 03/30/10 12:29 PM, Dan Price wrote:
> On Tue 30 Mar 2010 at 01:20AM, Nicolas Williams wrote:
>>> 1) In what order are the changes printed?  If I saw:
>>>
>>> +   /myfiles/rename_dir
>>> R   /myfiles/rename_dir ->  /myfiles/rename_dir
>>>
>>> My analyzer would need to be smart enough to realize that the second
>>> must have happened before the first, and that both paths need
>>> evaluation.  Right?
>>
>> Between two snapshots only the later event would be found, surely.  (But
>> I'm not the i-team.)
>
> The example was slightly messed up, sorry; that caused misunderstanding.
> I'm worried about this situation:
>
>  snapshot at 1
>  mv /myfiles/name1 /myfiles/name2
>  mkdir /myfiles/name1
>  snapshot at 2
>
> So, I'm fairly sure that between the two snapshots both events are
> relevant.  So the above might yield:
>
>  +   /myfiles/name1
>  R   /myfiles/name1 ->  /myfiles/name2
>
> It's tempting to read this as "name1 was created, then renamed to
> name2." But that isn't what it means.  For a human, the following
> version portrays the events much more clearly:
>
>  R   /myfiles/name1 ->  /myfiles/name2
>  +   /myfiles/name1
>
> I'm not sufficiently expert about all the pieces here to understand
> whether it's possible to efficiently or even sensibly sort things into
> a "human friendly" order.
>
Apologies for not responding sooner, I've been playing a bit with the 
code.  Based on what I've been doing this morning, I believe it will be 
possible to present the list in roughly chronological order.  The order 
originally was just by object number, so you could get either of the 
outputs you show above.

-tim


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Dan Price
On Tue 30 Mar 2010 at 07:21AM, Tim Haley wrote:
> On 3/30/10 5:35 AM, Vladimir Marek wrote:
> >Hi,
> >
> >I'm wondering, will zfs diff work between two zfs pools?
> >   
>
> No.  Only between snapshots within the same dataset within the same pool.

Only within the same dataset?  What about a snapshot of a clone,
compared with a snapshot from the source of the clone?

In other words, given this scenario:

# zfs create rpool/testdiff
# zfs snapshot rpool/testdiff at 1
# zfs clone rpool/testdiff at 1 rpool/testdiffchild
# zfs snapshot rpool/testdiffchild at 2

Then I can do this:

# zfs send -i rpool/testdiff at 1 rpool/testdiffchild at 2 

This seems to me to be a useful application; perhaps what you meant
was "it will only work when 'zfs send' would work"?

(As an aside, similar to the way send can send full snapshots, it'd be
nice to have a way to have "null" as the lefthand side of a diff.  That
is, since birth of the fs, what's changed?  Sort of like 'diff /dev/null
/myfile')

-dp

-- 
Daniel Price, Solaris Kernel Engineeringhttp://blogs.sun.com/dp


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread johan...@opensolaris.org
On Tue, Mar 30, 2010 at 10:31:04AM -0700, Bart Smaalders wrote:
> On 03/30/10 09:58, Glenn Brunette wrote:
> >
> >Dan,
> >
> >I have to +1 all of your well-thought-out comments. As a potential
> >consuming of this functionality for the Immutable Service Container
> >project, answers to these questions are critical.
> >
> >I am also interested in whether additional fields can be added to
> >the output similar to a "-o field1,field2" scenario? It would be
> >nice to have data such as file type, modification time (where
> >applicable).
> >
> >Also, will this functionality be able to tell how files were modified?
> >Things like changes in file ownership, group membership, permissions
> >and ACLs, size, times, etc.? Even if this processing is not directly
> >implemented as part of the zfs diff command, perhaps the fields could
> >be made available (per -o comment above) to be consumed by layered
> >tools?
> 
> A very detailed human readable diff of file attributes seems a lot of
> additional work to place on zfs diff.  Isn't a list of what is different
> sufficient to give to another program?

This portion of the discussion makes me wonder if there would be some
value in developing a prototype of 'zfs patch'.  Designing patch might
help illuminate some of the attributes that should be machine-readable
by default.  If certain portions of the diff/patch implementation are
kept in shared libraries, it would prevent different consumers of the
diff output from having to re-implement the same basic functions over
and over again. Since it sounds like a lot of teams are interested in
using this functionality, it may be worth investigating.  Having some of
the diff/patch shared would also allow the underlying implementation to
change over time, as long as consumers linked with the proper shared
libraries.

This is orthogonal to the current discussion, but it seemed worth
mentioning.

-j


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Dan Price
On Tue 30 Mar 2010 at 01:20AM, Nicolas Williams wrote:
> > 1) In what order are the changes printed?  If I saw:
> > 
> > +   /myfiles/rename_dir
> > R   /myfiles/rename_dir -> /myfiles/rename_dir
> > 
> > My analyzer would need to be smart enough to realize that the second
> > must have happened before the first, and that both paths need
> > evaluation.  Right?
> 
> Between two snapshots only the later event would be found, surely.  (But
> I'm not the i-team.)

The example was slightly messed up, sorry; that caused misunderstanding.
I'm worried about this situation:

snapshot at 1
mv /myfiles/name1 /myfiles/name2
mkdir /myfiles/name1
snapshot at 2

So, I'm fairly sure that between the two snapshots both events are
relevant.  So the above might yield:

+   /myfiles/name1
R   /myfiles/name1 -> /myfiles/name2

It's tempting to read this as "name1 was created, then renamed to
name2." But that isn't what it means.  For a human, the following
version portrays the events much more clearly:

R   /myfiles/name1 -> /myfiles/name2
+   /myfiles/name1

I'm not sufficiently expert about all the pieces here to understand
whether it's possible to efficiently or even sensibly sort things into
a "human friendly" order.

-dp

-- 
Daniel Price, Solaris Kernel Engineeringhttp://blogs.sun.com/dp


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Bart Smaalders
On 03/30/10 09:58, Glenn Brunette wrote:
>
> Dan,
>
> I have to +1 all of your well-thought-out comments. As a potential
> consuming of this functionality for the Immutable Service Container
> project, answers to these questions are critical.
>
> I am also interested in whether additional fields can be added to
> the output similar to a "-o field1,field2" scenario? It would be
> nice to have data such as file type, modification time (where
> applicable).
>
> Also, will this functionality be able to tell how files were modified?
> Things like changes in file ownership, group membership, permissions
> and ACLs, size, times, etc.? Even if this processing is not directly
> implemented as part of the zfs diff command, perhaps the fields could
> be made available (per -o comment above) to be consumed by layered
> tools?

A very detailed human readable diff of file attributes seems a lot of
additional work to place on zfs diff.  Isn't a list of what is different
sufficient to give to another program?

- Bart

-- 
Bart Smaalders  Solaris Kernel Performance
bart.smaalders at oracle.comhttp://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Michael Schuster
On 30.03.10 06:54, Dan Price wrote:
> On Mon 29 Mar 2010 at 11:14AM, Bart Smaalders wrote:
>> On 03/29/10 11:01, Matthew Ahrens wrote:
>>
>>> How do commands like ls and find handle printing of filenames with
>>> arbitrary characters (newlines and such)?
>>
>> In general, badly.
>
> Tim,
>
> My concern, which others have hinted at, is that there are a legion of
> people who are going to want to consume this information and there is
> great value in making said information be machine parseable.  Automated
> build systems, tripwires, fancy backup/recovery tools, et cetera.
>
> In summary, the current output seems mostly OK if it's for humans, but
> the case is ambiguous about who the expected consumer is.  It would
> be a tragedy if there wasn't a machine consumable way to get at this
> information.

May I humbly point at dladm here, specifically at the -p option that 
various show-* subcommands utilise? Here's the relevant snippet from the 
man-page:

   Parseable Output Format
  Many dladm subcommands have an option that  displays  output
  in  a  machine-parseable format. The output format is one or
  more  lines  of  colon  (:)  delimited  fields.  The  fields
  displayed are specific to the subcommand used and are listed
  under the entry for the -o option for  a  given  subcommand.
  Output  includes only those fields requested by means of the
  -o option, in the order requested.

  When you request multiple fields, any literal colon  charac-
  ters  are  escaped  by  a backslash (\) before being output.
  Similarly, literal backslash characters will also be escaped
  (\\). This escape format is parseable by using shell read(1)
  functions with the environment variable IFS=: (see EXAMPLES,
  below). Note that escaping is not done when you request only
  a single field.


regards
Michael
-- 
michael.schuster at oracle.com
Recursion, n.: see 'Recursion'


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Tim Haley
On 3/30/10 5:35 AM, Vladimir Marek wrote:
> Hi,
>
> I'm wondering, will zfs diff work between two zfs pools?
>
>
No.  Only between snapshots within the same dataset within the same pool.
> I want to know if the current snapshot of my data differs from the
> snapshot I created by "zfs send | zfs receive" while ago (on the same
> machine, just different pool), so that I should refresh my backup.
>
> Does 'zfs diff' read all the data in snapshot, or compares just some
> checksums and thus is fast?
>
>
We do our best to only read meta-data in finding the differences.

-tim
> Thank you
>



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Nicolas Williams
On Tue, Mar 30, 2010 at 01:20:55AM -0500, Nicolas Williams wrote:
> On Mon, Mar 29, 2010 at 09:54:01PM -0700, Dan Price wrote:
> > Some specific questions...
> > 
> > 1) In what order are the changes printed?  If I saw:
> > 
> > +   /myfiles/rename_dir
> > R   /myfiles/rename_dir -> /myfiles/rename_dir
> > 
> > My analyzer would need to be smart enough to realize that the second
> > must have happened before the first, and that both paths need
> > evaluation.  Right?
> 
> Between two snapshots only the later event would be found, surely.  (But
> I'm not the i-team.)

Oh, but for modified and rename order matters, and may not be what one
expects (rename first, then update, so that one could inspect the file
with the new name).  But I think one could script around this problem,
if it is a problem.

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-30 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 09:54:01PM -0700, Dan Price wrote:
> I also have questions about how intelligent a consuming piece of
> software must be in order to make sense of this information.  Has anyone
> written a proof of concept tool using this?  For example, if a directory
> /foo/a is renamed to /foo/b, then an analyzer would need to stat /foo/b
> in order to discover that /foo/b is a directory, then traverse into as
> needed.  It would be a shame if everyone who wanted to consume this had
> to write the same thousand lines of code (I'm happy to be convinced that
> this isn't the case).

Using simple backslash escaping works well with the shell's read
built-in.  Backslash octal escaping also works well with the shell's
read built-in using the -r option (but not with /usr/has/bin/sh, which
has no -r option).

I'm happy with octal escapes for shell scripting.

For other purposes one might really like HTML/XML entities (e.g.,
'&'), but I don't expect the i-team will want to cater to every
developer :/  Instead, an API would be nice, since an API could
completely avoid any ambiguity directly.  That could come later.

> Some specific questions...
> 
> 1) In what order are the changes printed?  If I saw:
> 
>   +   /myfiles/rename_dir
>   R   /myfiles/rename_dir -> /myfiles/rename_dir
> 
> My analyzer would need to be smart enough to realize that the second
> must have happened before the first, and that both paths need
> evaluation.  Right?

Between two snapshots only the later event would be found, surely.  (But
I'm not the i-team.)

> 2) The meaning of "file/directory" (Don's concern aside) seems ambiguous in
> the proposal.  Are we tracking the filesystem *namespace* entry?  Or the
> actual object?  I found that not being sure of this made the proposal
> hard to evaluate.  Simple thought experiment which confused me:

A related question: what about zero-link files that were open at the
time that a snapshot was taken?  Presumably there's no need to even
mention their existence since they can't be accessed by the consumer.

> 4) I would also vote for a mode which simply outputs a list of
> pathnames to investigate for differences.  This would enable:
> 
>zfs diff -someflag a at 1 a at 2 | xargs do_some_analysis_on_these

If you're going to target xargs then definitely, definitely include a
GNU find-like -print0 option to match xargs' -0 option.  That completely
avoids ambiguity.  (And, of course, for renames output only the new
name.)

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Tim Haley
On 3/29/10 11:45 AM, Sebastien Roy wrote:
> On 03/29/10 01:24 PM, Nicolas Williams wrote:
>> On Mon, Mar 29, 2010 at 01:17:30PM -0400, Sebastien Roy wrote:
>>> Understood, and I see that now.  It would indeed make sense to be
>>> consistent and have -H for this subcommand as well.
>>
>> Agreed.  Do you agree re: backslash-escaping?
>
> Yes, but presumably the existing output syntax for -H has already 
> taken that into account, no?  The existing syntax appears to use a 
> single tab as a separator.  I would hope that a tab within a field 
> would be escaped, and if it's not that should be a bug (otherwise it's 
> not really parsable).
>
 I could have filenames with ' ->   ' in them that would render the 
 rename
 output ambiguous to the human eye.  I could have filenames with
 '\n' in the name that would render the output ambiguous
 to the human eye.
>>>
>>> My intention wasn't to rathole into some absurd discussion over how
>>> to handle ridiculous filenames.
>>
>> My intention is to avoid security problems in consumers of zfs diff.  I
>> assert that zfs diff is mostly useful only in connection with scripting.
>>
>> I don't think it's reasonable to expect that zfs diff be useful only "by
>> eye".   It has got to be scriptable because for any sufficiently large
>> and active dataset the output of zfs diff will generally be too large to
>> handle "by eye" (sure, you could grep for specific things, but not much
>> beyond that you're scripting).
>>
>> (If zfs diff is only useful for scripting then the -H option is actually
>> unnecessary and zfs diff should always disambiguate pathnames in its
>> output.)
>
> I don't think it's only useful for scripting.  It makes perfect sense 
> to me to incorporate -H as with other zfs subcommands.  Tim, what is 
> your plan regarding this suggestion?
>
> -Seb
My proposal would be add a -H option.  I'm going to say the output 
without this option remains
as described, and

++With the -H option, parseable output is produced. Fields are
++separated by a single tab, and no '->' is placed between
++the old and new names of a rename. Whitespace characters, the
++backslash character, and other characters not in the print
++class for the locale are represented in the output as a
++backslash character followed by the three-digit octal
++representation of the byte value.
++

-tim



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Dan Price
On Mon 29 Mar 2010 at 11:14AM, Bart Smaalders wrote:
> On 03/29/10 11:01, Matthew Ahrens wrote:
> 
> >How do commands like ls and find handle printing of filenames with
> >arbitrary characters (newlines and such)?
> 
> In general, badly.

Tim,

My concern, which others have hinted at, is that there are a legion of
people who are going to want to consume this information and there is
great value in making said information be machine parseable.  Automated
build systems, tripwires, fancy backup/recovery tools, et cetera.

In summary, the current output seems mostly OK if it's for humans, but
the case is ambiguous about who the expected consumer is.  It would
be a tragedy if there wasn't a machine consumable way to get at this
information.

I also have questions about how intelligent a consuming piece of
software must be in order to make sense of this information.  Has anyone
written a proof of concept tool using this?  For example, if a directory
/foo/a is renamed to /foo/b, then an analyzer would need to stat /foo/b
in order to discover that /foo/b is a directory, then traverse into as
needed.  It would be a shame if everyone who wanted to consume this had
to write the same thousand lines of code (I'm happy to be convinced that
this isn't the case).

Some specific questions...

1) In what order are the changes printed?  If I saw:

+   /myfiles/rename_dir
R   /myfiles/rename_dir -> /myfiles/rename_dir

My analyzer would need to be smart enough to realize that the second
must have happened before the first, and that both paths need
evaluation.  Right?

2) The meaning of "file/directory" (Don's concern aside) seems ambiguous in
the proposal.  Are we tracking the filesystem *namespace* entry?  Or the
actual object?  I found that not being sure of this made the proposal
hard to evaluate.  Simple thought experiment which confused me:

snapshot at 1
rm a/b
rm a/c
rmdir a
echo "foo" > a
snapshot at 2

Does that yield this?   Or this?

-   a/b   | -   a/b
-   a/c   | -   a/c
-   a | M   a
+   a |

3) Output is shown with leading slashes.  Is output shown relative to the
mount point?  Or something else?  (If the former, what if between @a and
@b the mountpoint changed?)

4) I would also vote for a mode which simply outputs a list of
pathnames to investigate for differences.  This would enable:

   zfs diff -someflag a at 1 a at 2 | xargs do_some_analysis_on_these


Thanks for tackling this,

-dp

-- 
Daniel Price, Solaris Kernel Engineeringhttp://blogs.sun.com/dp


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Darren J Moffat


On 29/03/2010 19:21, Don Cragun wrote:
> On Mar 29, 2010, at 10:42 AM, Darren J Moffat wrote:
>
>> On 29/03/2010 18:30, Don Cragun wrote:
 + +   Indicates the file/directory was added in the later dataset
 + -   Indicates the file/directory was removed in the later dataset
 + M   Indicates the file/directory was modified in the later 
 dataset
 + R   Indicates the file/directory was renamed in the later dataset
>>>
>>> Again, "file/directory" should just be "file" in all four lines above.
>>
>> While a that may be technically true I personally found it very useful that 
>> it said file/directory.  Particularly since this isn't a POSIX C API man 
>> page.
>>
>> I'd rather it made it clear that both files and directories, and all other 
>> types of filesystem objects are supported here, and that it do so by 
>> explicitly saying file and directory.
>
> I'm not an ARC member, so you are free to ignore my comments.  But,
> explicitly saying "file of any type and file of type directory" makes
> absolutely no sense to me.  I don't see that it is clearer; it just
> raises the question of what does "file" mean on this man page if
> "directory" is not a type of file.

Think like a user that doesn't know C programming and how these things 
are implemented and doesn't know what POSIX/SUS is.

> I know you don't like POSIX/SUS based man pages, but  is
> pretty basic.  It clearly shows that the S_IFMT portion of the st_mode
> field (which has type mode_t) specifies the file type.

Which is fine for a developer but not for an end admin or user.

Remember ZFS commands can be delegated to users.  Users thing in terms 
of files and directories (and depending on where they came from they 
might still be calling them folders not directories).


-- 
Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Darren J Moffat
On 29/03/2010 18:30, Don Cragun wrote:
>> + +   Indicates the file/directory was added in the later dataset
>> + -   Indicates the file/directory was removed in the later dataset
>> + M   Indicates the file/directory was modified in the later dataset
>> + R   Indicates the file/directory was renamed in the later dataset
>
> Again, "file/directory" should just be "file" in all four lines above.

While a that may be technically true I personally found it very useful 
that it said file/directory.  Particularly since this isn't a POSIX C 
API man page.

I'd rather it made it clear that both files and directories, and all 
other types of filesystem objects are supported here, and that it do so 
by explicitly saying file and directory.

-- 
Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Steve Mckinty


Sebastien Roy wrote:
> On 03/29/10 12:14 PM, Nicolas Williams wrote:
>> On Sun, Mar 28, 2010 at 01:12:16PM -0600, Tim Haley wrote:
>>> +If the modification involved a change in the link count of a
>>> +file, the change will be expressed as a delta within
>>> +parentheses on the modification line.  Example outputs are
>>> +below:
>>> +
>>> + M   /myfiles/
>>> + M   /myfiles/link_to_me   (+1)
>>> + R   /myfiles/rename_me ->  /myfiles/renamed
>>> + -   /myfiles/delete_me
>>> + +   /myfiles/new_file
>>
>> Is there any escaping of whitespace and non-printable characters in the
>> pathnames?  If not then the above format is ambiguous and cannot be
>> safely scripted.
>>
>> There are several ways that you could address that problem, such as
>> escaping (HTML/XML entities? backslash escapes? pick your poison),
>> adding a length field preceding either each path or each path that
>> contains whitespace, or use a multi-line (for renames) format with paths
>> being the last field such that you need only [backslash?-]escape
>> newlines.
>>
>> Probably the simplest answer is backslash-escaping whitespace, since
>> shells can handle that trivially.  That is what I recommend.
> 
> Taking a step back here, this subcommand is not the only zfs subcommand 
> whose output could be subject to parsing by scripts.  Adding parsable 
> output should be something that is thought-through for the entire suite 
> of subcommands (and zfs-related commands) so that there is a uniformly 
> applicable solution.  IMO, that's not this case (although that's 
> ultimately the project team's decision).  If you think there is 
> something that is ambiguous to the human eye, then I think that's in scope.

It's not uncommon to have an additional option to a command (stty -g for
example) which produces easily-parsable (and not necessarily easily
readable :) ) output for this situation.

Steve


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Joep Vesseur
On 03/29/10 16:57, Tim Haley wrote:
> Not sure exactly what sort of properties you have in mind - many I can
> think of would
> result in us detecting a modification to the file itself, for example:
>
> # echo 'tim was here' > file
> # zfs snapshot toad/timh at before
> # touch file
> # zfs snapshot toad/timh at after
> # zfs diff toad/timh at before toad/timh at after
> M /toad/timh/file

I meant something like

   chmod u+s /usr/bin/sh

Would this be reported as a change to /usr/bin? And would someone have to go
and look at the diff of the directory listing?

Or would it be reported as a change to /usr/bin/sh?

Joep


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Tim Haley
On 03/29/10 12:43 AM, Darren Reed wrote:
> On 03/28/10 12:12, Tim Haley wrote:
>> I am sponsoring the following fast-track on behalf of myself. This
>> case introduces a new zfs sub-command for describing differences
>> between snapshots in a zfs hierarchy. A delegated permission and
>> read-only system attribute are also introduced to support the
>> sub-command. The case requests micro/patch binding.
>>
>> Template Version: @(#)sac_nextcase 1.69 02/15/10 SMI
>> This information is Copyright 2010 Sun Microsystems
>> 1. Introduction
>> 1.1. Project/Component Working Name:
>> zfs diff
>> 1.2. Name of Document Author/Supplier:
>> Author: Tim Haley
>> 1.3 Date of This Document:
>> 28 March, 2010
>>
>> 4. Technical Description
>>
>> There is a long-standing RFE for zfs to be able to describe
>> what has changed between the snapshots of a dataset.
>> To provide this capability, we propose a new 'zfs diff'
>> sub-command. When run with appropriate privilege the
>> sub-command describes what file system level changes have
>> occurred between the requested snapshots. A diff between the
>> current version of the file system and one of its snapshots is
>> also supported.
>>
>> Five types of change are described:
>>
>> o File/Directory modified
>> o File/Directory present in older snapshot but not newer
>> o File/Directory present in newer snapshot but not older
>> o File/Directory renamed
>> o File link count changed
>>
>> Diffs can be performed if the user is delegated the "diff"
>> permission. The "diff" permission is being introduced by this
>> case. Diffs can also be performed without the "diff"
>> permission, if the user has appropriate privilege. For diffs
>> between existing snapshots, the necessary privilege is
>> {PRIV_SYS_CONFIG}. For diff between the current file system
>> and a snapshot {PRIV_SYS_MOUNT} is also necessary.
>>
>> Also introduced by this case is a system attribute on zfs files
>> called 'generation'. This attribute is part of the
>> XATTR_VIEW_READONLY described in PSARC 2007/315. It is
>> generated automatically by the ZFS module.
>
> Does "zfs diff" work between different versions of zfs filesystems?
> If so, what are the restrictions?
>

Yes.  No restrictions.

-tim


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 11:14:24AM -0700, Bart Smaalders wrote:
> On 03/29/10 11:01, Matthew Ahrens wrote:
> 
> >How do commands like ls and find handle printing of filenames with
> >arbitrary characters (newlines and such)?
> 
> In general, badly.
> 
> % touch `echo '\07'`
> % ls
> 
> %

Use ls -b:

 -b
 --escape

 Forces printing of non-printable characters to be in the
 octal \ddd notation.

Octal escapes for non-printable characters seems like the way to go.

There's also GNU ls find's -print0 and GNU xargs' --nul (-0).  (GNU ls
also has the -b option).

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Sebastien Roy
On 03/29/10 01:24 PM, Nicolas Williams wrote:
> On Mon, Mar 29, 2010 at 01:17:30PM -0400, Sebastien Roy wrote:
>> Understood, and I see that now.  It would indeed make sense to be
>> consistent and have -H for this subcommand as well.
>
> Agreed.  Do you agree re: backslash-escaping?

Yes, but presumably the existing output syntax for -H has already taken 
that into account, no?  The existing syntax appears to use a single tab 
as a separator.  I would hope that a tab within a field would be 
escaped, and if it's not that should be a bug (otherwise it's not really 
parsable).

>>> I could have filenames with ' ->   ' in them that would render the rename
>>> output ambiguous to the human eye.  I could have filenames with
>>> '\n' in the name that would render the output ambiguous
>>> to the human eye.
>>
>> My intention wasn't to rathole into some absurd discussion over how
>> to handle ridiculous filenames.
>
> My intention is to avoid security problems in consumers of zfs diff.  I
> assert that zfs diff is mostly useful only in connection with scripting.
>
> I don't think it's reasonable to expect that zfs diff be useful only "by
> eye".   It has got to be scriptable because for any sufficiently large
> and active dataset the output of zfs diff will generally be too large to
> handle "by eye" (sure, you could grep for specific things, but not much
> beyond that you're scripting).
>
> (If zfs diff is only useful for scripting then the -H option is actually
> unnecessary and zfs diff should always disambiguate pathnames in its
> output.)

I don't think it's only useful for scripting.  It makes perfect sense to 
me to incorporate -H as with other zfs subcommands.  Tim, what is your 
plan regarding this suggestion?

-Seb


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Sebastien Roy
On 03/29/10 12:33 PM, Nicolas Williams wrote:
> On Mon, Mar 29, 2010 at 12:23:56PM -0400, Sebastien Roy wrote:
>>> Is there any escaping of whitespace and non-printable characters in the
>>> pathnames?  If not then the above format is ambiguous and cannot be
>>> safely scripted.
>>
>> Taking a step back here, this subcommand is not the only zfs
>> subcommand whose output could be subject to parsing by scripts.
>
> Many zfs sub-commands already have a -H option ("Display output in a
> form more easily parsed by scripts").

Ah, indeed.

>
>> Adding parsable output should be something that is thought-through
>> for the entire suite of subcommands (and zfs-related commands) so
>> that there is a uniformly applicable solution.  IMO, that's not this
>> case (although that's ultimately the project team's decision).
>
> Therefore I don't think your argument carries water.  My request is not
> generalizable because ZFS already has parseable output support.

Understood, and I see that now.  It would indeed make sense to be 
consistent and have -H for this subcommand as well.

>>  If
>> you think there is something that is ambiguous to the human eye,
>> then I think that's in scope.
>
> I could have filenames with ' ->  ' in them that would render the rename
> output ambiguous to the human eye.  I could have filenames with
> '\n' in the name that would render the output ambiguous
> to the human eye.

My intention wasn't to rathole into some absurd discussion over how to 
handle ridiculous filenames.

-Seb


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Sebastien Roy
On 03/29/10 12:34 PM, Steve Mckinty wrote:
> Sebastien Roy wrote:
>> Taking a step back here, this subcommand is not the only zfs
>> subcommand whose output could be subject to parsing by scripts. Adding
>> parsable output should be something that is thought-through for the
>> entire suite of subcommands (and zfs-related commands) so that there
>> is a uniformly applicable solution. IMO, that's not this case
>> (although that's ultimately the project team's decision). If you think
>> there is something that is ambiguous to the human eye, then I think
>> that's in scope.
>
> It's not uncommon to have an additional option to a command (stty -g for
> example) which produces easily-parsable (and not necessarily easily
> readable :) ) output for this situation.

That's right, but to reiterate, I think that such an option should 
theoretically apply to more then just the "diff" subcommand.

-Seb


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 01:17:30PM -0400, Sebastien Roy wrote:
> Understood, and I see that now.  It would indeed make sense to be
> consistent and have -H for this subcommand as well.

Agreed.  Do you agree re: backslash-escaping?

> >I could have filenames with ' ->  ' in them that would render the rename
> >output ambiguous to the human eye.  I could have filenames with
> >'\n' in the name that would render the output ambiguous
> >to the human eye.
> 
> My intention wasn't to rathole into some absurd discussion over how
> to handle ridiculous filenames.

My intention is to avoid security problems in consumers of zfs diff.  I
assert that zfs diff is mostly useful only in connection with scripting.

I don't think it's reasonable to expect that zfs diff be useful only "by
eye".  It has got to be scriptable because for any sufficiently large
and active dataset the output of zfs diff will generally be too large to
handle "by eye" (sure, you could grep for specific things, but not much
beyond that you're scripting).

(If zfs diff is only useful for scripting then the -H option is actually
unnecessary and zfs diff should always disambiguate pathnames in its
output.)

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Sebastien Roy
On 03/29/10 12:14 PM, Nicolas Williams wrote:
> On Sun, Mar 28, 2010 at 01:12:16PM -0600, Tim Haley wrote:
>> +If the modification involved a change in the link count of a
>> +file, the change will be expressed as a delta within
>> +parentheses on the modification line.  Example outputs are
>> +below:
>> +
>> + M   /myfiles/
>> + M   /myfiles/link_to_me   (+1)
>> + R   /myfiles/rename_me ->  /myfiles/renamed
>> + -   /myfiles/delete_me
>> + +   /myfiles/new_file
>
> Is there any escaping of whitespace and non-printable characters in the
> pathnames?  If not then the above format is ambiguous and cannot be
> safely scripted.
>
> There are several ways that you could address that problem, such as
> escaping (HTML/XML entities? backslash escapes? pick your poison),
> adding a length field preceding either each path or each path that
> contains whitespace, or use a multi-line (for renames) format with paths
> being the last field such that you need only [backslash?-]escape
> newlines.
>
> Probably the simplest answer is backslash-escaping whitespace, since
> shells can handle that trivially.  That is what I recommend.

Taking a step back here, this subcommand is not the only zfs subcommand 
whose output could be subject to parsing by scripts.  Adding parsable 
output should be something that is thought-through for the entire suite 
of subcommands (and zfs-related commands) so that there is a uniformly 
applicable solution.  IMO, that's not this case (although that's 
ultimately the project team's decision).  If you think there is 
something that is ambiguous to the human eye, then I think that's in scope.

-Seb


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Don Cragun
On Mar 29, 2010, at 11:37 AM, Darren J Moffat wrote:

> 
> 
> On 29/03/2010 19:21, Don Cragun wrote:
>> On Mar 29, 2010, at 10:42 AM, Darren J Moffat wrote:
>> 
>>> On 29/03/2010 18:30, Don Cragun wrote:
> + +   Indicates the file/directory was added in the later dataset
> + -   Indicates the file/directory was removed in the later 
> dataset
> + M   Indicates the file/directory was modified in the later 
> dataset
> + R   Indicates the file/directory was renamed in the later 
> dataset
 
 Again, "file/directory" should just be "file" in all four lines above.
>>> 
>>> While a that may be technically true I personally found it very useful that 
>>> it said file/directory.  Particularly since this isn't a POSIX C API man 
>>> page.
>>> 
>>> I'd rather it made it clear that both files and directories, and all other 
>>> types of filesystem objects are supported here, and that it do so by 
>>> explicitly saying file and directory.
>> 
>> I'm not an ARC member, so you are free to ignore my comments.  But,
>> explicitly saying "file of any type and file of type directory" makes
>> absolutely no sense to me.  I don't see that it is clearer; it just
>> raises the question of what does "file" mean on this man page if
>> "directory" is not a type of file.
> 
> Think like a user that doesn't know C programming and how these things are 
> implemented and doesn't know what POSIX/SUS is.
> 
>> I know you don't like POSIX/SUS based man pages, but  is
>> pretty basic.  It clearly shows that the S_IFMT portion of the st_mode
>> field (which has type mode_t) specifies the file type.
> 
> Which is fine for a developer but not for an end admin or user.
> 
> Remember ZFS commands can be delegated to users.  Users thing in terms of 
> files and directories (and depending on where they came from they might still 
> be calling them folders not directories).
> 

OK.  I see what you're trying to do now.  Please change "the
file/directory" on all four lines to "something".  Naive users won't
get lost in the details and savvy programmers won't be confused by
the real, overlapping definitions.

 - Don

> 
> -- 
> Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 12:23:56PM -0400, Sebastien Roy wrote:
> >Is there any escaping of whitespace and non-printable characters in the
> >pathnames?  If not then the above format is ambiguous and cannot be
> >safely scripted.
> 
> Taking a step back here, this subcommand is not the only zfs
> subcommand whose output could be subject to parsing by scripts.

Many zfs sub-commands already have a -H option ("Display output in a
form more easily parsed by scripts").

> Adding parsable output should be something that is thought-through
> for the entire suite of subcommands (and zfs-related commands) so
> that there is a uniformly applicable solution.  IMO, that's not this
> case (although that's ultimately the project team's decision).

Therefore I don't think your argument carries water.  My request is not
generalizable because ZFS already has parseable output support.

> If
> you think there is something that is ambiguous to the human eye,
> then I think that's in scope.

I could have filenames with ' -> ' in them that would render the rename
output ambiguous to the human eye.  I could have filenames with
'\n' in the name that would render the output ambiguous
to the human eye.

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Don Cragun
On Mar 29, 2010, at 10:42 AM, Darren J Moffat wrote:

> On 29/03/2010 18:30, Don Cragun wrote:
>>> + +   Indicates the file/directory was added in the later dataset
>>> + -   Indicates the file/directory was removed in the later dataset
>>> + M   Indicates the file/directory was modified in the later dataset
>>> + R   Indicates the file/directory was renamed in the later dataset
>> 
>> Again, "file/directory" should just be "file" in all four lines above.
> 
> While a that may be technically true I personally found it very useful that 
> it said file/directory.  Particularly since this isn't a POSIX C API man page.
> 
> I'd rather it made it clear that both files and directories, and all other 
> types of filesystem objects are supported here, and that it do so by 
> explicitly saying file and directory.

I'm not an ARC member, so you are free to ignore my comments.  But,
explicitly saying "file of any type and file of type directory" makes
absolutely no sense to me.  I don't see that it is clearer; it just
raises the question of what does "file" mean on this man page if
"directory" is not a type of file.

I know you don't like POSIX/SUS based man pages, but  is
pretty basic.  It clearly shows that the S_IFMT portion of the st_mode
field (which has type mode_t) specifies the file type.

 - Don

> 
> -- 
> Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 09:05:52AM -0600, Tim Haley wrote:
> On 03/29/10 09:01 AM, Joep Vesseur wrote:
> >On 03/29/10 16:57, Tim Haley wrote:
> >>Not sure exactly what sort of properties you have in mind - many I can
> >>think of would
> >>result in us detecting a modification to the file itself, for example:
> >
> >I meant something like
> >
> >  chmod u+s /usr/bin/sh
> >
> >Would this be reported as a change to /usr/bin? And would someone
> >have to go and look at the diff of the directory listing?
> >
> >Or would it be reported as a change to /usr/bin/sh?
> 
> It's reported as a modification to /usr/bin/sh.

I'm guessing that ZFS would need a second generation number per-dnode to
track data and meta-data changes separately (or a data checksum that
excludes parts of blkptr_t -- the parts that refer to disk locations).

Still, this feature is absolutely fantastic.

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Bart Smaalders
On 03/29/10 11:01, Matthew Ahrens wrote:

> How do commands like ls and find handle printing of filenames with
> arbitrary characters (newlines and such)?

In general, badly.

% touch `echo '\07'`
% ls

%

- Bart


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Nicolas Williams
On Sun, Mar 28, 2010 at 01:12:16PM -0600, Tim Haley wrote:
> +If the modification involved a change in the link count of a
> +file, the change will be expressed as a delta within
> +parentheses on the modification line.  Example outputs are
> +below:
> +
> + M   /myfiles/
> + M   /myfiles/link_to_me   (+1)
> + R   /myfiles/rename_me -> /myfiles/renamed
> + -   /myfiles/delete_me
> + +   /myfiles/new_file

Is there any escaping of whitespace and non-printable characters in the
pathnames?  If not then the above format is ambiguous and cannot be
safely scripted.

There are several ways that you could address that problem, such as
escaping (HTML/XML entities? backslash escapes? pick your poison),
adding a length field preceding either each path or each path that
contains whitespace, or use a multi-line (for renames) format with paths
being the last field such that you need only [backslash?-]escape
newlines.

Probably the simplest answer is backslash-escaping whitespace, since
shells can handle that trivially.  That is what I recommend.

Nico
-- 


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Matthew Ahrens
Nicolas Williams wrote:
> On Mon, Mar 29, 2010 at 12:23:56PM -0400, Sebastien Roy wrote:
>   
>>> Is there any escaping of whitespace and non-printable characters in the
>>> pathnames?  If not then the above format is ambiguous and cannot be
>>> safely scripted.
>>>   
>> Taking a step back here, this subcommand is not the only zfs
>> subcommand whose output could be subject to parsing by scripts.
>> 
>
> Many zfs sub-commands already have a -H option ("Display output in a
> form more easily parsed by scripts").
>   

However, those subcommands don't deal with filenames.  They deal with 
things like filesystem names and property names/values, which are all 
much more constrained than filenames.

How do commands like ls and find handle printing of filenames with 
arbitrary characters (newlines and such)?

--matt


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Joep Vesseur
On 03/28/10 21:12, Tim Haley wrote:

> Five types of change are described:
>
> o File/Directory modified
> o File/Directory present in older snapshot but not newer
> o File/Directory present in newer snapshot but not older
> o File/Directory renamed
> o File link count changed

Is there any provision made to detect/display file property changes? Or will
those be covered by a change to the parent-directory?


 >  [...] For diffs
 >  between existing snapshots, the necessary privilege is
 >  {PRIV_SYS_CONFIG}.

That seems an odd privilege to require...?

Joep


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Don Cragun
On Mar 28, 2010, at 13:12:16 -0600, Tim Haley wrote:

Please find documentation nits in-line below...

 - Don

 ... ... ...
> 
> 4. Technical Description
> 
> There is a long-standing RFE for zfs to be able to describe
> what has changed between the snapshots of a dataset.
> To provide this capability, we propose a new 'zfs diff'
> sub-command.  When run with appropriate privilege the
> sub-command describes what file system level changes have
> occurred between the requested snapshots.  A diff between the
> current version of the file system and one of its snapshots is
> also supported.
> 
> Five types of change are described:
> 
> oFile/Directory modified
> oFile/Directory present in older snapshot but not newer
> oFile/Directory present in newer snapshot but not older
> oFile/Directory renamed

A directory is a type of file.  Saying "File/Directory" is a long way
of saying "File".  The other possibility is that you meant "Regular
File/Directory", but that would only be appropriate if you mean that
zfs diff does not report changes to symbolic links, character special
file, block special files, doors, FIFOs, sockets, and any other types
of files zfs supports.

> oFile link count changed
> 
> Diffs can be performed if the user is delegated the "diff"
> permission.  The "diff" permission is being introduced by this
> case.  Diffs can also be performed without the "diff"
> permission, if the user has appropriate privilege.  For diffs
> between existing snapshots, the necessary privilege is
> {PRIV_SYS_CONFIG}.  For diff between the current file system
> and a snapshot {PRIV_SYS_MOUNT} is also necessary.
> 
> Also introduced by this case is a system attribute on zfs files
> called 'generation'.  This attribute is part of the
> XATTR_VIEW_READONLY described in PSARC 2007/315.  It is
> generated automatically by the ZFS module.
> 
> Man page changes:
 ... ... ...
> --- zfs.1m.rogi Sun Mar 21 17:01:04 2010
> +++ zfs.1m  Sun Mar 28 12:20:04 2010
> @@ -165,6 +165,9 @@
>   zfs release [-r] tag snapshot...
> 
> 
> + zfs diff snapshot snapshot|filesystem
> +
> +
>  DESCRIPTION
>   The zfs command configures ZFS datasets within a ZFS storage
>   pool,  as described in zpool(1M). A dataset is identified by
> @@ -1638,7 +1641,41 @@
>   size, the resulting behavior is undefined.
> 
> 
> + zfs diff snapshot  snapshot | filesystem
> 
> + Gives a high level description of the differences between a
> + snapshot and a descendant dataset.  The descendant may either
> + be a later snapshot of the dataset or the current dataset.
> + For each file system object that has undergone a change
> + between the original snapshot and the descendant, the type of
> + change is described along with the name of the file or
> + directory.  In the case of a rename, both the old and new
> + names are shown.
> +
> + The type of change is described with a single character:
> +
> + +   Indicates the file/directory was added in the later dataset
> + -   Indicates the file/directory was removed in the later dataset
> + M   Indicates the file/directory was modified in the later dataset
> + R   Indicates the file/directory was renamed in the later dataset

Again, "file/directory" should just be "file" in all four lines above.

> +
> +If the modification involved a change in the link count of a
> +file, the change will be expressed as a delta within
> +parentheses on the modification line.  Example outputs are
> +below:
> +
> + M   /myfiles/
> + M   /myfiles/link_to_me   (+1)
> + R   /myfiles/rename_me -> /myfiles/renamed
> + -   /myfiles/delete_me
> + +   /myfiles/new_file
 ... ... ...


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Darren J Moffat
On 29/03/2010 09:44, Joep Vesseur wrote:
> On 03/28/10 21:12, Tim Haley wrote:
>
>> Five types of change are described:
>>
>> o File/Directory modified
>> o File/Directory present in older snapshot but not newer
>> o File/Directory present in newer snapshot but not older
>> o File/Directory renamed
>> o File link count changed
>
> Is there any provision made to detect/display file property changes? Or
> will
> those be covered by a change to the parent-directory?
>
>
>  > [...] For diffs
>  > between existing snapshots, the necessary privilege is
>  > {PRIV_SYS_CONFIG}.
>
> That seems an odd privilege to require...?

While odd it is consistent with the rest of ZFS.  You need SYS_CONFIG to 
do "disk" or pool level stuff if you don't have a ZFS delegation.

-- 
Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Darren J Moffat
On 28/03/2010 20:12, Tim Haley wrote:
> I am sponsoring the following fast-track on behalf of myself.  This
> case introduces a new zfs sub-command for describing differences
> between snapshots in a zfs hierarchy. A delegated permission and
> read-only system attribute are also introduced to support the
> sub-command. The case requests micro/patch binding.

I'm happy with the case as specified so it gets my +1.

-- 
Darren J Moffat


zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Tim Haley
On 03/29/10 09:01 AM, Joep Vesseur wrote:
> On 03/29/10 16:57, Tim Haley wrote:
>> Not sure exactly what sort of properties you have in mind - many I can
>> think of would
>> result in us detecting a modification to the file itself, for example:
>>
>> # echo 'tim was here' > file
>> # zfs snapshot toad/timh at before
>> # touch file
>> # zfs snapshot toad/timh at after
>> # zfs diff toad/timh at before toad/timh at after
>> M /toad/timh/file
>
> I meant something like
>
>   chmod u+s /usr/bin/sh
>
> Would this be reported as a change to /usr/bin? And would someone have 
> to go
> and look at the diff of the directory listing?
>
> Or would it be reported as a change to /usr/bin/sh?
>
> Joep
It's reported as a modification to /usr/bin/sh.

-tim



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-29 Thread Tim Haley
On 03/29/10 02:44 AM, Joep Vesseur wrote:
> On 03/28/10 21:12, Tim Haley wrote:
>
>> Five types of change are described:
>>
>> o File/Directory modified
>> o File/Directory present in older snapshot but not newer
>> o File/Directory present in newer snapshot but not older
>> o File/Directory renamed
>> o File link count changed
>
> Is there any provision made to detect/display file property changes? 
> Or will
> those be covered by a change to the parent-directory?
>
> Joep
Not sure exactly what sort of properties you have in mind - many I can 
think of would
result in us detecting a modification to the file itself, for example:

# echo 'tim was here' > file
# zfs snapshot toad/timh at before
# touch file
# zfs snapshot toad/timh at after
# zfs diff toad/timh at before toad/timh at after
M   /toad/timh/file

-tim



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-28 Thread Darren Reed
On 03/28/10 12:12, Tim Haley wrote:
> I am sponsoring the following fast-track on behalf of myself.  This
> case introduces a new zfs sub-command for describing differences
> between snapshots in a zfs hierarchy.  A delegated permission and
> read-only system attribute are also introduced to support the
> sub-command.  The case requests micro/patch binding.
>
> Template Version: @(#)sac_nextcase 1.69 02/15/10 SMI
> This information is Copyright 2010 Sun Microsystems
> 1. Introduction
> 1.1. Project/Component Working Name:
>  zfs diff
> 1.2. Name of Document Author/Supplier:
>  Author:  Tim Haley
> 1.3  Date of This Document:
> 28 March, 2010
>
> 4. Technical Description
>
> There is a long-standing RFE for zfs to be able to describe
> what has changed between the snapshots of a dataset.
> To provide this capability, we propose a new 'zfs diff'
> sub-command.  When run with appropriate privilege the
> sub-command describes what file system level changes have
> occurred between the requested snapshots.  A diff between the
> current version of the file system and one of its snapshots is
> also supported.
>
> Five types of change are described:
>
> oFile/Directory modified
> oFile/Directory present in older snapshot but not newer
> oFile/Directory present in newer snapshot but not older
> oFile/Directory renamed
> oFile link count changed
>
> Diffs can be performed if the user is delegated the "diff"
> permission.  The "diff" permission is being introduced by this
> case.  Diffs can also be performed without the "diff"
> permission, if the user has appropriate privilege.  For diffs
> between existing snapshots, the necessary privilege is
> {PRIV_SYS_CONFIG}.  For diff between the current file system
> and a snapshot {PRIV_SYS_MOUNT} is also necessary.
>
> Also introduced by this case is a system attribute on zfs files
> called 'generation'.  This attribute is part of the
> XATTR_VIEW_READONLY described in PSARC 2007/315.  It is
> generated automatically by the ZFS module.

Does "zfs diff" work between different versions of zfs filesystems?
If so, what are the restrictions?

The documentation for "zfs upgrade" isn't clear on if you can
upgrade a filesystem without upgrading its snapshot or vice
versa.

What commitment level are you seeking for the subcommand?
And its output?

Darren



zfs diff [PSARC/2010/105 FastTrack timeout 04/05/2010]

2010-03-28 Thread Tim Haley
I am sponsoring the following fast-track on behalf of myself.  This
case introduces a new zfs sub-command for describing differences
between snapshots in a zfs hierarchy.  A delegated permission and
read-only system attribute are also introduced to support the
sub-command.  The case requests micro/patch binding.

Template Version: @(#)sac_nextcase 1.69 02/15/10 SMI
This information is Copyright 2010 Sun Microsystems
1. Introduction
 1.1. Project/Component Working Name:
  zfs diff
 1.2. Name of Document Author/Supplier:
  Author:  Tim Haley
 1.3  Date of This Document:
 28 March, 2010

4. Technical Description

 There is a long-standing RFE for zfs to be able to describe
 what has changed between the snapshots of a dataset.
 To provide this capability, we propose a new 'zfs diff'
 sub-command.  When run with appropriate privilege the
 sub-command describes what file system level changes have
 occurred between the requested snapshots.  A diff between the
 current version of the file system and one of its snapshots is
 also supported.

 Five types of change are described:

 oFile/Directory modified
 oFile/Directory present in older snapshot but not newer
 oFile/Directory present in newer snapshot but not older
 oFile/Directory renamed
 oFile link count changed

 Diffs can be performed if the user is delegated the "diff"
 permission.  The "diff" permission is being introduced by this
 case.  Diffs can also be performed without the "diff"
 permission, if the user has appropriate privilege.  For diffs
 between existing snapshots, the necessary privilege is
 {PRIV_SYS_CONFIG}.  For diff between the current file system
 and a snapshot {PRIV_SYS_MOUNT} is also necessary.

 Also introduced by this case is a system attribute on zfs files
 called 'generation'.  This attribute is part of the
 XATTR_VIEW_READONLY described in PSARC 2007/315.  It is
 generated automatically by the ZFS module.

Man page changes:

--- fgetattr.3c.rogiSun Mar 21 18:47:29 2010
+++ fgetattr.3c Sun Mar 21 18:50:50 2010
@@ -97,6 +97,7 @@
   XATTR_VIEW_READONLYA_FSID   uint64_value
  A_OPAQUE boolean_value
  A_AV_SCANSTAMP   uint8_array[]
+A_GENuint64_value
   XATTR_VIEW_READWRITE   A_READONLY   boolean_value
  A_HIDDEN boolean_value
  A_SYSTEM boolean_value

--- zfs.1m.rogi Sun Mar 21 17:01:04 2010
+++ zfs.1m  Sun Mar 28 12:20:04 2010
@@ -165,6 +165,9 @@
   zfs release [-r] tag snapshot...


+ zfs diff snapshot snapshot|filesystem
+
+
  DESCRIPTION
   The zfs command configures ZFS datasets within a ZFS storage
   pool,  as described in zpool(1M). A dataset is identified by
@@ -1638,7 +1641,41 @@
   size, the resulting behavior is undefined.


+ zfs diff snapshot  snapshot | filesystem

+ Gives a high level description of the differences between a
+ snapshot and a descendant dataset.  The descendant may either
+ be a later snapshot of the dataset or the current dataset.
+ For each file system object that has undergone a change
+ between the original snapshot and the descendant, the type of
+ change is described along with the name of the file or
+ directory.  In the case of a rename, both the old and new
+ names are shown.
+
+ The type of change is described with a single character:
+
+ +   Indicates the file/directory was added in the later dataset
+ -   Indicates the file/directory was removed in the later dataset
+ M   Indicates the file/directory was modified in the later dataset
+ R   Indicates the file/directory was renamed in the later dataset
+
+If the modification involved a change in the link count of a
+file, the change will be expressed as a delta within
+parentheses on the modification line.  Example outputs are
+below:
+
+ M   /myfiles/
+ M   /myfiles/link_to_me   (+1)
+ R   /myfiles/rename_me -> /myfiles/renamed
+ -   /myfiles/delete_me
+ +   /myfiles/new_file
+
+Users must be granted the diff permission with zfs allow in
+order to use this sub-command, unless they already have the
+{PRIV_SYS_CONFIG} privilege and, in the current file system
+versus snapshot case, the {PRIV_SYS_MOUNT} privilege.
+
+
   zfs destroy [-rRf] filesystem|volume


@@ -2733,6 +2770,7 @@
 'mount'
 ability in the origin file system