If I have understood correctly, the objective is to check a dir-tree
to ensure that specific directory/file-permissions are in-effect/have
not been changed. The specifications come from a .JSON file and may
be over-ridden by command-line arguments. Correct?

Yes.


How to test this in the best way?

The best way to prepare for unit-testing is to have 'units' of code
(apologies!).

An effective guide is to break the code into functions and methods,
so that each performs exactly one piece/unit of work - and only the
one. A better guide is that if you cannot name the procedure using
one description and want to add an "and" an "or" or some other
conjunction, perhaps there should be more than one procedure!

For example:

  >    for cat in cats:
  >          ...
  >          for d in scantree(cat.dir):
  >              # if `keep_fs` was specified then we must
  >              # make sure the file is on the same device
  >              if cat.keep_fs and devid != get_devid(d.path):
  >                  continue
  >
  >              cat.check(d)


Above is part of the main() function. But I could make some part of
the main() function into its own function. Then the call to check a
directory could be a function argument. For testing I could reuse that
function by providing a different function for the check (now it is
cat.checkid(d) )

There is no $charge for the number of functions used. Don't hesitate!

Whilst I haven't heard the phrase used recently, we used to talk about systems- and program-design being a process of "step-wise decomposition". It was likened to peeling layers off an onion, or Russian nesting dolls - as one takes 'off' the outer layer, there is more inside.

In programming then, the main() calls a bunch of other functions. Each of those, calls its own set of sub-functions, and so-on. At first, we are talking 'higher levels' of control - and a good choice of function names summarises what's going-on/provides the 'plan of attack' and thus good "documentation"!

As control passes to the 'lower level' of functions, the level of detail increases. You might think of it like a microscope - the more powerful the magnification, the more detail is seen; but the field of view has narrowed/become more focussed.

The idea of having one function perform one (and only one) function is known as "encapsulation". Another term which was a new 'buzz-word', decades ago, is "modular programming" - instead of having one long program, splitting it up into functional components.

Yet another traditional phrase describing computer programs is: "input, process, output". This can also be applied, at the successive levels of detail, to functions - they 'take in' parameter-values, 'process' them in some way, and then return, or 'output' the result.

In a simplistic world, one function's output would become input to the next function. However, back in the real-world, it is more likely that several 'outputs' from earlier functions are used as 'input' by some successive procedure. Which is why we are paid 'the big bucks' (maybe).

Earlier functions should only influence or affect the behavior of later functions by passing data (output from one, becoming input to another). The complex ways in which functions can 'feed' or 'affect' each other is called "cohesion". If each function is dependent only upon its input data, this is held as an ideal. If a function is not independent, that is likely to be and/or become a source of friction and bugs. Again, in an ideal world...

There's a lot of follow-through, if you'd care to do some reading using such key-words...


If this were a function, how would you name it? First of all we are
processing every category, then we are scanning a dir-tree, and
finally we are doing something to each directory found.

If we split these into separate routines, eg (sub-setting the above):

  >          for d in scantree(cat.dir):
                 do_something_with_directory( d )

and you devise a (much) more meaningful name than mine, it will
actually help readers (others - but also you!) to understand the
steps within the logic.

Now if we have a function which checks a single fileNM for
'whatever', the code will start something like:

        def stat_file( fileNM ):
                '''Gather stat information for nominated file.'''
                etc

So, we can now write a test-function because we don't need any
"categories", we don't need all the dir-tree, and we don't need to
have performed a scan - all we need is a (valid and previously
inspected) file-path. Thus (using Pytest):

        def test_stat_file( ... ):
                '''Test file stat.'''
                assert stat_file( "...file-path" ) == ...its
known-stat

        def test_stat_dir( ... ):
                '''Test file stat.'''
                assert stat_file( "...dir-path" ) == ...its known-stat

There is no need to test for a non-existent file, if you are the only
user!

In case you hadn't thought about it, make the test path, part of your
test directory - not part of 'the real world'!
...

The comments/advices I got were pretty helpful.  I already have
started to make improvements in my code to be able to test things
better.

For example I have changed the check() method in the Policy class to
get called like this:

       def check(self, fpath, user, group, mode):

This means I can test the checks and its result without requiring any
external test data.

Thanks a lot to all for your help.

Given your replies, 'now' might be a good time to take a look at Pytest, and see how you could use it to help build better code - by building tested units/functions which are assembled into ever-larger tested-units... (there is a range of choice/other testing aids if Pytest doesn't take your fancy)
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to