On 25/04/20 7:53 PM, Manfred Lotz wrote:
On Sat, 25 Apr 2020 18:41:37 +1200
DL Neil <pythonl...@danceswithmice.info> wrote:

On 25/04/20 5:16 PM, Manfred Lotz wrote:
On Fri, 24 Apr 2020 19:12:39 -0300
Cholo Lennon <chololen...@hotmail.com> wrote:
On 24/4/20 15:40, Manfred Lotz wrote:
I have a command like application which checks a directory tree
for certain things. If there are errors then messages will be
written to stdout.

> What I do here specifically is to check directory trees' file objects
> for user and group ownerships as well as permissions according to a
> given policy. There is a general policy and there could be exceptions
> which specify a specific policy for a certain file.

If I have understood correctly, the objective is to check a dir-tree to ensure that specific directory/file-permissions are in-effect/have not been changed. The specifications come from a .JSON file and may be over-ridden by command-line arguments. Correct?

There must be a whole 'genre' of programs which inspect a directory-tree and doing 'something' to the files-contained. I had a few, and needing another, decided to write a generic 'scanner' which would then call a tailored-function to perform the particular 'something' - come to think of it, am not sure if that was ever quite finished. Sigh!


>>>>> One idea was for the error situations to write messages to files
>>>>> and then later when running the tests to compare the error
>>>>> messages output to the previously saved output.
>>>>>
>>>>> Is there anything better?

The next specification appears to be that you want a list of files and their stats, perhaps reporting only any exceptions which weren't there 'last time'.

In which case, an exception could be raised, or it might be simpler to call a reporting-function when a file should be 'reported'.

I'm still a little confused about the difference between printing/logging/reporting some sort of exception, and the need to compare with 'history'.


The problem with the "previously saved output" is the process of linking 'today's data' with that from the last run. I would use a database (but then, that's my background/bias) to store each file's stat.

Alternately, if you only need to store exceptions, and that number is likely to be small, perhaps output only the exceptions from 'this run' to a .JSON/.yaml file - which would become input (and a dict for easy look-ups) next time?


>>>> Maybe I am wrong because I don't understand your scenario: If your
>>>> application is like a command, it has to return an error code to
>>>> the system, a distinct number for each error condition. The error
>>>> code is easier to test than the stdout/stderr.

Either way, you might decrease the amount of data to be stored by reducing the file's stat to a code.


How to test this in the best way?

The best way to prepare for unit-testing is to have 'units' of code (apologies!).

An effective guide is to break the code into functions and methods, so that each performs exactly one piece/unit of work - and only the one. A better guide is that if you cannot name the procedure using one description and want to add an "and" an "or" or some other conjunction, perhaps there should be more than one procedure!

For example:

>    for cat in cats:
>          ...
>          for d in scantree(cat.dir):
>              # if `keep_fs` was specified then we must
>              # make sure the file is on the same device
>              if cat.keep_fs and devid != get_devid(d.path):
>                  continue
>
>              cat.check(d)

If this were a function, how would you name it? First of all we are processing every category, then we are scanning a dir-tree, and finally we are doing something to each directory found.

If we split these into separate routines, eg (sub-setting the above):

>          for d in scantree(cat.dir):
               do_something_with_directory( d )

and you devise a (much) more meaningful name than mine, it will actually help readers (others - but also you!) to understand the steps within the logic.

Now if we have a function which checks a single fileNM for 'whatever', the code will start something like:

        def stat_file( fileNM ):
                '''Gather stat information for nominated file.'''
                etc

So, we can now write a test-function because we don't need any "categories", we don't need all the dir-tree, and we don't need to have performed a scan - all we need is a (valid and previously inspected) file-path. Thus (using Pytest):

        def test_stat_file( ... ):
                '''Test file stat.'''
                assert stat_file( "...file-path" ) == ...its known-stat

        def test_stat_dir( ... ):
                '''Test file stat.'''
                assert stat_file( "...dir-path" ) == ...its known-stat

There is no need to test for a non-existent file, if you are the only user!

In case you hadn't thought about it, make the test path, part of your test directory - not part of 'the real world'!


This is why TDD ("Test-Driven Development) theory says that one should devise tests 'first' (and the actual code, 'second'). You must design the logic first, and then start bottom-up, by writing tests to 'prove' the most simple functionality (and then the code to 'pass' those tests), gradually expanding the scope of the tests as single functions are combined in calls from larger/wider/controlling functions...

This takes the previous time-honored practice of writing the code first, and then testing it (or even, 'throwing it over the wall' to a test department), and turns it on its head. (no wall necessary)

I find it a very good way of making sure that when coding a system comprising many 'moving parts', when I make "a little change" to some component 'over here', the test-regime will quickly reveal any unintended consequences (effects, artifacts), etc, 'over there' - even though my (very) small brain hadn't realised the impact! ("it's just one, little, change!")


The policy file is a JSON file and could have different categories.
Each category defines a policy for a certain directory tree. Comand
line args could overwrite directory names, as well as user and group.
Or if for example a directory is not specified in the JSON file I am
required to specify it via command line. Otherwise no check can take
place.
default":
   {
       "match_perms":  "644",
       "match_permsd":  "755",
       "match_permsx":  "755",
       "owner":  "manfred",
       "group":  "manfred"
   }

Won't a directory-path be required, to tie 'policy' to 'tree'?


Is that enough feedback to help you take a few more steps?
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to