On 25/04/20 7:53 PM, Manfred Lotz wrote:
On Sat, 25 Apr 2020 18:41:37 +1200
DL Neil <pythonl...@danceswithmice.info> wrote:
On 25/04/20 5:16 PM, Manfred Lotz wrote:
On Fri, 24 Apr 2020 19:12:39 -0300
Cholo Lennon <chololen...@hotmail.com> wrote:
On 24/4/20 15:40, Manfred Lotz wrote:
I have a command like application which checks a directory tree
for certain things. If there are errors then messages will be
written to stdout.
> What I do here specifically is to check directory trees' file objects
> for user and group ownerships as well as permissions according to a
> given policy. There is a general policy and there could be exceptions
> which specify a specific policy for a certain file.
If I have understood correctly, the objective is to check a dir-tree to
ensure that specific directory/file-permissions are in-effect/have not
been changed. The specifications come from a .JSON file and may be
over-ridden by command-line arguments. Correct?
There must be a whole 'genre' of programs which inspect a directory-tree
and doing 'something' to the files-contained. I had a few, and needing
another, decided to write a generic 'scanner' which would then call a
tailored-function to perform the particular 'something' - come to think
of it, am not sure if that was ever quite finished. Sigh!
>>>>> One idea was for the error situations to write messages to files
>>>>> and then later when running the tests to compare the error
>>>>> messages output to the previously saved output.
>>>>>
>>>>> Is there anything better?
The next specification appears to be that you want a list of files and
their stats, perhaps reporting only any exceptions which weren't there
'last time'.
In which case, an exception could be raised, or it might be simpler to
call a reporting-function when a file should be 'reported'.
I'm still a little confused about the difference between
printing/logging/reporting some sort of exception, and the need to
compare with 'history'.
The problem with the "previously saved output" is the process of linking
'today's data' with that from the last run. I would use a database (but
then, that's my background/bias) to store each file's stat.
Alternately, if you only need to store exceptions, and that number is
likely to be small, perhaps output only the exceptions from 'this run'
to a .JSON/.yaml file - which would become input (and a dict for easy
look-ups) next time?
>>>> Maybe I am wrong because I don't understand your scenario: If your
>>>> application is like a command, it has to return an error code to
>>>> the system, a distinct number for each error condition. The error
>>>> code is easier to test than the stdout/stderr.
Either way, you might decrease the amount of data to be stored by
reducing the file's stat to a code.
How to test this in the best way?
The best way to prepare for unit-testing is to have 'units' of code
(apologies!).
An effective guide is to break the code into functions and methods, so
that each performs exactly one piece/unit of work - and only the one. A
better guide is that if you cannot name the procedure using one
description and want to add an "and" an "or" or some other conjunction,
perhaps there should be more than one procedure!
For example:
> for cat in cats:
> ...
> for d in scantree(cat.dir):
> # if `keep_fs` was specified then we must
> # make sure the file is on the same device
> if cat.keep_fs and devid != get_devid(d.path):
> continue
>
> cat.check(d)
If this were a function, how would you name it? First of all we are
processing every category, then we are scanning a dir-tree, and finally
we are doing something to each directory found.
If we split these into separate routines, eg (sub-setting the above):
> for d in scantree(cat.dir):
do_something_with_directory( d )
and you devise a (much) more meaningful name than mine, it will actually
help readers (others - but also you!) to understand the steps within the
logic.
Now if we have a function which checks a single fileNM for 'whatever',
the code will start something like:
def stat_file( fileNM ):
'''Gather stat information for nominated file.'''
etc
So, we can now write a test-function because we don't need any
"categories", we don't need all the dir-tree, and we don't need to have
performed a scan - all we need is a (valid and previously inspected)
file-path. Thus (using Pytest):
def test_stat_file( ... ):
'''Test file stat.'''
assert stat_file( "...file-path" ) == ...its known-stat
def test_stat_dir( ... ):
'''Test file stat.'''
assert stat_file( "...dir-path" ) == ...its known-stat
There is no need to test for a non-existent file, if you are the only user!
In case you hadn't thought about it, make the test path, part of your
test directory - not part of 'the real world'!
This is why TDD ("Test-Driven Development) theory says that one should
devise tests 'first' (and the actual code, 'second'). You must design
the logic first, and then start bottom-up, by writing tests to 'prove'
the most simple functionality (and then the code to 'pass' those tests),
gradually expanding the scope of the tests as single functions are
combined in calls from larger/wider/controlling functions...
This takes the previous time-honored practice of writing the code first,
and then testing it (or even, 'throwing it over the wall' to a test
department), and turns it on its head. (no wall necessary)
I find it a very good way of making sure that when coding a system
comprising many 'moving parts', when I make "a little change" to some
component 'over here', the test-regime will quickly reveal any
unintended consequences (effects, artifacts), etc, 'over there' - even
though my (very) small brain hadn't realised the impact! ("it's just
one, little, change!")
The policy file is a JSON file and could have different categories.
Each category defines a policy for a certain directory tree. Comand
line args could overwrite directory names, as well as user and group.
Or if for example a directory is not specified in the JSON file I am
required to specify it via command line. Otherwise no check can take
place.
default":
{
"match_perms": "644",
"match_permsd": "755",
"match_permsx": "755",
"owner": "manfred",
"group": "manfred"
}
Won't a directory-path be required, to tie 'policy' to 'tree'?
Is that enough feedback to help you take a few more steps?
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list