On Sun, 26 Apr 2020 15:26:58 +1200 DL Neil <pythonl...@danceswithmice.info> wrote:
> On 25/04/20 7:53 PM, Manfred Lotz wrote: > > On Sat, 25 Apr 2020 18:41:37 +1200 > > DL Neil <pythonl...@danceswithmice.info> wrote: > > > >> On 25/04/20 5:16 PM, Manfred Lotz wrote: > >>> On Fri, 24 Apr 2020 19:12:39 -0300 > >>> Cholo Lennon <chololen...@hotmail.com> wrote: > >>> > >>>> On 24/4/20 15:40, Manfred Lotz wrote: > >>>>> I have a command like application which checks a directory tree > >>>>> for certain things. If there are errors then messages will be > >>>>> written to stdout. > > > What I do here specifically is to check directory trees' file > > objects for user and group ownerships as well as permissions > > according to a given policy. There is a general policy and there > > could be exceptions which specify a specific policy for a certain > > file. > > If I have understood correctly, the objective is to check a dir-tree > to ensure that specific directory/file-permissions are in-effect/have > not been changed. The specifications come from a .JSON file and may > be over-ridden by command-line arguments. Correct? > Yes. > There must be a whole 'genre' of programs which inspect a > directory-tree and doing 'something' to the files-contained. I had a > few, and needing another, decided to write a generic 'scanner' which > would then call a tailored-function to perform the particular > 'something' - come to think of it, am not sure if that was ever quite > finished. Sigh! > > > >>>>> One idea was for the error situations to write messages to > >>>>> files and then later when running the tests to compare the > >>>>> error messages output to the previously saved output. > >>>>> > >>>>> Is there anything better? > > The next specification appears to be that you want a list of files > and their stats, perhaps reporting only any exceptions which weren't > there 'last time'. > I just want to report violations against the policy(ies) given in the JSON file. > In which case, an exception could be raised, or it might be simpler > to call a reporting-function when a file should be 'reported'. > > I'm still a little confused about the difference between > printing/logging/reporting some sort of exception, and the need to > compare with 'history'. > There is no compare with history. I just want to see current violations (I think I phrased things badly in my previous post so that it looked like I want to see history.) > > The problem with the "previously saved output" is the process of > linking 'today's data' with that from the last run. I would use a > database (but then, that's my background/bias) to store each file's > stat. > As said above I phrased things badly here in my previous post. > Alternately, if you only need to store exceptions, and that number is > likely to be small, perhaps output only the exceptions from 'this > run' to a .JSON/.yaml file - which would become input (and a dict for > easy look-ups) next time? > > > >>>> Maybe I am wrong because I don't understand your scenario: If > >>>> your application is like a command, it has to return an error > >>>> code to the system, a distinct number for each error condition. > >>>> The error code is easier to test than the stdout/stderr. > > Either way, you might decrease the amount of data to be stored by > reducing the file's stat to a code. > > > >>>>> How to test this in the best way? > > The best way to prepare for unit-testing is to have 'units' of code > (apologies!). > > An effective guide is to break the code into functions and methods, > so that each performs exactly one piece/unit of work - and only the > one. A better guide is that if you cannot name the procedure using > one description and want to add an "and" an "or" or some other > conjunction, perhaps there should be more than one procedure! > > For example: > > > for cat in cats: > > ... > > for d in scantree(cat.dir): > > # if `keep_fs` was specified then we must > > # make sure the file is on the same device > > if cat.keep_fs and devid != get_devid(d.path): > > continue > > > > cat.check(d) > Above is part of the main() function. But I could make some part of the main() function into its own function. Then the call to check a directory could be a function argument. For testing I could reuse that function by providing a different function for the check (now it is cat.checkid(d) ) > If this were a function, how would you name it? First of all we are > processing every category, then we are scanning a dir-tree, and > finally we are doing something to each directory found. > > If we split these into separate routines, eg (sub-setting the above): > > > for d in scantree(cat.dir): > do_something_with_directory( d ) > > and you devise a (much) more meaningful name than mine, it will > actually help readers (others - but also you!) to understand the > steps within the logic. > > Now if we have a function which checks a single fileNM for > 'whatever', the code will start something like: > > def stat_file( fileNM ): > '''Gather stat information for nominated file.''' > etc > > So, we can now write a test-function because we don't need any > "categories", we don't need all the dir-tree, and we don't need to > have performed a scan - all we need is a (valid and previously > inspected) file-path. Thus (using Pytest): > > def test_stat_file( ... ): > '''Test file stat.''' > assert stat_file( "...file-path" ) == ...its > known-stat > > def test_stat_dir( ... ): > '''Test file stat.''' > assert stat_file( "...dir-path" ) == ...its known-stat > > There is no need to test for a non-existent file, if you are the only > user! > > In case you hadn't thought about it, make the test path, part of your > test directory - not part of 'the real world'! > > > This is why TDD ("Test-Driven Development) theory says that one > should devise tests 'first' (and the actual code, 'second'). You must > design the logic first, and then start bottom-up, by writing tests to > 'prove' the most simple functionality (and then the code to 'pass' > those tests), gradually expanding the scope of the tests as single > functions are combined in calls from larger/wider/controlling > functions... > > This takes the previous time-honored practice of writing the code > first, and then testing it (or even, 'throwing it over the wall' to a > test department), and turns it on its head. (no wall necessary) > > I find it a very good way of making sure that when coding a system > comprising many 'moving parts', when I make "a little change" to some > component 'over here', the test-regime will quickly reveal any > unintended consequences (effects, artifacts), etc, 'over there' - > even though my (very) small brain hadn't realised the impact! ("it's > just one, little, change!") > > > > The policy file is a JSON file and could have different categories. > > Each category defines a policy for a certain directory tree. Comand > > line args could overwrite directory names, as well as user and > > group. Or if for example a directory is not specified in the JSON > > file I am required to specify it via command line. Otherwise no > > check can take place. > > default": > > { > > "match_perms": "644", > > "match_permsd": "755", > > "match_permsx": "755", > > "owner": "manfred", > > "group": "manfred" > > } > > Won't a directory-path be required, to tie 'policy' to 'tree'? > Yes, a directory path is required. If not present here then it must be supplied by command line arguments. > > Is that enough feedback to help you take a few more steps? Yes, I think so. Thanks very much. The comments/advices I got were pretty helpful. I already have started to make improvements in my code to be able to test things better. For example I have changed the check() method in the Policy class to get called like this: def check(self, fpath, user, group, mode): This means I can test the checks and its result without requiring any external test data. Thanks a lot to all for your help. -- Manfred -- https://mail.python.org/mailman/listinfo/python-list