On Friday, February 3, 2023 at 5:31:56 PM UTC-5, Thomas Passin wrote: > On 2/3/2023 4:18 PM, transreductionist wrote: > > Here is the situation. There is a top-level module (see designs below) > > containing code, that as the name suggests, manages an ETL pipeline. A > > directory is created called etl_helpers that organizes several modules > > responsible for making up the pipeline. The discussion concerns the Python > > language, which supports OOP as well as Structural/Functional approaches to > > programming. > > > > I am interested in opinions on which design adheres best to standard > > architectural practices and the SOLID principles. I understand that this is > > one of those topics where people may have strong opinions one way or the > > other. I am interested in those opinions. > Well, you have pretty well stacked the deck to make DESIGN 1 the > obviously preferred choice. I don't think it has much to do with Python > per se, or even with OO vs imperative style. > > As a practical matter, once you got into working with > extract_transform_load.py (for the other designs), I would expect that > you would start wanting to refactor it and eventually end up more like > DESIGN 1. So you might as well start out that way. > > The reasons are 1) what you said about separation of concerns, 2) a > desire to keep each module or file relatively coherent and easy to read, > and 3, as you also suggested, making each of them easier to test. > Decoupling is important too (one of the SOLID prescriptions), but you > can violate that with any architecture if you don't think carefully > about what you are doing. > > On the subject of OO, I think it is a very good approach to think about > architecture and design in object terms - meaning conceptual objects > from the users' point of view. For example, here you have a pipeline (a > metaphorical or userland object). It will need functionality to load, > transform, and output data so logically it can be composed of a loader, > one or more transformers, and one or more output formatters (more > objects). You may also need a scheduler and a configuration manager > (more objects). > > (*Please* let's not have any quibbling about "class" vs "object". We > are at a conceptual level here!) > > When it comes to implementation, you can choose to implement those > userland objects with either imperative, OO, or functional techniques, > or a mixture. > > Allow me to give my thoughts. First, I don't think there would be much > > difference if I was using OOP for the functionality, or using a structural > > paradigm. A structural paradigm in my opinion, along the lines of Rich > > Hickey's comments on simple versus complex, would be a simpler > > implementation. In this case there is no reason to create a construct with > > state. So let's assume the code is structural and not OOP. > > > > I would go with Design I. Succinctly stated, Design I supports readability > > and maintainability at least as well, if not better than the other designs. > > The goal of the SOLID principles are the creation of mid-level software > > structures that (Software Architecture: SA Martin). I think Design I best > > adheres to these principles of: > > ---- Tolerate change, > > ---- Are easy to understand, and > > ---- Are the basis of components that can be used in many software systems. > > > > I could point to the Single Responsibility Principle which is defined as > > (SA Martin): a module should be responsible to one, and only one, actor. It > > should satisfy the Liskov Substitution Principle as well. Further, each > > module in the etl_helpers directory is at the same level of abstraction. > > > > I could also mention that as Dijkstra stressed, at every level, from the > > smallest function to the largest component, software is like a science and, > > therefore, is driven by falsifiability. Software architects strive to > > define modules, components, and services that are easily falsifiable > > (testable). To do so, they employ restrictive disciplines similar to > > structured programming, > > albeit at a much higher level (SA Martin). > > > > One can point to multiple reasons why Design I might be preferred, but what > > are the compelling reasons, if there are any, that would suggest another > > design was superior. > > > > Finally, let me reference an interesting research paper I read recently > > that seems to support the other designs as anti-patterns: > > Architecture_Anti-patterns_Automatically.pdf > > > > ---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf) > > > > SEVERAL DESIGNS FOR COMPARISON > > > > DESIGN I: > > > > ---- manage_the_etl_pipeline.py > > ---- etl_helpers > > ---- extract.py > > ---- transform.py > > ---- load.py > > > > Of course one could also > > > > DESIGN II: > > > > ---- manage_the_etl_pipeline.py > > ---- etl_helpers > > ---- extract_transform_load.py > > > > or probably even: > > > > DESIGN III: > > > > ---- manage_the_etl_pipeline.py > > ---- extract_transform_load.py
On point that I think is worth making ,and I forgot to make it, is that namespaces are ubiquitous in Python: Built-in, Global, Function, and Enclosing namespaces, as well as user namespaces, e.g. dictionaries, the SimpleNamespace, and DataClasses to list just a few. Modules ARE namespaces. Namespaces organize programming constructs like classes, functions, variables, etc. into coherent groups of "things". To have a namespace that complects extract constructs with transform constructs, and load constructs in one module seems un-pythonistic. -- https://mail.python.org/mailman/listinfo/python-list