On Friday, February 3, 2023 at 5:31:56 PM UTC-5, Thomas Passin wrote:
> On 2/3/2023 4:18 PM, transreductionist wrote: 
> > Here is the situation. There is a top-level module (see designs below) 
> > containing code, that as the name suggests, manages an ETL pipeline. A 
> > directory is created called etl_helpers that organizes several modules 
> > responsible for making up the pipeline. The discussion concerns the Python 
> > language, which supports OOP as well as Structural/Functional approaches to 
> > programming. 
> > 
> > I am interested in opinions on which design adheres best to standard 
> > architectural practices and the SOLID principles. I understand that this is 
> > one of those topics where people may have strong opinions one way or the 
> > other. I am interested in those opinions.
> Well, you have pretty well stacked the deck to make DESIGN 1 the 
> obviously preferred choice. I don't think it has much to do with Python 
> per se, or even with OO vs imperative style. 
> 
> As a practical matter, once you got into working with 
> extract_transform_load.py (for the other designs), I would expect that 
> you would start wanting to refactor it and eventually end up more like 
> DESIGN 1. So you might as well start out that way. 
> 
> The reasons are 1) what you said about separation of concerns, 2) a 
> desire to keep each module or file relatively coherent and easy to read, 
> and 3, as you also suggested, making each of them easier to test. 
> Decoupling is important too (one of the SOLID prescriptions), but you 
> can violate that with any architecture if you don't think carefully 
> about what you are doing. 
> 
> On the subject of OO, I think it is a very good approach to think about 
> architecture and design in object terms - meaning conceptual objects 
> from the users' point of view. For example, here you have a pipeline (a 
> metaphorical or userland object). It will need functionality to load, 
> transform, and output data so logically it can be composed of a loader, 
> one or more transformers, and one or more output formatters (more 
> objects). You may also need a scheduler and a configuration manager 
> (more objects). 
> 
> (*Please* let's not have any quibbling about "class" vs "object". We 
> are at a conceptual level here!) 
> 
> When it comes to implementation, you can choose to implement those 
> userland objects with either imperative, OO, or functional techniques, 
> or a mixture.
> > Allow me to give my thoughts. First, I don't think there would be much 
> > difference if I was using OOP for the functionality, or using a structural 
> > paradigm. A structural paradigm in my opinion, along the lines of Rich 
> > Hickey's comments on simple versus complex, would be a simpler 
> > implementation. In this case there is no reason to create a construct with 
> > state. So let's assume the code is structural and not OOP. 
> > 
> > I would go with Design I. Succinctly stated, Design I supports readability 
> > and maintainability at least as well, if not better than the other designs. 
> > The goal of the SOLID principles are the creation of mid-level software 
> > structures that (Software Architecture: SA Martin). I think Design I best 
> > adheres to these principles of: 
> > ---- Tolerate change, 
> > ---- Are easy to understand, and 
> > ---- Are the basis of components that can be used in many software systems. 
> > 
> > I could point to the Single Responsibility Principle which is defined as 
> > (SA Martin): a module should be responsible to one, and only one, actor. It 
> > should satisfy the Liskov Substitution Principle as well. Further, each 
> > module in the etl_helpers directory is at the same level of abstraction. 
> > 
> > I could also mention that as Dijkstra stressed, at every level, from the 
> > smallest function to the largest component, software is like a science and, 
> > therefore, is driven by falsifiability. Software architects strive to 
> > define modules, components, and services that are easily falsifiable 
> > (testable). To do so, they employ restrictive disciplines similar to 
> > structured programming, 
> > albeit at a much higher level (SA Martin). 
> > 
> > One can point to multiple reasons why Design I might be preferred, but what 
> > are the compelling reasons, if there are any, that would suggest another 
> > design was superior. 
> > 
> > Finally, let me reference an interesting research paper I read recently 
> > that seems to support the other designs as anti-patterns: 
> > Architecture_Anti-patterns_Automatically.pdf 
> > 
> > ---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf) 
> > 
> > SEVERAL DESIGNS FOR COMPARISON 
> > 
> > DESIGN I: 
> > 
> > ---- manage_the_etl_pipeline.py 
> > ---- etl_helpers 
> > ---- extract.py 
> > ---- transform.py 
> > ---- load.py 
> > 
> > Of course one could also 
> > 
> > DESIGN II: 
> > 
> > ---- manage_the_etl_pipeline.py 
> > ---- etl_helpers 
> > ---- extract_transform_load.py 
> > 
> > or probably even: 
> > 
> > DESIGN III: 
> > 
> > ---- manage_the_etl_pipeline.py 
> > ---- extract_transform_load.py


On point that I think is worth making ,and I forgot to make it, is that 
namespaces are ubiquitous in Python: Built-in, Global,  Function, and Enclosing 
namespaces, as well as user namespaces, e.g. dictionaries, the SimpleNamespace, 
and DataClasses to list just a few. Modules ARE namespaces. Namespaces organize 
programming constructs like classes, functions, variables, etc. into coherent 
groups of "things". To have a namespace that complects extract constructs with 
transform constructs, and load constructs in one module seems un-pythonistic.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to