Organizing modules and their code

transreductionist Fri, 03 Feb 2023 13:49:45 -0800

Here is the situation. There is a top-level module (see designs below) 
containing code, that as the name suggests, manages an ETL pipeline. A 
directory is created called etl_helpers that organizes several modules 
responsible for making up the pipeline. The discussion concerns the Python 
language, which supports OOP as well as Structural/Functional approaches to 
programming.


I am interested in opinions on which design adheres best to standard 
architectural practices and the SOLID principles. I understand that this is one 
of those topics where people may have strong opinions one way or the other. I 
am interested in those opinions. 
 
Allow me to give my thoughts. First, I don't think there would be much 
difference if I was using OOP for the functionality, or using a structural 
paradigm. A structural paradigm in my opinion, along the lines of Rich Hickey's 
comments on simple versus complex, would be a simpler implementation. In this 
case there is no reason to create a construct with state. So let's assume the 
code is structural and not OOP.

I would go with Design I. Succinctly stated, Design I supports readability and 
maintainability at least as well, if not better than the other designs. The 
goal of the SOLID principles are the creation of mid-level software structures 
that (Software Architecture: SA Martin). I think Design I best adheres to these 
principles of:
---- Tolerate change,
---- Are easy to understand, and
---- Are the basis of components that can be used in many software systems.

I could point to the Single Responsibility Principle which is defined as (SA 
Martin): a module should be responsible to one, and only one, actor. It should 
satisfy the Liskov Substitution Principle as well. Further, each module in the 
etl_helpers directory is at the same level of abstraction.

I could also mention that as Dijkstra stressed, at every level, from the 
smallest function to the largest component, software is like a science and, 
therefore, is driven by falsifiability. Software architects strive to define 
modules, components, and services that are easily falsifiable (testable). To do 
so, they employ restrictive disciplines similar to structured programming,
albeit at a much higher level (SA Martin).

One can point to multiple reasons why Design I might be preferred, but what are 
the compelling reasons, if there are any, that would suggest another design was 
superior.

Finally, let me reference an interesting research paper I read recently that 
seems to support the other designs as anti-patterns: 
Architecture_Anti-patterns_Automatically.pdf 

 ---- (https://www.cs.drexel.edu/~yfcai/papers/2019/tse2019.pdf)

SEVERAL DESIGNS FOR COMPARISON

DESIGN I:

---- manage_the_etl_pipeline.py
---- etl_helpers
      ---- extract.py
      ---- transform.py
      ---- load.py

Of course one could also

DESIGN II:

---- manage_the_etl_pipeline.py
---- etl_helpers
      ---- extract_transform_load.py

or probably even:

DESIGN III: 

---- manage_the_etl_pipeline.py
---- extract_transform_load.py
-- 
https://mail.python.org/mailman/listinfo/python-list

Organizing modules and their code

Reply via email to