Re: [PR] Modernized front page scenario [airflow]

via GitHub Mon, 26 Feb 2024 17:47:51 -0800


philiprbrenan commented on PR #37609:
URL: https://github.com/apache/airflow/pull/37609#issuecomment-1965642832


   I have had the benefit of being forced to use ``Airflow`` on an important 
project with real time pressure and monetary consequences.
   
   I say _forced_ because the first time I saw ``Airflow`` I was underwhelmed. 
My initial take was: "_so what - this does not seem to advance the state of the 
art beyond anything that I can already do, so why would I want to take the time 
to learn how to use this_"? Indeed, using it seemed a retrograde step because 
of the need to define all the tasks in the ``DAG`` before the ``DAG`` could be 
run.
   
   Thus, initially ``Airflow`` appeared to me as a confused heap of building 
blocks that had been left lying littered around in a corner of the Internet for 
people to stumble into and waste their time falling over them in the dark as 
opposed to something essential.  Rather like the initial Soviet view of the 
intelligence flowing from Los Alamos.   Why for example did we need a bash 
operator when there was plenty of capability to execute stuff from the command 
line in Python?  Why did we need a special Python operator for determining the 
next task to run? I was already perfectly capable of sending an email or 
executing a bash command from Python and did not want to know that there was 
yet another way of doing these mundane tasks.
   
   In particular, this amorphous mass of blocks seemed to lack the conceptual 
integrity considered so essential in system design as detailed in Chapter 4 of 
the "Mythical Man Month -- Brookes" thus:
   
   _I will contend that conceptual integrity is the most important 
consideration in system design. It is better to have a system omit certain 
anomalous features and improvements, but to reflect one set of design ideas, 
than to have one that contains many good but independent and uncoordinated 
ideas._
   
   The thing that changed my mindset completely, a Damascene conversion that 
transformed me from scathingly ``anti`` to completely ``pro``, was that 
eventually, after some considerable time and effort, I discovered, deep in the 
documentation, the pure Python solution using just the ``@dag`` and ``@task`` 
decorators. These decorators allowed me to mark up my code to get it run in 
parallel under ``Airflow`` with the smallest amount of change and disruption 
possible to the existing code. Suddenly, ``Airflow`` went from being extremely 
annoying to powerfully enabling, from almost ``useless`` to absolutely 
``essential``.
   
   Fired by my experience I wrote a proposed new intro to ``Airflow`` that 
would spare future users the tedious and expensive learning process that I had 
been forced to go through.
   
   My first version of the improved intro used only the ``@dag`` and ``@task`` 
decorators to demonstrate a minimal pure Python ``DAG`` that illustrated the 
essential power of ``Airflow`` in a compelling and easily understood way.
   
   The idea was that a skilled Python programmer could then easily fill in all 
the other details for themselves if they wanted to, or if they preferred, dive 
deeper into the other offerings available in ``Airflow``.  Either way they 
would be in through the door and we would be in business.
   
   I proposed my improved intro via a pull request.  Helpful comments were made 
and the initial proposal improved further.
   
   One comment that was made was that there should also be a demonstration of 
the ``>>``  operator as it was seen as an important user facing component.
   
   Initially I was in favor of this and I modified my example to include usage 
of the ``>>`` operator.  To my surprise, I found out you cannot use ``>>`` with 
``@task`` . To overcome this problem, I produced a small separate ``DAG`` that 
used ``>>``  simply to illustrate this feature and placed it first before 
getting on with the main business of  the ``@dag`` and  ``@task`` decorators 
without further reference to ``>>``.
   
   I have to say that I do not find it easy to explain the  ``>>``  operator to 
other users. Programmers, in particular, want to see a flow of data through 
their code: you call this routine to get this value which you then hand to this 
routine to process it further. Stuff that is earlier in the code usually gets 
executed than stuff later in the code.  But the  ``>>``  seems to require one 
to redundantly repeat something that is obvious anyway while ignoring the flow 
of data. The power of the decorators is that all of these issues suddenly 
disappear. ``Airflow`` code starts to looks like normal code again with 
``Airflow`` working invisibly behind the scenes to move tasks to different 
machines and figure out which tasks can be run in parallel because they lack 
dependencies rather than requiring the user to state what they think the 
dependencies might be.   ``>>`` feels redundant and alien while ``@task`` feels 
comfortable and familiar.
   
   Having invested a good deal of time in creating the proposed intro I 
produced a preceding  ``>>``  example in high hopes of getting through to 
completion and acceptance of my main proposal.  But really, all that having two 
examples in the intro does is confuse the new ``Airflow`` user by providing two 
wildly different ways of doing two different things. And such users will be 
confused even more by asking them to make an important decision, which approach 
they prefer,  when they are least capable of making such a decision.
   
   We are now proposing to dig this hole deeper by having a tab  where the new 
user has to make that confusing initial decision with the help of a GUI, that 
must by its nature, give precedence to one example over the other and thus 
exert an unwanted bias on the unsuspecting new user hoping to be enlightened as 
to the essential essence of ``Airflow`` with as few clicks and as little bias 
as possible.
   
   We should spare new users this conundrum.  We should put our best foot 
forward from the get go. The decorators approach is compelling to new users in 
a way that the ``>>``  operator is not. ``>>`` is important infrastructure, but 
it is not architecture. The decorators approach will compel new users to enter 
the cathedral of ``Airflow`` to admire the beauty of its architectural 
simplicity and integrity, as recommended by Brookes. And if the new intro does 
not work as expected, we can easily revert to the existing intro and try again 
later in the light of the new experience gained.
   
   May I therefore recommend that I be encouraged to resubmit my proposal minus 
the leading ``>>`` operator example in the sure and certain expectation of 
imminent publication?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Modernized front page scenario [airflow]

Reply via email to