philiprbrenan commented on PR #37609: URL: https://github.com/apache/airflow/pull/37609#issuecomment-1965642832
I have had the benefit of being forced to use ``Airflow`` on an important project with real time pressure and monetary consequences. I say _forced_ because the first time I saw ``Airflow`` I was underwhelmed. My initial take was: "_so what - this does not seem to advance the state of the art beyond anything that I can already do, so why would I want to take the time to learn how to use this_"? Indeed, using it seemed a retrograde step because of the need to define all the tasks in the ``DAG`` before the ``DAG`` could be run. Thus, initially ``Airflow`` appeared to me as a confused heap of building blocks that had been left lying littered around in a corner of the Internet for people to stumble into and waste their time falling over them in the dark as opposed to something essential. Rather like the initial Soviet view of the intelligence flowing from Los Alamos. Why for example did we need a bash operator when there was plenty of capability to execute stuff from the command line in Python? Why did we need a special Python operator for determining the next task to run? I was already perfectly capable of sending an email or executing a bash command from Python and did not want to know that there was yet another way of doing these mundane tasks. In particular, this amorphous mass of blocks seemed to lack the conceptual integrity considered so essential in system design as detailed in Chapter 4 of the "Mythical Man Month -- Brookes" thus: _I will contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas._ The thing that changed my mindset completely, a Damascene conversion that transformed me from scathingly ``anti`` to completely ``pro``, was that eventually, after some considerable time and effort, I discovered, deep in the documentation, the pure Python solution using just the ``@dag`` and ``@task`` decorators. These decorators allowed me to mark up my code to get it run in parallel under ``Airflow`` with the smallest amount of change and disruption possible to the existing code. Suddenly, ``Airflow`` went from being extremely annoying to powerfully enabling, from almost ``useless`` to absolutely ``essential``. Fired by my experience I wrote a proposed new intro to ``Airflow`` that would spare future users the tedious and expensive learning process that I had been forced to go through. My first version of the improved intro used only the ``@dag`` and ``@task`` decorators to demonstrate a minimal pure Python ``DAG`` that illustrated the essential power of ``Airflow`` in a compelling and easily understood way. The idea was that a skilled Python programmer could then easily fill in all the other details for themselves if they wanted to, or if they preferred, dive deeper into the other offerings available in ``Airflow``. Either way they would be in through the door and we would be in business. I proposed my improved intro via a pull request. Helpful comments were made and the initial proposal improved further. One comment that was made was that there should also be a demonstration of the ``>>`` operator as it was seen as an important user facing component. Initially I was in favor of this and I modified my example to include usage of the ``>>`` operator. To my surprise, I found out you cannot use ``>>`` with ``@task`` . To overcome this problem, I produced a small separate ``DAG`` that used ``>>`` simply to illustrate this feature and placed it first before getting on with the main business of the ``@dag`` and ``@task`` decorators without further reference to ``>>``. I have to say that I do not find it easy to explain the ``>>`` operator to other users. Programmers, in particular, want to see a flow of data through their code: you call this routine to get this value which you then hand to this routine to process it further. Stuff that is earlier in the code usually gets executed than stuff later in the code. But the ``>>`` seems to require one to redundantly repeat something that is obvious anyway while ignoring the flow of data. The power of the decorators is that all of these issues suddenly disappear. ``Airflow`` code starts to looks like normal code again with ``Airflow`` working invisibly behind the scenes to move tasks to different machines and figure out which tasks can be run in parallel because they lack dependencies rather than requiring the user to state what they think the dependencies might be. ``>>`` feels redundant and alien while ``@task`` feels comfortable and familiar. Having invested a good deal of time in creating the proposed intro I produced a preceding ``>>`` example in high hopes of getting through to completion and acceptance of my main proposal. But really, all that having two examples in the intro does is confuse the new ``Airflow`` user by providing two wildly different ways of doing two different things. And such users will be confused even more by asking them to make an important decision, which approach they prefer, when they are least capable of making such a decision. We are now proposing to dig this hole deeper by having a tab where the new user has to make that confusing initial decision with the help of a GUI, that must by its nature, give precedence to one example over the other and thus exert an unwanted bias on the unsuspecting new user hoping to be enlightened as to the essential essence of ``Airflow`` with as few clicks and as little bias as possible. We should spare new users this conundrum. We should put our best foot forward from the get go. The decorators approach is compelling to new users in a way that the ``>>`` operator is not. ``>>`` is important infrastructure, but it is not architecture. The decorators approach will compel new users to enter the cathedral of ``Airflow`` to admire the beauty of its architectural simplicity and integrity, as recommended by Brookes. And if the new intro does not work as expected, we can easily revert to the existing intro and try again later in the light of the new experience gained. May I therefore recommend that I be encouraged to resubmit my proposal minus the leading ``>>`` operator example in the sure and certain expectation of imminent publication? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org