jedcunningham commented on a change in pull request #17757:
URL: https://github.com/apache/airflow/pull/17757#discussion_r694946565



##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,190 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+Also make sure to :ref:`Add init file to your folders 
<add_init_py_to_your_folders>`.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)

Review comment:
       ```suggestion
   This is an example structure that you might have in your ``dags`` folder:
   ```

##########
File path: docs/apache-airflow/modules_management.rst
##########
@@ -68,99 +81,192 @@ In the next section, you will learn how to create your own 
simple
 installable package and how to specify additional directories to be added
 to ``sys.path`` using the environment variable :envvar:`PYTHONPATH`.
 
+If you want to import some packages from a directory that is added to 
``PYTHONPATH`` you should import
+it following the full Python path of the files. All directories where you put 
your files have to also
+have an empty ``__init__.py`` file which turns it into Python package. Take as 
an example such structure
+as described below (the root directory which is on the ``PYTHONPATH`` might be 
any of the directories
+listed in the next chapter or those that you added manually to the path.
 
-Creating a package in Python
-----------------------------
+Typical structure of packages
+-----------------------------
 
-1. Before starting, install the following packages:
+This is an example structure that you might have in your ``dags`` folder (see 
below)
 
-``setuptools``: setuptools is a package development process library designed
-for creating and distributing Python packages.
+.. code-block:: none
 
-``wheel``: The wheel package provides a bdist_wheel command for setuptools. It
-creates .whl file which is directly installable through the ``pip install``
-command. We can then upload the same file to `PyPI <pypi.org>`_.
+   <DIRECTORY ON PYTHONPATH>
+   | .airflowignore  -- only needed in in ``dags`` folder, see below
+   | -- my_company
+                 | __init__.py
+                 | common_package
+                 |              |  __init__.py
+                 |              | common_module.py
+                 |              | subpackage
+                 |                         | __init__.py
+                 |                         | subpackaged_util_module.py
+                 |
+                 | my_custom_dags
+                                 | __init__.py
+                                 | my_dag_1.py
+                                 | my_dag_2.py
+                                 | base_dag.py
+
+In the case above, those are the ways you should import the python files:
 
-.. code-block:: bash
+.. code-block:: python
 
-    pip install --upgrade pip setuptools wheel
+   from my_company.common_package.common_module import SomeClass
+   from my_company.common_package.subpackge.subpackaged_util_module import 
AnotherClass
+   from my_company.my_custom_dags.base_dag import BaseDag
 
-2. Create the package directory - in our case, we will call it 
``airflow_operators``.
+You can see the ``.ariflowignore`` file at the root of your folder. This is a 
file that you can put in your
+``dags`` folder to tell Airflow which files from the 'dags` folder should be 
ignored when Airflow
+scheduler looks for DAGs. It should contain regular expressions for the paths 
that should be ignored. You
+do not need to have that file in any other folder in ``PYTHONPATH`` (and also 
you can only keep
+shared code in the other folders, not the actual DAGs).
 
-.. code-block:: bash
+In the example above the dags are only in ``my_custom_dags`` folder, the 
``common_package`` should not be
+scanned by scheduler when searching for DAGS, so we should ignore 
``common_package`` folder. You also
+want to ignore the ``base_dag`` if you keep a base DAG there that 
``my_dag1.py`` and ``my_dag1.py`` derives
+from. Your ``.airflowignore`` should look then like this:
 
-    mkdir airflow_operators
+.. code-block:: none
 
-3. Create the file ``__init__.py`` inside the package and add following code:
+   my_company/common_package/.*
+   my_company/my_custom_dags/base_dag\.py
 
-.. code-block:: python
+Built-in ``PYTHONPATH`` entries in Airflow
+------------------------------------------
 
-    print("Hello from airflow_operators")
+Airflow, when running dynamically adds three directories to the ``sys.path``:
 
-When we import this package, it should print the above message.
+- The ``dags`` folder: It is configured with option ``dags_folder`` in section 
``[core]``.
+- The ``config`` folder: It is configured by setting ``AIRFLOW_HOME`` variable 
(``{AIRFLOW_HOME}/config``) by default.
+- The ``plugins`` Folder: It is configured with option ``plugins_folder`` in 
section ``[core]``.
 
-4. Create ``setup.py``:
+.. note::
+   DAGS folder in Airflow 2 should not be shared with Webserver. While you can 
do it, unlike in Airflow 1.10
+   Airflow has no expectations that the DAGS folder is present for webserver. 
In fact it's a bit of
+   security risk to share ``dags`` folder with the webserver, because it means 
that people who write DAGS
+   can write code that webserver will be able to execute (And Airflow 2 
approach is that webserver should
+   never run code which can be modified by users who write DAGs). Therefore if 
you need to share some code
+   with Webserver, it is highly recommended that you share it via ``config`` 
or ``plugins`` folder or
+   via installed airflow packages (see below). Those folders are usually 
managed and accessible by different
+   users (Admins/DevOps) than DAG folders (those are usually data-scientists), 
so they are considered
+   as safe because they are part of configuration of Airflow installation that 
can be controlled by the
+   people managing the installation.
+
+Best practices for module loading
+---------------------------------
+
+There are a few watch-outs you should be careful about when you import your 
code.
+
+Use unique top package name
+...........................
+
+It is recommended that you always put your dags/common files in a subpackage 
which is unique to your
+deployment (``my_company`` in the example below). It is far too easy to use 
generic names for the
+folders that will clash with other packages already present in the system. For 
example if you
+create ``airflow/operators`` subfolder it will not be accessible because 
Airflow already has a package
+named ``airflow.operators`` and it will look there when importing ``from 
airflow.operators``
+
+Don't use relative imports
+..........................
+
+Never use relative imports (starting with ``.``) that were added in Python 3.
+
+This is tempting to do something like that it in ``my_dag1.py``:
 
 .. code-block:: python
 
-    import setuptools
+   from .base_dag import BaseDag  # NEVER DO THAT!!!!
 
-    setuptools.setup(
-        name="airflow_operators",
-    )
+You should import such shared dag using full path (starting from the directory 
which is added to
+``PYTHONPATH``:
 
-5. Build the wheel:
+.. code-block:: python
 
-.. code-block:: bash
+   from my_company.my_custom_dags.base_dag import BaseDag  # This is cool
 
-    python setup.py bdist_wheel
+The relative imports are counter-intuitive, depending on how you start your 
python code, they behave
+differently. In Airflow the same DAG file might be parsed in different context 
(by scheduler, by worker
+or during the tests) and in those cases, relatives imports might behave 
differently. Always use full
+python package path when you import anything in Airflow DAGs, this will save 
you a lot of troubles.
+You can read more about relative import caveats in
+`this Stack Overflow thread <https://stackoverflow.com/q/16981921/516701>`_
 
-This will create a few directories in the project and the overall structure 
will
-look like following:
+Add ``__init__.py`` in package folders
+......................................
 
-.. code-block:: bash
+When you create folders you should add ``__init__.py`` file as empty files in 
your folders. While in Python 3
+there is a concept of implicit namespaces where you do not have to add those 
files to folder, Airflow
+expects that the files are added to all packages you added.

Review comment:
       Going to resolve this too (for context see the other thread). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to