potiuk commented on issue #28280:
URL: https://github.com/apache/airflow/issues/28280#issuecomment-1359916847

   Good catch.
   
   
   It is because this is what Jinja2 does by default - it replaces any new line 
it detects in the template (not in the values) with the value of 
newline_sequence passed as parameter of the Template (which is `\n` by 
default). This is nothing specific to Airflow - it's just how Jinja2 works.
   
   You can test it yourself:
   
   When you run this In a regular terminal this one (same with no 
newline_sequence):
   ```
   import jinja2
   
   jinjatemplate = jinja2.Template('Hello \r{{ name }}!', newline_sequence="\n")
   print(jinjatemplate.render(name="World"))
   ```
   
   You will get:
   
   ```
   Hello 
   World!
   ```
   
   Where this one:
   
   ```
   import jinja2
   
   jinjatemplate = jinja2.Template('Hello \r{{ name }}!', newline_sequence="\r")
   print(jinjatemplate.render(name="World"))
   ```
   
   Will print just:
   
   ```
   World!
   ```
   
   The thing is that Jinja2 treats the template as multi-line text file.  It 
reads the temple line by line and later joins the line using "newline_sequence" 
as separator. I think this has a lot to do with the ways how some formatting of 
jinja template and whitespace is done. For example it would be rather difficult 
to achieve whitespace manipulation that Jinja allows you to do with whitespace 
control: 
https://jinja.palletsprojects.com/en/3.1.x/templates/#whitespace-control - so 
JINJA authors actually opted to define what is the "canonical newline" they 
always use in the rendered template. Which makes a lot of sense - imagine the 
complexity if you have a template that mixes both \r and 'n and Jinja 
adding/removing those - with some level of consistency.
   
   Additionally, the `test_bash_operator_heredoc_contains_newlines` does not 
complete successfuly for you because it has a bug (in the task definition). 
   
   Proper command should be:
   
   ```
          bash_command="""
   diff <(
   cat <<EOF
   Line 1
   Line 2
   Line 3
   EOF
   ) <(
   cat <<EOF | dos2unix   # <- Here EOF was in a wrong place
   Line 1
   Line 2
   Line 3
   EOF
   ) || {
       echo >&2 "Bash heredoc contains newlines incorrectly converted to 
Windows CRLF"
       exit 1
   }
   """,
   ```
   
   The problem was that heredoc was pased as input to dos2unix command, not to 
the `cat` command and the cat command simply read from the stdin. Apparently 
astro sdk somehow replaces the stdin when it runs airflow tasks and cat simply 
gets empty string or smth like that.
   
   Solution:
   
   I am not sure if we want to do anything with it. This seems to be standard 
Jinja2 behaviour, and even if it kinda unexpected, I think this is the 
behaviour our users already likely rely on and changing it would be 
backwards-incompatible. Maybe we could specify newline_sequence somewhere in 
DAG definition (same as render_template_as_native_object?) 
   
   In any case, It would be great however if this behaviour is documented.
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to