Unit03 opened a new issue #9595:
URL: https://github.com/apache/airflow/issues/9595


   **Apache Airflow version**: 1.10.9, 1.10.10, trunk
   
   - **OS** (e.g. from /etc/os-release): Linux
   - **Others**: Bash/sh
   
   **What happened**:
   
   Password masking was added to `SparkSubmitOperator` (`SparkSubmitHook`, to 
be precise) in December 2019 (under 
[AIRFLOW-6350](https://issues.apache.org/jira/browse/AIRFLOW-6350); PR: #6917) 
- but it only masks passwords as long as they are in the 
`--foo.password='value'` form; i.e. it must be put in single-quotes and be 
joined with the argument's name via an equal sign.
   
   **What you expected to happen**:
   
   I would expect the forms a) with double-quotes or with no quotes at all b) 
with whitespace instead of an equal sign to also be covered by this mechanism, 
e.g.
   * `--foo.password=value`
   * `--foo.password="value"`
   * `--foo.password 'value'`
   * `--foo.password value`
   * `--foo.password "value"`
   
   But I may be missing something. Is there any reason [the initial 
version](https://github.com/apache/airflow/pull/6917) only covers the 
single-quoted-with-equal-sign form? The regular expression used in the masking 
code ([1.10.9 
version](https://github.com/apache/airflow/blob/1.10.9/airflow/contrib/hooks/spark_submit_hook.py#L229-L236),
 [trunk 
version](https://github.com/apache/airflow/blob/master/airflow/providers/apache/spark/hooks/spark_submit.py#L236-L243))
 looks pretty intentional:
   
   ```python
       def _mask_cmd(self, connection_cmd):
           # Mask any password related fields in application args with key 
value pair
           # where key contains password (case insensitive), e.g. 
HivePassword='abc'
   
           connection_cmd_masked = re.sub(
               r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
               r'\1******', ' '.join(connection_cmd), flags=re.I)
   ```
   
   **How to reproduce it**:
   
   ```python
   from airflow.contrib.operators.spark_submit_operator import 
SparkSubmitOperator  # Airflow 1.10.9
   
   dag = DAG(...)
   SparkSubmitOperator(
       ...,
       conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
       dag=dag,
   )
   ```
   
   Running such a task will leak the password into Airflow logs.
   
   **Anything else we need to know**:
   
   Again, I may be missing something, e.g. sth OS-specific. I'd be happy to 
learn something here. :)
   
   In case all/part of the other forms I mentioned should also get the masking 
treatment, [I have a change ready for opening a 
PR](https://github.com/Unit03/airflow/commits/mask-not-single-quoted-passwords).
   
   (Note there's no JIRA issue referenced in the commit messages: I cannot 
create issues in [Airflow's 
Jira](https://issues.apache.org/jira/projects/AIRFLOW/summary) for some reason)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to