albertusk95 commented on issue #7075: [AIRFLOW-6212] SparkSubmitHook resolve 
connection
URL: https://github.com/apache/airflow/pull/7075#issuecomment-571470394
 
 
   @tooptoop4 I think you might want to try this sample DAG to reproduce the 
issue.
   
   a) create an environment var for spark connection.
   ```
   export AIRFLOW_CONN_SPARK_DEFAULT='{"conn_type": spark, "host":<host>, 
"port":<port>}'
   ```
   
   b) create a DAG file to run
   
   ```python
   from airflow import DAG
   from airflow.contrib.operators.spark_submit_operator import 
SparkSubmitOperator
   from datetime import datetime, timedelta
   import os
   
   job_file = 'path/to/job/file'
   
   default_args = {
       'depends_on_past': False,
       'start_date': <fill_start_date>,
       'retries': <fill_retries>,
       'retry_delay': <fill_retry_delay>
   }
   dag = DAG('spark-submit-hook', default_args=default_args, 
schedule_interval=<fill_interval>)
   
   avg = SparkSubmitOperator(task_id=<fill_task_id>, dag=dag, 
        application=job_file,
        spark_binary='path/to/spark-submit')
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to