Hi

I am trying to specify this env variable in

dag = DAG(
    dag_id='SparkJDBC',
    schedule_interval=dt.timedelta(hours=4),
    start_date=airflow.utils.dates.days_ago(2),)


*os.environ['HADOOP_CONF_DIR'] = "/etc/hadoop/conf"
os.environ['YARN_CONF_DIR'] = "/etc/hadoop/conf"*


task1 = SparkJDBCOperator(
    task_id='SparkJDBC',
    dag=dag,
    spark_app_name="TEST",
    cmd_type="jdbc_to_spark",
    conn_id="spark_default",
    spark_conn_id="spark_default",
    jdbc_conn_id="oracle_jdbc_test",
    jdbc_table="TEST",*
env_vars='{"HADOOP_CONF_DIR":"/etc/hadoop/conf","YARN_CONF_DIR":"/etc/hadoop/conf"}',*
    metastore_table="TEST",
    verbose=True)


but I have this error

019-02-06 14:42:01,043] {{base_task_runner.py:101}} INFO - Job 13534:
Subtask SparkJDBC [2019-02-06 14:42:01,041] {{spark_submit_hook.py:283}}
INFO - Spark-Submit cmd: ['spark-submit', '--master', 'yarn', '--name',
'TEST', '--verbose', '--queue', 'root.default',
'/opt/conda/miniconda/envs/airflow-dask/lib/python3.6/site-packages/airflow/contrib/hooks/spark_jdbc_script.py',
'-cmdType', 'jdbc_to_spark', '-url', 'jdbc:oracle:thin:@//*****/******/',
'-user', '*****', '-password', '*******', '-metastoreTable', 'TEST',
'-jdbcTable', 'TEST']
[2019-02-06 14:42:01,951] {{base_task_runner.py:101}} INFO - Job 13534:
Subtask SparkJDBC [2019-02-06 14:42:01,950] {{spark_submit_hook.py:415}}
INFO - Using properties file: null
[2019-02-06 14:42:01,969] {{base_task_runner.py:101}} INFO - Job 13534:
Subtask SparkJDBC [2019-02-06 14:42:01,969] {{spark_submit_hook.py:415}}
INFO - Exception in thread "main" org.apache.spark.SparkException: When
running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be
set in the environment.
.................................
.................................


Could you help me ?

Thanks!!!

Regards,
Iván Robla

Reply via email to