Ah, from that issue I see it's something to do with use unicode literals, though in local testing I can't get the error you see.
Could you add the following print statements to your dag, inside the python callable: print(__import__("tensorflow_hub.compressed_module_resolver", fromlist=['_COMPRESSED_FORMAT_QUERY'])._COMPRESSED_FORMAT_QUERY[0].__class__) print("https://tfhub.dev/google/nnlm-en-dim50/1 <https://tfhub.dev/google/nnlm-en-dim50/1>".__class__) > On 19 Jun 2018, at 11:09, Andrew Maguire <andrewm4...@gmail.com> wrote: > > There is a bit more info in the issue: > > https://github.com/tensorflow/hub/issues/76 > > Seems like maybe Airflow is passing as bytes when it's being expected as a > string..? > > On Tue, Jun 19, 2018 at 11:02 AM Ash Berlin-Taylor < > ash_airflowl...@firemirror.com> wrote: > >> Welp, nothing useful in there :( >> >> The error is coming form this line: >> >>> embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim50/1") >> >> I'm next to un-familar with TensorFlow - is it possible this model was >> generated on Py3 but is being run on python2? Is that even a question that >> makes sense? >> >> You could always try >> >> `embed = hub.Module(b"https://tfhub.dev/google/nnlm-en-dim50/1")` >> >> I'm not expecting that to make any difference but doesn't hurt to try. >> >> -ash >> >>> On 19 Jun 2018, at 10:47, Andrew Maguire <andrewm4...@gmail.com> wrote: >>> >>> sure thing. >>> >>> attached is a minimal example. >>> >>> error i get is: >>> >>> [2018-06-19 09:44:23,133] {cli.py:374} INFO - Running on host >> airflow-worker-f796f6bd-7qzwc >>> [2018-06-19 09:44:23,577] {models.py:1196} INFO - Dependencies all met >> for <TaskInstance: dev_dag.encode_posts 2018-06-19 09:15:00 [queued]> >>> [2018-06-19 09:44:23,813] {models.py:1196} INFO - Dependencies all met >> for <TaskInstance: dev_dag.encode_posts 2018-06-19 09:15:00 [queued]> >>> [2018-06-19 09:44:23,824] {models.py:1406} INFO - >>> >> -------------------------------------------------------------------------------- >>> Starting attempt 1 of 6 >>> >> -------------------------------------------------------------------------------- >>> >>> [2018-06-19 09:44:24,188] {models.py:1427} INFO - Executing >> <Task(PythonOperator): encode_posts> on 2018-06-19 09:15:00 >>> [2018-06-19 09:44:24,222] {base_task_runner.py:115} INFO - Running: >> ['bash', '-c', u'airflow run dev_dag encode_posts 2018-06-19T09:15:00 >> --job_id 5753 --raw -sd DAGS_FOLDER/dev_dag.py'] >>> [2018-06-19 09:45:11,207] {base_task_runner.py:98} INFO - Subtask: >> [2018-06-19 09:45:11,164] {__init__.py:45} INFO - Using executor >> CeleryExecutor >>> [2018-06-19 09:45:13,200] {base_task_runner.py:98} INFO - Subtask: >> [2018-06-19 09:45:13,168] {models.py:189} INFO - Filling up the DagBag from >> /home/airflow/gcs/dags/dev_dag.py >>> [2018-06-19 09:45:13,222] {base_task_runner.py:98} INFO - Subtask: >> /usr/local/lib/python2.7/site-packages/airflow/utils/helpers.py:351: >> DeprecationWarning: Importing DummyOperator directly from <module >> 'airflow.operators' from >> '/usr/local/lib/python2.7/site-packages/airflow/operators/__init__.pyc'> >> has been deprecated. Please import from '<module 'airflow.operators' from >> '/usr/local/lib/python2.7/site-packages/airflow/operators/__init__.pyc'>.[operator_module]' >> instead. Support for direct imports will be dropped entirely in Airflow 2.0. >>> [2018-06-19 09:45:13,224] {base_task_runner.py:98} INFO - Subtask: >> DeprecationWarning) >>> [2018-06-19 09:45:27,573] {base_task_runner.py:98} INFO - Subtask: >> [2018-06-19 09:45:27,571] {dev_dag.py:52} INFO - ... begin - get module >> from tf-hub ... >>> [2018-06-19 09:45:28,228] {base_task_runner.py:98} INFO - Subtask: >> Traceback (most recent call last): >>> [2018-06-19 09:45:28,230] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/bin/airflow", line 27, in <module> >>> [2018-06-19 09:45:28,232] {base_task_runner.py:98} INFO - Subtask: >> args.func(args) >>> [2018-06-19 09:45:28,232] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line >> 392, in run >>> [2018-06-19 09:45:28,233] {base_task_runner.py:98} INFO - Subtask: >> pool=args.pool, >>> [2018-06-19 09:45:28,233] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line >> 50, in wrapper >>> [2018-06-19 09:45:28,234] {base_task_runner.py:98} INFO - Subtask: >> result = func(*args, **kwargs) >>> [2018-06-19 09:45:28,234] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line >> 1492, in _run_raw_task >>> [2018-06-19 09:45:28,250] {base_task_runner.py:98} INFO - Subtask: >> result = task_copy.execute(context=context) >>> [2018-06-19 09:45:28,250] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", >> line 89, in execute >>> [2018-06-19 09:45:28,251] {base_task_runner.py:98} INFO - Subtask: >> return_value = self.execute_callable() >>> [2018-06-19 09:45:28,251] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/airflow/operators/python_operator.py", >> line 94, in execute_callable >>> [2018-06-19 09:45:28,252] {base_task_runner.py:98} INFO - Subtask: >> return self.python_callable(*self.op_args, **self.op_kwargs) >>> [2018-06-19 09:45:28,252] {base_task_runner.py:98} INFO - Subtask: >> File "/home/airflow/gcs/dags/dev_dag.py", line 53, in fn_encode_posts >>> [2018-06-19 09:45:28,253] {base_task_runner.py:98} INFO - Subtask: >> embed = hub.Module("https://tfhub.dev/google/nnlm-en-dim50/1 < >> https://tfhub.dev/google/nnlm-en-dim50/1>") >>> [2018-06-19 09:45:28,253] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/module.py", >> line 105, in __init__ >>> [2018-06-19 09:45:28,276] {base_task_runner.py:98} INFO - Subtask: >> self._spec = as_module_spec(spec) >>> [2018-06-19 09:45:28,277] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/module.py", >> line 31, in as_module_spec >>> [2018-06-19 09:45:28,278] {base_task_runner.py:98} INFO - Subtask: >> return native_module.load_module_spec(spec) >>> [2018-06-19 09:45:28,278] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/tensorflow_hub/native_module.py", >> line 99, in load_module_spec >>> [2018-06-19 09:45:28,280] {base_task_runner.py:98} INFO - Subtask: >> path = compressed_module_resolver.get_default().get_module_path(path) >>> [2018-06-19 09:45:28,280] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/resolver.py", >> line 385, in get_module_path >>> [2018-06-19 09:45:28,295] {base_task_runner.py:98} INFO - Subtask: >> return self._get_module_path(handle) >>> [2018-06-19 09:45:28,296] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/resolver.py", >> line 467, in _get_module_path >>> [2018-06-19 09:45:28,297] {base_task_runner.py:98} INFO - Subtask: >> return resolver.get_module_path(handle) >>> [2018-06-19 09:45:28,297] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/resolver.py", >> line 385, in get_module_path >>> [2018-06-19 09:45:28,298] {base_task_runner.py:98} INFO - Subtask: >> return self._get_module_path(handle) >>> [2018-06-19 09:45:28,299] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/tensorflow_hub/compressed_module_resolver.py", >> line 105, in _get_module_path >>> [2018-06-19 09:45:28,342] {base_task_runner.py:98} INFO - Subtask: >> self._lock_file_timeout_sec()) >>> [2018-06-19 09:45:28,343] {base_task_runner.py:98} INFO - Subtask: >> File "/usr/local/lib/python2.7/site-packages/tensorflow_hub/resolver.py", >> line 313, in atomic_download >>> [2018-06-19 09:45:28,343] {base_task_runner.py:98} INFO - Subtask: >> download_fn(handle, tmp_dir) >>> [2018-06-19 09:45:28,344] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/tensorflow_hub/compressed_module_resolver.py", >> line 86, in download >>> [2018-06-19 09:45:28,345] {base_task_runner.py:98} INFO - Subtask: >> request = url.Request(_append_compressed_format_query(handle)) >>> [2018-06-19 09:45:28,345] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/tensorflow_hub/compressed_module_resolver.py", >> line 62, in _append_compressed_format_query >>> [2018-06-19 09:45:28,346] {base_task_runner.py:98} INFO - Subtask: >> return urlparse.urlunparse(parsed) >>> [2018-06-19 09:45:28,346] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", >> line 387, in urlunparse >>> [2018-06-19 09:45:28,368] {base_task_runner.py:98} INFO - Subtask: >> _coerce_args(*components)) >>> [2018-06-19 09:45:28,370] {base_task_runner.py:98} INFO - Subtask: >> File >> "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", >> line 115, in _coerce_args >>> [2018-06-19 09:45:28,371] {base_task_runner.py:98} INFO - Subtask: >> raise TypeError("Cannot mix str and non-str arguments") >>> [2018-06-19 09:45:28,373] {base_task_runner.py:98} INFO - Subtask: >> TypeError: Cannot mix str and non-str arguments >>> >>> I'm running this on Google cloud composer which is airflow 1.9 i >> believe. >>> >>> Cheers, >>> Andy >>> >>> On Tue, Jun 19, 2018 at 10:06 AM Ash Berlin-Taylor < >> ash_airflowl...@firemirror.com <mailto:ash_airflowl...@firemirror.com>> >> wrote: >>> There's nothing directly in Airflow itself that would cause this kind of >> issue that I can think of. >>> >>> It depends on what the PythonOperator you are using in the DAG does >> really. Can you share that code? >>> >>> -ash >>> >>>> On 19 Jun 2018, at 10:01, Andrew Maguire <andrewm4...@gmail.com >> <mailto:andrewm4...@gmail.com>> wrote: >>>> >>>> Hi All, >>>> >>>> Just wondering if anyone might have a deeper insight into what if >> anything >>>> airflow related might be causing this issue >>>> <https://github.com/tensorflow/hub/issues/76 < >> https://github.com/tensorflow/hub/issues/76>>. >>>> >>>> When i try load a tensorflow hub module within an airflow operator i >> get >>>> the error in that issue. >>>> >>>> Works fine if i just run the python script myself. >>>> >>>> Best i could figure out was something airflow was doing didn't agree >> with >>>> something tensorflow hub was expecting. And i'm not really sure if >> there is >>>> anything i could do to resolve. >>>> >>>> Cheers, >>>> Andy >>> >>> <dev_dag.py> >> >>