Thanks Bolke and Feng! I seem to have a working connection with GCS but it seems there some error occuring in the gcs_task_handler in airflow:
Traceback (most recent call last): File "/usr/local/bin/airflow", line 27, in <module> args.func(args) File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 423, in run logging.shutdown() File "/usr/lib/python3.5/logging/__init__.py", line 1882, in shutdown h.close() File "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gcs_task_handler.py", line 87, in close self.gcs_write(log, remote_loc) File "/usr/local/lib/python3.5/dist-packages/airflow/utils/log/gcs_task_handler.py", line 144, in gcs_write log = '\n'.join([old_log, log]) if old_log else log UnboundLocalError: local variable 'old_log' referenced before assignment I believe the connection is working because the tasks are getting a 404 instead of 403 when trying to read from remote logs, but they aren't being written because of the above error. Eg. *** Unable to read remote log from gs://<mybucket>/<...>/2017-12-20T15:21:23.704614+00:00/1.log *** <HttpError 404 when requesting https://www.googleapis.com/storage/v1/b/<mybucket>/o/<...>F2017-12-20T15%3A21%3A23.704614%2B00%3A00%2F1.log?alt=media returned "Not Found"> On Wed, Dec 20, 2017 at 1:48 AM, Bolke de Bruin <bdbr...@gmail.com> wrote: > Both will/should work, master is just cleaner and more manageable. > > B. > > Verstuurd vanaf mijn iPad > > > Op 19 dec. 2017 om 23:44 heeft Kevin Lam <ke...@fathomhealth.co> het > volgende geschreven: > > > > Looks like it might be related to > > https://github.com/apache/incubator-airflow/commit/ > 02ff8ae35dd16e6f23d29d7b24a5fb9c09d0b7a4? > > Why isn't this fix on the v1-9 branches? Should I be using master > instead? > > > >> On Tue, Dec 19, 2017 at 5:37 PM, Kevin Lam <ke...@fathomhealth.co> > wrote: > >> > >> Hi Feng, > >> > >> Thanks for your help! Got it, will try to push on the python based > logging > >> config. > >> > >> I'm trying to set-up the GCS logging on airflow v1-9-stable and my > >> logging_config.py seems to be causing a python import error, caused by > >> 'from airflow import configuration' > >> > >> "Initialize database... > >> Unable to load the config, contains a configuration error. > >> Traceback (most recent call last): > >> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve > >> self.importer(used) > >> ImportError: No module named 'airflow.utils.log.logging_ > mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package > >> > >> The above exception was the direct cause of the following exception: > >> > >> Traceback (most recent call last): > >> File "/usr/lib/python3.5/logging/config.py", line 558, in configure > >> handler = self.configure_handler(handlers[name]) > >> File "/usr/lib/python3.5/logging/config.py", line 708, in > >> configure_handler > >> klass = self.resolve(cname) > >> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve > >> raise v > >> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve > >> self.importer(used) > >> ValueError: Cannot resolve 'airflow.utils.log.logging_ > mixin.RedirectStdHandler': > >> No module named 'airflow.utils.log.logging_mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package > >> > >> During handling of the above exception, another exception occurred: > >> > >> Traceback (most recent call last): > >> File "/usr/local/bin/airflow", line 16, in <module> > >> from airflow import configuration > >> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py", > line > >> 31, in <module> > >> from airflow import settings > >> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py", > line > >> 148, in <module> > >> configure_logging() > >> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ > config.py", > >> line 75, in configure_logging > >> raise e > >> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ > config.py", > >> line 70, in configure_logging > >> dictConfig(logging_config) > >> File "/usr/lib/python3.5/logging/config.py", line 795, in dictConfig > >> dictConfigClass(config).configure() > >> File "/usr/lib/python3.5/logging/config.py", line 566, in configure > >> '%r: %s' % (name, e)) > >> ValueError: Unable to configure handler 'console': Cannot resolve > >> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module named > >> 'airflow.utils.log.logging_mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package > >> HTTP/1.1 200 OK > >> Unable to load the config, contains a configuration error. > >> Traceback (most recent call last): > >> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve > >> self.importer(used) > >> ImportError: No module named 'airflow.utils.log.logging_ > mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package > >> > >> The above exception was the direct cause of the following exception: > >> > >> Traceback (most recent call last): > >> File "/usr/lib/python3.5/logging/config.py", line 558, in configure > >> handler = self.configure_handler(handlers[name]) > >> File "/usr/lib/python3.5/logging/config.py", line 708, in > >> configure_handler > >> klass = self.resolve(cname) > >> File "/usr/lib/python3.5/logging/config.py", line 391, in resolve > >> raise v > >> File "/usr/lib/python3.5/logging/config.py", line 384, in resolve > >> self.importer(used) > >> ValueError: Cannot resolve 'airflow.utils.log.logging_ > mixin.RedirectStdHandler': > >> No module named 'airflow.utils.log.logging_mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package > >> > >> During handling of the above exception, another exception occurred: > >> > >> Traceback (most recent call last): > >> File "/usr/local/bin/airflow", line 16, in <module> > >> from airflow import configuration > >> File "/usr/local/lib/python3.5/dist-packages/airflow/__init__.py", > line > >> 31, in <module> > >> from airflow import settings > >> File "/usr/local/lib/python3.5/dist-packages/airflow/settings.py", > line > >> 148, in <module> > >> configure_logging() > >> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ > config.py", > >> line 75, in configure_logging > >> raise e > >> File "/usr/local/lib/python3.5/dist-packages/airflow/logging_ > config.py", > >> line 70, in configure_logging > >> dictConfig(logging_config) > >> File "/usr/lib/python3.5/logging/config.py", line 795, in dictConfig > >> dictConfigClass(config).configure() > >> File "/usr/lib/python3.5/logging/config.py", line 566, in configure > >> '%r: %s' % (name, e)) > >> ValueError: Unable to configure handler 'console': Cannot resolve > >> 'airflow.utils.log.logging_mixin.RedirectStdHandler': No module named > >> 'airflow.utils.log.logging_mixin.RedirectStdHandler'; > >> 'airflow.utils.log.logging_mixin' is not a package" > >> > >> Have you encountered this before? > >> > >> On Mon, Dec 18, 2017 at 8:53 PM, Feng Lu <fen...@google.com.invalid> > >> wrote: > >> > >>> Hi Kevin, > >>> > >>> Kindly see my reply inline: > >>> > >>>> On Mon, Dec 18, 2017 at 3:28 PM, Kevin Lam <ke...@fathomhealth.co> > wrote: > >>>> > >>>> Hi, > >>>> > >>>> I'm trying to get airflow to use GCS for logging purposes and had a > few > >>>> questions. > >>>> > >>>> We're currently using Airflow 1.9rc2, running in a Kubernetes Airflow > >>>> deployment (similar to https://github.com/mumoshu/kube-airflow) > >>>> > >>>> 1/ Seems like the logging code has been going through some changes in > >>> the > >>>> recent versions. What's the correct way to set up GCS for logging? Is > >>> it by > >>>> just specifying remote_base_log_folder and remote_log_conn_id in > >>>> airflow.cfg? Or by following this guide: > >>>> http://airflow.readthedocs.io/en/latest/integration.html#gcp, using > the > >>>> python based logging config? Is there an Airflow version that we > should > >>> use > >>>> to be most stable? > >>>> > >>> The python based logging config is the right place to make changes, in > our > >>> test setup, we override the airflow_local_settings.py similarly to the > >>> link > >>> you pasted. > >>> You may also want to config: [core]task_log_reader = gcs.task > >>> > >>> > >>>> > >>>> 2/ Is there a way to encode the connection for GCS in a file so that > one > >>>> doesn't have to open the webserver and create it from the admin panel? > >>> It'd > >>>> be nice if the GCS connection would be automatically created. > >>>> > >>> Unfortunately GCS connection ties to some GCP project and is > impossible to > >>> pre-populate. > >>> Airflow1.9 should fix the gcp connection type issue ( > >>> https://github.com/apache/incubator-airflow/commit/2f107d8a3 > >>> 0910fd025774004d5c4c95407ed55c5), > >>> so you can use airflow connections CLI directly. > >>> > >>> > >>>> > >>>> Thanks in advance for your help! > >>>> > >>> > >> > >> >