I agree! :-) looking forward to the Airflow Summit next week! Howard On Sun, May 22, 2022 at 10:29 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> I think once we get some of the people who commented (Elad/Malthe) > confirm that their comments were addressed and maybe some other voices > of support, it could be ready for a Voting attempt actually :). I'd > wait however with it till after the Summit (like with few other > discussions we are having now). > > J > > On Sun, May 22, 2022 at 4:53 PM Howard Yoo <howard...@gmail.com> wrote: > > > > But isn't the span uniquely identified by the task instance and attempt > number? > > --> True, but the thing about OpenTelemetry is that span ID and trace ID > are in UUID format (and has to be), so unless we devise a way to uniquely > create the task instance and attempt number into UUID format (I guess we > could kind of do it, technically?), those identification cannot be directly > used, hence the need to persist the span information somewhere for later > retrieval and 'ending' it when the dag run or task instance ends. > > > > Yes, the only problem with the logging part with OpenTelemetry right > now, is that the logging was the latest addition to it, and thus will be > subjected to many changes and additions. This AIP does guarantee, however, > that it will include the logging feature, and according to the > opentelemetry docs, logging will be designed and implemented in such a way > that will try to encompass majority of existing logging structures and > schemes since the project understands that logging is a well established > practices. > > > > Would be more than happy to get this AIP approved and going to address > logging part also. We are still waiting for it to get voted! > > > > Howard > > > > On Sat, May 21, 2022 at 2:22 AM Malthe <mbo...@gmail.com> wrote: > >> > >> On Wed, 18 May 2022 at 16:44, Howard Yoo <howard...@gmail.com> wrote: > >> > 2. So, the reason why I ended up implementing span_json was that > between the scheduler who submits the tasks to be processed, and the worker > that needs to pick them up from the queue (which is implemented in meta > database of airflow) - needs to get the current span in some way. It looked > like every time the worker gets dagrun or task instance it does so via > databases, so in my POC, it was necessary to have means to persist the > current 'span' in the database tables. Well, dagrun and task instances do > not have anything related to storing spans, so had to implement some method > to convert the span objects into json and store them. > >> > >> But isn't the span uniquely identified by the task instance and attempt > number? > >> > >> I would think that you've already got all the information without > >> persisting any additional data. > >> > >> > 3. Yes, I believe the logs will be included into the scope of AIP, > even in the draft stage (at least that is what I hope). However, it may be > implemented following the initial implementation of metrics and traces. > >> > >> To me what is the most important and what right now would motivate me > >> personally in helping out here is to get task execution logs (both > >> worker-based on asynchronous, task deferreds) out using OpenTelemetry. > >> That's because right now there is a bit of a broken logging story if > >> you're using deferred tasks and distributed logging is really the only > >> fix as far as I can tell. > >> > >> Cheers >