But isn't the span uniquely identified by the task instance and attempt number? --> True, but the thing about OpenTelemetry is that span ID and trace ID are in UUID format (and has to be), so unless we devise a way to uniquely create the task instance and attempt number into UUID format (I guess we could kind of do it, technically?), those identification cannot be directly used, hence the need to persist the span information somewhere for later retrieval and 'ending' it when the dag run or task instance ends.
Yes, the only problem with the logging part with OpenTelemetry right now, is that the logging was the latest addition to it, and thus will be subjected to many changes and additions. This AIP does guarantee, however, that it will include the logging feature, and according to the opentelemetry docs, logging will be designed and implemented in such a way that will try to encompass majority of existing logging structures and schemes since the project understands that logging is a well established practices. Would be more than happy to get this AIP approved and going to address logging part also. We are still waiting for it to get voted! Howard On Sat, May 21, 2022 at 2:22 AM Malthe <mbo...@gmail.com> wrote: > On Wed, 18 May 2022 at 16:44, Howard Yoo <howard...@gmail.com> wrote: > > 2. So, the reason why I ended up implementing span_json was that between > the scheduler who submits the tasks to be processed, and the worker that > needs to pick them up from the queue (which is implemented in meta database > of airflow) - needs to get the current span in some way. It looked like > every time the worker gets dagrun or task instance it does so via > databases, so in my POC, it was necessary to have means to persist the > current 'span' in the database tables. Well, dagrun and task instances do > not have anything related to storing spans, so had to implement some method > to convert the span objects into json and store them. > > But isn't the span uniquely identified by the task instance and attempt > number? > > I would think that you've already got all the information without > persisting any additional data. > > > 3. Yes, I believe the logs will be included into the scope of AIP, even > in the draft stage (at least that is what I hope). However, it may be > implemented following the initial implementation of metrics and traces. > > To me what is the most important and what right now would motivate me > personally in helping out here is to get task execution logs (both > worker-based on asynchronous, task deferreds) out using OpenTelemetry. > That's because right now there is a bit of a broken logging story if > you're using deferred tasks and distributed logging is really the only > fix as far as I can tell. > > Cheers >