Re: logging baby step -- worth pursuing?
Tyson Norris wrote on 08/15/2018 08:29:48 PM: > > FWIW This won’t help with concurrent activations since the logs from > concurrent activations will be interleaved (I think Dave was not > suggesting to use this for concurrent activations). It will only > help in the case where log processing is done outside of the > invoker, and logs are not interleaved from multiple activations. Agreed. This is the case I'm attempting to optimize for on Kubernetes. Kube clusters typically already have an external logging service deployed and the current KubernetesContainerFactory log processing by the Invoker is a major performance bottleneck. > I’m not sure having a start sentinel is simpler than just including > the activation id in the existing sentinel line (end of log segment, > not the beginning), but it would be probably simpler to read for a human. I think it makes offline processing to add the activationId to every logline more efficient because you will already have the activationId in hand when you read each "real" log line. > If people use blackbox actions, and if blackbox containers have > different log collection than managed actions, I think that would be > a reason to not do anything until there is better support for > structured logging, since if you are still using invoker to collect > blackbox logs, you might as well use it to collect all logs? It may > be that majority log collection is not blackbox so you could get > some efficiencies there, but the added mess of multiple log > collection approaches may bring different problems (my logs behave > different for different types of actions, etc). On large deployments, we segregate blackbox/non-blackbox actions to different invokers. So if all the non-blackbox runtimes are updated to have start sentinels, those invokers can be cut out of the log processing entirely. It's a slightly more complex deployment configuration, but I think it will be worth it (at least on Kubernetes). > > One option might be to allow the /init endpoint to return some > details about the container image, so that it can hint how it > expects logs to be handled (if at all) at the invoker - currently / > init response is only interpreted in case of a non-200 response. > This same approach may be useful for other optional facilities like > support of concurrency or gpu, where the container can signal it’s > support and fail early if there is a mismatch with the action being > executed. This would not resolve the different behavior problem, but > would provide a smooth transition for older blackbox images. > An interesting idea. --dave
Re: logging baby step -- worth pursuing?
This was a pretty simple change, so to make things concrete I have PRs with a prototype of the enabling change in the invoker [1] and a change to the nodejs runtime to emit the start sentinels [2]. If we go ahead with this design, here's an example from an action that writes one line to stdout and no lines to stderr: stdout stream: XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id cafca5b74be94eb8bca5b74be9beb80f for namespace guest Here's a friendly message XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX stderr stream: XXX_THE_START_OF_A_WHISK_ACTIVATION_XXX with id cafca5b74be94eb8bca5b74be9beb80f for namespace guest XXX_THE_END_OF_A_WHISK_ACTIVATION_XXX It should be really straightforward to write a streaming agent that reads the json formatted logstreams and uses the START sentinels to keep track of the activationId that should be injected into each logline and the namespace to associate that logline with to the platform logging service. Arguably the namespace information is redundant since all activations that run in the container belong to the same namespace, but it seemed like including it could make the processing marginally simpler. Pausing for feedback before doing any more of the runtimes... --dave [1] https://github.com/apache/incubator-openwhisk/pull/3974 [2] https://github.com/apache/incubator-openwhisk-runtime-nodejs/pull/81
Re: logging baby step -- worth pursuing?
Hi - FWIW This won’t help with concurrent activations since the logs from concurrent activations will be interleaved (I think Dave was not suggesting to use this for concurrent activations). It will only help in the case where log processing is done outside of the invoker, and logs are not interleaved from multiple activations. I’m not sure having a start sentinel is simpler than just including the activation id in the existing sentinel line (end of log segment, not the beginning), but it would be probably simpler to read for a human. If people use blackbox actions, and if blackbox containers have different log collection than managed actions, I think that would be a reason to not do anything until there is better support for structured logging, since if you are still using invoker to collect blackbox logs, you might as well use it to collect all logs? It may be that majority log collection is not blackbox so you could get some efficiencies there, but the added mess of multiple log collection approaches may bring different problems (my logs behave different for different types of actions, etc). One option might be to allow the /init endpoint to return some details about the container image, so that it can hint how it expects logs to be handled (if at all) at the invoker - currently /init response is only interpreted in case of a non-200 response. This same approach may be useful for other optional facilities like support of concurrency or gpu, where the container can signal it’s support and fail early if there is a mismatch with the action being executed. This would not resolve the different behavior problem, but would provide a smooth transition for older blackbox images. Thanks Tyson > On Aug 14, 2018, at 2:49 PM, Dragos Dascalita Haut > wrote: > > "...we should be able to fully > process the logs offline and in a streaming manner and get the needed > activation id injected into every logline..." > > > +1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going > down this path as well. Having this natively supported by all OpenWhisk > runtimes can only make things easier. > > > From: David P Grove > Sent: Tuesday, August 14, 2018 2:29:12 PM > To: dev@openwhisk.apache.org > Subject: logging baby step -- worth pursuing? > > > > Even if we think structured logging is the right eventual goal, it could > take a while to get there (especially since it is changing functionality > users may have grown accustomed to). > > However, for non-concurrent, non-blackbox runtimes we could make a small, > not-user visible change, that could enable fully offline and streaming log > processing. We already generate an end-of-log sentinel to stdout/stderr > for these runtimes. If we also generated a start-of-log sentinel to > stdout/stderr that included the activation id, we should be able to fully > process the logs offline and in a streaming manner and get the needed > activation id injected into every logline. > > Is this worth pursuing? I'm motivated to get log processing out of the > Invoker/ContainerRouter so we can push ahead with some of the scheduler > redesignwithout tackling logging, I don't think we'll be able to assess > the true scalability potential of the new scheduling architectures. > > --dave
Re: logging baby step -- worth pursuing?
"...we should be able to fully process the logs offline and in a streaming manner and get the needed activation id injected into every logline..." +1 IIRC for concurrent activations Tyson Norris and Dan McWeeney were going down this path as well. Having this natively supported by all OpenWhisk runtimes can only make things easier. From: David P Grove Sent: Tuesday, August 14, 2018 2:29:12 PM To: dev@openwhisk.apache.org Subject: logging baby step -- worth pursuing? Even if we think structured logging is the right eventual goal, it could take a while to get there (especially since it is changing functionality users may have grown accustomed to). However, for non-concurrent, non-blackbox runtimes we could make a small, not-user visible change, that could enable fully offline and streaming log processing. We already generate an end-of-log sentinel to stdout/stderr for these runtimes. If we also generated a start-of-log sentinel to stdout/stderr that included the activation id, we should be able to fully process the logs offline and in a streaming manner and get the needed activation id injected into every logline. Is this worth pursuing? I'm motivated to get log processing out of the Invoker/ContainerRouter so we can push ahead with some of the scheduler redesignwithout tackling logging, I don't think we'll be able to assess the true scalability potential of the new scheduling architectures. --dave
logging baby step -- worth pursuing?
Even if we think structured logging is the right eventual goal, it could take a while to get there (especially since it is changing functionality users may have grown accustomed to). However, for non-concurrent, non-blackbox runtimes we could make a small, not-user visible change, that could enable fully offline and streaming log processing. We already generate an end-of-log sentinel to stdout/stderr for these runtimes. If we also generated a start-of-log sentinel to stdout/stderr that included the activation id, we should be able to fully process the logs offline and in a streaming manner and get the needed activation id injected into every logline. Is this worth pursuing? I'm motivated to get log processing out of the Invoker/ContainerRouter so we can push ahead with some of the scheduler redesignwithout tackling logging, I don't think we'll be able to assess the true scalability potential of the new scheduling architectures. --dave