This is an automated email from the ASF dual-hosted git repository. wusheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/skywalking-website.git
The following commit(s) were added to refs/heads/master by this push: new 136a730af7b Add CN online session s01e02 (#709) 136a730af7b is described below commit 136a730af7b71f0246626aa67f0714b471e0f7af Author: weixiang1862 <652048...@qq.com> AuthorDate: Tue May 14 23:39:05 2024 +0800 Add CN online session s01e02 (#709) Co-authored-by: weixiang <weixiang1...@gmail.com> --- .../index.md | 132 +++++++++++++++++++++ .../log-dashboard.jpg | Bin 0 -> 169847 bytes .../log-metrics-alerting.jpg | Bin 0 -> 41988 bytes .../log-metrics.jpg | Bin 0 -> 136861 bytes 4 files changed, 132 insertions(+) diff --git a/content/zh/2024-05-09-skywalking-in-practice-s01e02/index.md b/content/zh/2024-05-09-skywalking-in-practice-s01e02/index.md new file mode 100644 index 00000000000..b540cd2790f --- /dev/null +++ b/content/zh/2024-05-09-skywalking-in-practice-s01e02/index.md @@ -0,0 +1,132 @@ +--- +title: "SkyWalking从入门到精通 - 2024系列线上分享活动(第二讲)" +date: 2024-05-09 +author: 魏翔 +description: SkyWalking LAL(Log Analysis Language) 语法介绍、日志分析 demo 实操,以及 log-analyzer 模块源码讲解 +--- + +本次直播是 Apache SkyWalking 社区和纵目联合举办分享活动的第二讲,由魏翔为大家介绍 SkyWalking LAL(Log Analysis Language),主要包含以下几部分内容: + +- SkyWalking LAL(Log Analysis Language) 语法介绍 +- 使用 LAL 监控服务日志异常实验 +- OAP log-analyzer 模块源码讲解 + +[B站视频地址](https://www.bilibili.com/video/BV1Ti421C7b6) + +实验中涉及到的知识点比较零散,为了方便大家复现实验结果,现将实验步骤整理如下: + +# 1. 接入服务日志至SkyWalking +首先,我们启动 demo 服务,并通过一个定时任务模拟异常,并输出异常至日志中,下面的方法会每秒钟执行一次,因为除数为零,所以会产生 `java.lang.ArithmeticException: / by zero` 的异常: +```java +@Scheduled(fixedDelay = 1000) +public void mockException() throws Exception { + int i = 1 / 0; +} +``` +```text +2024-04-22 23:03:54 SW_CTX:[gateway,3a96549cb6474607be27e3ce481c2629@198.18.0.1,N/A,N/A,-1] [scheduling-1] ERROR [org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler:95] - Unexpected error occurred in scheduled task +java.lang.ArithmeticException: / by zero + at com.test.ConsumerApplication.mockException(ConsumerApplication.java:47) + at sun.reflect.GeneratedMethodAccessor195.invoke(Unknown Source) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.springframework.scheduling.support.ScheduledMethodRunnable.$sw$original$run$c8tpsq2(ScheduledMethodRunnable.java:84) + at org.springframework.scheduling.support.ScheduledMethodRunnable.$sw$original$run$c8tpsq2$accessor$$sw$p2boiv3(ScheduledMethodRunnable.java) + at org.springframework.scheduling.support.ScheduledMethodRunnable$$sw$auxiliary$k466ps2.call(Unknown Source) + at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86) + at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java) +``` +接着,我们为该服务启动参数添加 skywalking agent 启动参数,并接入日志至 skywalking,由于我们demo使用的是logback,我们在 pom.xml 中添加以下依赖: +``` +<dependency> + <groupId>org.apache.skywalking</groupId> + <artifactId>apm-toolkit-logback-1.x</artifactId> + <version>${version}</version> +</dependency> +``` +同时,在logback.xml中添加 skywalking-grpc appender: +``` +<appender name="grpc-log" class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.log.GRPCLogClientAppender"> + <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder"> + <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.mdc.TraceIdMDCPatternLogbackLayout"> + <Pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n</Pattern> + </layout> + </encoder> +</appender> +``` +启动 SkyWalking OAP 服务,一切顺利的话,你会在 SkyWalking 日志面板中看到 demo 服务上报的日志信息: + +![log-dashboard](./log-dashboard.jpg) + +# 2. 配置 LAL 解析上报的日志并提取指标 +默认情况下,SkyWalking只会保存原始的日志数据,不做任何的处理分析,我们修改 `config/lal/default.xml`: +```yaml +rules: + - name: default + layer: GENERAL + dsl: | + filter { + text { + abortOnFailure false + regexp $/(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) \[.+] \[.+] (?<level>\w+) (?<msg>.*)/$ + } + + extractor { + tag level: parsed.level + timestamp parsed.time as String, "yyyy-MM-dd HH:mm:ss.SSS" + + if (parsed.level == "ERROR") { + metrics { + timestamp log.timestamp as Long + labels service: log.service, service_instance_id: log.serviceInstance + name "log_exception_count" + value 1 + } + } + } + + sink { + } + } +``` +上面的 dsl 中,首先使用 text regex 解析器解析日志内容,分别解析出了日志的时间、日志等级等信息,大家可以根据需要自行调整 regexp 表达式(如果你的日志是json格式,你也可以尝试[json 解析器](https://skywalking.apache.org/docs/main/next/en/concepts-and-designs/lal/#json) )。 + +接着 extractor 会从 regexp 解析结果中,提取出日志额外的 tag 以及 timestamp 信息,并且会检查 level,如果 level 级别为 ERROR,就会生成一个名为`log_exception_count`,值为 1 的指标,在打上 service 及 service_instance_id 标签后,会交给 skywalking meter system 接着处理。 + +# 3. 定义 log-mal 进一步分析 LAL 中提取的指标 +上一步中,我们定义了日志的解析规则,并成功提取到了 `log_exception_count` 指标,接着我们定义指标分析规则,创建 `config/lal-mal/rules/default.yaml`: +```yaml +metricPrefix: instance +metricsRules: + - name: log_exception_count + exp: log_exception_count.sum(['service','service_instance_id']).downsampling(SUM).instance(['service'], ['service_instance_id'], Layer.GENERAL) +``` +上面的 mal 中,我们指定 downsampling 函数为 SUM,这样可以帮助我们计算一分钟内的错误数和,由于是新创建的文件,别忘了在 `config/application.yml` 中注册该配置文件: +``` +log-analyzer: + selector: ${SW_LOG_ANALYZER:default} + default: + lalFiles: ${SW_LOG_LAL_FILES:envoy-als,mesh-dp,mysql-slowsql,pgsql-slowsql,redis-slowsql,k8s-service,nginx,default} + malFiles: ${SW_LOG_MAL_FILES:"nginx,default"} +``` +最后我们打开 skywalking-ui,在 dashboard 中添加指标 `instance_log_exception_count` 并验证指标结果正确性: + +![log-metrics](./log-metrics.jpg) + +# 4. 配置指标告警规则 +有了指标数据后,我们可以在 `config/alarm-settings.yml` 添加对应的告警规则,该规则定义如果一分钟内日志异常数量超过 5 就会发出告警信息: +``` +instance_error_log_rule: + expression: sum(instance_log_exception_count > 5) >= 1 + period: 1 + tags: + level: WARNING +``` +配置好以上规则后,我们稍等 1 分钟,便可以在告警记录面板查看到响应的告警信息: + +![log-metrics-alerting](./log-metrics-alerting.jpg) + +--- +*附:想参与直播的小伙伴,可以关注后续的直播安排和我们的B站直播预约* + +![schedule](../2024-04-26-skywalking-in-practice-s01e01/schedule.png) diff --git a/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-dashboard.jpg b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-dashboard.jpg new file mode 100644 index 00000000000..4523454a5a4 Binary files /dev/null and b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-dashboard.jpg differ diff --git a/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics-alerting.jpg b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics-alerting.jpg new file mode 100644 index 00000000000..57af92c45e7 Binary files /dev/null and b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics-alerting.jpg differ diff --git a/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics.jpg b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics.jpg new file mode 100644 index 00000000000..a3257e884f7 Binary files /dev/null and b/content/zh/2024-05-09-skywalking-in-practice-s01e02/log-metrics.jpg differ