[ 
https://issues.apache.org/jira/browse/KYLIN-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786281#comment-17786281
 ] 

Hongrong Cao edited comment on KYLIN-5706 at 11/15/23 10:02 AM:
----------------------------------------------------------------

background

In the current code, there are many scenarios where a cmd needs to be spliced, 
and then executed by ProcessBuilder. The parameters of the spliced cmd may come 
from the interface, and the lack of checking the legitimacy of the parameter 
makes it possible to be maliciously attacked.

When splicing spark commands, the checkCommandInjection method is used to avoid 
injection attacks, but it only avoids injection attacks caused by backquotes 
and $(), such as `rm -rf /` $(rm -rf /), but not other scenarios, such as cat 
nohup.out2 && echo success || echo failed echo failed

 

dev design

Option 1

The org.apache.commons.text.StringEscapeUtils class provides the escapeXSI 
method for escaping special characters on command line parameters

Unify in org.apache.kylin.common.util.CliCommandExecutor#execute method, before 
executing the command, cut the command by space, and then just escape each part 
in turn.

Option 1 is not feasible, in our code logic will also splice some complex 
commands, in the execution of the command is difficult to distinguish whether 
the command is the client injected.

Option 2 (Adoption)

Check the parameters when splicing cmd commands, including the following four 
scenarios:

When typing the diagnostic package, the parameters of the diag.sh script will 
be spliced, such as project, jobId, path, etc., and each parameter will be 
checked in turn, and conforming to ^[a-zA-Z0-9_. /-]+$ is enough
When exporting influxDB data, it will splice the database address and database 
name as the parameter of influx command, the former meets 
[a-zA-Z0-9._-](:[0-9])? and ^[0-9a-zA-Z_-]+$ for the latter.
When fetching yarn's stats, the url of yarn is spliced as an argument to the 
curl command, conforming to ^(http(s)? ://)? [a-zA-Z0-9._-](:[0-9])? 
(/[a-zA-Z0-9._-]+)*/? $ That's it!
When executing the beeline command, the beeline-params in the configuration 
will be spliced into the command. The composition of the beeline-params is more 
complicated, forcing each parameter value to be converted to a string by 
wrapping it with ', such as abc → 'abc', ab'c → 'ab'\''c'

Option 3

The current CliCommandExecutor's execute method accepts a command string and 
then executes it directly, which is not reasonable because it lacks 
verification of the command arguments.

You can do a refactoring, CliCommandExecutor's execute accepts a CMD object, 
the CMD object encapsulates a number of commands to be executed, the parameters 
of each command, the validation method of the parameters, the parameter 
escaping method, etc. The CMD object can be used as an interface class, the 
specific validation method, the escaping method is implemented by the caller of 
the code.


was (Author: JIRAUSER303052):
*background*

在当前代码中,有众多场景需要拼接出一条 cmd, 然后通过 {{ProcessBuilder}} 
来执行,拼接cmd的参数可能会来自接口,并且缺少参数合法性的检验,有被恶意攻击的可能。

当拼接 spark 命令时,使用了 {{checkCommandInjection}} 方法来避免注入攻击,但是该方法仅规避了 反引号 和 $() 
导致的注入攻击,如 {{`rm -rf /`}} {{{}$(rm -rf /){}}},无法规避其他场景,如 {{cat nohup.out2 && 
echo success || echo failed}}

 

*dev design*

*方案一*

org.apache.commons.text.StringEscapeUtils 类中提供了escapeXSI 方法,用于对命令行参数做特殊字符的转义

统一在org.apache.kylin.common.util.CliCommandExecutor#execute 
方法中,执行命令之前,对命令按空格切割,然后对每一部分依次进行转义即可。

方案一不可行,在我们的代码逻辑里也会拼接一些复杂的命令,在执行时难以区分命令是否是客户注入的。

*方案二(采用)*

在拼接cmd命令时对参数进行检查,包括以下四种场景:
 # 打诊断包时,会拼接 diag.sh 脚本的参数,如项目、jobId、路径等,依次检查每一个参数,符合 {{^[a-zA-Z0-9_./-]+$}} 即可

 # 导出influxDB 数据时,会在命令里拼接 *数据库地址* 以及 {*}数据库名称{*}作为 influx命令的参数,前者符合 
{{[a-zA-Z0-9._-]+(:[0-9]+)?}} 即可,后者符合{{{}^[0-9a-zA-Z_-]+${}}} 即可

 # 获取yarn的统计指标时,会拼接yarn 的url地址作为 curl 命令的参数,符合 
{{^(http(s)?://)?[a-zA-Z0-9._-]+(:[0-9]+)?(/[a-zA-Z0-9._-]+)*/?$}} 即可

 # 执行 beeline 命令时,会将配置中的 beeline-params 
拼接到命令中,beeline-params的构成较为复杂,强制将每一个参数值使用{{{}'{}}}包起来转为字符串,如 abc → ‘abc',ab’c → 
‘ab’\''c'

*方案三*

当前 {{CliCommandExecutor}} 的 {{{}execute{}}}方法接受命令字符串然后直接执行,缺少对命令参数的检验,不太合理。

可以做一次重构,{{{}CliCommandExecutor{}}}的{{{}execute{}}}接受一个CMD对象,CMD对象内封装要执行的若干条命令、每条命令的参数、参数的验证方法、参数的转义方法等,CMD对象可以作为一个接口类,具体的验证方法、转义方法由代码的调用方来实现。

 

代码中执行shell命令的场景汇总:
 
||*序号*||*功能描述*||*代码位置*||*风险*||*是否需要修复*||
|1|执行推荐任务|{{ForkBasedJobRunner#doExecute}}|无|否|
|2|打诊断包|{{SystemService#dumpLocalDiagPackage}}|用户可通过构造jobId或queryId 来实现命令行注入|是|
|3|执行异步查询|{{AsyncQueryJob#runSparkSubmit}}|无|否|
|4|提交spark任务|{{DefaultSparkBuildJobHandler#runSparkSubmit}}|无|否|
|5|获取hostname|{{LicenseInfoService#gatherEnv}}|无|否|
|6|执行脚步任务|{{ShellExecutable#doWork}}|无 (疑似废弃代码)|否|
|7|杀远程进程|{{NExecutableManager#killRemoteProcess}}|无|否|
|8|导入SSB数据源|{{TableService#importSSBDataBase}}|无|否|
|9|清理临时表|{{SparkCleanupTransactionalTableStep#doExecuteCliCommand}}|代码会读取参数 
{{kylin.source.hive.beeline-params}} 并拼接到命令中,若该参数含有恶意代码,则会导致恶意代码执行|是|
|10|生成临时表|{{HiveTransactionTableHelper#generateTxTable}}|与序号9一致|是|
|11|获取流式任务进程ID|{{JobKiller#grepProcess}}|无|否|
|12|杀流式任务进程|{{JobKiller#doKillProcess}}|无|否|
|13|元数据导出工具复制文件|{{AbstractInfoExtractorTool#addFile}}|无|否|
|14|元数据导出工具获取系统信息(linux版本、内存、磁盘、hadoop版本 
等)|{{AbstractInfoExtractorTool#addShellOutput}}|无|否|
|15|元数据导出工具获取环境变量及KE目录结构|{{ClientEnvTool#extractInfoByCmd}}|无|否|
|16|导出 influxDB数据|{{InfluxDBTool#dumpInfluxDB}}|influxDB的host 和 database 
参数均是从配置中读取的,若该配置含有恶意代码,则会导致恶意代码执行|是|
|17|获取yarn日志|{{YarnApplicationTool#extractYarnLogs}}|无|否|
|18|KG高可用|{{KapGuardianHATask#run}}|无|否|
|19|获取GC时间|{{FullGCDurationChecker#getGCTime}}|无|否|
|20|KE状态检查|{{KEProcessChecker#doCheck}}|无|否|
|21|重启KE|{{RestartStateHandler#doHandle}}|无|否|
|22|垃圾清理工具清理临时表|{{ProjectTemporaryTableCleaner#doExecuteCmd}}|与序号9一致|是|
|23|获取yarn的统计指标|{{KapGetClusterInfo#getYarnMetrics}}|代码会读取参数 
{{kylin.job.yarn-app-rest-check-status-url}} 并拼接到命令中,若该配置含有恶意代码,则会导致恶意代码执行|是|
|24|检测是否是数据权限分离的版本|{{UpdateUserAclTool#isDataPermissionSeparateVersion}}|无|否|
|25|打JStack|{{ToolUtil#dumpKylinJStack}}|无|否|

> Fix the command line injection vulnerability when generating diagnostic 
> packages through scripts
> ------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5706
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5706
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: 5.0-beta
>            Reporter: Hongrong Cao
>            Assignee: Zhiting Guo
>            Priority: Major
>             Fix For: 5.0.0
>
>
> The diagnostic package will call the command line to execute the shell script 
> through java, and kylin does not escape the input from the user, and directly 
> splices it into the command line, resulting in command line injection.
> Therefore, we need to escape the user input that will be spliced into the cmd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to