[ 
https://issues.apache.org/jira/browse/KYLIN-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756692#comment-17756692
 ] 

Yaguang Jia edited comment on KYLIN-5700 at 8/21/23 6:26 AM:
-------------------------------------------------------------

Summary of scenarios for executing shell commands in code:
||*Index*||*Function Description*||*Code Location*||*Risk*||*Need for Fix*||
|2|Generate diagnostic package|{{SystemService#dumpLocalDiagPackage}}|Users can 
achieve command injection by constructing jobId or queryId|Yes|
|3|Execute asynchronous query|{{AsyncQueryJob#runSparkSubmit}}|None|No|
|4|Submit Spark job|{{DefaultSparkBuildJobHandler#runSparkSubmit}}|None|No|
|5|Get hostname|{{LicenseInfoService#gatherEnv}}|None|No|
|6|Execute script task|{{ShellExecutable#doWork}}|None (Possibly deprecated 
code)|No|
|7|Kill remote process|{{NExecutableManager#killRemoteProcess}}|None|No|
|8|Import SSB data source|{{TableService#importSSBDataBase}}|None|No|
|9|Clean up temporary 
tables|{{SparkCleanupTransactionalTableStep#doExecuteCliCommand}}|The code 
reads the parameter {{kylin.source.hive.beeline-params}} and concatenates it to 
the command. If this parameter contains malicious code, it can execute the 
malicious code.|Yes|
|10|Generate temporary 
tables|{{HiveTransactionTableHelper#generateTxTable}}|Same as item 9|Yes|
|13|Copy files in metadata export 
tool|{{AbstractInfoExtractorTool#addFile}}|None|No|
|14|Get system information in metadata export tool (Linux version, memory, 
disk, Hadoop version, 
etc.)|{{AbstractInfoExtractorTool#addShellOutput}}|None|No|
|15|Get environment variables and Kylin directory structure in metadata export 
tool|{{ClientEnvTool#extractInfoByCmd}}|None|No|
|16|Export InfluxDB data|{{InfluxDBTool#dumpInfluxDB}}|The host and database 
parameters of InfluxDB are read from the configuration. If the configuration 
contains malicious code, it can execute the malicious code.|Yes|
|17|Get YARN logs|{{YarnApplicationTool#extractYarnLogs}}|None|No|
|19|Get GC time|{{FullGCDurationChecker#getGCTime}}|None|No|
|21|Restart Kylin|{{RestartStateHandler#doHandle}}|None|No|
|22|Clean up temporary tables with garbage 
cleaner|{{ProjectTemporaryTableCleaner#doExecuteCmd}}|Same as item 9|Yes|
|23|Get YARN statistics|{{KapGetClusterInfo#getYarnMetrics}}|The code reads the 
parameter {{kylin.job.yarn-app-rest-check-status-url}} and concatenates it to 
the command. If this parameter contains malicious code, it can execute the 
malicious code.|Yes|
|24|Check if it is a data permission separation 
version|{{UpdateUserAclTool#isDataPermissionSeparateVersion}}|None|No|
|25|Dump JStack|{{ToolUtil#dumpKylinJStack}}|None|No|

 

代码中执行shell命令的场景汇总:
||*序号*||*功能描述*||*代码位置*||*风险*||*是否需要修复*||
|2|打诊断包|{{SystemService#dumpLocalDiagPackage}}|用户可通过构造jobId或queryId 来实现命令行注入|是|
|3|执行异步查询|{{AsyncQueryJob#runSparkSubmit}}|无|否|
|4|提交spark任务|{{DefaultSparkBuildJobHandler#runSparkSubmit}}|无|否|
|5|获取hostname|{{LicenseInfoService#gatherEnv}}|无|否|
|6|执行脚步任务|{{ShellExecutable#doWork}}|无 (疑似废弃代码)|否|
|7|杀远程进程|{{NExecutableManager#killRemoteProcess}}|无|否|
|8|导入SSB数据源|{{TableService#importSSBDataBase}}|无|否|
|9|清理临时表|{{SparkCleanupTransactionalTableStep#doExecuteCliCommand}}|代码会读取参数 
{{kylin.source.hive.beeline-params}} 并拼接到命令中,若该参数含有恶意代码,则会导致恶意代码执行|是|
|10|生成临时表|{{HiveTransactionTableHelper#generateTxTable}}|与序号9一致|是|
|13|元数据导出工具复制文件|{{AbstractInfoExtractorTool#addFile}}|无|否|
|14|元数据导出工具获取系统信息(linux版本、内存、磁盘、hadoop版本 
等)|{{AbstractInfoExtractorTool#addShellOutput}}|无|否|
|15|元数据导出工具获取环境变量及Kylin目录结构|{{ClientEnvTool#extractInfoByCmd}}|无|否|
|16|导出 influxDB数据|{{InfluxDBTool#dumpInfluxDB}}|influxDB的host 和 database 
参数均是从配置中读取的,若该配置含有恶意代码,则会导致恶意代码执行|是|
|17|获取yarn日志|{{YarnApplicationTool#extractYarnLogs}}|无|否|
|19|获取GC时间|{{FullGCDurationChecker#getGCTime}}|无|否|
|21|重启Kylin|{{RestartStateHandler#doHandle}}|无|否|
|22|垃圾清理工具清理临时表|{{ProjectTemporaryTableCleaner#doExecuteCmd}}|与序号9一致|是|
|23|获取yarn的统计指标|{{KapGetClusterInfo#getYarnMetrics}}|代码会读取参数 
{{kylin.job.yarn-app-rest-check-status-url}} 并拼接到命令中,若该配置含有恶意代码,则会导致恶意代码执行|是|
|24|检测是否是数据权限分离的版本|{{UpdateUserAclTool#isDataPermissionSeparateVersion}}|无|否|
|25|打JStack|{{ToolUtil#dumpKylinJStack}}|无|否|


was (Author: JIRAUSER298908):
代码中执行shell命令的场景汇总:
 
||*序号*||*功能描述*||*代码位置*||*风险*||*是否需要修复*||
|2|打诊断包|{{SystemService#dumpLocalDiagPackage}}|用户可通过构造jobId或queryId 来实现命令行注入|是|
|3|执行异步查询|{{AsyncQueryJob#runSparkSubmit}}|无|否|
|4|提交spark任务|{{DefaultSparkBuildJobHandler#runSparkSubmit}}|无|否|
|5|获取hostname|{{LicenseInfoService#gatherEnv}}|无|否|
|6|执行脚步任务|{{ShellExecutable#doWork}}|无 (疑似废弃代码)|否|
|7|杀远程进程|{{NExecutableManager#killRemoteProcess}}|无|否|
|8|导入SSB数据源|{{TableService#importSSBDataBase}}|无|否|
|9|清理临时表|{{SparkCleanupTransactionalTableStep#doExecuteCliCommand}}|代码会读取参数 
{{kylin.source.hive.beeline-params}} 并拼接到命令中,若该参数含有恶意代码,则会导致恶意代码执行|是|
|10|生成临时表|{{HiveTransactionTableHelper#generateTxTable}}|与序号9一致|是|
|13|元数据导出工具复制文件|{{AbstractInfoExtractorTool#addFile}}|无|否|
|14|元数据导出工具获取系统信息(linux版本、内存、磁盘、hadoop版本 
等)|{{AbstractInfoExtractorTool#addShellOutput}}|无|否|
|15|元数据导出工具获取环境变量及Kylin目录结构|{{ClientEnvTool#extractInfoByCmd}}|无|否|
|16|导出 influxDB数据|{{InfluxDBTool#dumpInfluxDB}}|influxDB的host 和 database 
参数均是从配置中读取的,若该配置含有恶意代码,则会导致恶意代码执行|是|
|17|获取yarn日志|{{YarnApplicationTool#extractYarnLogs}}|无|否|
|19|获取GC时间|{{FullGCDurationChecker#getGCTime}}|无|否|
|21|重启Kylin|{{RestartStateHandler#doHandle}}|无|否|
|22|垃圾清理工具清理临时表|{{ProjectTemporaryTableCleaner#doExecuteCmd}}|与序号9一致|是|
|23|获取yarn的统计指标|{{KapGetClusterInfo#getYarnMetrics}}|代码会读取参数 
{{kylin.job.yarn-app-rest-check-status-url}} 并拼接到命令中,若该配置含有恶意代码,则会导致恶意代码执行|是|
|24|检测是否是数据权限分离的版本|{{UpdateUserAclTool#isDataPermissionSeparateVersion}}|无|否|
|25|打JStack|{{ToolUtil#dumpKylinJStack}}|无|否|

> Command line injection vulnerability when generating diagnostic packages via 
> scripts
> ------------------------------------------------------------------------------------
>
>                 Key: KYLIN-5700
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5700
>             Project: Kylin
>          Issue Type: Bug
>          Components: Tools, Build and Test
>    Affects Versions: 5.0-alpha
>            Reporter: Yaguang Jia
>            Assignee: Yaguang Jia
>            Priority: Critical
>             Fix For: 5.0-beta
>
>
> h2. Background
> 在当前代码中,有众多场景需要拼接出一条 cmd, 然后通过 {{ProcessBuilder}} 
> 来执行,拼接cmd的参数可能会来自接口,并且缺少参数合法性的检验,有被恶意攻击的可能。
> 当拼接 spark 命令时,使用了 {{checkCommandInjection}} 方法来避免注入攻击,但是该方法仅规避了 反引号 和 $() 
> 导致的注入攻击,如 {{`rm -rf /`}} {{{}$(rm -rf /){}}},无法规避其他场景,如 {{cat nohup.out2 && 
> echo success || echo failed}}
> h2. Fix Design
> 在拼接cmd命令时对参数进行检查,包括以下四种场景:
>  # 打诊断包时,会拼接 diag.sh 脚本的参数,如项目、jobId、路径等,依次检查每一个参数,符合 {{^[a-zA-Z0-9_./-]+$}} 
> 即可
>  # 导出influxDB 数据时,会在命令里拼接 *数据库地址* 以及 {*}数据库名称{*}作为 influx命令的参数,前者符合 
> {{[a-zA-Z0-9._-]{+}(:[0-9]{+})?}} 即可,后者符合{{{}^[0-9a-zA-Z_-]+${}}} 即可
>  # 获取yarn的统计指标时,会拼接yarn 的url地址作为 curl 命令的参数,符合 
> {{^(http(s)?://)?[a-zA-Z0-9._-]{+}(:[0-9]{+})?(/[a-zA-Z0-9._-]+)*/?$}} 即可
>  # 执行 beeline 命令时,会将配置中的 beeline-params 
> 拼接到命令中,beeline-params的构成较为复杂,强制将每一个参数值使用{{{}'{}}}包起来转为字符串,如 abc → ‘abc',ab’c 
> → ‘ab’\''c'
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to