BsoBird commented on code in PR #5934: URL: https://github.com/apache/hive/pull/5934#discussion_r2191831532
########## ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java: ########## @@ -71,8 +70,8 @@ public class SplitGrouper { // TODO This needs to be looked at. Map of Map to Map... Made concurrent for now since split generation // can happen in parallel. - private static final Map<Map<Path, PartitionDesc>, Map<Path, PartitionDesc>> cache = - new ConcurrentHashMap<>(); + private final Map<Map<Path, PartitionDesc>, Map<Path, PartitionDesc>> cache = Review Comment: > @BsoBird, thanks for the details, it helped a lot. i managed to create a test that initializes the cache from `SplitGroper`. Note, I struggled to create an iceberg table backed by HadoopCatalog > > ``` > set iceberg.catalog.ice01.type=hadoop; > set iceberg.catalog.ice01.warehouse=/tmp; > > CREATE EXTERNAL TABLE orders (orderid INT, quantity INT, itemid INT, tradets TIMESTAMP) > PARTITIONED BY (p1 STRING, p2 STRING) > STORED BY ICEBERG STORED AS ORC > LOCATION '/tmp/ice01/orders' > TBLPROPERTIES('format-version'='2', 'iceberg.catalog'='ice01'); > ``` > > Caused by: java.lang.IllegalArgumentException: Cannot set a custom location for a path-based table. Expected /tmp/default/orders but got file:/tmp/ice01/orders > > ``` > CREATE EXTERNAL TABLE orders (orderid INT, quantity INT, itemid INT, tradets TIMESTAMP) > PARTITIONED BY (p1 STRING, p2 STRING) > STORED BY ICEBERG STORED AS ORC > LOCATION '/tmp/ice01/orders' > TBLPROPERTIES('format-version'='2', 'iceberg.catalog'='ice01'); > ``` > > Caused by: java.lang.IllegalArgumentException: Table location not set > > Wonder how were you able to create one? @deniskuzZ Example: ``` ----spark3.4.1+iceberg 1.4.2 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name,bucket(id,16)) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xxxxxx/iceberg-catalog/warehouse/test/test_data_02' TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); ``` But in reality, using Iceberg tables from other catalogs in HIVE is not always smooth. For example, if the table's location is set only using Iceberg properties without setting the HIVE table's LOCATION property, there's a high chance the table won't function properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org