BsoBird commented on code in PR #5934:
URL: https://github.com/apache/hive/pull/5934#discussion_r2191831532


##########
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java:
##########
@@ -71,8 +70,8 @@ public class SplitGrouper {
 
   // TODO This needs to be looked at. Map of Map to Map... Made concurrent for 
now since split generation
   // can happen in parallel.
-  private static final Map<Map<Path, PartitionDesc>, Map<Path, PartitionDesc>> 
cache =
-      new ConcurrentHashMap<>();
+  private final Map<Map<Path, PartitionDesc>, Map<Path, PartitionDesc>> cache =

Review Comment:
   > @BsoBird, thanks for the details, it helped a lot. i managed to create a 
test that initializes the cache from `SplitGroper`. Note, I struggled to create 
an iceberg table backed by HadoopCatalog
   > 
   > ```
   > set iceberg.catalog.ice01.type=hadoop;
   > set iceberg.catalog.ice01.warehouse=/tmp;
   > 
   > CREATE EXTERNAL TABLE orders (orderid INT, quantity INT, itemid INT, 
tradets TIMESTAMP)
   >     PARTITIONED BY (p1 STRING, p2 STRING)
   > STORED BY ICEBERG STORED AS ORC
   > LOCATION '/tmp/ice01/orders'
   > TBLPROPERTIES('format-version'='2', 'iceberg.catalog'='ice01');
   > ```
   > 
   > Caused by: java.lang.IllegalArgumentException: Cannot set a custom 
location for a path-based table. Expected /tmp/default/orders but got 
file:/tmp/ice01/orders
   > 
   > ```
   > CREATE EXTERNAL TABLE orders (orderid INT, quantity INT, itemid INT, 
tradets TIMESTAMP)
   >     PARTITIONED BY (p1 STRING, p2 STRING)
   > STORED BY ICEBERG STORED AS ORC
   > LOCATION '/tmp/ice01/orders'
   > TBLPROPERTIES('format-version'='2', 'iceberg.catalog'='ice01');
   > ```
   > 
   > Caused by: java.lang.IllegalArgumentException: Table location not set
   > 
   > Wonder how were you able to create one?
   
   @deniskuzZ 
   Example:
   ```
   ----spark3.4.1+iceberg 1.4.2
   CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
   id string,name string
   )
   PARTITIONED BY (name,bucket(id,16))
   TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
   STORED AS iceberg;
   
   insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 
   
   
   --hive
    CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
   LOCATION 'hdfs://xxxxxx/iceberg-catalog/warehouse/test/test_data_02'
   TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
   ```
   
   But in reality, using Iceberg tables from other catalogs in HIVE is not 
always smooth. For example, if the table's location is set only using Iceberg 
properties without setting the HIVE table's LOCATION property, there's a high 
chance the table won't function properly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to