Some problems encountered when reading ICEBERG with vectorisation turned on

lisoda Sun, 10 Mar 2024 01:59:24 -0800

Hi.

I am using HIVE 4.0.0 to read ICEBERG tables. I am having some problems with 
it, so if someone could guide me, that would be great.



Env: hadoop3.3.6  hive4.0.0  tez0.10.2  iceberg1.4.3


iceberg-table: hadoop-catalog-table/location_based_table


Question 1: How tez.mrreader.config.update.properties works?


I'm testing hive-iceberg. My current problem is that I find I can't read all 
the non-partitioned columns under the partitioned table.(With vectorisation 
turned on).
Reading through the code, I found that vectorised reads depend on the value of 
"hive.io.file.readcolumn.ids".
When vectorisation is turned on, TEZ-MAP-TASK relies on the values of the 
following two attributes:
hive.io.file.readcolumn.names  and  hive.io.file.readcolumn.ids
Currently, these two values are dynamically set in TEZ-Driver depending on the 
SQL submitted by the user. 
According to https://issues.apache.org/jira/ browse/TEZ-4248 , the authors seem 
to expect to be able to pass both values to tez-worker.
But, I found that in TezChild, I am not able to get the value of 
hive.io.file.readcolumn.ids which is set in TEZ-ApplicationMaster.
When I assign the value "hive.io.file.readcolumn.ids" directly from the 
console, it reads the ICEBERG partition table just fine. But I can't do this in 
a production environment.
So.How should I troubleshoot this problem?


Question 2: HIVE read ICEBERG non-partitioned table dependency on 
"hive.io.file.readcolumn.ids"?


For non-partitioned tables, I found that in cases where I couldn't get the 
value of "hive.io.file.readcolumn.ids" or the value of 
"hive.io.file.readcolumn.ids" was wrong. I can still read the ICEBERG 
non-partitioned tables just fine.
But from the code, they are using the same code .....
So. Why...?


I'm very confused at the moment and I'd be grateful if someone could help me. 
I'd appreciate it. Thank you.

Some problems encountered when reading ICEBERG with vectorisation turned on

Reply via email to