[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=713004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713004
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 17:36
Start Date: 21/Jan/22 17:36
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018721560


   FYI I've uploaded a PR to Iceberg: 
https://github.com/apache/iceberg/pull/3947
   
   It only contains the 1-based indexing of this PR as the table migration code 
is not present in the Iceberg repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 713004)
Time Spent: 2.5h  (was: 2h 20m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712922
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:59
Start Date: 21/Jan/22 14:59
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018586068


   Ah right, things are currently being duplicated between Hive and Iceberg. 
Sure, I'll happily add these changes to Iceberg as well!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712922)
Time Spent: 2h 20m  (was: 2h 10m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712913
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:43
Start Date: 21/Jan/22 14:43
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018568724


   Thanks for the contribution @boroknagyz! As Peter mentioned it would be 
great to get the relevant parts into the upstream Iceberg code base as well - 
is this something you would be fancy doing too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712913)
Time Spent: 2h 10m  (was: 2h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712911
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:42
Start Date: 21/Jan/22 14:42
Worklog Time Spent: 10m 
  Work Description: marton-bod merged pull request #2948:
URL: https://github.com/apache/hive/pull/2948


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712911)
Time Spent: 2h  (was: 1h 50m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712908
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:40
Start Date: 21/Jan/22 14:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018564045


   LGTM +1.
   I think this change should go into the Iceberg repo as well. What do you 
think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712908)
Time Spent: 1h 50m  (was: 1h 40m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712896
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:28
Start Date: 21/Jan/22 14:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789702644



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   Makes sense, thx!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712896)
Time Spent: 1.5h  (was: 1h 20m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712899
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 14:28
Start Date: 21/Jan/22 14:28
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1018551920


   LGTM, will merge this today unless @pvary has further comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712899)
Time Spent: 1h 40m  (was: 1.5h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712874
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 13:45
Start Date: 21/Jan/22 13:45
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789669255



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   Prior to this patch `HiveSchemaConverter` used 0-based indexing when it 
assigned the field ids. E.g. in the above statement it would assign field id 0 
to `id`, field id 1 to `year_field`, and so on. Hence in 
'iceberg.mr.table.partition.spec' the source-id 1 referred to the `year_field`. 
Everything was fine, but when Iceberg creates a table it reassigns the field 
ids using 1-based indexing (field id 1 is `id`, field id 2 is `year_field`). 
And Iceberg is smart enough to use the correct ids in the partition spec, i.e. 
it replaces source id 1 to source id 2 and so on.
   
   So everything worked OK, but you had to specify different field/source ids 
in Hive than the actual field/source ids assigned by Iceberg.
   
   With this change, you need to use the same 1-based indexing in the partition 
spec that Iceberg will use later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712874)
Time Spent: 1h 20m  (was: 1h 10m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712720
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 21/Jan/22 10:29
Start Date: 21/Jan/22 10:29
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789540515



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   I'm probably missing something obvious - can you explain why the 
source-id values had to be incremented? Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712720)
Time Spent: 1h 10m  (was: 1h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=712274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-712274
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 20/Jan/22 18:32
Start Date: 20/Jan/22 18:32
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r789047277



##
File path: 
iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q
##
@@ -8,7 +8,7 @@ DROP TABLE IF EXISTS ice_t_transform;
 CREATE EXTERNAL TABLE ice_t_transform (year_field date, month_field date, 
day_field date, hour_field timestamp, truncate_field string, bucket_field int, 
identity_field int) PARTITIONED BY SPEC (year(year_field), month(month_field), 
day(day_field), hour(hour_field), truncate(2, truncate_field), bucket(2, 
bucket_field), identity_field) STORED BY ICEBERG;
 
 DROP TABLE IF EXISTS ice_t_transform_prop;
-CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":1,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":2,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":3,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":4,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":5,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":6,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":7,"field-id":1006}]}');
+CREATE EXTERNAL TABLE ice_t_transform_prop (id int, year_field date, 
month_field date, day_field date, hour_field timestamp, truncate_field string, 
bucket_field int, identity_field int) STORED BY ICEBERG TBLPROPERTIES 
('iceberg.mr.table.partition.spec'='{"spec-id":0,"fields":[{"name":"year_field_year","transform":"year","source-id":2,"field-id":1000},{"name":"month_field_month","transform":"month","source-id":3,"field-id":1001},{"name":"day_field_day","transform":"day","source-id":4,"field-id":1002},{"name":"hour_field_hour","transform":"hour","source-id":5,"field-id":1003},{"name":"truncate_field_trunc","transform":"truncate[2]","source-id":6,"field-id":1004},{"name":"bucket_field_bucket","transform":"bucket[2]","source-id":7,"field-id":1005},{"name":"identity_field","transform":"identity","source-id":8,"field-id":1006}]}');

Review comment:
   I've issued these commands in Hive (with the original 0-based field 
ids), then checked the Iceberg snapshot file and the field ids were transformed 
by Iceberg to 1-based anyway. So now the values in the statements are the same 
that the values in the snapshots.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 712274)
Time Spent: 1h  (was: 50m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=711573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711573
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 18:53
Start Date: 19/Jan/22 18:53
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1016768717


   > > HiveSchemaConverter assigned field ids starting from 0, while Iceberg 
assigns field ids starting from 1. This caused test failures because the name 
mapping had wrong field ids. So in the second commit I fixed 
HiveSchemaConverter as well.
   > 
   > If we provide a schema where the id is starting from 0, would that mean 
that the schema is not correct? Or this is just to be more aligned with the way 
Iceberg generates ids?
   
   Seems like Iceberg doesn't respect the field ids during table creation, it 
always reassigns them.
   E.g. I was trying to create a table with this schema:
   
   
`iceberg.mr.table.schema={"type":"struct","schema-id":0,"fields":[{"id":0,"name":"a","required":false,"type":"int"}]}`
   
   or
   
   
`iceberg.mr.table.schema={"type":"struct","schema-id":0,"fields":[{"id":5,"name":"a","required":false,"type":"int"}]}`
   
   And Iceberg created the following schema:
   
   `table {
 1: a: optional int
   }`
   
   OTOH, I don't think the value 0 is invalid. The spec reserves only these ids:
   https://iceberg.apache.org/#spec/#reserved-field-ids
   
   So I edited the Iceberg snapshot file and rewrote the field ids to start 
from 0, then inserted data files and queried the table. Iceberg didn't complain 
about the 0 field id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711573)
Time Spent: 50m  (was: 40m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=711404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711404
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 14:10
Start Date: 19/Jan/22 14:10
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1016501611


   > HiveSchemaConverter assigned field ids starting from 0, while Iceberg 
assigns field ids starting from 1. This caused test failures because the name 
mapping had wrong field ids. So in the second commit I fixed 
HiveSchemaConverter as well.
   
   If we provide a schema where the id is starting from 0, would that mean that 
the schema is not correct? Or this is just to be more aligned with the way 
Iceberg generates ids?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711404)
Time Spent: 40m  (was: 0.5h)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=711369&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711369
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 12:45
Start Date: 19/Jan/22 12:45
Worklog Time Spent: 10m 
  Work Description: boroknagyz commented on pull request #2948:
URL: https://github.com/apache/hive/pull/2948#issuecomment-1016431804


   HiveSchemaConverter assigned field ids starting from 0, while Iceberg 
assigns field ids starting from 1.
   This caused test failures because the name mapping had wrong field ids.
   So in the second commit I fixed HiveSchemaConverter as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711369)
Time Spent: 0.5h  (was: 20m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=710433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-710433
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 18/Jan/22 11:47
Start Date: 18/Jan/22 11:47
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2948:
URL: https://github.com/apache/hive/pull/2948#discussion_r786672462



##
File path: 
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergMigration.java
##
@@ -225,12 +226,15 @@ private void validateSd(Table hmsTable, String format) {
   private void validateTblProps(Table hmsTable, boolean migrationSucceeded) {
 String migratedProp = 
hmsTable.getParameters().get(HiveIcebergMetaHook.MIGRATED_TO_ICEBERG);
 String tableTypeProp = 
hmsTable.getParameters().get(BaseMetastoreTableOperations.TABLE_TYPE_PROP);
+String nameMappingProp = 
hmsTable.getParameters().get(TableProperties.DEFAULT_NAME_MAPPING);
 if (migrationSucceeded) {
   Assert.assertTrue(Boolean.parseBoolean(migratedProp));
   
Assert.assertEquals(BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE.toUpperCase(),
 tableTypeProp);
+  Assert.assertTrue(nameMappingProp != null && !nameMappingProp.isEmpty());
 } else {
   Assert.assertNull(migratedProp);
   
Assert.assertNotEquals(BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE.toUpperCase(),
 tableTypeProp);
+  Assert.assertTrue(nameMappingProp == null);

Review comment:
   nit: could use Assert.assertNull here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 710433)
Time Spent: 20m  (was: 10m)

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25871) Hive should set name mapping table property for migrated Iceberg tables

2022-01-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25871?focusedWorklogId=710112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-710112
 ]

ASF GitHub Bot logged work on HIVE-25871:
-

Author: ASF GitHub Bot
Created on: 17/Jan/22 19:01
Start Date: 17/Jan/22 19:01
Worklog Time Spent: 10m 
  Work Description: boroknagyz opened a new pull request #2948:
URL: https://github.com/apache/hive/pull/2948


   …Iceberg tables
   
   
   
   ### What changes were proposed in this pull request?
   With this PR, Hive will set table property 'schema.name-mapping.default' for 
migrated Iceberg tables.
   The value of this property contains a mapping between Iceberg field ids and 
column names.
   
   ### Why are the changes needed?
   This table property is useful for column projection of legacy data files:
   https://iceberg.apache.org/#spec/#column-projection
   
   ### Does this PR introduce _any_ user-facing change?
   When users migrate a legacy table to Iceberg, this new table property will 
be set.
   
   ### How was this patch tested?
   Extended unit test TestHiveIcebergMigration.
   Q tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 710112)
Remaining Estimate: 0h
Time Spent: 10m

> Hive should set name mapping table property for migrated Iceberg tables
> ---
>
> Key: HIVE-25871
> URL: https://issues.apache.org/jira/browse/HIVE-25871
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive should  set the name-mapping table property during table migration.
> It would be useful for [column 
> projection|https://iceberg.apache.org/#spec/#column-projection] for files 
> without field ids.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)