[jira] [Updated] (HIVE-29451) PlanUtils.configureJobConf is called with a table-level logic for every single partition

Jira Tue, 10 Feb 2026 01:53:15 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-29451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-29451:
--------------------------------
    Description: 
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662

so, this below is executed for every single partition repeatedly, where this 
logic has no chance to distinguish between the partitions
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
{code}
  public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
    try {
      HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
          jobConf, 
tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
      if (storageHandler != null) {
        storageHandler.configureJobConf(tableDesc, jobConf);
      }
      if (tableDesc.getJobSecrets() != null) {
        for (Map.Entry<String, String> entry : 
tableDesc.getJobSecrets().entrySet()) {
          String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
                  tableDesc.getTableName() + TableDesc.SECRET_DELIMIT + 
entry.getKey();
          jobConf.getCredentials().addSecretKey(new Text(key), 
entry.getValue().getBytes());
        }
        tableDesc.getJobSecrets().clear();
      }
    } catch (HiveException e) {
      throw new RuntimeException(e);
    }
  }
{code}
consider a job reading hundreds of partitions (can become thousands, even 
though it's suboptimal for Hive)
we might want to collect distinct tables affected by the MapWork beforehand and 
run this logic once per TableDesc

  was:
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662

so, this below is executed for every single partition repeatedly, where this 
logic has no chance to distinguish between the partitions
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
{code}
  public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
    try {
      HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
          jobConf, 
tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
      if (storageHandler != null) {
        storageHandler.configureJobConf(tableDesc, jobConf);
      }
      if (tableDesc.getJobSecrets() != null) {
        for (Map.Entry<String, String> entry : 
tableDesc.getJobSecrets().entrySet()) {
          String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
                  tableDesc.getTableName() + TableDesc.SECRET_DELIMIT + 
entry.getKey();
          jobConf.getCredentials().addSecretKey(new Text(key), 
entry.getValue().getBytes());
        }
        tableDesc.getJobSecrets().clear();
      }
    } catch (HiveException e) {
      throw new RuntimeException(e);
    }
  }
{code}
configure a job reading hundreds of partitions
we need to collect distinct tables affected by the mapwork and run this logic 
once per TableDesc


> PlanUtils.configureJobConf is called with a table-level logic for every 
> single partition
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-29451
>                 URL: https://issues.apache.org/jira/browse/HIVE-29451
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662
> so, this below is executed for every single partition repeatedly, where this 
> logic has no chance to distinguish between the partitions
> https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
> {code}
>   public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
>     try {
>       HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
>           jobConf, 
> tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
>       if (storageHandler != null) {
>         storageHandler.configureJobConf(tableDesc, jobConf);
>       }
>       if (tableDesc.getJobSecrets() != null) {
>         for (Map.Entry<String, String> entry : 
> tableDesc.getJobSecrets().entrySet()) {
>           String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
>                   tableDesc.getTableName() + TableDesc.SECRET_DELIMIT + 
> entry.getKey();
>           jobConf.getCredentials().addSecretKey(new Text(key), 
> entry.getValue().getBytes());
>         }
>         tableDesc.getJobSecrets().clear();
>       }
>     } catch (HiveException e) {
>       throw new RuntimeException(e);
>     }
>   }
> {code}
> consider a job reading hundreds of partitions (can become thousands, even 
> though it's suboptimal for Hive)
> we might want to collect distinct tables affected by the MapWork beforehand 
> and run this logic once per TableDesc



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-29451) PlanUtils.configureJobConf is called with a table-level logic for every single partition

Reply via email to