[
https://issues.apache.org/jira/browse/HIVE-29451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-29451:
--------------------------------
Description:
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662
so, this below is executed for every single partition repeatedly, where this
logic has no chance to distinguish between the partitions
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
{code}
public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
try {
HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
jobConf,
tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
if (storageHandler != null) {
storageHandler.configureJobConf(tableDesc, jobConf);
}
if (tableDesc.getJobSecrets() != null) {
for (Map.Entry<String, String> entry :
tableDesc.getJobSecrets().entrySet()) {
String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
tableDesc.getTableName() + TableDesc.SECRET_DELIMIT +
entry.getKey();
jobConf.getCredentials().addSecretKey(new Text(key),
entry.getValue().getBytes());
}
tableDesc.getJobSecrets().clear();
}
} catch (HiveException e) {
throw new RuntimeException(e);
}
}
{code}
consider a job reading hundreds of partitions (can become thousands, even
though it's suboptimal for Hive)
we might want to collect distinct tables affected by the MapWork beforehand and
run this logic once per TableDesc
was:
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662
so, this below is executed for every single partition repeatedly, where this
logic has no chance to distinguish between the partitions
https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
{code}
public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
try {
HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
jobConf,
tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
if (storageHandler != null) {
storageHandler.configureJobConf(tableDesc, jobConf);
}
if (tableDesc.getJobSecrets() != null) {
for (Map.Entry<String, String> entry :
tableDesc.getJobSecrets().entrySet()) {
String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
tableDesc.getTableName() + TableDesc.SECRET_DELIMIT +
entry.getKey();
jobConf.getCredentials().addSecretKey(new Text(key),
entry.getValue().getBytes());
}
tableDesc.getJobSecrets().clear();
}
} catch (HiveException e) {
throw new RuntimeException(e);
}
}
{code}
configure a job reading hundreds of partitions
we need to collect distinct tables affected by the mapwork and run this logic
once per TableDesc
> PlanUtils.configureJobConf is called with a table-level logic for every
> single partition
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-29451
> URL: https://issues.apache.org/jira/browse/HIVE-29451
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Priority: Major
>
> https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java#L662
> so, this below is executed for every single partition repeatedly, where this
> logic has no chance to distinguish between the partitions
> https://github.com/apache/hive/blob/98da62c93f198126c78d3352bf3ac6aeacefa53c/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java#L912-L930
> {code}
> public static void configureJobConf(TableDesc tableDesc, JobConf jobConf) {
> try {
> HiveStorageHandler storageHandler = HiveUtils.getStorageHandler(
> jobConf,
> tableDesc.getProperties().getProperty(hive_metastoreConstants.META_TABLE_STORAGE));
> if (storageHandler != null) {
> storageHandler.configureJobConf(tableDesc, jobConf);
> }
> if (tableDesc.getJobSecrets() != null) {
> for (Map.Entry<String, String> entry :
> tableDesc.getJobSecrets().entrySet()) {
> String key = TableDesc.SECRET_PREFIX + TableDesc.SECRET_DELIMIT +
> tableDesc.getTableName() + TableDesc.SECRET_DELIMIT +
> entry.getKey();
> jobConf.getCredentials().addSecretKey(new Text(key),
> entry.getValue().getBytes());
> }
> tableDesc.getJobSecrets().clear();
> }
> } catch (HiveException e) {
> throw new RuntimeException(e);
> }
> }
> {code}
> consider a job reading hundreds of partitions (can become thousands, even
> though it's suboptimal for Hive)
> we might want to collect distinct tables affected by the MapWork beforehand
> and run this logic once per TableDesc
--
This message was sent by Atlassian Jira
(v8.20.10#820010)