Marta Kuczora created HIVE-18696:
------------------------------------
Summary: The partition folders might not get cleaned up properly
in the HiveMetaStore.add_partitions_core method if an exception occurs
Key: HIVE-18696
URL: https://issues.apache.org/jira/browse/HIVE-18696
Project: Hive
Issue Type: Bug
Components: Metastore
Reporter: Marta Kuczora
Assignee: Marta Kuczora
When trying to add multiple partitions, but one of them cannot be created
successfully, none of the partitions are created, but the folders might not be
cleaned up properly. See the test case "testAddPartitionsOneInvalid" in the
TestAddPartitions test.
This is the problematic code in the HiveMetaStore.add_partitions_core method:
{code:java}
for (final Partition part : parts) {
if (!part.getTableName().equals(tblName) ||
!part.getDbName().equals(dbName)) {
throw new MetaException("Partition does not belong to target table "
+ dbName + "." + tblName + ": " + part);
}
boolean shouldAdd = startAddPartition(ms, part, ifNotExists);
if (!shouldAdd) {
existingParts.add(part);
LOG.info("Not adding partition " + part + " as it already exists");
continue;
}
final UserGroupInformation ugi;
try {
ugi = UserGroupInformation.getCurrentUser();
} catch (IOException e) {
throw new RuntimeException(e);
}
partFutures.add(threadPool.submit(new Callable<Partition>() {
@Override
public Partition call() throws Exception {
ugi.doAs(new PrivilegedExceptionAction<Object>() {
@Override
public Object run() throws Exception {
try {
boolean madeDir = createLocationForAddedPartition(table,
part);
if (addedPartitions.put(new PartValEqWrapper(part),
madeDir) != null) {
// Technically, for ifNotExists case, we could insert one
and discard the other
// because the first one now "exists", but it seems
better to report the problem
// upstream as such a command doesn't make sense.
throw new MetaException("Duplicate partitions in the
list: " + part);
}
initializeAddedPartition(table, part, madeDir);
} catch (MetaException e) {
throw new IOException(e.getMessage(), e);
}
return null;
}
});
return part;
}
}));
}
{code}
When going through the partitions, let's say for the first two partitions the
threads are successfully submitted to create the folders. But an exception
occurs for the third partition in the code before submitting the thread. (It
can happen if the partition has different table or db name as the others or it
has invalid value.)
In this case the execution will jump to the finally part where the folders in
the "addedPartitions" map will be cleaned up. However it can happen that the
threads for the first two partitions are not finished with the folder creation
yet, so the map can be empty or it can contain only one of the partitions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)