[jira] [Commented] (IMPALA-8243) ConcurrentModificationException in Catalog stress tests

2019-03-14 Thread Gabor Kaszab (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792672#comment-16792672
 ] 

Gabor Kaszab commented on IMPALA-8243:
--

Hey [~bharathv],
I see this was submitted. Can this be resolved and the fix version set to 3.2?

fced1cc IMPALA-8243: Fix racy access to nonPartFieldSchemas_

> ConcurrentModificationException in Catalog stress tests
> ---
>
> Key: IMPALA-8243
> URL: https://issues.apache.org/jira/browse/IMPALA-8243
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Blocker
>
> Following is the full stack from the Catalog server logs.
> {noformat}
> 14:09:29.474424 14829 jni-util.cc:256] 
> java.util.ConcurrentModificationException
> java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> java.util.ArrayList$Itr.next(ArrayList.java:851)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1449)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1278)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1144)
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1062)
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:919)
> org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:815)
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:862)
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:759)
> org.apache.impala.thrift.TPartialPartitionInfo.write(TPartialPartitionInfo.java:665)
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:731)
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:624)
> org.apache.impala.thrift.TPartialTableInfo.write(TPartialTableInfo.java:543)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:977)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:857)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse.write(TGetPartialCatalogObjectResponse.java:739)
> org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:233)
> {noformat}
> It looks like the bug is in the following piece of code.
> {noformat}
> /**
>* Returns a Hive-compatible partition object that may be used in calls to 
> the
>* metastore.
>*/
>   public org.apache.hadoop.hive.metastore.api.Partition toHmsPartition() {
> if (cachedMsPartitionDescriptor_ == null) return null;
> Preconditions.checkNotNull(table_.getNonPartitionFieldSchemas());
> // Update the serde library class based on the currently used file format.
> org.apache.hadoop.hive.metastore.api.StorageDescriptor storageDescriptor =
> new org.apache.hadoop.hive.metastore.api.StorageDescriptor(
> table_.getNonPartitionFieldSchemas(),  <= Reference to the 
> actual field schema list.
> getLocation(),
> cachedMsPartitionDescriptor_.sdInputFormat,
> cachedMsPartitionDescriptor_.sdOutputFormat,
> cachedMsPartitionDescriptor_.sdCompressed,
> {noformat}
> It appears we are leaking a reference to {{nonPartFieldSchemas_}} in to the 
> thrift object and once the thread leaves the lock scope, some other thread 
> (load() for ex: ) can potentially change the source list and the 
> serialization code could throw {{ConcurrentModificationException}}
> While the stack above is Catalog-v2 only, it is possible that some other 
> threads can race in a similar fashion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8243) ConcurrentModificationException in Catalog stress tests

2019-02-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777588#comment-16777588
 ] 

ASF subversion and git services commented on IMPALA-8243:
-

Commit fced1cc1bbe62bb94b82962aea0560df8ed00d2d in impala's branch 
refs/heads/master from Bharath Vissapragada
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fced1cc ]

IMPALA-8243: Fix racy access to nonPartFieldSchemas_

** Please refer to the jira for the full stacktrace.

When constructing the HMS partition state for a given partition,
we are leaking a reference to HdfsTable#nonPartFieldSchemas_ list.
The construction happens under a table lock. But once the thread
exits the lock scope, the source list could be racily modified by
another operation (say refresh) and that interferes with the original
thread if it tries to access the list.`

The fix is to make a shallow copy of the source list so that any
changes to the list do not affect the original caller.

This was found by a stress test under heavy concurrency of refresh
operations + GetPartialCatalogObject() calls.

Testing:
-
- I tried a bunch of combinations of operations in the unit-test
framework but I couldn't reproduce the stack trace, probably because
the operations are very short-lived

- However, after deploying this patched jar on the stress test cluster,
this exception never happened again.

Change-Id: I7d68b54af2ba954cf0ffa7b2533cde7be835be77
Reviewed-on: http://gerrit.cloudera.org:8080/12572
Tested-by: Impala Public Jenkins 
Reviewed-by: Paul Rogers 


> ConcurrentModificationException in Catalog stress tests
> ---
>
> Key: IMPALA-8243
> URL: https://issues.apache.org/jira/browse/IMPALA-8243
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: bharath v
>Assignee: bharath v
>Priority: Major
>
> Following is the full stack from the Catalog server logs.
> {noformat}
> 14:09:29.474424 14829 jni-util.cc:256] 
> java.util.ConcurrentModificationException
> java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> java.util.ArrayList$Itr.next(ArrayList.java:851)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1449)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1278)
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1144)
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1062)
> org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:919)
> org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:815)
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:862)
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:759)
> org.apache.impala.thrift.TPartialPartitionInfo.write(TPartialPartitionInfo.java:665)
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:731)
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:624)
> org.apache.impala.thrift.TPartialTableInfo.write(TPartialTableInfo.java:543)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:977)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:857)
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse.write(TGetPartialCatalogObjectResponse.java:739)
> org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:233)
> {noformat}
> It looks like the bug is in the following piece of code.
> {noformat}
> /**
>* Returns a Hive-compatible partition object that may be used in calls to 
> the
>* metastore.
>*/
>   public org.apache.hadoop.hive.metastore.api.Partition toHmsPartition() {
> if (cachedMsPartitionDescriptor_ == null) return null;
> Preconditions.checkNotNull(table_.getNonPartitionFieldSchemas());
> // Update the serde library class based on the currently used file format.
> org.apache.hadoop.hive.metastore.api.StorageDescriptor storageDescriptor =
> new org.apache.hadoop.hive.metastore.api.StorageDescriptor(
> table_.getNonPartitionFieldSchemas(),  <= Reference to the 
> actual field schema list.
> getLocation(),
>