[ https://issues.apache.org/jira/browse/HIVE-26419?focusedWorklogId=794338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794338 ]
ASF GitHub Bot logged work on HIVE-26419: ----------------------------------------- Author: ASF GitHub Bot Created on: 22/Jul/22 18:24 Start Date: 22/Jul/22 18:24 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3466: URL: https://github.com/apache/hive/pull/3466#discussion_r927895735 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java: ########## @@ -253,8 +253,12 @@ private static PersistenceManagerFactory initPMF(Configuration conf, boolean for } else { try { DataSource ds = (maxPoolSize > 0) ? dsp.create(conf, maxPoolSize) : dsp.create(conf); + // The secondary connection factory is used for schema generation, and for value generation operations. + // We should use a different pool for the secondary connection factory to avoid resource starvation. + // Since DataNucleus uses locks for schema generation and value generation, 2 connections should be sufficient. + DataSource ds2 = dsp.create(conf, /* maxPoolSize */ 2); Review Comment: @hsnusonic, according to the documentation: Datastore connections are obtained from up to 2 connection factories. The primary connection factory is used for persistence operations, and optionally for value generation operations. The secondary connection factory is used for schema generation, and optionally for value generation operations. Schema generation happens at HMS startup. Where do you see a potential issue with resource starvation? Issue Time Tracking ------------------- Worklog Id: (was: 794338) Time Spent: 0.5h (was: 20m) > Use a different pool for DataNucleus' secondary connection factory > ------------------------------------------------------------------ > > Key: HIVE-26419 > URL: https://issues.apache.org/jira/browse/HIVE-26419 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore > Reporter: Yu-Wen Lai > Assignee: Yu-Wen Lai > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Quote from DataNucleus documentation: > {quote}The secondary connection factory is used for schema generation, and > for value generation operations (unless specified to use primary). > {quote} > We should not use same connection pool for DataNucleus' primary and secondary > connection factory. An awful situation is that each thread holds one > connection and request for another connection for value generation, but no > connection is available in the pool. It will keep retrying and fail at the > end. -- This message was sent by Atlassian Jira (v8.20.10#820010)