Copilot commented on code in PR #17800: URL: https://github.com/apache/pinot/pull/17800#discussion_r2876673586
########## pinot-spi/src/main/java/org/apache/pinot/spi/utils/PinotMd5Mode.java: ########## @@ -0,0 +1,46 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.pinot.spi.utils; + +/** + * Global runtime switch for disabling MD5-dependent code paths in Pinot. + */ +public class PinotMd5Mode { + private static volatile boolean _pinotMd5Disabled = readFromSystemProperty(); + + private PinotMd5Mode() { + } + + public static synchronized void setPinotMd5Disabled(boolean pinotMd5Disabled) { + _pinotMd5Disabled = pinotMd5Disabled; + } + + public static synchronized boolean isPinotMd5Disabled() { + return _pinotMd5Disabled; + } + + // Visible for tests in the same package. + static synchronized void resetFromSystemProperty() { Review Comment: `PinotMd5Mode.isPinotMd5Disabled()` is `synchronized` even though the backing field is `volatile`. This method is called in hot paths (e.g., `HashUtils.hashPrimaryKey()` used during upsert/dedup), so the synchronization can introduce avoidable lock contention. Consider removing `synchronized` from the getter (and likely the setter) and rely on `volatile` or an `AtomicBoolean` instead. ```suggestion public static void setPinotMd5Disabled(boolean pinotMd5Disabled) { _pinotMd5Disabled = pinotMd5Disabled; } public static boolean isPinotMd5Disabled() { return _pinotMd5Disabled; } // Visible for tests in the same package. static void resetFromSystemProperty() { ``` ########## pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java: ########## @@ -141,6 +143,35 @@ public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceC assertTrue(sasTokenFS != null); } + @Test + public void testChecksumDisabledWhenMd5Disabled() + throws Exception { + PinotConfiguration pinotConfiguration = new PinotConfiguration(); + pinotConfiguration.setProperty("authenticationType", "SAS_TOKEN"); + pinotConfiguration.setProperty("sasToken", "sp=rwdl&se=2025-12-31T23:59:59Z&sv=2022-11-02&sr=c&sig=test"); + pinotConfiguration.setProperty("accountName", "testaccount"); + pinotConfiguration.setProperty("fileSystemName", "testcontainer"); + pinotConfiguration.setProperty("enableChecksum", "true"); + + ADLSGen2PinotFS adlsGen2PinotFs = new ADLSGen2PinotFS() { + @Override + public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient, + String fileSystemName) { + return _mockFileSystemClient; + } + }; + + try { + PinotMd5Mode.setPinotMd5Disabled(true); + adlsGen2PinotFs.init(pinotConfiguration); + Field enableChecksumField = ADLSGen2PinotFS.class.getDeclaredField("_enableChecksum"); + enableChecksumField.setAccessible(true); + assertFalse(enableChecksumField.getBoolean(adlsGen2PinotFs)); Review Comment: This test uses reflection to access the private `_enableChecksum` field, which is brittle (it will break on refactors/renames without changing behavior). Prefer asserting behavior via the public API (e.g., verifying checksum headers/hashes are not computed/sent when MD5 is disabled) or adding a small `@VisibleForTesting` accessor instead of reflective field access. ```suggestion // Initialization should succeed even when checksum is requested in the configuration // but MD5 computation is globally disabled. adlsGen2PinotFs.init(pinotConfiguration); ``` ########## pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java: ########## @@ -123,7 +125,13 @@ public ADLSGen2PinotFS(DataLakeFileSystemClient fileSystemClient) { @Override public void init(PinotConfiguration config) { - _enableChecksum = config.getProperty(ENABLE_CHECKSUM, false); + boolean checksumEnabled = config.getProperty(ENABLE_CHECKSUM, false); + if (checksumEnabled && PinotMd5Mode.isPinotMd5Disabled()) { + LOGGER.warn("Checksum is enabled for ADLSGen2PinotFS but MD5 is globally disabled via '{}=true'; " + + "running with checksum disabled", + CommonConstants.CONFIG_OF_PINOT_MD5_DISABLED); + } + _enableChecksum = checksumEnabled && !PinotMd5Mode.isPinotMd5Disabled(); Review Comment: `_enableChecksum` is correctly forced off when MD5 is globally disabled, but `copySrcToDst()` still forwards `pathProperties.getContentMd5()` into `copyInputStreamToDst(...)`, which will set MD5 metadata whenever it exists (even when `_enableChecksum` is false). To fully honor the global MD5 disable switch, gate that metadata propagation as well (e.g., pass `null` when MD5 is disabled). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
