Eliaaazzz opened a new pull request, #37354:
URL: https://github.com/apache/beam/pull/37354

   Fixes #37351
   This fixes a bug where ByteBuddyDoFnInvokerFactory would return the same 
cached invoker for different generic instantiations of the same DoFn class.
   
   Description:
   This PR addresses a critical bug in the ByteBuddyDoFnInvokerFactory where 
the caching mechanism for generated DoFnInvoker classes failed to distinguish 
between different generic instantiations of the same DoFn class.
   
   1. The Problem: Cache Collision on Generic Types
   The previous implementation utilized a ConcurrentHashMap<Class<?>, 
Constructor<?>> to cache the generated invoker constructors. The cache key was 
solely the DoFn class itself.
   
   In scenarios involving generic DoFn classes (e.g., class MyGenericFn<T> 
extends DoFn<T, T>), the raw class object is identical regardless of the type 
parameter T.
   
   Scenario: If a pipeline instantiates MyGenericFn<String> and later 
MyGenericFn<Integer>, the factory would generate an invoker for String and 
cache it.
   
   2. The Bug: When MyGenericFn<Integer> is requested, the factory hits the 
cache using the same MyGenericFn.class key and returns the String-specialized 
invoker.
   
   Consequence: This leads to runtime ClassCastException or incorrect Coder 
inference, as the bytecode for the invoker is specialized for the wrong type.
   
   3. The Solution: Type-Aware Caching & Naming
   This PR introduces a composite cache key and updates the class naming 
strategy to ensure full isolation between generic types.
   
   A. Introduced InvokerCacheKey
   I replaced the simple Class<?> key with a new static inner class 
InvokerCacheKey. This key encapsulates three components:
   
   fnClass: The raw class of the DoFn.
   
   inputType: The TypeDescriptor for the input element.
   
   outputType: The TypeDescriptor for the output element.
   
   By implementing equals() and hashCode() based on all three fields, we ensure 
that MyGenericFn<String> and MyGenericFn<Integer> map to distinct cache entries.
   
   B. Unique Class Naming Strategy
   ByteBuddy cannot define two different classes with the exact same name in 
the same ClassLoader. To support multiple generated invokers for the same raw 
DoFn class, I modified generateInvokerClass.
   
   Old Strategy: <DoFnClassName>$DoFnInvoker
   
   New Strategy: <DoFnClassName>$DoFnInvoker$<TypeHash>
   
   The suffix now includes a hexadecimal hash derived from the input and output 
TypeDescriptors. This guarantees that the generated bytecode class names are 
unique for each generic instantiation.
   
   
   Test Plan
   I have added a regression test testCacheKeyCollisionProof to 
DoFnInvokersTest.java.
   
   Methodology: The test defines a local generic class DynamicTypeDoFn<T> and 
manually forces specific TypeDescriptor returns.
   
   Verification: It instantiates the DoFn twice (once for String, once for 
Integer) and asserts that the generated Invoker classes are not the same 
(assertNotSame).
   
   Result: The test passes with the fix (confirming distinct invokers are 
generated) and fails without it.
   
   Existing tests passed:
   
   DoFnInvokersTest
   
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [x] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit 
[https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/actions/workflows/build_wheels.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/actions/workflows/python_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/actions/workflows/java_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go 
tests](https://github.com/apache/beam/actions/workflows/go_tests.yml/badge.svg?event=schedule&&?branch=master)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI or the [workflows 
README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) 
to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to