It's working! Huge thank you to Steve Niemitz who pointed out the need for
"--experiments=enable_custom_pubsub_sink" to prevent dataflow override for
the module that I wanted to use custom source.

Here is my full process in case it's helpful to anyone in the future (note
one might need to change the version identifiers):


   1. Modify files in sdks/java/io/google-cloud-platform
   2. Add id 'com.github.johnrengelman.shadow' to plugins in
   sdks/java/io/google-cloud-platform/build.gradle
   3. Build a shadowJar via "./gradlew
   :sdk:java:io:google-cloud-platform:shadowJar"
   4. Copy the shadowJar from
   
my/path/to/beam/sdks/java/io/google-cloud-platform/build/libs/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT-all.jar
   to
   
my/path/to/user/pipeline/top-level/libs/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.40.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.40.0-SNAPSHOT.jar
   5. Add a pom file for the shadowJar (to emulate local maven repo):

   <?xml version="1.0" encoding="utf-8"?>
   <project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
   http://maven.apache.org/xsd/maven-4.0.0.xsd"; xmlns="
   http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
   http://www.w3.org/2001/XMLSchema-instance";>
       <modelVersion>4.0.0</modelVersion>
       <groupId>org.apache.beam</groupId>
       <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
       <version>2.40.0-SNAPSHOT</version>
   </project>

   6. In user code pipeline "build.gradle", add a local maven repo (note
   "./libs" is from "my/path/to/user/pipeline/top-level/libs")

         repositories {
           maven {
               url = uri('./libs')
           }
           ... other repos ...
        }

   7. In user code pipeline "build.gradle", implement dependency
   replacement of the SDK version of beam-sdks-java-io-google-cloud-platform

   configurations {
       all {
           resolutionStrategy.dependencySubstitution {
               substitute
   module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0")
   using
   
module("org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.40.0-SNAPSHOT")
           }
       }
   }

   8. Deploy the user code pipeline including the flag:
   --experiments=enable_custom_pubsub_sink



On Thu, Jul 21, 2022 at 4:42 PM Evan Galpin <[email protected]> wrote:

> Thanks Tomo, I'll check that out too as a good safeguard!  Are you
> familiar with any process to build pre-release artifacts?  I suppose that's
> really what I'm after is building a pre-release version of pubsubIO to
> validate in Dataflow.
>
> - Evan
>
>
> On Thu, Jul 21, 2022 at 4:21 PM Tomo Suzuki via dev <[email protected]>
> wrote:
>
>> I don't come up with a solution (I'm not familiar with the method
>> you're using). However I often use "getProtectionDomain()"
>> https://stackoverflow.com/a/56000383/975074 to find the JAR file from a
>> class. This ensures the class you modified is actually used.
>>
>> On Thu, Jul 21, 2022 at 3:35 PM Evan Galpin <[email protected]> wrote:
>>
>>> Spoke too soon... still can't seem to get the new behaviour to appear in
>>> dataflow, possibly something is being overridden?
>>>
>>> On Thu, Jul 21, 2022 at 3:15 PM Evan Galpin <[email protected]> wrote:
>>>
>>>> Making a shadowJar from "beam-sdks-java-io-google-cloud-platform" looks
>>>> to be working. Added `  id 'com.github.johnrengelman.shadow'` to
>>>> `build.gradle` for "beam-sdks-java-io-google-cloud-platform" in the beam
>>>> source and used the resulting jar as a dependency replacement when
>>>> deploying the job to dataflow.  Looks ok.
>>>>
>>>> On Thu, Jul 21, 2022 at 3:02 PM Evan Galpin <[email protected]> wrote:
>>>>
>>>>> I believe I have the dependencySubstitution working, but it seems as
>>>>> though the substitution is removing transitive deps of
>>>>> "beam-sdks-java-io-google-cloud-platform", hmm...
>>>>>
>>>>> On Thu, Jul 21, 2022 at 1:15 PM Evan Galpin <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm trying to test a change I've made locally, but by validating it
>>>>>> on Dataflow.  It works locally, but I want to validate on Dataflow.  I've
>>>>>> tried a few different attempts at module substitution in the build.gradle
>>>>>> config file for the pipeline I'm trying to deploy, but I haven't had any
>>>>>> success yet.
>>>>>>
>>>>>> How might I be able to replace the
>>>>>> beam-sdks-java-io-google-cloud-platform module usually installed from 
>>>>>> maven
>>>>>> with a local jar generated from running:
>>>>>>
>>>>>> "./gradlew :sdk:java:io:google-cloud-platform:jar"
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>
>>
>> --
>> Regards,
>> Tomo
>>
>

Reply via email to