I think the goals are good:

 - be able to release fixes quicker
 - have users discover PubsubLiteIO

Just to clarify a little - a user currently has to depend on (probably)
org.apache.beam:beam-sdks-java-core,
org.apache.beam:beam-runners-direct-java,
org.apache.beam:beam-sdks-java-google-cloud-platform, and some production
runner. Without the GCP IO dependency, there will be no IDE autocomplete
anyhow. So the proposal is almost entirely to avoid the user having to add
com.google.pubsublite:google-beam-pubsublite or to depend on the nightly
snapshot.

As Luke mentioned, IOs outside of the Beam repo already exist and it is
fine. Decoupled releases are the hard part. I've had a few discussions
about decoupled releases within the same repo. It has all the same problems
whether it is in the same repo or not. In some ways it is easier outside
the repo because it removes the temptation to couple things too much. I
think getting good version compatibility test matrix and benchmarking might
be the big task here. And you'd want to have much more automation in the
release. Incidentally, fixes already do not have to be coupled with an
upgrade of all of Beam. You can have a different version for an IO. Or you
can choose the snapshot just for an IO dep. The missing piece is just the
testing mentioned. You want to be sure your new version of the IO is going
to work with old versions of the core SDK.

Regarding the circular dep; I agree that there should not be one: in your
proposal, org.apache.beam:beam-sdks-java-io-google-cloud-platform depends
on com.google.pubsublite:google-beam-pubsublite, and both of those modules
depend on org.apache.beam:beam-sdks-java-core. The core SDK does not depend
on any IO (and we should keep it this way, for sure).

But in addition to Reuven's simple idea, I have to also push on whether we
can do this the "normal" way: refer to it in docs, and have examples for
users to copy/paste/modify that already includes the needed deps. Our
current example pipelines do not serve this purpose because they are
integrated with our build system rather than being standalone, but it is
very easy to make an example "PubsubLite to blobstore" pipeline or
something, including the working pom.xml, and I expect most users would
start from that.

Kenn

On Thu, Jul 1, 2021 at 5:41 PM Reuven Lax <[email protected]> wrote:

> There already is a nightly snapshot that users can use.
>
> On Thu, Jul 1, 2021 at 5:22 PM Evan Galpin <[email protected]> wrote:
>
>> Is there any possibility of changing the build cadence allowing for
>> builds released as alpha versions or similar? It’s not too uncommon for
>> projects to have nightly builds for example. Could that help deliver fixes
>> more quickly to customers, while also avoiding the nuisances mentioned in
>> this thread?
>>
>> Thanks,
>> Evan
>>
>> On Thu, Jul 1, 2021 at 12:39 Luke Cwik <[email protected]> wrote:
>>
>>> I wouldn't say this is uncharted territory as there are Apache Beam
>>> IOs[1] that live outside of the Apache Beam git repo.
>>>
>>> The most annoying aspects will be the versioning story, i.e. users will
>>> want to use the library with different versions of Apache Beam since some
>>> people won't want to upgrade since they have something working and others
>>> will want it against the latest version since they want some feature.
>>> Apache Beam has had a pretty good track record of maintaining API
>>> compatibility so I wouldn't be too worried about that. The real issue is
>>> 3rd party dependency convergence and managing a BOM that works for your
>>> users.
>>>
>>> 1: https://github.com/SolaceProducts/solace-apache-beam
>>>
>>> On Wed, Jun 30, 2021 at 3:45 PM Ahmet Altay <[email protected]> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I think you are in a better place to make this decision. You are the
>>>> primary contributor and maintainer for this IO and you clearly know the
>>>> pubsub lite user base as well. If you think this is the best course of
>>>> action I will support that.
>>>>
>>>> That said, afaik you are moving into uncharted territory. The
>>>> questions raised here are about support, testing, discoverability,
>>>> compatibility, potential circular dependencies issues are all unknowns for
>>>> this model. You might have good answers to them, but nevertheless some of
>>>> them (or some other unknowns) might still become problematic and result in
>>>> user confusion and frustration and you will have to address those if/when
>>>> that happens.
>>>>
>>>> I like that this model still allows discoverability through Beam and by
>>>> default supports an out of the box tested version already. I guess that
>>>> will be good enough for most beam + pubsub lite users.  And I hope the
>>>> model will, as you predict, give you a quick way to address user requests.
>>>>
>>>> One more question of my own: Do you expect pubsub lite io to continue
>>>> to receive frequent updates in the long term? (For example, afaik pubsub io
>>>> no longer needs or gets frequent updates.). If not, eventually keeping the
>>>> io external might become irrelevant.
>>>>
>>>> What do you need from this community to make progress on this question?
>>>>
>>>> Ahmet
>>>>
>>>>
>>>>
>>>> On Fri, Jun 18, 2021 at 11:27 AM Daniel Collins <[email protected]>
>>>> wrote:
>>>>
>>>>> > How will this be communicated to the user?
>>>>>
>>>>> The docstring on PubsubLiteIO in beam will mention this. If they get
>>>>> the one subject to the long release cycle, that's usually okay, unless 
>>>>> they
>>>>> need recently added features/fixes. Pub/Sub Lite's documentation will 
>>>>> state
>>>>> to prefer the one from our artifact, but the expectation is the one in 
>>>>> beam
>>>>> will work fine in recent releases.
>>>>>
>>>>> > Will it just be documented somewhere that users should prefer
>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix they
>>>>> need?
>>>>>
>>>>> Yes, both in our public docs and the docstring for the beam
>>>>> PubsubLiteIO.
>>>>>
>>>>> An interesting side effect of subclassing in this way is that if the
>>>>> user adds a newer version of the PubsubLiteIO implementation-specific
>>>>> artifact in their pom, they won't actually need to make any code changes:
>>>>> the beam PubsubLiteIO will transparently refer to the new implementation
>>>>> version.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 18, 2021 at 1:53 PM Brian Hulette <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> How will this be communicated to the user? The idea is that they will
>>>>>> discover PubsubLiteIO through their IDE as you described, but that will 
>>>>>> get
>>>>>> them to the Beam one that's subject to the long release cycle. Will it 
>>>>>> just
>>>>>> be documented somewhere that users should prefer
>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO if there's a recent fix 
>>>>>> they
>>>>>> need?
>>>>>>
>>>>>> I wonder if a similar result could be achieved just by making Beam's
>>>>>> PubsubLiteIO a stub with no implementation that directs users to the
>>>>>> com.google.cloud one somehow?
>>>>>>
>>>>>> junit's matcher interface comes to mind as a precedent here. I have
>>>>>> been warned many times by
>>>>>> Matcher._dont_implement_Matcher___instead_extend_BaseMatcher_ [1].
>>>>>>
>>>>>> [1]
>>>>>> https://junit.org/junit4/javadoc/4.13/org/hamcrest/Matcher.html#_dont_implement_Matcher___instead_extend_BaseMatcher_()
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On Thu, Jun 17, 2021 at 3:56 PM Daniel Collins <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> > Question 1: How are you going to approach testing/CI?
>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>> tested as much as possible.
>>>>>>>
>>>>>>> I'd like to run the integration tests in both locations. They would
>>>>>>> only be meaningful in the beam setup when we went to validate a version
>>>>>>> bump on the I/O.
>>>>>>>
>>>>>>> > Question2 : in the code below, what is the purpose of keeping the
>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>
>>>>>>> Visibility and autocomplete. It means the core class will be in the
>>>>>>> beam javadoc and if you type `import org.apache.beam.sdk.io.gcp.pubsu` 
>>>>>>> in
>>>>>>> an IDE you'll see pubsublite and PubsubLiteIO.
>>>>>>>
>>>>>>> On Thu, Jun 17, 2021 at 5:35 PM Tomo Suzuki <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>> (You helped me apply some change to this strange setup a few months
>>>>>>>> back. Thank you for working on rectifying the situation.)
>>>>>>>>
>>>>>>>> I like that idea overall.
>>>>>>>>
>>>>>>>> Question 1: How are you going to approach testing/CI?
>>>>>>>> The pull requests in the java-pubsublite repo do not trigger Beam
>>>>>>>> repo's CI. You want to deliver things to your customers after they are
>>>>>>>> tested as much as possible.
>>>>>>>>
>>>>>>>>
>>>>>>>> Question2 : in the code below, what is the purpose of keeping the
>>>>>>>> PubsubLiteIO in the Beam repo?
>>>>>>>>
>>>>>>>> ```
>>>>>>>> class PubsubLiteIO extends
>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>> ````
>>>>>>>>
>>>>>>>> The backward compatibility came to my mind but I thought you may
>>>>>>>> have more reasons.
>>>>>>>>
>>>>>>>>
>>>>>>>> My memo:
>>>>>>>> java-pubsublite repsitory has:
>>>>>>>> https://github.com/googleapis/java-pubsublite/blob/master/pubsublite-beam-io/src/main/java/com/google/cloud/pubsublite/beam/PubsubLiteIO.java
>>>>>>>> beam repo has:
>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsublite/PubsubLiteIO.java
>>>>>>>> (and other files in the same directory)
>>>>>>>> google-cloud-pubsublite is not part of the Libraries BOM (yet)
>>>>>>>> because of its pre-1.0 status.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 17, 2021 at 5:07 PM Daniel Collins <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I don't know that the cycle would cause a problem- wouldn't it
>>>>>>>>> override and cause it to use beam-sdks-java-core:2.30.0 (at least 
>>>>>>>>> until
>>>>>>>>> beam goes to 3.X.X)?
>>>>>>>>>
>>>>>>>>> Something we can do if this is an issue is mark
>>>>>>>>> pubsublite-beam-io's dep on beam-sdks-java-core as 'provided'. But I'd
>>>>>>>>> prefer to avoid this and just let overriding fix it if that works.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 17, 2021 at 4:15 PM Andrew Pilloud <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> How do you plan to address the circular dependency? Won't this
>>>>>>>>>> end up with Beam depending on older versions of itself?
>>>>>>>>>>
>>>>>>>>>> beam-sdks-java-io-google-cloud-platform:2.30.0 ->
>>>>>>>>>> pubsublite-beam-io:0.16.0 -> beam-sdks-java-core:2.29.0
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 17, 2021 at 11:56 AM Daniel Collins <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello beam developers,
>>>>>>>>>>>
>>>>>>>>>>> I'm the primary author of the Pub/Sub Lite I/O, and I'd like to
>>>>>>>>>>> get some feedback on a change to the model for hosting this I/O in 
>>>>>>>>>>> beam.
>>>>>>>>>>> Our team has been frustrated by the fact that we have no way to 
>>>>>>>>>>> release
>>>>>>>>>>> features or fixes for bugs to customers on time scales shorter than 
>>>>>>>>>>> the 1-2
>>>>>>>>>>> months of the beam release cycle, and that those fixes are 
>>>>>>>>>>> necessarily
>>>>>>>>>>> coupled with a beam version upgrade. To work around this, I forked 
>>>>>>>>>>> the I/O
>>>>>>>>>>> in beam to our own repo about 6 months ago and have been 
>>>>>>>>>>> maintaining both
>>>>>>>>>>> copies in parallel.
>>>>>>>>>>>
>>>>>>>>>>> I'd like to retain our ability to quickly fix and improve the
>>>>>>>>>>> I/O while retaining end-user visibility within the beam repo. To do 
>>>>>>>>>>> this,
>>>>>>>>>>> I'd like to remove all the implementation from the beam repo, and 
>>>>>>>>>>> leave the
>>>>>>>>>>> I/O there implemented as:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> class PubsubLiteIO extends
>>>>>>>>>>> com.google.cloud.pubsublite.beam.PubsubLiteIO {}
>>>>>>>>>>> ````
>>>>>>>>>>> , and add a dependency on our beam artifact.
>>>>>>>>>>>
>>>>>>>>>>> This enables beam users who want to just use the
>>>>>>>>>>> beam-sdks-java-io-google-cloud-platform artifact to do so, but they 
>>>>>>>>>>> can
>>>>>>>>>>> also track the canonical version separately in our repo to get 
>>>>>>>>>>> fixes and
>>>>>>>>>>> improvements at a faster rate. All static methods from the parent 
>>>>>>>>>>> class
>>>>>>>>>>> would be available on the class in the beam repo.
>>>>>>>>>>>
>>>>>>>>>>> I'd be interested to hear anyones thoughts and suggestions
>>>>>>>>>>> surrounding this.
>>>>>>>>>>>
>>>>>>>>>>> -Daniel
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Tomo
>>>>>>>>
>>>>>>>

Reply via email to