Based on my local test, it's fine for String Schema.
On 2021/07/19 18:47:49 Devin Bost wrote: > > This leads to an IncompatibleClassChangeError when you have a Function or > > a Connector that is using Schema.JSON(Pojo.class) > > I just noticed this detail. Do we have a sense of how often people are > using Schema.JSON in Functions/Connectors? > Most of our functions are using a string schema, so it's not clear to me if > they would be impacted. > > Devin G. Bost > > > On Mon, Jul 19, 2021 at 12:41 PM Devin Bost <devin.b...@gmail.com> wrote: > > > > I think Sijie is referring to using KubernetesRuntime to deploy functions > > > where each function/source/sink runs as an independent statefulset in > > K8s. > > > In this scenario, it is possible to have fine grained control over which > > > version of the function container the function is using. > > > > Not everybody is using the KubernetesRuntime yet (especially since the > > Helm charts aren't feature-complete), and it appears that those who aren't > > running KubernetesRuntime would be impacted the most by this issue. > > > > Devin G. Bost > > > > > > On Mon, Jul 19, 2021 at 12:36 PM Devin Bost <devin.b...@gmail.com> wrote: > > > >> > For example, if you are upgrading Flink from one version to the other > >> > version, you have to make a save point in the previous version for all > >> > the Flink jobs. > >> > Upgrade the Flink cluster and resume jobs in a new version. > >> > > >> > > >> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/ > >> > > >> > So it is not unreasonable for asking people to do that when dealing > >> > with upgrading a centralized computing engine. > >> > >> One difference with Flink is that organizations running Flink in job mode > >> or application mode can upgrade jobs independently of one another, so teams > >> can upgrade jobs when they are ready without impacting other teams. In the > >> Pulsar case, Pulsar is multi-tenant, so upgrading the entire cluster would > >> break every tenant simultaneously and would block the flow of all messages > >> until all functions are upgraded. If one team takes a year to upgrade their > >> one function, the cluster could not be upgraded until that happened. Also, > >> after all the functions have been upgraded, there would be production > >> downtime while deploying all the upgraded functions, which would be a major > >> outage... It might be possible to write a script to speed up the deployment > >> to shrink the outage window, but there's currently a bug that wipes out > >> existing userConfigs when a function is upgraded, so that adds to the > >> complexity of upgrading all the functions since someone would need to know > >> all the userConfigs for all the functions. > >> > >> So, I don't think we're really comparing the same things here. > >> > >> Devin G. Bost > >> > >> > >> On Mon, Jul 19, 2021 at 12:17 PM Sijie Guo <guosi...@gmail.com> wrote: > >> > >>> On Mon, Jul 19, 2021 at 10:32 AM Jerry Peng <jerry.boyang.p...@gmail.com> > >>> wrote: > >>> > > >>> > I agree that the best we can do right now is to just clearly document > >>> this > >>> > as a potential problem when updating 2.7 to 2.8. > >>> > > >>> > We should definitely make every attempt to not make BC breaking > >>> changes. > >>> > However, there are times when we have to make these tough decisions > >>> for one > >>> > reason or another. The bigger problem I see here is not necessarily a > >>> BC > >>> > breaking change occurred, but rather we didn't know about it > >>> beforehand so > >>> > we can clearly document this caveat when 2.8 is released. Perhaps > >>> this is > >>> > where we can improve our backwards compatibility testing. We already > >>> have > >>> > some but probably not enough as highlighted by this case. > >>> > > >>> > In regards to > >>> > > >>> > This is partially correct, because you can wait to upgrade the workers > >>> pod, > >>> > > but there is no fine grained control over which version of each pod > >>> will > >>> > > be running your function, especially in a big cluster with many > >>> tenants and > >>> > > functions with this problem > >>> > > > >>> > > >>> > > >>> > I think Sijie is referring to using KubernetesRuntime to deploy > >>> functions > >>> > where each function/source/sink runs as an independent statefulset in > >>> K8s. > >>> > In this scenario, it is possible to have fine grained control over > >>> which > >>> > version of the function container the function is using. There > >>> currently > >>> > might not be tools to easily allow users to do this but using kubectl > >>> one > >>> > can definitely determine which container version is running and > >>> potentially > >>> > update the container version on a per function basis. > >>> > >>> Jerry - Thank you! That was what I meant. > >>> > >>> > > >>> > Best, > >>> > > >>> > Jerry > >>> > > >>> > On Mon, Jul 19, 2021 at 12:50 AM Enrico Olivelli <eolive...@gmail.com> > >>> > wrote: > >>> > > >>> > > Sijie, > >>> > > Thank you for your feedback > >>> > > Some additional considerations inline > >>> > > > >>> > > Il Lun 19 Lug 2021, 06:47 Sijie Guo <guosi...@gmail.com> ha scritto: > >>> > > > >>> > > > I don't think this is a big problem. Because people can recompile > >>> the > >>> > > > function and submit the function. Most of the computing/streaming > >>> > > > engines ask users to recompile the jobs and resubmit the jobs when > >>> it > >>> > > > upgrades to a new version. > >>> > > > >>> > > > >>> > > Unfortunately this is not easily feasible if the org that is > >>> managing the > >>> > > Pulsar service is different from the org who is developing the > >>> Functions. > >>> > > And especially it is quite impossible to prevent service > >>> interruption. > >>> > > > >>> > > BTW I believe that there is no way to fix this at this point. > >>> > > > >>> > > The best approach here is to document this > >>> > > > behavior. > >>> > > > > >>> > > > >>> > > I agree that the best thing we can do is to document this > >>> requirement. > >>> > > > >>> > > Therefore we must ensure in the future that we won't fall again into > >>> this > >>> > > kind of issues. > >>> > > > >>> > > Pulsar is becoming more and more used by large enterprises and > >>> backward > >>> > > compatibility is a big value. > >>> > > > >>> > > Fortunately not all the Functions need rebuilding. > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > Also, if you are using Kubernetes runtime to schedule functions, > >>> you > >>> > > > are not really impacted. > >>> > > > > >>> > > > >>> > > This is partially correct, because you can wait to upgrade the > >>> workers pod, > >>> > > but there is no fine grained control over which version of each pod > >>> will > >>> > > be running your function, especially in a big cluster with many > >>> tenants and > >>> > > functions with this problem > >>> > > > >>> > > > >>> > > Enrico > >>> > > > >>> > > > >>> > > > - Sijie > >>> > > > > >>> > > > On Fri, Jul 16, 2021 at 2:44 AM Enrico Olivelli < > >>> eolive...@gmail.com> > >>> > > > wrote: > >>> > > > > > >>> > > > > Hello, > >>> > > > > I have reported this issue [1] about upgrading from Pulsar 2.7 > >>> to 2.8. > >>> > > > > More information is on the ticket, but the short version of the > >>> story > >>> > > is > >>> > > > > that > >>> > > > > in Pulsar 2.8 we introduced a breaking change in the Schema API, > >>> by > >>> > > > > switching SchemaInfo from a class to an interface. > >>> > > > > > >>> > > > > This leads to an IncompatibleClassChangeError when you have a > >>> Function > >>> > > > or > >>> > > > > a Connector that is using Schema.JSON(Pojo.class) and you > >>> upgrade your > >>> > > > > Pulsar cluster (the functions worker pod for instance) from > >>> Pulsar > >>> > > 2.7.x > >>> > > > to > >>> > > > > Pulsar 2.8.0. > >>> > > > > > >>> > > > > The bad problem is that you cannot upgrade Pulsar without > >>> interrupting > >>> > > > the > >>> > > > > service and coordinating with the upgrade of the Functions. > >>> > > > > Your functions need to be recompiled against the Pulsar 2.8 API > >>> and > >>> > > > > deployed again in production. > >>> > > > > > >>> > > > > I have tried to move back SchemaInfo to an "abstract class" but > >>> without > >>> > > > > success, because then you fall into errors. > >>> > > > > > >>> > > > > I am not sure there is a way to provide a good "upgrade path" for > >>> > > > > Functions/IO users. > >>> > > > > > >>> > > > > If we do not find a way we have to document the upgrade in the > >>> official > >>> > > > > Pulsar Documentation. > >>> > > > > > >>> > > > > We must do our best to prevent users from falling again into > >>> this bad > >>> > > > > situation. > >>> > > > > > >>> > > > > Any suggestions or thoughts ? > >>> > > > > > >>> > > > > Regards > >>> > > > > Enrico > >>> > > > > > >>> > > > > [1] https://github.com/apache/pulsar/issues/11338 > >>> > > > > >>> > > > >>> > >> >