Re: Question Regarding State Migrations in Ververica Platform

2022-08-31 Thread Rion Williams
+dev

> On Aug 30, 2022, at 11:20 AM, Rion Williams  wrote:
> 
> 
> Hi all,
> 
> I wasn't sure if this would be the best audience, if not, please advise if 
> you know of a better place to ask it. I figured that at least some folks here 
> either work for Ververica or might have used their platform.
> 
> tl;dr; I'm trying to migrate an existing stateful Flink job to run in 
> Ververica Platform (Community) and I'm noticing that it doesn't seem that all 
> of the state is being properly handed off (only _metadata).
> 
> I'm currently in the process of migrating an existing Flink job that is 
> running in Kubernetes on its own to run within the Ververica platform. The 
> issue here is that the job itself is stateful, so I want to ensure I can 
> migrate over that state so when the new job kicks off, it's a fairly seamless 
> transition.
> 
> Basically, what I've done up to this point is create a script as part of the 
> Ververica platform deployment that will:
> Check for the existence of any of the known jobs that have been migrated.
> If one is found, it will stop the job, taking a full savepoint, and store the 
> savepoint path within a configmap for that job used solely for migration 
> purposes.
> If one is not found, it will assume the job has been migrated.
> Create a Deployment for each of the new jobs, pointing to the appropriate 
> configuration, jars, etc.
> Check for the presence of one of the previous migration configmaps and issue 
> a request to create a savepoint for that deployment.
> This involves using the Ververica REST API to grab the appropriate deployment 
> information and issuing a request to the Savepoints endpoint of the same REST 
> API to "add" the savepoint.
> I've confirmed the above "works" and indeed stops any legacy jobs, creates 
> the resources (i.e. configmaps) used for the migration, starts up the new job 
> within Ververica and I can see evidence within the UI that a savepoint was 
> "COPIED" for that deployment.
> 
> However, when comparing (in GCS) the previous savepoint for the old job and 
> the one now managed by Ververica for the job, I notice that the new one only 
> contains a single _metadata file:
> 
> 
> 
> Whereas the previous contained a metadata file and another related data file:
> 
> 
> This leads me to believe that the new job might not know about any items 
> previously stored in state, which could be problematic.
> 
> When reviewing over the documentation for "manually adding a savepoint" for 
> Ververica Platform 2.6, I noticed that the payload to the Savepoints endpoint 
> looked like the following, which was what I used:
> metadata:
>   deploymentId: ${deploymentId}
>   annotations:
> com.dataartisans.appmanager.controller.deployment.spec.version: 
> ${deploymentSpecVersion}
>   type: ${type} (used FULL in my case)
> spec:
>   savepointLocation:  ${savepointLocation}
>   flinkSavepointId: ----
> status:
>   state: COMPLETED
> 
> The empty UUID was a bit concerning and I was curious if that might be the 
> reason my additional data files didn't come across from the savepoint as well 
> (I noticed in 2.7 this is an optional argument in the payload). I don't see 
> much more for any additional configuration that would otherwise specify to 
> pull everything including _metadata.
> 
> Any ideas or guidance would be helpful. 
> 
> Rion
> 
> 
> 
> 


Question Regarding State Migrations in Ververica Platform

2022-08-30 Thread Rion Williams
Hi all,

I wasn't sure if this would be the best audience, if not, please advise if
you know of a better place to ask it. I figured that at least some folks
here either work for Ververica or might have used their platform.

*tl;dr; I'm trying to migrate an existing stateful Flink job to run in
Ververica Platform (Community) and I'm noticing that it doesn't seem that
all of the state is being properly handed off (only _metadata).*

I'm currently in the process of migrating an existing Flink job that is
running in Kubernetes on its own to run within the Ververica platform. The
issue here is that the job itself is stateful, so I want to ensure I can
migrate over that state so when the new job kicks off, it's a fairly
seamless transition.

Basically, what I've done up to this point is create a script as part of
the Ververica platform deployment that will:

   1. Check for the existence of any of the known jobs that have been
   migrated.
  - If one is found, it will stop the job, taking a full savepoint, and
  store the savepoint path within a configmap for that job used solely for
  migration purposes.
  - If one is not found, it will assume the job has been migrated.
   2. Create a Deployment for each of the new jobs, pointing to the
   appropriate configuration, jars, etc.
   3. Check for the presence of one of the previous migration configmaps
   and issue a request to create a savepoint for that deployment.
  1. This involves using the Ververica REST API to grab the appropriate
  deployment information and issuing a request to the Savepoints
endpoint of
  the same REST API to "add" the savepoint.

I've confirmed the above "works" and indeed stops any legacy jobs, creates
the resources (i.e. configmaps) used for the migration, starts up the new
job within Ververica and I can see evidence within the UI that a savepoint
was "COPIED" for that deployment.

However, when comparing (in GCS) the previous savepoint for the old job and
the one now managed by Ververica for the job, I notice that the new one
only contains a single _metadata file:

[image: image.png]

Whereas the previous contained a metadata file and another related data
file:

[image: image.png]
This leads me to believe that the new job might not know about any items
previously stored in state, which could be problematic.

When reviewing over the documentation for "manually adding a savepoint" for
Ververica Platform 2.6
,
I noticed that the payload to the Savepoints endpoint looked like the
following, which was what I used:

metadata:
  deploymentId: ${deploymentId}
  annotations:
com.dataartisans.appmanager.controller.deployment.spec.version:
${deploymentSpecVersion}
  type: ${type} (used FULL in my case)spec:
  savepointLocation:  ${savepointLocation}
  flinkSavepointId: ----status:
  state: COMPLETED


The empty UUID was a bit concerning and I was curious if that might be the
reason my additional data files didn't come across from the savepoint as
well (I noticed in 2.7 this is an optional argument in the payload). I
don't see much more for any additional configuration that would otherwise
specify to pull everything including _metadata.

Any ideas or guidance would be helpful.

Rion