Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Galen Warren via user
Gotcha, makes sense as to the original division.

>> Can this be solved by simply passing in the path to the artifacts

This definitely works if we're going to be copying the artifacts on the
host side -- into the build context -- and then from the context into the
image. It only gets tricky to have a potentially varying path to the
artifacts if we're trying to *directly *include the artifacts in the Docker
context -- then we have a situation where the Docker context must contain
both the artifacts and playground files, with (potentially) different root
locations.

Maybe the simplest thing to do here is just to leave the playground as-is
and then copy the artifacts into the Docker context manually, prior to
building the playground images. I'm fine with that. It will mean that each
Statefun release will require two PRs and two sets of build/publish steps
instead of one, but if everyone else is fine with that I am, too. Unless
anyone objects, I'll go ahead and queue up a PR for the playground that
makes these changes.

Also, I should mention -- in case it's not clear -- that I have already
built and run the playground examples with the code from the PR and
everything worked. So that PR is ready to move forward with review, etc.,
at this point.

Thanks.







On Fri, Aug 18, 2023 at 4:16 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Galen,
>
> The original intent of having a separate repo for the playground repo, was
> that StateFun users can just go to that and start running simple examples
> without any other distractions from the core code. I personally don't have
> a strong preference here and can understand how it would make the workflow
> more streamlined, but just FYI on the reasoning why are separate in the
> first place.
>
> re: paths for locating StateFun artifacts.
> Can this be solved by simply passing in the path to the artifacts? As well
> as the image tag for the locally build base StateFun image. They could
> probably be environment variables.
>
> Cheers,
> Gordon
>
> On Fri, Aug 18, 2023 at 12:13 PM Galen Warren via user <
> user@flink.apache.org> wrote:
>
>> Yes, exactly! And in addition to the base Statefun jars and the jar for
>> the Java SDK, it does an equivalent copy/register operation for each of the
>> other SDK libraries (Go, Python, Javascript) so that those libraries are
>> also available when building the playground examples.
>>
>> One more question: In order to copy the various build artifacts into the
>> Docker containers, those artifacts need to be part of the Docker context.
>> With the playground being a separate project, that's slightly tricky to do,
>> as there is no guarantee (other than convention) about the relative paths
>> of *flink-statefun* and* flink-statefun-playground *in someone's local
>> filesystem. The way I've set this up locally is to copy the playground into
>> the* flink-statefun* project -- i.e. to *flink-statefun*/playground --
>> such that I can set the Docker context to the root folder of
>> *flink-statefun* and then have access to any local code and/or build
>> artifacts.
>>
>> I'm wondering if there might be any appetite for making that move
>> permanent, i.e. moving the playground to *flink-statefun*/playground and
>> deprecating the standalone playground project. In addition to making the
>> problem of building with unreleased artifacts a bit simpler to solve, it
>> would also simplify the process of releasing a new Statefun version, since
>> the entire process could be handled with a single PR and associated
>> build/deploy tasks. In other words, a single PR could both update and
>> deploy the Statefun package and the playground code and images.
>>
>> As it stands, at least two PRs would be required for each Statefun
>> version update -- one for *flink-statefun* and one for
>> *flink-statefun-playground*.
>>
>> Anyway, just an idea. Maybe there's an important reason for these
>> projects to remain separate. If we do want to keep the playground project
>> where it is, I could solve the copying problem by requiring the two
>> projects to be siblings in the file system and by pre-copying the local
>> build artifacts into a location accessible by the existing Docker contexts.
>> This would still leave us with the need to have two PRs and releases
>> instead of one, though.
>>
>> Thanks for your help!
>>
>>
>> On Fri, Aug 18, 2023 at 2:45 PM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi Galen,
>>>
>>> > locally built code is copied into the build containers
>>> so that it can be accessed during the build.
>>>
>>> That's exactly what we had been doing for release testing, yes. Sorry I
>>> missed that detail in my previous response.
>>>
>>> And yes, that sounds like a reasonable approach. If I understand you
>>> correctly, the workflow would become this:
>>>
>>>1. Build the StateFun repo locally to install the snapshot artifact
>>>jars + have a local base StateFun image.
>>>2. Run the playground in "local" mode, so that it uses the local base
>>>

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
Hi Galen,

The original intent of having a separate repo for the playground repo, was
that StateFun users can just go to that and start running simple examples
without any other distractions from the core code. I personally don't have
a strong preference here and can understand how it would make the workflow
more streamlined, but just FYI on the reasoning why are separate in the
first place.

re: paths for locating StateFun artifacts.
Can this be solved by simply passing in the path to the artifacts? As well
as the image tag for the locally build base StateFun image. They could
probably be environment variables.

Cheers,
Gordon

On Fri, Aug 18, 2023 at 12:13 PM Galen Warren via user <
user@flink.apache.org> wrote:

> Yes, exactly! And in addition to the base Statefun jars and the jar for
> the Java SDK, it does an equivalent copy/register operation for each of the
> other SDK libraries (Go, Python, Javascript) so that those libraries are
> also available when building the playground examples.
>
> One more question: In order to copy the various build artifacts into the
> Docker containers, those artifacts need to be part of the Docker context.
> With the playground being a separate project, that's slightly tricky to do,
> as there is no guarantee (other than convention) about the relative paths
> of *flink-statefun* and* flink-statefun-playground *in someone's local
> filesystem. The way I've set this up locally is to copy the playground into
> the* flink-statefun* project -- i.e. to *flink-statefun*/playground --
> such that I can set the Docker context to the root folder of
> *flink-statefun* and then have access to any local code and/or build
> artifacts.
>
> I'm wondering if there might be any appetite for making that move
> permanent, i.e. moving the playground to *flink-statefun*/playground and
> deprecating the standalone playground project. In addition to making the
> problem of building with unreleased artifacts a bit simpler to solve, it
> would also simplify the process of releasing a new Statefun version, since
> the entire process could be handled with a single PR and associated
> build/deploy tasks. In other words, a single PR could both update and
> deploy the Statefun package and the playground code and images.
>
> As it stands, at least two PRs would be required for each Statefun version
> update -- one for *flink-statefun* and one for *flink-statefun-playground*
> .
>
> Anyway, just an idea. Maybe there's an important reason for these projects
> to remain separate. If we do want to keep the playground project where it
> is, I could solve the copying problem by requiring the two projects to be
> siblings in the file system and by pre-copying the local build artifacts
> into a location accessible by the existing Docker contexts. This would
> still leave us with the need to have two PRs and releases instead of one,
> though.
>
> Thanks for your help!
>
>
> On Fri, Aug 18, 2023 at 2:45 PM Tzu-Li (Gordon) Tai 
> wrote:
>
>> Hi Galen,
>>
>> > locally built code is copied into the build containers
>> so that it can be accessed during the build.
>>
>> That's exactly what we had been doing for release testing, yes. Sorry I
>> missed that detail in my previous response.
>>
>> And yes, that sounds like a reasonable approach. If I understand you
>> correctly, the workflow would become this:
>>
>>1. Build the StateFun repo locally to install the snapshot artifact
>>jars + have a local base StateFun image.
>>2. Run the playground in "local" mode, so that it uses the local base
>>StateFun image + builds the playground code using copied artifact jars
>>(instead of pulling from Maven).
>>
>> That looks good to me!
>>
>> Thanks,
>> Gordon
>>
>> On Fri, Aug 18, 2023 at 11:33 AM Galen Warren
>>  wrote:
>>
>> > Thanks.
>> >
>> > If you were to build a local image, as you suggest, how do you access
>> that
>> > image when building the playground images? All the compilation of
>> > playground code happens inside containers, so local images on the host
>> > aren't available in those containers. Unless I'm missing something?
>> >
>> > I've slightly reworked things such that the playground images can be
>> run in
>> > one of two modes -- the default mode, which works like before, and a
>> > "local" mode where locally built code is copied into the build
>> containers
>> > so that it can be accessed during the build. It works fine, you just
>> have
>> > to define a couple of environment variables when running docker-compose
>> to
>> > specify default vs. local mode and what versions of Flink and Statefun
>> > should be referenced, and then you can build a run the local examples
>> > without any additional steps. Does that sound like a reasonable
>> approach?
>> >
>> >
>> > On Fri, Aug 18, 2023 at 2:17 PM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> > wrote:
>> >
>> > > Hi Galen,
>> > >
>> > > > Gordon, is there a trick to running the sample code in
>> > > flink-statefun-playground against 

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Galen Warren via user
Yes, exactly! And in addition to the base Statefun jars and the jar for the
Java SDK, it does an equivalent copy/register operation for each of the
other SDK libraries (Go, Python, Javascript) so that those libraries are
also available when building the playground examples.

One more question: In order to copy the various build artifacts into the
Docker containers, those artifacts need to be part of the Docker context.
With the playground being a separate project, that's slightly tricky to do,
as there is no guarantee (other than convention) about the relative paths
of *flink-statefun* and* flink-statefun-playground *in someone's local
filesystem. The way I've set this up locally is to copy the playground into
the* flink-statefun* project -- i.e. to *flink-statefun*/playground -- such
that I can set the Docker context to the root folder of *flink-statefun*
and then have access to any local code and/or build artifacts.

I'm wondering if there might be any appetite for making that move
permanent, i.e. moving the playground to *flink-statefun*/playground and
deprecating the standalone playground project. In addition to making the
problem of building with unreleased artifacts a bit simpler to solve, it
would also simplify the process of releasing a new Statefun version, since
the entire process could be handled with a single PR and associated
build/deploy tasks. In other words, a single PR could both update and
deploy the Statefun package and the playground code and images.

As it stands, at least two PRs would be required for each Statefun version
update -- one for *flink-statefun* and one for *flink-statefun-playground*.

Anyway, just an idea. Maybe there's an important reason for these projects
to remain separate. If we do want to keep the playground project where it
is, I could solve the copying problem by requiring the two projects to be
siblings in the file system and by pre-copying the local build artifacts
into a location accessible by the existing Docker contexts. This would
still leave us with the need to have two PRs and releases instead of one,
though.

Thanks for your help!


On Fri, Aug 18, 2023 at 2:45 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Galen,
>
> > locally built code is copied into the build containers
> so that it can be accessed during the build.
>
> That's exactly what we had been doing for release testing, yes. Sorry I
> missed that detail in my previous response.
>
> And yes, that sounds like a reasonable approach. If I understand you
> correctly, the workflow would become this:
>
>1. Build the StateFun repo locally to install the snapshot artifact
>jars + have a local base StateFun image.
>2. Run the playground in "local" mode, so that it uses the local base
>StateFun image + builds the playground code using copied artifact jars
>(instead of pulling from Maven).
>
> That looks good to me!
>
> Thanks,
> Gordon
>
> On Fri, Aug 18, 2023 at 11:33 AM Galen Warren
>  wrote:
>
> > Thanks.
> >
> > If you were to build a local image, as you suggest, how do you access
> that
> > image when building the playground images? All the compilation of
> > playground code happens inside containers, so local images on the host
> > aren't available in those containers. Unless I'm missing something?
> >
> > I've slightly reworked things such that the playground images can be run
> in
> > one of two modes -- the default mode, which works like before, and a
> > "local" mode where locally built code is copied into the build containers
> > so that it can be accessed during the build. It works fine, you just have
> > to define a couple of environment variables when running docker-compose
> to
> > specify default vs. local mode and what versions of Flink and Statefun
> > should be referenced, and then you can build a run the local examples
> > without any additional steps. Does that sound like a reasonable approach?
> >
> >
> > On Fri, Aug 18, 2023 at 2:17 PM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > Hi Galen,
> > >
> > > > Gordon, is there a trick to running the sample code in
> > > flink-statefun-playground against yet-unreleased code that I'm missing?
> > >
> > > You'd have to locally build an image from the release branch, with a
> > > temporary image version tag. Then, in the flink-statefun-playground,
> > change
> > > the image versions in the docker-compose files to use that locally
> built
> > > image. IIRC, that's what we have been doing in the past. Admittedly,
> it's
> > > pretty manual - I don't think the CI manages this workflow.
> > >
> > > Thanks,
> > > Gordon
> > >
> > > On Mon, Aug 14, 2023 at 10:42 AM Galen Warren  >
> > > wrote:
> > >
> > > > I created a pull request for this: [FLINK-31619] Upgrade Stateful
> > > > Functions to Flink 1.16.1 by galenwarren · Pull Request #331 ·
> > > > apache/flink-statefun (github.com)
> > > > .
> > > >
> > > > JIRA is here: [FLINK-31619] Upgrade Stateful Functions to Flink
> 1.16.1
> > -

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
Hi Galen,

> locally built code is copied into the build containers
so that it can be accessed during the build.

That's exactly what we had been doing for release testing, yes. Sorry I
missed that detail in my previous response.

And yes, that sounds like a reasonable approach. If I understand you
correctly, the workflow would become this:

   1. Build the StateFun repo locally to install the snapshot artifact
   jars + have a local base StateFun image.
   2. Run the playground in "local" mode, so that it uses the local base
   StateFun image + builds the playground code using copied artifact jars
   (instead of pulling from Maven).

That looks good to me!

Thanks,
Gordon

On Fri, Aug 18, 2023 at 11:33 AM Galen Warren
 wrote:

> Thanks.
>
> If you were to build a local image, as you suggest, how do you access that
> image when building the playground images? All the compilation of
> playground code happens inside containers, so local images on the host
> aren't available in those containers. Unless I'm missing something?
>
> I've slightly reworked things such that the playground images can be run in
> one of two modes -- the default mode, which works like before, and a
> "local" mode where locally built code is copied into the build containers
> so that it can be accessed during the build. It works fine, you just have
> to define a couple of environment variables when running docker-compose to
> specify default vs. local mode and what versions of Flink and Statefun
> should be referenced, and then you can build a run the local examples
> without any additional steps. Does that sound like a reasonable approach?
>
>
> On Fri, Aug 18, 2023 at 2:17 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi Galen,
> >
> > > Gordon, is there a trick to running the sample code in
> > flink-statefun-playground against yet-unreleased code that I'm missing?
> >
> > You'd have to locally build an image from the release branch, with a
> > temporary image version tag. Then, in the flink-statefun-playground,
> change
> > the image versions in the docker-compose files to use that locally built
> > image. IIRC, that's what we have been doing in the past. Admittedly, it's
> > pretty manual - I don't think the CI manages this workflow.
> >
> > Thanks,
> > Gordon
> >
> > On Mon, Aug 14, 2023 at 10:42 AM Galen Warren 
> > wrote:
> >
> > > I created a pull request for this: [FLINK-31619] Upgrade Stateful
> > > Functions to Flink 1.16.1 by galenwarren · Pull Request #331 ·
> > > apache/flink-statefun (github.com)
> > > .
> > >
> > > JIRA is here: [FLINK-31619] Upgrade Stateful Functions to Flink 1.16.1
> -
> > > ASF JIRA (apache.org)
> > > .
> > >
> > > Statefun references 1.16.2, despite the title -- that version has come
> > out
> > > since the issue was created.
> > >
> > > I figured out how to run all the playground tests locally, but it took
> a
> > > bit of reworking of the playground setup with respect to Docker;
> > > specifically, the Docker contexts used to build the example functions
> > > needed to be broadened (i.e. moved up the tree) so that, if needed,
> local
> > > artifacts/code can be accessed from within the containers at build
> time.
> > > Then I made the Docker compose.yml configurable through environment
> > > variables to allow for them to run in either the original manner --
> i.e.
> > > pulling artifacts from public repos -- or in a "local" mode, where
> > > artifacts are pulled from local builds.
> > >
> > > This process is a cleaner if the playground is a subfolder of the
> > > flink-statefun project rather than be its own repository
> > > (flink-statefun-playground), because then all the relative paths
> between
> > > the playground files and the build artifacts are fixed. So, I'd like to
> > > propose to move the playground files, modified as described above, to
> > > flink-statefun/playground and retire flink-statefun-playground. I can
> > > submit separate PR s those changes if everyone is on board.
> > >
> > > Also, should I plan to do the same upgrade to handle Flink 1.17.x? It
> > > should be easy to do, especially while the 1.16.x upgrade is fresh on
> my
> > > mind.
> > >
> > > Thanks.
> > >
> > >
> > > On Fri, Aug 11, 2023 at 6:40 PM Galen Warren 
> > > wrote:
> > >
> > >> I'm done with the code to make Statefun compatible with Flink 1.16,
> and
> > >> all the tests (including e2e succeed). The required changes were
> pretty
> > >> minimal.
> > >>
> > >> I'm running into a bit of a chicken/egg problem executing the tests in
> > >> flink-statefun-playground
> > >> , though. That
> > >> project seems to assume that all the various Statefun artifacts are
> > built
> > >> and deployed to the various public repositories already. I've looked
> > into
> > >> trying to redirect references to local artifacts; however, that's also
> > >> tricky since 

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Galen Warren via user
Thanks.

If you were to build a local image, as you suggest, how do you access that
image when building the playground images? All the compilation of
playground code happens inside containers, so local images on the host
aren't available in those containers. Unless I'm missing something?

I've slightly reworked things such that the playground images can be run in
one of two modes -- the default mode, which works like before, and a
"local" mode where locally built code is copied into the build containers
so that it can be accessed during the build. It works fine, you just have
to define a couple of environment variables when running docker-compose to
specify default vs. local mode and what versions of Flink and Statefun
should be referenced, and then you can build a run the local examples
without any additional steps. Does that sound like a reasonable approach?


On Fri, Aug 18, 2023 at 2:17 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Galen,
>
> > Gordon, is there a trick to running the sample code in
> flink-statefun-playground against yet-unreleased code that I'm missing?
>
> You'd have to locally build an image from the release branch, with a
> temporary image version tag. Then, in the flink-statefun-playground, change
> the image versions in the docker-compose files to use that locally built
> image. IIRC, that's what we have been doing in the past. Admittedly, it's
> pretty manual - I don't think the CI manages this workflow.
>
> Thanks,
> Gordon
>
> On Mon, Aug 14, 2023 at 10:42 AM Galen Warren 
> wrote:
>
> > I created a pull request for this: [FLINK-31619] Upgrade Stateful
> > Functions to Flink 1.16.1 by galenwarren · Pull Request #331 ·
> > apache/flink-statefun (github.com)
> > .
> >
> > JIRA is here: [FLINK-31619] Upgrade Stateful Functions to Flink 1.16.1 -
> > ASF JIRA (apache.org)
> > .
> >
> > Statefun references 1.16.2, despite the title -- that version has come
> out
> > since the issue was created.
> >
> > I figured out how to run all the playground tests locally, but it took a
> > bit of reworking of the playground setup with respect to Docker;
> > specifically, the Docker contexts used to build the example functions
> > needed to be broadened (i.e. moved up the tree) so that, if needed, local
> > artifacts/code can be accessed from within the containers at build time.
> > Then I made the Docker compose.yml configurable through environment
> > variables to allow for them to run in either the original manner -- i.e.
> > pulling artifacts from public repos -- or in a "local" mode, where
> > artifacts are pulled from local builds.
> >
> > This process is a cleaner if the playground is a subfolder of the
> > flink-statefun project rather than be its own repository
> > (flink-statefun-playground), because then all the relative paths between
> > the playground files and the build artifacts are fixed. So, I'd like to
> > propose to move the playground files, modified as described above, to
> > flink-statefun/playground and retire flink-statefun-playground. I can
> > submit separate PR s those changes if everyone is on board.
> >
> > Also, should I plan to do the same upgrade to handle Flink 1.17.x? It
> > should be easy to do, especially while the 1.16.x upgrade is fresh on my
> > mind.
> >
> > Thanks.
> >
> >
> > On Fri, Aug 11, 2023 at 6:40 PM Galen Warren 
> > wrote:
> >
> >> I'm done with the code to make Statefun compatible with Flink 1.16, and
> >> all the tests (including e2e succeed). The required changes were pretty
> >> minimal.
> >>
> >> I'm running into a bit of a chicken/egg problem executing the tests in
> >> flink-statefun-playground
> >> , though. That
> >> project seems to assume that all the various Statefun artifacts are
> built
> >> and deployed to the various public repositories already. I've looked
> into
> >> trying to redirect references to local artifacts; however, that's also
> >> tricky since all the building occurs in Docker containers.
> >>
> >> Gordon, is there a trick to running the sample code in
> >> flink-statefun-playground against yet-unreleased code that I'm missing?
> >>
> >> Thanks.
> >>
> >> On Sat, Jun 24, 2023 at 12:40 PM Galen Warren 
> >> wrote:
> >>
> >>> Great -- thanks!
> >>>
> >>> I'm going to be out of town for about a week but I'll take a look at
> >>> this when I'm back.
> >>>
> >>> On Tue, Jun 20, 2023 at 8:46 AM Martijn Visser 
> >>> wrote:
> >>>
>  Hi Galen,
> 
>  Yes, I'll be more than happy to help with Statefun releases.
> 
>  Best regards,
> 
>  Martijn
> 
>  On Tue, Jun 20, 2023 at 2:21 PM Galen Warren  >
>  wrote:
> 
> > Thanks.
> >
> > Martijn, to answer your question, I'd need to do a small amount of
> > work to get a PR ready, but not much. Happy to do it if we're
> deciding to
> > restart Statefun releases -- are 

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
Hi Galen,

> Gordon, is there a trick to running the sample code in
flink-statefun-playground against yet-unreleased code that I'm missing?

You'd have to locally build an image from the release branch, with a
temporary image version tag. Then, in the flink-statefun-playground, change
the image versions in the docker-compose files to use that locally built
image. IIRC, that's what we have been doing in the past. Admittedly, it's
pretty manual - I don't think the CI manages this workflow.

Thanks,
Gordon

On Mon, Aug 14, 2023 at 10:42 AM Galen Warren 
wrote:

> I created a pull request for this: [FLINK-31619] Upgrade Stateful
> Functions to Flink 1.16.1 by galenwarren · Pull Request #331 ·
> apache/flink-statefun (github.com)
> .
>
> JIRA is here: [FLINK-31619] Upgrade Stateful Functions to Flink 1.16.1 -
> ASF JIRA (apache.org)
> .
>
> Statefun references 1.16.2, despite the title -- that version has come out
> since the issue was created.
>
> I figured out how to run all the playground tests locally, but it took a
> bit of reworking of the playground setup with respect to Docker;
> specifically, the Docker contexts used to build the example functions
> needed to be broadened (i.e. moved up the tree) so that, if needed, local
> artifacts/code can be accessed from within the containers at build time.
> Then I made the Docker compose.yml configurable through environment
> variables to allow for them to run in either the original manner -- i.e.
> pulling artifacts from public repos -- or in a "local" mode, where
> artifacts are pulled from local builds.
>
> This process is a cleaner if the playground is a subfolder of the
> flink-statefun project rather than be its own repository
> (flink-statefun-playground), because then all the relative paths between
> the playground files and the build artifacts are fixed. So, I'd like to
> propose to move the playground files, modified as described above, to
> flink-statefun/playground and retire flink-statefun-playground. I can
> submit separate PR s those changes if everyone is on board.
>
> Also, should I plan to do the same upgrade to handle Flink 1.17.x? It
> should be easy to do, especially while the 1.16.x upgrade is fresh on my
> mind.
>
> Thanks.
>
>
> On Fri, Aug 11, 2023 at 6:40 PM Galen Warren 
> wrote:
>
>> I'm done with the code to make Statefun compatible with Flink 1.16, and
>> all the tests (including e2e succeed). The required changes were pretty
>> minimal.
>>
>> I'm running into a bit of a chicken/egg problem executing the tests in
>> flink-statefun-playground
>> , though. That
>> project seems to assume that all the various Statefun artifacts are built
>> and deployed to the various public repositories already. I've looked into
>> trying to redirect references to local artifacts; however, that's also
>> tricky since all the building occurs in Docker containers.
>>
>> Gordon, is there a trick to running the sample code in
>> flink-statefun-playground against yet-unreleased code that I'm missing?
>>
>> Thanks.
>>
>> On Sat, Jun 24, 2023 at 12:40 PM Galen Warren 
>> wrote:
>>
>>> Great -- thanks!
>>>
>>> I'm going to be out of town for about a week but I'll take a look at
>>> this when I'm back.
>>>
>>> On Tue, Jun 20, 2023 at 8:46 AM Martijn Visser 
>>> wrote:
>>>
 Hi Galen,

 Yes, I'll be more than happy to help with Statefun releases.

 Best regards,

 Martijn

 On Tue, Jun 20, 2023 at 2:21 PM Galen Warren 
 wrote:

> Thanks.
>
> Martijn, to answer your question, I'd need to do a small amount of
> work to get a PR ready, but not much. Happy to do it if we're deciding to
> restart Statefun releases -- are we?
>
> -- Galen
>
> On Sat, Jun 17, 2023 at 9:47 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org> wrote:
>
>> > Perhaps he could weigh in on whether the combination of automated
>> tests plus those smoke tests should be sufficient for testing with new
>> Flink versions
>>
>> What we usually did at the bare minimum for new StateFun releases was
>> the following:
>>
>>1. Build tests (including the smoke tests in the e2e module,
>>which covers important tests like exactly-once verification)
>>2. Updating the flink-statefun-playground repo and manually
>>running all language examples there.
>>
>> If upgrading Flink versions was the only change in the release, I'd
>> probably say that this is sufficient.
>>
>> Best,
>> Gordon
>>
>> On Thu, Jun 15, 2023 at 5:25 AM Martijn Visser <
>> martijnvis...@apache.org> wrote:
>>
>>> Let me know if you have a PR for a Flink update :)
>>>
>>> On Thu, Jun 8, 2023 at 5:52 PM Galen Warren via user <
>>> user@flink.apache.org> wrote:
>>>
 Thanks 

RE: [E] RE: Recommendations on using multithreading in flink map functions in java

2023-08-18 Thread Schwalbe Matthias
… mirrored back to user list …

On additional thing you can do is to not split into 10 additional tasks but:

  *   Fan-out your original event into 10 copies (original key, 
1-of-10-algorithm-key, event),
  *   key by the combined key (original key, algorithm key)
  *   have a single operator chain that internally switches by algorithm key
  *   then collect by event id to enrich a final result
  *   much like mentioned in [1]
This made all the difference for us with orders of magnitude better overall 
latency and backpressure because we avoided multiple layers of parallelism (job 
parallelism * algorithm parallelism).

Thias

[1] Master Thesis, Dominik Bünzli, University of Zurich, 2021: 
https://www.merlin.uzh.ch/contributionDocument/download/14168


From: Vignesh Kumar Kathiresan 
Sent: Thursday, August 17, 2023 10:27 PM
To: Schwalbe Matthias 
Cc: liu ron ; dominik.buen...@swisscom.com
Subject: Re: [E] RE: Recommendations on using multithreading in flink map 
functions in java

⚠EXTERNAL MESSAGE – CAUTION: Think Before You Click ⚠


Hello Thias,

Thanks for the explanation. The objective to achieve e2e 100 ms latency was to 
establish the latency vs ser/deser + I/O  tradeoff. 1 sec(when you execute all 
the 10 algorithms in sequence) vs ~100 ms(when you execute them in parallel).

My takeaway is that in streaming frameworks when you want to move your e2e 
latency towards the 100 ms end of the latency spectrum

  *   Separate the 10 algorithms as different tasks (unchain) so that they are 
executed in different threads
  *   Fan out the element(branch out) and send them to the each algorithm 
task(separate task)
  *   Incur the serialize/deserialize cost and try to avoid a network shuffle 
as much as possible(by having the same parallelism in all the 11 operators? so 
that they are run in a different thread but the same worker)
  *   Combine the results using some stateful process function finally.


On Wed, Aug 16, 2023 at 12:01 AM Schwalbe Matthias 
mailto:matthias.schwa...@viseca.ch>> wrote:
Hi Ron,

What you say is pretty much similar to what I’ve written , the difference is 
the focus:


  *   When you use a concurrency library, things are not necessarily running in 
parallel, they serialize/schedule the execution of tasks to the available CPU 
cores, and need synchronization
  *   i.e. you end up with total latency bigger than the 100ms (even if you’ve 
got 10 dedicated CPU cores, because of synchronization)
  *   the whole matter is affected by Amdahls law [1]

Back to your most prominent question: “What I want to know is how to achieve 
low latency per element processing”:
Strictly speaking, the only way to achieve the 100ms overall latency is to have 
10 dedicated CPU cores that don’t do anything else and avoid synchronization at 
any cost and events must have a minimum distance of 100ms .
This is not possible, but with a couple of nifty tricks you can come very close 
to it.
However another aspect is, that this practically only scales up to the maximum 
feasible number of CPU cores (512 core e.g.) in a system beyond which you 
cannot avoid serialization and synchronization.

The way I understand the design of Flink is:

  *   That the focus is on throughput with decent latency values
  *   Flink jobs can be scaled linearly in a wide range of parallelism
  *   i.e. within that range Flink does not run into the effects of Amdahls 
law, because it avoids synchronization among tasks
  *   This comes with a price: serialization efforts, and I/O cost
  *   A Flink (sub-)task is basically a message queue

 *   where incoming events sit in a buffer and are process one after the 
other (=latency),
 *   buffering incurs serialization (latency),
 *   outgoing messages for non-chained operators are also serialized and 
buffered (latency)
 *   before they get sent out to a downstream (sub-)task (configurable size 
and time triggers on buffer (latency))

  *   the difference that makes the difference is that all these forms of 
latency are linear to the number of events, i.e. the effects of Amdahls law 
don’t kick in

Independent of the runtime (Flink or non-Flink) it is good to use only a single 
means/granularity of parallelism/concurrency.
This way we avoid a lot of synchronization cost and avoid one level steal 
resources from other levels in an unpredictable way (= latency/jitter)

The solution that I proposed in my previous way does exactly this:

  *   it unifies the parallelism used for sharding (per key group parallelism) 
with the parallelism for the 10 calculation algorithms
  *   it scales linearly, avoids backpressure given enough resources, and has a 
decent overall latency (although not the 100ms)

In order to minimize serialization cost you could consider serialization 

Re: [Question] How to scale application based on 'reactive' mode

2023-08-18 Thread Gyula Fóra
Hi!

I think what you need is probably not the reactive mode but a proper
autoscaler. The reactive mode as you say doesn't do anything in itself, you
need to build a lot of logic around it.

Check this instead:
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/

The Kubernetes Operator has a built in autoscaler that can scale jobs based
on kafka data rate / processing throughput. It also doesn't rely on the
reactive mode.

Cheers,
Gyula

On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung  wrote:

> Hello,
> Sorry for frequent questions. This is a question about 'reactive' mode.
>
> 1. As far as I understand, though I've setup `scheduler-mode: reactive`,
> it will not change parallelism automatically by itself, by CPU usage or
> Kafka consumer rate. It needs additional resource monitor features (such as
> Horizontal Pod Autoscaler, or else). Is this correct?
> 2. Is it possible to create a custom resource monitor provider
> application? For example, if I want to increase/decrease parallelism by
> Kafka consumer rate, do I need to send specific API from outside, to order
> rescaling?
> 3. If 2 is correct, what is the difference when using 'reactive' mode?
> Because as far as I think, calling a specific API will rescale either using
> 'reactive' mode or not...(or is the API just working based on this mode)?
>
> Thanks.
>
> Regards
>
>


[Question] How to scale application based on 'reactive' mode

2023-08-18 Thread Dennis Jung
Hello,
Sorry for frequent questions. This is a question about 'reactive' mode.

1. As far as I understand, though I've setup `scheduler-mode: reactive`, it
will not change parallelism automatically by itself, by CPU usage or Kafka
consumer rate. It needs additional resource monitor features (such as
Horizontal Pod Autoscaler, or else). Is this correct?
2. Is it possible to create a custom resource monitor provider application?
For example, if I want to increase/decrease parallelism by Kafka consumer
rate, do I need to send specific API from outside, to order rescaling?
3. If 2 is correct, what is the difference when using 'reactive' mode?
Because as far as I think, calling a specific API will rescale either using
'reactive' mode or not...(or is the API just working based on this mode)?

Thanks.

Regards


Re: Flink throws exception when submitting a job through Jenkins and Spinnaker

2023-08-18 Thread Shammon FY
Hi Elakiya,

In general I think there would be two steps to start a job: launch jm node
including dispatcher, resource manager, and then submit sql job to
dispatcher. The dispatcher will launch a rest server, and the client will
connect to the rest server to submit a job.

>From your error message, I found the timeout exception is thrown from
`RestClusterClient` which is used to submit jobs to the rest server in
`Dispatcher`. So I suspect that the address or port for Dispatcher is
incorrect, causing the rest connection to time out. You can check the
configuration for the rest server or check whether the dispatcher is
started successfully.

Best,
Shammon FY

On Wed, Aug 16, 2023 at 2:55 PM elakiya udhayanan 
wrote:

> Hi Shammon,
>
> Thanks for your response.
>
> If it is a network issue as you have mentioned, how does it read the
> contents of the jar file, we can see that the code is read and it throws an
> error only when executing the SQL. Also can you let us know exactly what
> address could be wrong here, so that we could correct from our end.
> My other doubt is whether we should port-forward the job manager (is it
> necessary when using Kubernetes standalone) before submitting the job using
> the run command.
>
> Thanks,
> Elakiya
>
> On Mon, Aug 14, 2023 at 11:15 AM Shammon FY  wrote:
>
>> Hi,
>>
>> It seems that the client can not access the right network to submit you
>> job, maybe the address option in k8s is wrong and you can check the error
>> message in k8s log
>>
>> Best,
>> Shammon FY
>>
>> On Fri, Aug 11, 2023 at 11:40 PM elakiya udhayanan 
>> wrote:
>>
>>>
>>> Hi Team,
>>> We are using Apache Flink 1.16.1 configured as a standalone Kubernetes
>>> pod ,for one of our applications to read from confluent Kafka topics to do
>>> event correlation. We are using the flink's Table API join for the same (in
>>> SQL format).We are able to submit the job using the flink's UI. For our DEV
>>> environment , we implemented a jenkins pipeline, which downloads the jar
>>> that is required to submit the job and also creates the flink kubernetes
>>> pods and copy the downloaded jar to the flink pod's folder and uses the
>>> flink's run command to submit the job.The deployment step happens through
>>> the spinnaker webhook. We use a docker file to create the kubernetes pods,
>>> also have a docker-entrypoin.sh which has the flink run command to submit
>>> the job.
>>>
>>> Everything works fine, but when the job is getting submitted , we get
>>> the below exception.
>>>
>>> The flink run command used is
>>>
>>> *flink run  /opt/flink/lib/application-0.0.1.jar*
>>> Any help is appreciated.
>>>
>>> 
>>>  The program finished with the following exception:
>>>
>>> org.apache.flink.client.program.ProgramInvocationException: The main method 
>>> caused an error: Failed to execute sql
>>> at 
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
>>> at 
>>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
>>> at 
>>> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:98)
>>> at 
>>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:843)
>>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:240)
>>> at 
>>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1087)
>>> at 
>>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1165)
>>> at 
>>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>>> at 
>>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1165)
>>> Caused by: org.apache.flink.table.api.TableException: Failed to execute sql
>>> at 
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:867)
>>> at 
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:827)
>>> at 
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:918)
>>> at 
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:730)
>>> at com.sample.SampleStreamingApp.main(SampleStreamingApp.java:157)
>>> at 
>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>> Method)
>>> at 
>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
>>> Source)
>>> at 
>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
>>> Source)
>>> at java.base/java.lang.reflect.Method.invoke(Unknown Source)
>>> at 
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
>>> ... 8 more
>>> Caused by: 

Re: 404 Jar File Not Found w/ Web Submit Disabled

2023-08-18 Thread patricia lee
Hi Jiadong,

Thanks for the feedback.

Our team has decided to just upload jar files via CLI.


Thanks


Regards
Patricia

On Thu, Aug 17, 2023, 11:52 PM jiadong.lu  wrote:

> Hi Patricia
> Sorry for giving wrong advice. I tried the url path of "/v1/jar/upload"
> and it did respond the same 404. For now, we may sure  that
> '/jar/upload' cannot work with the configuration of `web.ui.submit=false` .
>
> If you really need to disable `web.ui.submit`configuration,
> maybe you should try @Shammon's solution.
>
> By the way, I have been submitting my Flink application by
> RestClusterClient for a long time.
>
> Or you can use the CLI by starting a subprocess.
>
> Best
> Jiadong Lu
>
> On 2023/8/17 23:07, jiadong.lu wrote:
> > Hi Patricia
> >
> > Have you tried the url path of '/v1/jars/upload' ?
> >
> > Best
> > Jiadong Lu
> >
> > On 2023/8/16 14:00, patricia lee wrote:
> >> Hi,
> >>
> >> Below are the steps that I take to replicate the issue. I have a
> >> requirement to disable both the capability to run and submit jobs in
> >> Flink Web UI and Rest Endpoint.
> >> I created a docker compose of job manager and task manager.
> >>
> >> When the property in job manager's config is set to
> >> web.submit.enabled: true
> >> I can submit a job via rest api as shown below (which is expected)
> >>
> >> Screenshot 2023-08-16 at 1.05.00 PM.png
> >>
> >> However, when I disabled the job manager's property
> >> web.submit.enabled: false
> >>
> >> Screenshot 2023-08-16 at 1.55.25 PM.png
> >>
> >> I can no longer upload a jar file via rest endpoint. According to the
> >> Apache Flink Documentation, web.submit.enable=false is only on the
> >> front end flag and would not take effect on the rest api.
> >>
> >> *Reference:*
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
> >
> >> Screenshot 2023-08-16 at 1.18.25 PM.png
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Aug 15, 2023 at 12:31 AM jiadong.lu 
> >> mailto:archzi...@gmail.com>> wrote:
> >>
> >> Hi, Patricia
> >>
> >> I think you should have a look the REST API[1].
> >>
> >>   > "Even it is disabled sessions clusters still accept jobs through
> >> REST
> >> requests (Http calls). This flag only guards the feature to upload
> >> jobs
> >> in the UI"
> >>
> >> means  you cannot upload flink application jar in the flink
> >> dashboard UI
> >> if the `web.ui.submit=false`,
> >>
> >> you still can sumbit the flink application jar by the REST API.
> >>
> >> Best
> >>
> >> Jiadong Lu.
> >>
> >> 1.
> >>
> >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/
> 
> >>
> >> On 2023/8/15 0:14, patricia lee wrote:
> >>  > Hi,
> >>  >
> >>  > Just to add, when I set back to "true" the web.ui submit
> property,
> >>  > that is when the rest endpoint /jars/upload worked again. But
> >> in the
> >>  > documentation reference:
> >>  >
> >>  >
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
> >
> >>  >
> >>  >
> >>  > Disabling the UI doesnt disable the endpoint. Is this the
> expected
> >>  > behavior?
> >>  >
> >>  > Regards,
> >>  > Patricia
> >>  >
> >>  > On Mon, Aug 14, 2023, 5:07 PM patricia lee  >> > wrote:
> >>  >
> >>  > Hi,
> >>  >
> >>  > I disabled the web.ui.submit=false, after that uploading jar
> >> files
> >>  > via rest endpoint is now throwing 404. In the documentation
> >> it says:
> >>  >
> >>  > "Even it is disabled sessions clusters still accept jobs
> >> through
> >>  > REST requests (Http calls). This flag only guards the
> >> feature to
> >>  > upload jobs in the UI"
> >>  >
> >>  > I also set the io.tmp.dirs to my specified directory.
> >>  >
> >>  >
> >>  > But I can no longer upload jar via rest endpoint.
> >>  >
> >>  >
> >>  > Regards,
> >>  > Patricia
> >>  >
> >>
>


Re: flink sql作业状态跨存储系统迁移问题

2023-08-18 Thread Tianwang Li
可以 savepoint 到 HDFS,然后配置 checkpoint 的地址为 对象存储。

我们就是 flink 支持对象存储和 HDFS。

Hangxiang Yu  于2023年8月2日周三 14:03写道:

> Hi, 我理解可以有两种方式:
> 1. 设定从某个存储集群上恢复并向另一个存储集群上快照,即设置[1]为 HDFS地址,[2] 为后面的对象存储地址
> 2. 还是在HDFS集群上启停作业,设置 savepoint 目录[3]到对象存储
>
> 关于 state processor api,目前 sql 作业确实操作起来比较困难,只能从日志里获取 uid 等信息,以及理解 sql
> 实际产生的状态才能使用;
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#execution-savepoint-path
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#state-checkpoints-dir
> [3]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#state-savepoints-dir
>
> On Sat, Jul 29, 2023 at 11:09 AM casel.chen  wrote:
>
> > 我们要将当前在Hadoop Yarn上运行的flink
> > sql作业迁移到K8S上,状态存储介质要从HDFS更换到对象存储,以便作业能够从之前保存点恢复,升级对用户无感。
> > 又因为flink作业状态文件内容中包含有绝对路径,所以不能通过物理直接复制文件的办法实现。
> >
> >
> > 查了一下官网flink state processor api目前读取状态需要传参uid和flink状态类型,但问题是flink
> > sql作业的uid是自动生成的,状态类型我们也无法得知,请问有没有遍历目录下保存的所有状态并将其另存到另一个文件系统目录下的API ?
> 感觉state
> > processor api更适合stream api写的作业,sql作业几乎无法处理。是这样么?
>
>
>
> --
> Best,
> Hangxiang.
>


-- 
**
 tivanli
**