Re: Unable to run pyflink job - NetUtils getAvailablePort Error

2022-09-06 Thread Ramana
Hi Xingbo

I have double checked on this, both the flink and pyflink versions that i
have are 1.14.4 on Jobmanager and task manager.
However, I still get this error.

Thanks
Ramana



On Tue, Sep 6, 2022, 14:23 Xingbo Huang  wrote:

> Hi Raman,
>
> This problem comes from the inconsistency between your flink version and
> pyflink version
>
> Best,
> Xingbo
>
> Ramana  于2022年9月6日周二 15:08写道:
>
>> Hello there,
>>
>> I have a pyflink setup of 1 : JobManager - 1 : Task Manager.
>>
>> Trying to run a pyflink job and no matter what i do, i get the following
>> error message.
>>
>> -
>> The program finished with the following exception:
>>
>> org.apache.flink.client.program.ProgramInvocationException: The main
>> method caused an error: java.lang.NoSuchMethodError:
>> org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;
>> 
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;
>> at
>> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonUtils.java:365)
>> at java.lang.Thread.run(Thread.java:750)
>> 
>>
>> Tried executing with some out of the box examples, yet I get the same
>> error above.
>>
>> Could anybody shed some light on why the error is occurring, and how I
>> can have it resolved?
>>
>> Appreciate any help here.
>>
>> Thanks.
>> Ramana
>> --
>> DREAM IT, DO IT
>>
>


Re: Flink upgrade path

2022-09-06 Thread Hangxiang Yu
Hi, Alexey.
You could check the state compatibility in the compatibility table.
The page includes how to upgrade and whether it is compatible among
different versions.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/upgrading/#compatibility-table

On Wed, Sep 7, 2022 at 7:04 AM Alexey Trenikhun  wrote:

> Hello,
> Can I upgade from 1.13.6 directly to 1.15.2 skipping 1.14.x ?
>
> Thanks,
> Alexey
>
>

-- 
Best,
Hangxiang.


[Flink Kubernetes Operator] FlinkSessionJob crd spec jarURI

2022-09-06 Thread Vignesh Kumar Kathiresan via user
Hi,

Have a session cluster deployed in kubernetes. Trying to submit a job
following the example given in docs.

When I give
1) spec.job.jarURI:
local:///opt/flink/examples/streaming/StateMachineExample.jar

getting
Error:  org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
Could not find a file system implementation for scheme 'local'. The scheme
is not directly supported by Flink and no Hadoop file system to support
this scheme could be loaded. For a full list of supported file systems,
please see
https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
2) when I change the scheme to file
spec.job.jarURI:
file:///opt/flink/examples/streaming/StateMachineExample.jar

getting
 Error:  java.io.FileNotFoundException:
/opt/flink/examples/streaming/TopSpeedWindowing.jar (No such file or
directory)

anything that I am missing. From the docs I can gather that I do not need
any extra fs plugin for referencing a local file system jar.


Flink upgrade path

2022-09-06 Thread Alexey Trenikhun
Hello,
Can I upgade from 1.13.6 directly to 1.15.2 skipping 1.14.x ?

Thanks,
Alexey



serviceAccount permissions issue for high availability in operator 1.1

2022-09-06 Thread Javier Vegas
I am migrating a HA standalone Kubernetes app to use the Flink operator.
The HA store is S3 using IRSA so the app needs to run with a serviceAccount
that is authorized to access S3. In standalone mode HA worked once I gave
the account permissions to edit configMaps. But when trying the operator
with the custom serviceAccount, I am getting this error:

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing:
GET at:
https://172.20.0.1/apis/apps/v1/namespaces/MYNAMESPACE/deployments/MYAPPNAME.
Message: Forbidden!Configured service account doesn't have access. Service
account may have been revoked. deployments.apps "MYAPPNAME" is forbidden:
User "system:serviceaccount:MYNAMESPACE:MYSERVICEACCOUNT" cannot get
resource "deployments" in API group "apps" in the namespace "MYNAMESPACE".

Does the serviceAccount needs additional permissions beside configMap edit
to be able to run HA using the operator?

Thanks,

Javier Vegas


[ANNOUNCE] Apache Kyuubi (Incubating) released 1.6.0-incubating

2022-09-06 Thread Nicholas Jiang
Hi all,


The Apache Kyuubi (Incubating) community is pleased to announce that

Apache Kyuubi (Incubating) 1.6.0-incubating has been released!




Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for

large-scale data processing and analytics, build on top of multiple compule

engines include Apache Spark, Apache Flink, Apache Hive, Trino.


Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface

for end-users to manipulate large-scale data with pre-programmed and

extensible Spark/Flink SQL engines.


We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses
and data lakes.


This "out-of-the-box" model minimizes the barriers and costs for end-users
to use Spark, Flink at the client side.


At the server-side, Kyuubi server and engine's multi-tenant architecture

provides the administrators a way to achieve computing resource isolation,

data security, high availability, high client concurrency, etc.


The full release notes and download links are available at:
Release Notes: https://kyuubi.apache.org/release/1.6.0-incubating.html


To learn more about Apache Kyuubi (Incubating), please see
https://kyuubi.apache.org/


Kyuubi Resources:
- Issue: https://github.com/apache/incubator-kyuubi/issues
- Mailing list: d...@kyuubi.apache.org


We would like to thank all contributors of the Kyuubi community and Incubating
community who made this release possible!



Thanks,

On behalf of Apache Kyuubi (Incubating) community

Re: Slow Tests in Flink 1.15

2022-09-06 Thread Matthias Pohl via user
Hi David,
I guess, you're referring to [1]. But as Chesnay already pointed out in the
previous thread: It would be helpful to get more insights into what exactly
your tests are executing (logs, code, ...). That would help identifying the
cause.
> Can you give us a more complete stacktrace so we can see what call in
> Flink is waiting for something?
>
> Does this happen to all of your tests?
> Can you provide us with an example that we can try ourselves? If not,
> can you describe the test structure (e.g., is it using a
> MiniClusterResource).

Matthias

[1] https://lists.apache.org/thread/yhhprwyf29kgypzzqdmjgft4qs25yyhk

On Mon, Sep 5, 2022 at 4:59 PM David Jost  wrote:

> Hi,
>
> we were going to upgrade our application from Flink 1.14.4 to Flink
> 1.15.2, when we noticed, that all our job tests, using a
> MiniClusterWithClientResource, are multiple times slower in 1.15 than
> before in 1.14. I, unfortunately, have not found mentions in that regard in
> the changelog or documentation. The slowdown is rather extreme I hope to
> find a solution to this. I saw it mentioned once in the mailing list, but
> there was no (public) outcome to it.
>
> I would appreciate any help on this. Thank you in advance.
>
> Best
>  David


Re: Where will the state be stored in the taskmanager when using rocksdbstatebend?

2022-09-06 Thread Alexander Fedulov
Well, in that case, it is similar to the situation of hitting the limits of
vertical scaling - you'll have to scale out horizontally.
You could consider sizing down the number of CPU and RAM you allocate to
each task manager, but instead increase their count (and your job's
parallelism).
It might come with its own downsides, so measure as you go. This might also
be problematic if you have significant key skew for some of your key ranges.

Best,
Alex

On Tue, Sep 6, 2022 at 8:09 AM hjw <1010445...@qq.com> wrote:

> Hi,Alexander
>
> When Flink job deployed on Native k8s, taskmanager is a Pod.The data
> directory size of a single container is limited in our company.Are there
> any idea to deal with this ?
>
> --
> Best,
> Hjw
>
>
>
> -- 原始邮件 --
> *发件人:* "Alexander Fedulov" ;
> *发送时间:* 2022年9月6日(星期二) 凌晨3:19
> *收件人:* "hjw"<1010445...@qq.com>;
> *抄送:* "user";
> *主题:* Re: Where will the state be stored in the taskmanager when using
> rocksdbstatebend?
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#state-backend-rocksdb-localdir
> Make sure to use a local SSD disk (not NFS/EBS).
>
> Best,
> Alexander Fedulov
>
> On Mon, Sep 5, 2022 at 7:24 PM hjw <1010445...@qq.com> wrote:
>
>> The EmbeddedRocksDBStateBackend holds in-flight data in a RocksDB
>>  database that is (per default) stored in the
>> TaskManager local data directories.
>> Which path does local data directories store RocksDB database in
>> TaskManager point to in operating system?
>> If the job state is very large, I think I should  take some measures to
>> deal with it.(mount a volume for local data directories store RocksDB
>> database etc...)
>>
>> thx.
>>
>> --
>> Best,
>> Hjw
>>
>


Re: Unable to run pyflink job - NetUtils getAvailablePort Error

2022-09-06 Thread Xingbo Huang
Hi Raman,

This problem comes from the inconsistency between your flink version and
pyflink version

Best,
Xingbo

Ramana  于2022年9月6日周二 15:08写道:

> Hello there,
>
> I have a pyflink setup of 1 : JobManager - 1 : Task Manager.
>
> Trying to run a pyflink job and no matter what i do, i get the following
> error message.
>
> -
> The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error: java.lang.NoSuchMethodError:
> org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;
> 
> Caused by: java.lang.NoSuchMethodError:
> org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;
> at
> org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonUtils.java:365)
> at java.lang.Thread.run(Thread.java:750)
> 
>
> Tried executing with some out of the box examples, yet I get the same
> error above.
>
> Could anybody shed some light on why the error is occurring, and how I can
> have it resolved?
>
> Appreciate any help here.
>
> Thanks.
> Ramana
> --
> DREAM IT, DO IT
>


Flink Kinesis Connector EFO not working for cross-accounts

2022-09-06 Thread Iris Grace Endozo
Hey there,

Checking if Flink kinesis connector (using 1.15.2 version) EFO can do 
cross-accounts? My configuration looks like this:
if (kinesisIamRole != null && !kinesisIamRole.isEmpty()) {
 kinesisConsumerProps.put(AWSConfigConstants.AWS_ROLE_ARN, kinesisIamRole);
 kinesisConsumerProps.put(AWSConfigConstants.AWS_CREDENTIALS_PROVIDER, 
"ASSUME_ROLE");
 kinesisConsumerProps.put(AWSConfigConstants.AWS_ROLE_SESSION_NAME, 
"flink-kinesis-kafka-connector-session");
}

if (kinesisUseEfo && kinesisEfoConsumerName != null && 
!kinesisEfoConsumerName.isBlank()) {
 kinesisConsumerProps.put(ConsumerConfigConstants.RECORD_PUBLISHER_TYPE, 
ConsumerConfigConstants.RecordPublisherType.EFO.name());
 kinesisConsumerProps.put(ConsumerConfigConstants.EFO_CONSUMER_NAME, 
kinesisEfoConsumerName);
}

And from docs I’m expecting that to just flow thru. All the IAM policies and 
permissions have been set. However, we get the ff error (xxx is the AWS account 
where the Flink job is hosted and not where the kinesis stream is):

Causedby: java.util.concurrent.ExecutionException: 
org.apache.flink.kinesis.shaded.software.amazon.awssdk.services.kinesis.model.KinesisException:
 User: 
arn:aws:sts::xxx:assumed-role/flink-flinkkinesiskafkaconnector0d95fc3-role-ap-southeast-2/aws-sdk-java-xxxis
 notauthorized to perform: kinesis:DescribeStreamSummaryonresource: 
arn:aws:kinesis:us-west-2:xxx:stream/dev-logs because no identity-based policy 
allows the kinesis:DescribeStreamSummaryaction

Cheers, Iris.

--

Iris Grace Endozo, Senior Software Engineer
Mob +61 435 108 697
E iris.end...@gmail.com


Mixed up session aggregations for same key

2022-09-06 Thread Kristinn Danielsson via user
Hi,

I'm trying to migrate a KafkaStreams application to Flink (using DataStream 
API).
The application consumes a high traffic (millions of events per second) Kafka
topic and collects events into sessions keyed by id. To reduce the load on
subsequent processing steps I want to output one event on session start and one
event on session end. So, I set up a pipeline which keys the stream by id,
aggregates the events over a event time session window with a gap of 4 seconds.
I also implemented a custom trigger to trigger when the first event
arrives in a window.

When I run this pipeline I somtimes observe that I get multiple calls to the
aggregator's "createAccumulator" method for a given session id, and therefore I
also get duplicate session start and session end events for the session id.
So it looks to me that the Flink is collecting the events into multiple sessions
even if they have the same session id.

Examples:

Input events:
Event timestamp Id
2022-09-06 08:00:00 ABC
2022-09-06 08:00:01 ABC
2022-09-06 08:00:02 ABC
2022-09-06 08:00:03 ABC
2022-09-06 08:00:04 ABC
2022-09-06 08:00:05 ABC

Problem 1:
Output events:
Event time  Id  Type
2022-09-06 08:00:00 ABC Start
2022-09-06 08:00:03 ABC End
2022-09-06 08:00:04 ABC Start
2022-09-06 08:00:05 ABC End
Problem 2:
Output events:
Event time  Id  Type
2022-09-06 08:00:00 ABC Start
2022-09-06 08:00:03 ABC Start
2022-09-06 08:00:04 ABC End
2022-09-06 08:00:05 ABC End

Expected output:
Event time  Id  Type
2022-09-06 08:00:00 ABC Start
2022-09-06 08:00:05 ABC End


Is this expected behaviour? How can I avoid getting duplicate session windows?

Thanks for your help
Kristinn


Unable to run pyflink job - NetUtils getAvailablePort Error

2022-09-06 Thread Ramana
Hello there,

I have a pyflink setup of 1 : JobManager - 1 : Task Manager.

Trying to run a pyflink job and no matter what i do, i get the following
error message.

-
The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: java.lang.NoSuchMethodError:
org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;

Caused by: java.lang.NoSuchMethodError:
org.apache.flink.util.NetUtils.getAvailablePort()Lorg/apacheflink/util/NetUtils$Port;
at
org.apache.flink.client.python.PythonEnvUtils.lambda$startGatewayServer$3(PythonUtils.java:365)
at java.lang.Thread.run(Thread.java:750)


Tried executing with some out of the box examples, yet I get the same error
above.

Could anybody shed some light on why the error is occurring, and how I can
have it resolved?

Appreciate any help here.

Thanks.
Ramana
-- 
DREAM IT, DO IT