Re: Deserialize generic kafka json message in pyflink. Single kafka topic, multiple message schemas (debezium).

2021-11-21 Thread Dian Fu
Hi Kamil,

Actually FlinkKafkaConsumer expects a DeserializationSchema instead of
JsonRowDeserialization and so I guess you could try SimpleStringSchema.

Regards,
Dian

On Sat, Nov 20, 2021 at 5:55 AM Kamil ty  wrote:

> Hello all,
>
> I'm working on a pyflink job that's supposed to consume json messages from
> Kafka and save them to a partitioned avro file sink.
> I'm having difficulties finding a solution on how to process the
> messages, because there is only one kafka topic for multiple
> message schemas. As pyflinks FlinkKafkaConsumer expects a
> JsonRowDeserialization schema, I assume that all of the messages need a
> constant defined schema. I expect the same for the Kafka Table API.
>
> The messages follow a general debezium message schema:
> Example data taken from flink docs:
>
> {
>   "schema": {...},
>   "payload": {
> "before": {
>   "id": 111,
>   "name": "scooter",
>   "description": "Big 2-wheel scooter",
>   "weight": 5.18
> },
> "after": {
>   "id": 111,
>   "name": "scooter",
>   "description": "Big 2-wheel scooter",
>   "weight": 5.15
> },
> "source": {...},
> "op": "u",
> "ts_ms": 1589362330904,
> "transaction": null
>   }}
>
> The messages are coming to a single Kafka topic, where the 'schema',
> 'after', 'before' fields can be different for each message. The kafka
> message key also contains the 'schema' field from the above example. My
> question is if there is a way to process such messages coming from a single
> Kafka topic with pyflink without writing a custom DeserializationSchema.
> Any help would be appreciated.
>
> Kind Regards
> Kamil
>


Re: Flink S3 Presto Checkpointing Permission Forbidden

2021-11-21 Thread bat man
Hi Dennis,

Were you able to use checkpointing on s3 with native kubernetes. I am using
flink 1.13.1 and did tried your solution of passing the
webidentitytokencredentialsprovider.

*-Dfs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider*
I am getting this error in job-manager logs - *Caused by:
com.amazonaws.SdkClientException: Unable to locate specified web identity
token file: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
*
Describing the pod shows that that volume is mounted to the pod.
Is there anything specific that needs to be done as on the same EKS cluster
for testing I ran a sample pod with aws cli image and it's able to do *ls*
on the same s3 bucket.

Thanks,
Hemant

On Mon, Oct 11, 2021 at 1:56 PM Denis Nutiu  wrote:

> Hi Rommel,
>
>
>
> Thanks for getting back to me and for your time.
>
> I switched to the Hadoop plugin and used the following authentication
> method that worked:
> *fs.s3a.aws.credentials.provider:
> "com.amazonaws.auth.WebIdentityTokenCredentialsProvider"*
>
>
> Turns out that I was using the wrong credentials provider. Reading
> AWSCredentialProvider[1] and seeing that I have the
> AWS_WEB_IDENTITY_TOKEN_FILE variable in the container allowed me to find
> the correct one.
>
>
> [1]
> https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/AWSCredentialsProvider.html
>
>
> Best,
>
> Denis
>
>
>
>
>
> *From:* Rommel Holmes 
> *Sent:* Saturday, October 9, 2021 02:09
> *To:* Denis Nutiu 
> *Cc:* user 
> *Subject:* Re: Flink S3 Presto Checkpointing Permission Forbidden
>
>
>
> You already have s3 request ID, you can easily reach out to AWS tech
> support to know what account was used to write to S3. I guess that account
> probably doesn't have permission to do the following:
>
>
>
> "s3:GetObject",
> "s3:PutObject",
> "s3:DeleteObject",
> "s3:ListBucket"
>
> Then grant the account with the permission in k8s. Then you should be good
> to go.
>
>
>
>
>
>
>
>
>
> On Fri, Oct 8, 2021 at 6:06 AM Denis Nutiu  wrote:
>
> Hello,
>
>
>
> I'm trying to deploy my Flink cluster inside of an AWS EKS using Flink
> Native. I want to use S3 as a filesystem for checkpointing, and giving the
> following options related to flink-s3-fs-presto:
>
>
>
> "-Dhive.s3.endpoint": "https://s3.eu-central-1.amazonaws.com";
> "-Dhive.s3.iam-role": "arn:aws:iam::xxx:role/s3-flink"
> "-Dhive.s3.use-instance-credentials": "true"
> "-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS":
> "flink-s3-fs-presto-1.13.2.jar"
> "-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS":
> "flink-s3-fs-presto-1.13.2.jar"
> "-Dstate.backend": "rocksdb"
> "-Dstate.backend.incremental": "true"
> "-Dstate.checkpoints.dir": "s3://bucket/checkpoints/"
> "-Dstate.savepoints.dir": "s3://bucket/savepoints/"
>
>
>
> But my job fails with:
>
>
>
> 2021-10-08 11:38:49,771 WARN
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Could
> not properly dispose the private states in the pending checkpoint 45 of job
> 75bdd6fb6e689961ef4e096684e867bc.
> com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException:
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service:
> Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID:
> JEZ3X8YPDZ2TF4T9; S3 Extended Request ID:
> u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=;
> Proxy: null), S3 Extended Request ID:
> u2RBcDpifTnzO4hIOGqgTOKDY+nw6iSeSepd4eYThITCPCpVddIUGMU7jY5DpJBg1LkPuYXiH9c=
> (Path: s3://bucket/checkpoints/75bdd6fb6e689961ef4e096684e867bc/chk-45)
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:573)
> ~[?:?]
> at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?]
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:560)
> ~[?:?]
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:311)
> ~[?:?]
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.directory(PrestoS3FileSystem.java:450)
> ~[?:?]
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem.delete(PrestoS3FileSystem.java:427)
> ~[?:?]
> at
> org.apache.flink.fs.s3presto.common.HadoopFileSystem.delete(HadoopFileSystem.java:160)
> ~[?:?]
> at
> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.delete(PluginFileSystemFactory.java:155)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.disposeOnFailure(FsCheckpointStorageLocation.java:117)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.discard(PendingCheckpoint.java:588)
> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
> at
> org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:60)
> ~[flink-dist_2.11-

Flink on Native Kubernetes S3 checkpointing error

2021-11-21 Thread bat man
Hi,

I am using flink 1.13.1 to use checkpointing(RocksDB) on s3 with native
kubernetes.
Passing in this parameter to job -

*-Dfs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider*
I am getting this error in job-manager logs -

*Caused by: com.amazonaws.AmazonClientException: No AWS Credentials
provided by WebIdentityTokenCredentialsProvider :
com.amazonaws.SdkClientException: Unable to locate specified web identity
token file: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
 at
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:139)
~[?:?]*

Describing the pod shows that that volume is mounted to the jobmanager pod.
Is there anything specific that needs to be done as on the same EKS cluster
for testing I ran a sample pod with aws cli image and it's able to do *ls* on
the s3 buckets.
Is this related to aws sdk used in Flink 1.13.1, shall I try with recent
flink versions.

Any help would be appreciated.

Thanks.


GroupOffsetReset functionality

2021-11-21 Thread Sangbida Dutta
Hi ,

Currently I am using flink 1.13 . To configure kafka consumers start
position i am using  setStartFromGroupOffsets() method of
FlinkKafkaConsumer.
if Group offset is set, it should read committed, but if they are not found
then how will it work ?

Any help would be appreciated.

Thanks.