Re: Support for ConfigMap for Runtime Arguments in Flink Kubernetes Operator

2024-03-04 Thread Surendra Singh Lilhore
Hi Arjun,

I have raised a Jira for this case and attached a patch:

https://issues.apache.org/jira/browse/FLINK-34565

-Surendra

On Wed, Feb 21, 2024 at 12:48 AM Surendra Singh Lilhore <
surendralilh...@apache.org> wrote:

> Hi Arjun,
>
> Yes, direct support for external configuration files within Flink
> ConfigMaps is somewhat restricted. The current method involves simply
> copying two local files from the operator.
>
> Please check : FlinkConfMountDecorator#getLocalLogConfFiles()
> <https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/FlinkConfMountDecorator.java#L180>
>
> You can try a pod template for external configmap.
>
> Thanks
> Surendra
>
> On Mon, Feb 19, 2024 at 11:17 PM arjun s  wrote:
>
>> Hi team,
>>
>> I am currently in the process of deploying Flink on Kubernetes using the
>> Flink Kubernetes Operator and have encountered a scenario where I need to
>> pass runtime arguments to my Flink application from a properties file.
>> Given the dynamic nature of Kubernetes environments and the need for
>> flexibility in configuration management, I was wondering if the Flink
>> Kubernetes Operator supports the use of Kubernetes ConfigMaps for this
>> purpose. Specifically, I am interested in understanding:
>>
>> 1.How can I use a ConfigMap to pass runtime arguments or configurations
>> stored in a properties file to a Flink job deployed using the Kubernetes
>> operator?
>> 2.Are there best practices or recommended approaches for managing
>> application-specific configurations, such as database connections or other
>> external resource settings, using ConfigMaps with the Flink Kubernetes
>> Operator?
>> 3.If direct support for ConfigMaps is not available or limited, could you
>> suggest any workarounds or alternative strategies that align with Flink's
>> deployment model on Kubernetes?
>>
>> I appreciate any guidance or documentation you could provide on this
>> matter, as it would greatly assist in streamlining our deployment process
>> and maintaining configuration flexibility in our Flink applications.
>>
>> Thank you for your time and support. I look forward to your response.
>>
>


Re: Support for ConfigMap for Runtime Arguments in Flink Kubernetes Operator

2024-02-20 Thread Surendra Singh Lilhore
Hi Arjun,

Yes, direct support for external configuration files within Flink
ConfigMaps is somewhat restricted. The current method involves simply
copying two local files from the operator.

Please check : FlinkConfMountDecorator#getLocalLogConfFiles()


You can try a pod template for external configmap.

Thanks
Surendra

On Mon, Feb 19, 2024 at 11:17 PM arjun s  wrote:

> Hi team,
>
> I am currently in the process of deploying Flink on Kubernetes using the
> Flink Kubernetes Operator and have encountered a scenario where I need to
> pass runtime arguments to my Flink application from a properties file.
> Given the dynamic nature of Kubernetes environments and the need for
> flexibility in configuration management, I was wondering if the Flink
> Kubernetes Operator supports the use of Kubernetes ConfigMaps for this
> purpose. Specifically, I am interested in understanding:
>
> 1.How can I use a ConfigMap to pass runtime arguments or configurations
> stored in a properties file to a Flink job deployed using the Kubernetes
> operator?
> 2.Are there best practices or recommended approaches for managing
> application-specific configurations, such as database connections or other
> external resource settings, using ConfigMaps with the Flink Kubernetes
> Operator?
> 3.If direct support for ConfigMaps is not available or limited, could you
> suggest any workarounds or alternative strategies that align with Flink's
> deployment model on Kubernetes?
>
> I appreciate any guidance or documentation you could provide on this
> matter, as it would greatly assist in streamlining our deployment process
> and maintaining configuration flexibility in our Flink applications.
>
> Thank you for your time and support. I look forward to your response.
>


Re: Flink 1.18.0 Checkpoints on cancelled jobs

2023-12-10 Thread Surendra Singh Lilhore
Hi Ethan,

Looks like this got changed after
https://issues.apache.org/jira/browse/FLINK-32469.

Now the checkpoint history call throws below exception for canceled job.

2023-12-10 21:50:12,990 ERROR
org.apache.flink.runtime.rest.handler.job.checkpoints.
CheckpointingStatisticsHandler [] - Exception occurred in REST handler: Job
7504e7a6106093a3a9c7ef35f52ce6cf not found


Thanks
Surendra


On Sat, Dec 9, 2023 at 12:26 PM Ethan T Yang  wrote:

> Hello Surendra,
> Thank you for replying my question. I already have this code
>
>
> env.getCheckpointConfig().setExternalizedCheckpointCleanup(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
>
> I also tried use the rest api to retrieve a cancelled job, and no
> checkpoint was found from the rest api either. We use this conf
>
> # s3 checkpointing
> state.backend: filesystem
> state.checkpoints.dir: {{ .Values.jobManager.checkpointUrl }}
> state.savepoints.dir: {{ .Values.jobManager.savepointUrl }}
>
> The actual checkpoint is there in s3 after cancellation. Can someone point
> me to the code where the checkpoint history is maintained?
>
> Thanks,
> Ethan
>
> On Dec 8, 2023, at 8:23 AM, Surendra Singh Lilhore <
> surendralilh...@gmail.com> wrote:
>
>
> Hi Ethan,
>
> Can you try :
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints/#retained-checkpoints
>
> Thanks
> Surendra
>
>
> On Thu, Dec 7, 2023 at 4:47 PM Ethan T Yang  wrote:
>
>> Hi Flink Users,
>>
>> After migration from Flink 1.13.1 -> 1.18.0, I am no longer seeing the
>> checkpoint history after cancelling a job. I am wonder which setting to
>> enable so that I can see the checkpoint history on a cancelled job in Flink
>> 1.18.0.  Below is the screenshot of what I can see in Flink 1.13.1. Hope to
>> get back the same view in the new version.
>>
>> Thanks,
>> Ethan
>>
>>
>


Re: Continuous errors with Azure ABFSS

2023-09-28 Thread Surendra Singh Lilhore
Hi Alexis,

Could you please check the TaskManager log for any exceptions?

Thanks
Surendra


On Thu, Sep 28, 2023 at 7:06 AM Alexis Sarda-Espinosa <
sarda.espin...@gmail.com> wrote:

> Hello,
>
> We are using ABFSS for RocksDB's backend as well as the storage dir
> required for Kubernetes HA. In the Azure Portal's monitoring insights I see
> that every single operation contains failing transactions for the
> GetPathStatus API. Unfortunately I don't see any additional details, but I
> know the storage account is only used by Flink. Checkpointing isn't
> failing, but I wonder if this could be an issue in the long term?
>
> Regards,
> Alexis.
>
>


Re: MSI Auth to Azure Storage Account with Flink Apache Operator not working

2023-05-17 Thread Surendra Singh Lilhore
Hi Derocco,

Good to hear that it is working. Let me create a Jira ticket and update the
document.

-Surendra


On Wed, May 17, 2023 at 9:29 PM DEROCCO, CHRISTOPHER  wrote:

> Surendra,
>
>
>
> Your recommended config change fixed my issue. Azure Managed Service
> Identity works for me now and I can write checkpoints to ADLSGen2 storage.
> My client id is the managed identity that is attached to the azure
> Kubernetes nodepools. For anyone else facing this issue, my configurations
> to get this working in the Kubernetes yaml are:
>
>
>
> flinkConfigurations:
>
> fs.azure.createRemoteFileSystemDuringInitialization: "true"
>
> fs.azure.account.oauth.provider.type..
> dfs.core.windows.net:
> *org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider*
>
> fs.azure.account.oauth2.msi.tenant. .
> dfs.core.windows.net: 
>
> fs.azure.account.oauth2.client.id. .
> dfs.core.windows.net: 
>
> fs.azure.account.oauth2.client.endpoint. .
> dfs.core.windows.net: https://login.microsoftonline.com/ ID>/oauth2/token
>
>
>
> Also this environment variable has to be added to the Kubernetes yaml
> configuration
>
>
>
>   containers:
>
> # Do not change the main container name
>
> - name: flink-main-container
>
>   env:
>
>   - name: ENABLE_BUILT_IN_PLUGINS
>
> value: flink-azure-fs-hadoop-1.16.1.jar
>
>
>
>
>
> This azure managed service identity configuration should be added to the
> flink docs. I couldn’t find anywhere that the
> fs.azure.account.oauth.provider.type had to be set as
> *org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider*
>
>
>
>
>
> *From:* Surendra Singh Lilhore 
> *Sent:* Tuesday, May 16, 2023 11:46 PM
> *To:* Ivan Webber 
> *Cc:* DEROCCO, CHRISTOPHER ; Shammon FY ;
> user@flink.apache.org
> *Subject:* Re: MSI Auth to Azure Storage Account with Flink Apache
> Operator not working
>
>
>
> Hi DEROCCO,
>
>
>
> Flink uses shaded jars for the Hadoop Azure Storage plugin, so in order to
> correct the ClassNotFoundException, you need to adjust the configuration.
> Please configure the MSITokenProvider as shown below.
>
>
>
> fs.azure.account.oauth.provider.type:
> *org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider*
>
>
>
> Thanks
>
> Surendra
>
>
>
>
>
> On Wed, May 17, 2023 at 5:32 AM Ivan Webber via user <
> user@flink.apache.org> wrote:
>
> When you create your cluster you probably need to ensure the following
> settings are set. I briefly looked into MSI but ended up using Azure Key
> Vault with CSI-storage driver for initial prototype (
> https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/csi-secrets-store-driver.md#upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support
> <https://urldefense.com/v3/__https:/github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/csi-secrets-store-driver.md*upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support__;Iw!!BhdT!iKqopXbK8CmuTNzfeMy7YIENZID6mDog1vC7RonlcCPJ33cegpcwXpHjtqYV1HDWO3bIWCPxwuHxW0-o0PEq2cCN$>
> ).
>
>
>
> For me it helped to think about it as Hadoop configuration.
>
>
>
> If you do get MSI working I would be interested in hearing what made it
> work for you, so be sure to update the docs or put it on this thread.
>
>
>
> * To create from scratch*
>
> Create an AKS cluster with the required settings.
>
> ```bash
>
> # create an AKS cluster with pod-managed identity and Azure CNI
>
> az aks create --resource-group $RESOURCE_GROUP --name $CLUSTER
> --enable-managed-identity --network-plugin azure --enable-pod-identity
>
> ```
>
>
>
> I hope that is somehow helpful.
>
>
>
> Best of luck,
>
>
>
> Ivan
>
>
>
> *From: *DEROCCO, CHRISTOPHER 
> *Sent: *Monday, May 8, 2023 3:40 PM
> *To: *Shammon FY 
> *Cc: *user@flink.apache.org
> *Subject: *[EXTERNAL] RE: MSI Auth to Azure Storage Account with Flink
> Apache Operator not working
>
>
>
> You don't often get email from cd9...@att.com. Learn why this is important
> <https://urldefense.com/v3/__https:/aka.ms/LearnAboutSenderIdentification__;!!BhdT!iKqopXbK8CmuTNzfeMy7YIENZID6mDog1vC7RonlcCPJ33cegpcwXpHjtqYV1HDWO3bIWCPxwuHxW0-o0Cz4a7J5$>
>
> Shammon,
>
>
>
> I’m still having trouble setting the package in my cluster environment. I 
> have these lines added to my dockerfile
>
> mkdir ./plugins/azure-fs-hadoop
>
> cp ./opt

Re: MSI Auth to Azure Storage Account with Flink Apache Operator not working

2023-05-16 Thread Surendra Singh Lilhore
Hi DEROCCO,

Flink uses shaded jars for the Hadoop Azure Storage plugin, so in order to
correct the ClassNotFoundException, you need to adjust the configuration.
Please configure the MSITokenProvider as shown below.

fs.azure.account.oauth.provider.type:
*org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider*


Thanks
Surendra


On Wed, May 17, 2023 at 5:32 AM Ivan Webber via user 
wrote:

> When you create your cluster you probably need to ensure the following
> settings are set. I briefly looked into MSI but ended up using Azure Key
> Vault with CSI-storage driver for initial prototype (
> https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/csi-secrets-store-driver.md#upgrade-an-existing-aks-cluster-with-azure-key-vault-provider-for-secrets-store-csi-driver-support
> ).
>
>
>
> For me it helped to think about it as Hadoop configuration.
>
>
>
> If you do get MSI working I would be interested in hearing what made it
> work for you, so be sure to update the docs or put it on this thread.
>
>
>
> * To create from scratch*
>
> Create an AKS cluster with the required settings.
>
> ```bash
>
> # create an AKS cluster with pod-managed identity and Azure CNI
>
> az aks create --resource-group $RESOURCE_GROUP --name $CLUSTER
> --enable-managed-identity --network-plugin azure --enable-pod-identity
>
> ```
>
>
>
> I hope that is somehow helpful.
>
>
>
> Best of luck,
>
>
>
> Ivan
>
>
>
> *From: *DEROCCO, CHRISTOPHER 
> *Sent: *Monday, May 8, 2023 3:40 PM
> *To: *Shammon FY 
> *Cc: *user@flink.apache.org
> *Subject: *[EXTERNAL] RE: MSI Auth to Azure Storage Account with Flink
> Apache Operator not working
>
>
>
> You don't often get email from cd9...@att.com. Learn why this is important
> 
>
> Shammon,
>
>
>
> I’m still having trouble setting the package in my cluster environment. I 
> have these lines added to my dockerfile
>
> mkdir ./plugins/azure-fs-hadoop
>
> cp ./opt/flink-azure-fs-hadoop-1.16.0.jar ./plugins/azure-fs-hadoop/
>
>
>
> according to the flink docs here (
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/filesystems/azure/
> )
>
> This should enable the flink-azure-fs-hadoop jar in the environment which
> has the classes to enable the adls2 MSI authentication.
>
> I also have the following dependency in my pom to add it to the FAT Jar.
>
>
>
> 
>
> org.apache.flink
>
> flink-azure-fs-hadoop
>
> ${flink.version}
>
> 
>
>
>
> However, I still get the class not found error and the flink job is not
> able to authenticate to the azure storage account to store its checkpoints.
> I’m not sure what other configuration pieces I’m missing. Has anyone had
> successful with writing checkpoints to Azure ADLS2gen Storage with managed
> service identity (MSI) authentication.?
>
>
>
>
>
>
>
> *From:* Shammon FY 
> *Sent:* Friday, May 5, 2023 8:38 PM
> *To:* DEROCCO, CHRISTOPHER 
> *Cc:* user@flink.apache.org
> *Subject:* Re: MSI Auth to Azure Storage Account with Flink Apache
> Operator not working
>
>
>
> Hi DEROCCO,
>
>
>
> I think you can check the startup command of the job on k8s to see if the
> jar file is in the classpath.
>
>
>
> If your job is DataStream, you need to add hadoop azure dependency in your
> project, and if it is an SQL job, you need to include this jar file in your
> Flink release package. Or you can also add this package in your cluster
> environment.
>
>
>
> Best,
>
> Shammon FY
>
>
>
>
>
> On Fri, May 5, 2023 at 10:21 PM DEROCCO, CHRISTOPHER 
> wrote:
>
> How can I add the package to the flink job or check if it is there?
>
>
>
> *From:* Shammon FY 
> *Sent:* Thursday, May 4, 2023 9:59 PM
> *To:* DEROCCO, CHRISTOPHER 
> *Cc:* user@flink.apache.org
> *Subject:* Re: MSI Auth to Azure Storage Account with Flink Apache
> Operator not working
>
>
>
> Hi DEROCCO,
>
>
>
> I think you need to check whether there is a hadoop-azure jar file in the
> classpath of your flink job. From an error message '*Caused by:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider not found.*', your
> flink job may be missing this package.
>
>
>
> Best,
>
> Shammon FY
>
>
>
>
>
> On Fri, May 5, 2023 at 4:40 AM DEROCCO, CHRISTOPHER 
> wrote:
>
>
>
> I receive the error:  *Caused by: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider not found.*
>
> I’m using flink 1.16 running in Azure Kubernetes using the Flink Apache
> Kubernetes Operator.
>
> I have the following specified in the spec.flinkConfiguration: as per the
> Apache Kubernetes operator documentation.
>
>
>
> fs.azure.createRemoteFileSystemDuringInitialization: "true"
>
> fs.azure.account.auth.type.storageaccountname.dfs.core.windows.net
>