Flink Table Kinesis sink not failing when sink fails

2022-11-29 Thread Dan Hill
I set up a simple Flink SQL job (Flink v1.14.4) that writes to Kinesis.
The job looks healthy but the records are not being written.  I did not
give enough IAM permissions to write to Kinesis.  However, the Flink SQL
job acts like it's healthy and checkpoints even though the Kinesis PutRecords
call fails.  I'd expect this error to kill the Flink job.

I looked through Flink Jira and the Flink user group but didn't see a
similar issue.

Is the silent failure a known issue?  If the Flink job doesn't fail, it'll
be hard to detect production issues.

```

2022-11-29 23:30:27,587 ERROR
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader
[] - [2022-11-29 23:30:27.578072] [0x1e3b][0x7f12ef8fc700]
[error] [AWS Log: ERROR](AWSClient)HTTP response code: 400
Exception name: AccessDeniedException
Error message: User:
arn:aws:sts::055315558257:assumed-role/dev-workers-us-east-1b-202203101433138915000a/i-09e4f747a4bdbb1f0
is not authorized to perform: kinesis:ListShards on resource:
arn:aws:kinesis:us-east-1:055315558257:stream/dan-dev-content-metrics
because no identity-based policy allows the kinesis:ListShards action
6 response headers:
connection : close
content-length : 379
content-type : application/x-amz-json-1.1
date : Tue, 29 Nov 2022 23:30:27 GMT
x-amz-id-2 : 
q8kuplUOMJILzVU97YA+TYSyy6aozeoST+yws26rOkyzEUUZT0zKBdcMWUAjV/8RrnMeed/+em7CbjpwzGYEANgkwCihZWdC
x-amzn-requestid : e4a39674-66fa-4dcd-b8a3-0e273e5e628a

```


Re: Flink Table Kinesis sink not failing when sink fails

2022-11-29 Thread yuxia
Which code line the error message happens? Maybe it will swallow the exception 
and then log the error message, in which case Flink job won't fail since it 
seems like no exception happens. 

Best regards, 
Yuxia 


发件人: "Dan Hill"  
收件人: "User"  
发送时间: 星期三, 2022年 11 月 30日 上午 8:06:52 
主题: Flink Table Kinesis sink not failing when sink fails 

I set up a simple Flink SQL job (Flink v1.14.4) that writes to Kinesis . The 
job looks healthy but the records are not being written. I did not give enough 
IAM permissions to write to Kinesis . However, the Flink SQL job acts like it's 
healthy and checkpoints even though the Kinesis PutRecords call fails. I'd 
expect this error to kill the Flink job. 
I looked through Flink Jira and the Flink user group but didn't see a similar 
issue. 

Is the silent failure a known issue? If the Flink job doesn't fail, it'll be 
hard to detect production issues. 

``` 
2022 - 11 - 29 23 : 30 : 27 , 587 ERROR org.apache.flink. kinesis .shaded. com 
. amazonaws . services . kinesis . producer . LogInputStreamReader [] - [ 2022 
- 11 - 29 23 : 30 : 27.578072 ] [ 0 x1e3b][ 0 x7f12ef8fc700] [error] 
[AWS Log: ERROR](AWSClient)HTTP response code: 400 
Exception name: AccessDeniedException 
Error message: User : arn:aws:sts:: 055315558257 
:assumed-role/dev-workers-us-east-1b-202203101433138915000a/i-09e4f747a4bdbb1f0
 is not authorized to perform: kinesis :ListShards on resource: arn:aws: 
kinesis :us-east-1: 055315558257 :stream/dan-dev-content-metrics because no 
identity -based policy allows the kinesis :ListShards action 
6 response headers: 
connection : close 
content-length : 379 
content-type : application/x-amz-json-1. 1 
date : Tue, 29 Nov 2022 23 : 30 : 27 GMT 
x-amz-id-2 : q8kuplUOMJILzVU97YA+TYSyy6aozeoST+yws26rOkyzEUUZT0zKBdcMWUAjV/ 8 
RrnMeed/+em7CbjpwzGYEANgkwCihZWdC 
x-amzn-requestid : e4a39674-66fa-4dcd-b8a3-0e273e5e628a 
``` 



Re: Flink Table Kinesis sink not failing when sink fails

2022-11-29 Thread Dan Hill
My text logs don't have a stack trace with this exception.  I'm doing this
inside Flink SQL with a standard Kinesis connector and JSON formatter.

On Tue, Nov 29, 2022 at 6:38 PM yuxia  wrote:

> Which code line the error message happens? Maybe it will swallow the
> exception and then log the error message, in which case Flink job won't
> fail since it seems like no exception happens.
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"Dan Hill" 
> *收件人: *"User" 
> *发送时间: *星期三, 2022年 11 月 30日 上午 8:06:52
> *主题: *Flink Table Kinesis sink not failing when sink fails
>
> I set up a simple Flink SQL job (Flink v1.14.4) that writes to Kinesis.
> The job looks healthy but the records are not being written.  I did not
> give enough IAM permissions to write to Kinesis.  However, the Flink SQL
> job acts like it's healthy and checkpoints even though the Kinesis PutRecords
> call fails.  I'd expect this error to kill the Flink job.
> I looked through Flink Jira and the Flink user group but didn't see a
> similar issue.
>
> Is the silent failure a known issue?  If the Flink job doesn't fail, it'll
> be hard to detect production issues.
>
> ```
>
> 2022-11-29 23:30:27,587 ERROR 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader
>  [] - [2022-11-29 23:30:27.578072] [0x1e3b][0x7f12ef8fc700] [error] 
> [AWS Log: ERROR](AWSClient)HTTP response code: 400
> Exception name: AccessDeniedException
> Error message: User: 
> arn:aws:sts::055315558257:assumed-role/dev-workers-us-east-1b-202203101433138915000a/i-09e4f747a4bdbb1f0
>  is not authorized to perform: kinesis:ListShards on resource: 
> arn:aws:kinesis:us-east-1:055315558257:stream/dan-dev-content-metrics because 
> no identity-based policy allows the kinesis:ListShards action
> 6 response headers:
> connection : close
> content-length : 379
> content-type : application/x-amz-json-1.1
> date : Tue, 29 Nov 2022 23:30:27 GMT
> x-amz-id-2 : 
> q8kuplUOMJILzVU97YA+TYSyy6aozeoST+yws26rOkyzEUUZT0zKBdcMWUAjV/8RrnMeed/+em7CbjpwzGYEANgkwCihZWdC
> x-amzn-requestid : e4a39674-66fa-4dcd-b8a3-0e273e5e628a
>
> ```
>
>


Re: Flink Table Kinesis sink not failing when sink fails

2022-11-30 Thread Danny Cranmer
Hello,

By default the sink will not fail, the underlying connector has a flag
"failOnError" which defaults to false. Unfortunately this cannot be set for
Flink 1.14 in the Table API, however in 1.15 it can via
'sink.fail-on-error: true'

Thanks

On Wed, Nov 30, 2022 at 5:41 AM Dan Hill  wrote:

> My text logs don't have a stack trace with this exception.  I'm doing this
> inside Flink SQL with a standard Kinesis connector and JSON formatter.
>
> On Tue, Nov 29, 2022 at 6:38 PM yuxia  wrote:
>
>> Which code line the error message happens? Maybe it will swallow the
>> exception and then log the error message, in which case Flink job won't
>> fail since it seems like no exception happens.
>>
>> Best regards,
>> Yuxia
>>
>> --
>> *发件人: *"Dan Hill" 
>> *收件人: *"User" 
>> *发送时间: *星期三, 2022年 11 月 30日 上午 8:06:52
>> *主题: *Flink Table Kinesis sink not failing when sink fails
>>
>> I set up a simple Flink SQL job (Flink v1.14.4) that writes to Kinesis.
>> The job looks healthy but the records are not being written.  I did not
>> give enough IAM permissions to write to Kinesis.  However, the Flink SQL
>> job acts like it's healthy and checkpoints even though the Kinesis PutRecords
>> call fails.  I'd expect this error to kill the Flink job.
>> I looked through Flink Jira and the Flink user group but didn't see a
>> similar issue.
>>
>> Is the silent failure a known issue?  If the Flink job doesn't fail,
>> it'll be hard to detect production issues.
>>
>> ```
>>
>> 2022-11-29 23:30:27,587 ERROR 
>> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader
>>  [] - [2022-11-29 23:30:27.578072] [0x1e3b][0x7f12ef8fc700] [error] 
>> [AWS Log: ERROR](AWSClient)HTTP response code: 400
>> Exception name: AccessDeniedException
>> Error message: User: 
>> arn:aws:sts::055315558257:assumed-role/dev-workers-us-east-1b-202203101433138915000a/i-09e4f747a4bdbb1f0
>>  is not authorized to perform: kinesis:ListShards on resource: 
>> arn:aws:kinesis:us-east-1:055315558257:stream/dan-dev-content-metrics 
>> because no identity-based policy allows the kinesis:ListShards action
>> 6 response headers:
>> connection : close
>> content-length : 379
>> content-type : application/x-amz-json-1.1
>> date : Tue, 29 Nov 2022 23:30:27 GMT
>> x-amz-id-2 : 
>> q8kuplUOMJILzVU97YA+TYSyy6aozeoST+yws26rOkyzEUUZT0zKBdcMWUAjV/8RrnMeed/+em7CbjpwzGYEANgkwCihZWdC
>> x-amzn-requestid : e4a39674-66fa-4dcd-b8a3-0e273e5e628a
>>
>> ```
>>
>>


Re: Flink Table Kinesis sink not failing when sink fails

2022-11-30 Thread Dan Hill
Thanks!

On Wed, Nov 30, 2022, 00:37 Danny Cranmer  wrote:

> Hello,
>
> By default the sink will not fail, the underlying connector has a flag
> "failOnError" which defaults to false. Unfortunately this cannot be set for
> Flink 1.14 in the Table API, however in 1.15 it can via
> 'sink.fail-on-error: true'
>
> Thanks
>
> On Wed, Nov 30, 2022 at 5:41 AM Dan Hill  wrote:
>
>> My text logs don't have a stack trace with this exception.  I'm doing
>> this inside Flink SQL with a standard Kinesis connector and JSON formatter.
>>
>> On Tue, Nov 29, 2022 at 6:38 PM yuxia 
>> wrote:
>>
>>> Which code line the error message happens? Maybe it will swallow the
>>> exception and then log the error message, in which case Flink job won't
>>> fail since it seems like no exception happens.
>>>
>>> Best regards,
>>> Yuxia
>>>
>>> --
>>> *发件人: *"Dan Hill" 
>>> *收件人: *"User" 
>>> *发送时间: *星期三, 2022年 11 月 30日 上午 8:06:52
>>> *主题: *Flink Table Kinesis sink not failing when sink fails
>>>
>>> I set up a simple Flink SQL job (Flink v1.14.4) that writes to Kinesis.
>>> The job looks healthy but the records are not being written.  I did not
>>> give enough IAM permissions to write to Kinesis.  However, the Flink
>>> SQL job acts like it's healthy and checkpoints even though the Kinesis 
>>> PutRecords
>>> call fails.  I'd expect this error to kill the Flink job.
>>> I looked through Flink Jira and the Flink user group but didn't see a
>>> similar issue.
>>>
>>> Is the silent failure a known issue?  If the Flink job doesn't fail,
>>> it'll be hard to detect production issues.
>>>
>>> ```
>>>
>>> 2022-11-29 23:30:27,587 ERROR 
>>> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.LogInputStreamReader
>>>  [] - [2022-11-29 23:30:27.578072] [0x1e3b][0x7f12ef8fc700] [error] 
>>> [AWS Log: ERROR](AWSClient)HTTP response code: 400
>>> Exception name: AccessDeniedException
>>> Error message: User: 
>>> arn:aws:sts::055315558257:assumed-role/dev-workers-us-east-1b-202203101433138915000a/i-09e4f747a4bdbb1f0
>>>  is not authorized to perform: kinesis:ListShards on resource: 
>>> arn:aws:kinesis:us-east-1:055315558257:stream/dan-dev-content-metrics 
>>> because no identity-based policy allows the kinesis:ListShards action
>>> 6 response headers:
>>> connection : close
>>> content-length : 379
>>> content-type : application/x-amz-json-1.1
>>> date : Tue, 29 Nov 2022 23:30:27 GMT
>>> x-amz-id-2 : 
>>> q8kuplUOMJILzVU97YA+TYSyy6aozeoST+yws26rOkyzEUUZT0zKBdcMWUAjV/8RrnMeed/+em7CbjpwzGYEANgkwCihZWdC
>>> x-amzn-requestid : e4a39674-66fa-4dcd-b8a3-0e273e5e628a
>>>
>>> ```
>>>
>>>