RE: Flink bulk and record file source format metrices

2023-06-18 Thread Kamal Mittal via user
Thanks for the approach but few updates w.r.t my query sent  –

Parquet file is a binary file so when I said corrupt record it is complete file 
in itself can’t be processed right?
So it is not counting corrupt records rather counting corrupt files or splits 
in Flink?

From: Ken Krugler 
Sent: 16 June 2023 08:19 PM
To: user@flink.apache.org
Cc: Shammon FY ; Kamal Mittal 
Subject: Re: Flink bulk and record file source format metrices

Hi Kamal,

In a similar situation, when a decoding failure happened I would generate a 
special record that I could then detect/filter out (and increment a counter) in 
a FilterFunction immediately following the source.

— Ken



On Jun 16, 2023, at 2:11 AM, Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:

Hello,

Any way-forward, please suggest.

Rgds,
Kamal

From: Kamal Mittal via user 
mailto:user@flink.apache.org>>
Sent: 15 June 2023 10:39 AM
To: Shammon FY mailto:zjur...@gmail.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: Flink bulk and record file source format metrices

Hello,

I need one counter matric for no. of corrupt records while decoding parquet 
records at data source level. I know where the corrupt record handling requires 
but due to non-existence of “SourceContext” or “RuntimeContext”, unable to do 
anything w.r.t metric.

It is needed similarly the way “SourceReaderBase” class maintaining one counter 
for no. of records emitted.

Rgds,
Kamal

From: Shammon FY mailto:zjur...@gmail.com>>
Sent: 14 June 2023 05:33 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Flink bulk and record file source format metrices

Hi Kamal,

Can you give more information about the metris you want? In Flink each source 
task has one source reader which already has some metrics, you can refer to 
metrics doc[1] for more detailed information.

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Best,
Shammon FY

On Tue, Jun 13, 2023 at 11:13 AM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Using Flink record stream format file source API as below for parquet records 
reading.

FileSource.FileSourceBuilder source = 
FileSource.forRecordStreamFormat(streamformat, path);
source.monitorContinuously(Duration.ofMillis(1));

Want to log/generate metrices for corrupt records and for the same need to log 
flink metrices at source level in parquet reader class, is there any way to do 
that as right now no handle for SourceContext available?

Rgds,
Kamal

--
Ken Krugler
http://www.scaleunlimited.com<https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-454445554331-f6b7d29d8e15caa4=1=33678f8f-9480-45e5-aac4-a9d76bed10d0=http%3A%2F%2Fwww.scaleunlimited.com%2F>
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch





Re: Flink bulk and record file source format metrices

2023-06-16 Thread Ken Krugler
Hi Kamal,

In a similar situation, when a decoding failure happened I would generate a 
special record that I could then detect/filter out (and increment a counter) in 
a FilterFunction immediately following the source.

— Ken


> On Jun 16, 2023, at 2:11 AM, Kamal Mittal via user  
> wrote:
> 
> Hello,
>  
> Any way-forward, please suggest.
>  
> Rgds,
> Kamal
>  
> From: Kamal Mittal via user  <mailto:user@flink.apache.org>> 
> Sent: 15 June 2023 10:39 AM
> To: Shammon FY mailto:zjur...@gmail.com>>
> Cc: user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: RE: Flink bulk and record file source format metrices
>  
> Hello,
>  
> I need one counter matric for no. of corrupt records while decoding parquet 
> records at data source level. I know where the corrupt record handling 
> requires but due to non-existence of “SourceContext” or “RuntimeContext”, 
> unable to do anything w.r.t metric.
>  
> It is needed similarly the way “SourceReaderBase” class maintaining one 
> counter for no. of records emitted.
>  
> Rgds,
> Kamal
>  
> From: Shammon FY mailto:zjur...@gmail.com>> 
> Sent: 14 June 2023 05:33 PM
> To: Kamal Mittal  <mailto:kamal.mit...@ericsson.com>>
> Cc: user@flink.apache.org <mailto:user@flink.apache.org>
> Subject: Re: Flink bulk and record file source format metrices
>  
> Hi Kamal,
>  
> Can you give more information about the metris you want? In Flink each source 
> task has one source reader which already has some metrics, you can refer to 
> metrics doc[1] for more detailed information.
>  
> [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/ 
> <https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/>
>  
> Best,
> Shammon FY
>  
> On Tue, Jun 13, 2023 at 11:13 AM Kamal Mittal via user  <mailto:user@flink.apache.org>> wrote:
> Hello,
>  
> Using Flink record stream format file source API as below for parquet records 
> reading.
>  
> FileSource.FileSourceBuilder source = 
> FileSource.forRecordStreamFormat(streamformat, path);
> source.monitorContinuously(Duration.ofMillis(1));
>  
> Want to log/generate metrices for corrupt records and for the same need to 
> log flink metrices at source level in parquet reader class, is there any way 
> to do that as right now no handle for SourceContext available?
>  
> Rgds,
> Kamal

--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch





RE: Flink bulk and record file source format metrices

2023-06-16 Thread Kamal Mittal via user
Hello,

Any way-forward, please suggest.

Rgds,
Kamal

From: Kamal Mittal via user 
Sent: 15 June 2023 10:39 AM
To: Shammon FY 
Cc: user@flink.apache.org
Subject: RE: Flink bulk and record file source format metrices

Hello,

I need one counter matric for no. of corrupt records while decoding parquet 
records at data source level. I know where the corrupt record handling requires 
but due to non-existence of “SourceContext” or “RuntimeContext”, unable to do 
anything w.r.t metric.

It is needed similarly the way “SourceReaderBase” class maintaining one counter 
for no. of records emitted.

Rgds,
Kamal

From: Shammon FY mailto:zjur...@gmail.com>>
Sent: 14 June 2023 05:33 PM
To: Kamal Mittal mailto:kamal.mit...@ericsson.com>>
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Flink bulk and record file source format metrices

Hi Kamal,

Can you give more information about the metris you want? In Flink each source 
task has one source reader which already has some metrics, you can refer to 
metrics doc[1] for more detailed information.

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Best,
Shammon FY

On Tue, Jun 13, 2023 at 11:13 AM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Using Flink record stream format file source API as below for parquet records 
reading.

FileSource.FileSourceBuilder source = 
FileSource.forRecordStreamFormat(streamformat, path);
source.monitorContinuously(Duration.ofMillis(1));

Want to log/generate metrices for corrupt records and for the same need to log 
flink metrices at source level in parquet reader class, is there any way to do 
that as right now no handle for SourceContext available?

Rgds,
Kamal


RE: Flink bulk and record file source format metrices

2023-06-14 Thread Kamal Mittal via user
Hello,

I need one counter matric for no. of corrupt records while decoding parquet 
records at data source level. I know where the corrupt record handling requires 
but due to non-existence of “SourceContext” or “RuntimeContext”, unable to do 
anything w.r.t metric.

It is needed similarly the way “SourceReaderBase” class maintaining one counter 
for no. of records emitted.

Rgds,
Kamal

From: Shammon FY 
Sent: 14 June 2023 05:33 PM
To: Kamal Mittal 
Cc: user@flink.apache.org
Subject: Re: Flink bulk and record file source format metrices

Hi Kamal,

Can you give more information about the metris you want? In Flink each source 
task has one source reader which already has some metrics, you can refer to 
metrics doc[1] for more detailed information.

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Best,
Shammon FY

On Tue, Jun 13, 2023 at 11:13 AM Kamal Mittal via user 
mailto:user@flink.apache.org>> wrote:
Hello,

Using Flink record stream format file source API as below for parquet records 
reading.

FileSource.FileSourceBuilder source = 
FileSource.forRecordStreamFormat(streamformat, path);
source.monitorContinuously(Duration.ofMillis(1));

Want to log/generate metrices for corrupt records and for the same need to log 
flink metrices at source level in parquet reader class, is there any way to do 
that as right now no handle for SourceContext available?

Rgds,
Kamal


Re: Flink bulk and record file source format metrices

2023-06-14 Thread Shammon FY
Hi Kamal,

Can you give more information about the metris you want? In Flink each
source task has one source reader which already has some metrics, you can
refer to metrics doc[1] for more detailed information.

[1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/

Best,
Shammon FY

On Tue, Jun 13, 2023 at 11:13 AM Kamal Mittal via user <
user@flink.apache.org> wrote:

> Hello,
>
>
>
> Using Flink record stream format file source API as below for parquet
> records reading.
>
>
>
> FileSource.FileSourceBuilder source = FileSource.
> *forRecordStreamFormat*(streamformat, path);
>
> source.monitorContinuously(Duration.*ofMillis*(1));
>
>
>
> Want to log/generate metrices for corrupt records and for the same need to
> log flink metrices at *source level* in parquet reader class, is there
> any way to do that as right now no handle for SourceContext available?
>
>
>
> Rgds,
>
> Kamal
>