Re: About Flink parquet format
Hi Kamal Indeed, Flink does not handle this exception. When this exception occurs, the Flink job will fail directly and internally keep restarting, continuously creating new files. Personally, I think this logic can be optimized. When this exception occurs, the file with the exception should be deleted before the Flink job exits, to avoid generating too many unnecessary files. Best, Feng On Mon, Sep 25, 2023 at 10:27 AM Kamal Mittal wrote: > Hello, > > > > Can you please share that why Flink is not able to handle exception and > keeps on creating files continuously without closing? > > > > Rgds, > > Kamal > > > > *From:* Kamal Mittal via user > *Sent:* 21 September 2023 07:58 AM > *To:* Feng Jin > *Cc:* user@flink.apache.org > *Subject:* RE: About Flink parquet format > > > > Yes. > > > > Due to below error, Flink bulk writer never close the part file and keep > on creating new part file continuously. Is flink not handling exceptions > like below? > > > > *From:* Feng Jin > *Sent:* 20 September 2023 05:54 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi > > > > I tested it on my side and also got the same error. This should be a > limitation of Parquet. > > > > ``` > > java.lang.IllegalArgumentException: maxCapacityHint can't be less than > initialSlabSize 64 1 > > at org.apache.parquet.Preconditions.checkArgument(Preconditions.java: > 57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] > > at org.apache.parquet.bytes.CapacityByteArrayOutputStream.( > CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar: > 1.17.1] > > at org.apache.parquet.column.values.rle. > RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder > .jav > > ``` > > > > > > So I think the current minimum page size that can be set for parquet is > 64B. > > > > Best, > > Feng > > > > > > On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal > wrote: > > Hello, > > > > If given page size as 1 byte then encountered exception as - > ‘maxCapacityHint can't be less than initialSlabSize %d %d’. > > > > This is coming from class CapacityByteArrayOutputStream and contained in > parquet-common library. > > > > Rgds, > > Kamal > > > > *From:* Feng Jin > *Sent:* 19 September 2023 01:01 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi Kamal > > > > What exception did you encounter? I have tested it locally and it works > fine. > > > > > > Best, > > Feng > > > > > > On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal > wrote: > > Hello, > > > > Checkpointing is enabled and works fine if configured parquet page size is > at least 64 bytes as otherwise there is exception thrown at back-end. > > > > Looks to be an issue which is not handled by file sink bulk writer? > > > > Rgds, > > Kamal > > > > *From:* Feng Jin > *Sent:* 15 September 2023 04:14 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi Kamal > > > > Check if the checkpoint of the task is enabled and triggered correctly. By > default, write parquet files will roll a new file when checkpointing. > > > > > > Best, > > Feng > > > > On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user < > user@flink.apache.org> wrote: > > Hello, > > > > Tried parquet file creation with file sink bulk writer. > > > > If configured parquet page size as low as 1 byte (allowed configuration) > then flink keeps on creating multiple ‘in-progress’ state files and with > content only as ‘PAR1’ and never closed the file. > > > > I want to know what is the reason of not closing the file and creating > multiple ‘in-progress’ part files or why no error is given if applicable? > > > > Rgds, > > Kamal > >
RE: About Flink parquet format
Hello, Can you please share that why Flink is not able to handle exception and keeps on creating files continuously without closing? Rgds, Kamal From: Kamal Mittal via user Sent: 21 September 2023 07:58 AM To: Feng Jin Cc: user@flink.apache.org Subject: RE: About Flink parquet format Yes. Due to below error, Flink bulk writer never close the part file and keep on creating new part file continuously. Is flink not handling exceptions like below? From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 20 September 2023 05:54 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder.jav ``` So I think the current minimum page size that can be set for parquet is 64B. Best, Feng On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal mailto:kamal.mit...@ericsson.com>> wrote: Hello, If given page size as 1 byte then encountered exception as - ‘maxCapacityHint can't be less than initialSlabSize %d %d’. This is coming from class CapacityByteArrayOutputStream and contained in parquet-common library. Rgds, Kamal From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 19 September 2023 01:01 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal mailto:kamal.mit...@ericsson.com>> wrote: Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 15 September 2023 04:14 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal
RE: About Flink parquet format
Yes. Due to below error, Flink bulk writer never close the part file and keep on creating new part file continuously. Is flink not handling exceptions like below? From: Feng Jin Sent: 20 September 2023 05:54 PM To: Kamal Mittal Cc: user@flink.apache.org Subject: Re: About Flink parquet format Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.bytes.CapacityByteArrayOutputStream.(CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.(RunLengthBitPackingHybridEncoder.jav ``` So I think the current minimum page size that can be set for parquet is 64B. Best, Feng On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal mailto:kamal.mit...@ericsson.com>> wrote: Hello, If given page size as 1 byte then encountered exception as - ‘maxCapacityHint can't be less than initialSlabSize %d %d’. This is coming from class CapacityByteArrayOutputStream and contained in parquet-common library. Rgds, Kamal From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 19 September 2023 01:01 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal mailto:kamal.mit...@ericsson.com>> wrote: Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 15 September 2023 04:14 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal
Re: About Flink parquet format
Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.bytes.CapacityByteArrayOutputStream.( CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17. 1] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder .(RunLengthBitPackingHybridEncoder.jav ``` So I think the current minimum page size that can be set for parquet is 64B. Best, Feng On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal wrote: > Hello, > > > > If given page size as 1 byte then encountered exception as - > ‘maxCapacityHint can't be less than initialSlabSize %d %d’. > > > > This is coming from class CapacityByteArrayOutputStream and contained in > parquet-common library. > > > > Rgds, > > Kamal > > > > *From:* Feng Jin > *Sent:* 19 September 2023 01:01 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi Kamal > > > > What exception did you encounter? I have tested it locally and it works > fine. > > > > > > Best, > > Feng > > > > > > On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal > wrote: > > Hello, > > > > Checkpointing is enabled and works fine if configured parquet page size is > at least 64 bytes as otherwise there is exception thrown at back-end. > > > > Looks to be an issue which is not handled by file sink bulk writer? > > > > Rgds, > > Kamal > > > > *From:* Feng Jin > *Sent:* 15 September 2023 04:14 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi Kamal > > > > Check if the checkpoint of the task is enabled and triggered correctly. By > default, write parquet files will roll a new file when checkpointing. > > > > > > Best, > > Feng > > > > On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user < > user@flink.apache.org> wrote: > > Hello, > > > > Tried parquet file creation with file sink bulk writer. > > > > If configured parquet page size as low as 1 byte (allowed configuration) > then flink keeps on creating multiple ‘in-progress’ state files and with > content only as ‘PAR1’ and never closed the file. > > > > I want to know what is the reason of not closing the file and creating > multiple ‘in-progress’ part files or why no error is given if applicable? > > > > Rgds, > > Kamal > >
RE: About Flink parquet format
Hello, If given page size as 1 byte then encountered exception as - ‘maxCapacityHint can't be less than initialSlabSize %d %d’. This is coming from class CapacityByteArrayOutputStream and contained in parquet-common library. Rgds, Kamal From: Feng Jin Sent: 19 September 2023 01:01 PM To: Kamal Mittal Cc: user@flink.apache.org Subject: Re: About Flink parquet format Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal mailto:kamal.mit...@ericsson.com>> wrote: Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin mailto:jinfeng1...@gmail.com>> Sent: 15 September 2023 04:14 PM To: Kamal Mittal mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal
Re: About Flink parquet format
Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal wrote: > Hello, > > > > Checkpointing is enabled and works fine if configured parquet page size is > at least 64 bytes as otherwise there is exception thrown at back-end. > > > > Looks to be an issue which is not handled by file sink bulk writer? > > > > Rgds, > > Kamal > > > > *From:* Feng Jin > *Sent:* 15 September 2023 04:14 PM > *To:* Kamal Mittal > *Cc:* user@flink.apache.org > *Subject:* Re: About Flink parquet format > > > > Hi Kamal > > > > Check if the checkpoint of the task is enabled and triggered correctly. By > default, write parquet files will roll a new file when checkpointing. > > > > > > Best, > > Feng > > > > On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user < > user@flink.apache.org> wrote: > > Hello, > > > > Tried parquet file creation with file sink bulk writer. > > > > If configured parquet page size as low as 1 byte (allowed configuration) > then flink keeps on creating multiple ‘in-progress’ state files and with > content only as ‘PAR1’ and never closed the file. > > > > I want to know what is the reason of not closing the file and creating > multiple ‘in-progress’ part files or why no error is given if applicable? > > > > Rgds, > > Kamal > >
RE: About Flink parquet format
Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin Sent: 15 September 2023 04:14 PM To: Kamal Mittal Cc: user@flink.apache.org Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal
Re: About Flink parquet format
Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user wrote: > Hello, > > > > Tried parquet file creation with file sink bulk writer. > > > > If configured parquet page size as low as 1 byte (allowed configuration) > then flink keeps on creating multiple ‘in-progress’ state files and with > content only as ‘PAR1’ and never closed the file. > > > > I want to know what is the reason of not closing the file and creating > multiple ‘in-progress’ part files or why no error is given if applicable? > > > > Rgds, > > Kamal >