Re: Backpressure handling in FileSource APIs - Flink 1.16

2023-05-28 Thread Shammon FY
Hi Kamal,

The network buffer will be full for specific `FileSource` when the job has
back pressure which will block the source subtask. You can refer to network
buffer [1] for more information.

[1]
https://flink.apache.org/2019/06/05/a-deep-dive-into-flinks-network-stack/

Best,
Shammon FY


On Fri, May 26, 2023 at 7:13 PM Kamal Mittal  wrote:

> Hello Shammon,
>
> Can you please point out the classes where like for "FileSource" slow down
> logic is placed?
>
> Just wanted to understand it more better and try it at my end by running
> various perf. runs, also apply changes in my application if any.
>
> Rgds,
> Kamal
>
> On Thu, May 25, 2023 at 9:16 AM Kamal Mittal  wrote:
>
>> Hello Shammon,
>>
>> Can you please point out the classes where like for "FileSource" slow
>> down logic is placed?
>>
>> Just wanted to understand it more better and try it at my end by running
>> various perf. runs, also apply changes in my application if any.
>>
>> Rgds,
>> Kamal
>>
>> On Wed, May 24, 2023 at 11:41 AM Kamal Mittal  wrote:
>>
>>> Thanks Shammon for clarification.
>>>
>>> On Wed, May 24, 2023 at 11:01 AM Shammon FY  wrote:
>>>
 Hi Kamal,

 The source will slow down when there is backpressure in the flink job,
 you can refer to docs [1] and [2] to get more detailed information about
 backpressure mechanism.

 Currently there's no API or Callback in source for users to do some
 customized operations for backpressure, but users can collect the metrics
 of the job and analysis, for example, the metrics in [1] and [3]. I hope
 this can help you.

 [1]
 https://flink.apache.org/2021/07/07/how-to-identify-the-source-of-backpressure/#:~:text=Backpressure%20is%20an%20indicator%20that,the%20queues%20before%20being%20processed
 .
 [2]
 https://www.alibabacloud.com/blog/analysis-of-network-flow-control-and-back-pressure-flink-advanced-tutorials_596632
 [3]
 https://nightlies.apache.org/flink/flink-docs-master/docs/ops/monitoring/back_pressure/

 On Tue, May 23, 2023 at 9:40 PM Kamal Mittal 
 wrote:

> Hello Community,
>
> Can you please share views about the query asked above w.r.t back
> pressure for  FileSource APIs for Bulk and Record stream formats.
> Planning to use these APIs w.r.t AVRO to Parquet and vice-versa
> conversion.
>
> Rgds,
> Kamal
>
> On Tue, 23 May 2023, 12:26 pm Kamal Mittal, 
> wrote:
>
>> Added Flink community DL as well.
>>
>> -- Forwarded message -
>> From: Kamal Mittal 
>> Date: Tue, May 23, 2023 at 7:57 AM
>> Subject: Re: Backpressure handling in FileSource APIs - Flink 1.16
>> To: Shammon FY 
>>
>>
>> Hello,
>>
>> Yes, want to take some custom actions and also if there is any
>> default behavior of slowing down sending data in pipeline further or
>> reading data from source somehow?
>>
>> Rgds,
>> Kamal
>>
>> On Tue, May 23, 2023 at 6:06 AM Shammon FY  wrote:
>>
>>> Hi Kamal,
>>>
>>> If I understand correctly, do you want the source to do some custom
>>> actions, such as current limiting, when there is backpressure in the 
>>> job?
>>>
>>> Best,
>>> Shammon FY
>>>
>>>
>>> On Mon, May 22, 2023 at 2:12 PM Kamal Mittal 
>>> wrote:
>>>
 Hello Community,

 Can you please share views about the query asked above w.r.t back
 pressure for  FileSource APIs for Bulk and Record stream formats.
 Planning to use these APIs w.r.t AVRO to Parquet and vice-versa
 conversion.

 Rgds,
 Kamal

 On Thu, May 18, 2023 at 2:33 PM Kamal Mittal 
 wrote:

> Hello Community,
>
> Does FileSource APIs for Bulk and Record stream formats handle
> back pressure by any way like slowing down sending data in piepline 
> further
> or reading data from source somehow?
> Or does it give any callback/handle so that any action can be
> taken? Can you please share details if any?
>
> Rgds,
> Kamal
>



Re: [ANNOUNCE] Apache Flink 1.16.2 released

2023-05-28 Thread weijie guo
Hi Jing,

Thank you for caring about the releasing process. It has to be said that
the entire process went smoothly. We have very comprehensive
documentation[1] to guide my work, thanks to the contribution of previous
release managers and the community.

Regarding the obstacles, I actually only have one minor problem: We used an
older twine(1.12.0) to deploy python artifacts to PyPI, and its compatible
dependencies (such as urllib3) are also older. When I tried twine upload,
the process couldn't work as expected as the version of urllib3 installed
in my machine was relatively higher. In order to solve this, I had to
proactively downgrade the version of some dependencies. I added a notice in
the cwiki page[1] to prevent future release managers from encountering the
same problem. It seems that this is a known issue(see comments in [2]),
which has been resolved in the higher version of twine, I wonder if we can
upgrade the version of twine? Does anyone remember the reason why we fixed
a very old version(1.12.0)?

Best regards,

Weijie


[1]
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release

[2] https://github.com/pypa/twine/issues/997


Jing Ge  于2023年5月27日周六 00:15写道:

> Hi Weijie,
>
> Thanks again for your effort. I was wondering if there were any obstacles
> you had to overcome while releasing 1.16.2 and 1.17.1 that could lead us to
> any improvement wrt the release process and management?
>
> Best regards,
> Jing
>
> On Fri, May 26, 2023 at 4:41 PM Martijn Visser 
> wrote:
>
> > Thank you Weijie and those who helped with testing!
> >
> > On Fri, May 26, 2023 at 1:06 PM weijie guo 
> > wrote:
> >
> > > The Apache Flink community is very happy to announce the release of
> > > Apache Flink 1.16.2, which is the second bugfix release for the Apache
> > > Flink 1.16 series.
> > >
> > >
> > >
> > > Apache Flink® is an open-source stream processing framework for
> > > distributed, high-performing, always-available, and accurate data
> > > streaming applications.
> > >
> > >
> > >
> > > The release is available for download at:
> > >
> > > https://flink.apache.org/downloads.html
> > >
> > >
> > >
> > > Please check out the release blog post for an overview of the
> > > improvements for this bugfix release:
> > >
> > > https://flink.apache.org/news/2023/05/25/release-1.16.2.html
> > >
> > >
> > >
> > > The full release notes are available in Jira:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352765
> > >
> > >
> > >
> > > We would like to thank all contributors of the Apache Flink community
> > > who made this release possible!
> > >
> > >
> > >
> > > Feel free to reach out to the release managers (or respond to this
> > > thread) with feedback on the release process. Our goal is to
> > > constantly improve the release process. Feedback on what could be
> > > improved or things that didn't go so well are appreciated.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Release Manager
> > >
> >
>


[DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-05-28 Thread yuxia
Hi, community . 

I want to start the discussion about Hive dialect shouldn't fall back to 
Flink's default dialect. 

Currently, when the HiveParser fail to parse the sql in Hive dialect, it'll 
fall back to Flink's default parser[1] to handle flink-specific statements like 
"CREATE CATALOG xx with (xx);". 

As I‘m involving with Hive dialect and have some communication with community 
users who use Hive dialectrecently, I'm thinking throw exception directly 
instead of falling back to Flink's default dialect when fail to parse the sql 
in Hive dialect 

Here're some reasons: 

First of all, it'll hide some error with Hive dialect. For example, we found we 
can't use Hive dialect any more with Flink sql client in release validation 
phase[2], finally we find a modification in Flink sql client cause it, but our 
test case can't find it earlier for although HiveParser faill to parse it but 
then it'll fall back to default parser and pass test case successfully. 

Second, conceptually, Hive dialect should be do nothing with Flink's default 
dialect. They are two totally different dialect. If we do need a dialect mixing 
Hive dialect and default dialect , may be we need to propose a new hybrid 
dialect and announce the hybrid behavior to users. 
Also, It made some users confused for the fallback behavior. The fact comes 
from I had been ask by community users. Throw an excpetioin directly when fail 
to parse the sql statement in Hive dialect will be more intuitive. 

Last but not least, it's import to decouple Hive with Flink planner[3] before 
we can externalize Hive connector[4]. If we still fall back to Flink default 
dialct, then we will need depend on `ParserImpl` in Flink planner, which will 
block us removing the provided dependency of Hive dialect as well as 
externalizing Hive connector. 

Although we hadn't announced the fall back behavior ever, but some users may 
implicitly depend on this behavior in theirs sql jobs. So, I hereby open the 
dicussion about abandoning the fall back behavior to make Hive dialect clear 
and isoloted. 
Please remember it won't break the Hive synatax but the syntax specified to 
Flink may fail after then. But for the failed sql, you can use `SET 
table.sql-dialect=default;` to switch to Flink dialect. 
If there's some flink-specific statements we found should be included in Hive 
dialect to be easy to use, I think we can still add them as specific cases to 
Hive dialect. 

Look forwards to your feedback. I'd love to listen the feedback from community 
to take the next steps. 

[1]:https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParser.java#L348
 
[2]:https://issues.apache.org/jira/browse/FLINK-26681 
[3]:https://issues.apache.org/jira/browse/FLINK-31413 
[4]:https://issues.apache.org/jira/browse/FLINK-30064 



Best regards, 
Yuxia