Re: [DISCUSS] Drop Gelly

2022-01-04 Thread Jing Ge
Hi,

thanks Martijn for bringing it up for discussion. I think we could make the
discussion a little bit clearer by splitting it into two questions:

1. should Flink drop Gelly?
2. should Flink drop the graph computing?

The answer of the first question could be yes, since there have been no
changes for years. +1 for dropping Gelly.

But for the second question, I would suggest answering it with no or with
strategic yes and will definitely support it again in the near future,
because there are many use cases that could be solved by streaming/batch +
graph in a more elegant way. Afaik fintech companies have a lot of those
use cases[1]. It would be great if we could find a way to drop Gelly but
keep the graph computing ability within Flink's ecosystem.

Best regards
Jing

[1]
https://california18.com/the-ant-graph-calculation-is-upgraded-to-tugraph-and-it-won-the-2021-world-internet-leading-scientific-and-technological-achievement-award/2117052021/


On Tue, Jan 4, 2022 at 2:02 PM Zhipeng Zhang 
wrote:

> Hi Martijin,
>
> Thanks for the feedback. I am not proposing  to bundle the new graph
> library with Alink. I am +1 for dropping the DataSet-based Gelly library,
> but we probably need a new graph library in Flink for the possible
> migration.
>
> We haven't decided what to do yet and probably need more discussion. There
> are some possible solutions:
> 1. We include a new DataStream-based graph library in FlinkML[1], given
> that graphs and machine learning algorithms are more often used together
> [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in
> FlinkML.
> 2. We include a new DataStream-based graph library as a separate
> module/repo. This is consistent with existing libraries like Spark [5].
>
> What do you think?
>
>
> [1] https://github.com/apache/flink-ml
> [2] https://arxiv.org/abs/1403.6652
> [3] https://arxiv.org/abs/1503.03578
> [4] https://github.com/apache/spark
>
> Best,
> Zhipeng
>
> Martijn Visser  于2022年1月4日周二 15:27写道:
>
>> Hi Zhipeng,
>>
>> Good that you've reached out, I wasn't aware that Gelly is being used in
>> Alink. Are you proposing to write a new graph library as a successor of
>> Gelly and bundle that with Alink?
>>
>> Best regards,
>>
>> Martijn
>>
>> On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Thanks for starting the discussion :)
>>>
>>> We (Alink team [1]) are actually using part of the Gelly library to
>>> support graph algorithms (connected component, single source shortest path,
>>> etc.) for users in Alibaba Inc.
>>>
>>> As DataSet API is going to be dropped, shall we also provide a new graph
>>> library based on DataStream runtime (similar as we did for machine
>>> learning)?
>>>
>>> [1] https://github.com/Alibaba/alink
>>>
>>> David Anderson  于2022年1月4日周二 00:01写道:
>>>
 Most of the inquiries I've had about Gelly in recent memory have been
 from folks looking for a streaming solution, and it's only been a handful.

 +1 for dropping Gelly

 David

 On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann 
 wrote:

> I haven't seen any changes or requests to/for Gelly in ages. Hence, I
> would assume that it is not really used and can be removed.
>
> +1 for dropping Gelly.
>
> Cheers,
> Till
>
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> Flink is bundled with Gelly, a Graph API library [1]. This has been
>> marked as approaching end-of-life for quite some time [2].
>>
>> Gelly is built on top of Flink's DataSet API, which is deprecated and
>> slowly being phased out [3]. It only works on batch jobs. Based on the
>> activity in the Dev and User mailing lists, I don't see a lot of 
>> questions
>> popping up regarding the usage of Gelly. Removing Gelly would reduce CI
>> time and resources because we won't need to run tests for this anymore.
>>
>> I'm cross-posting this to the User mailing list to see if there are
>> any users of Gelly at the moment.
>>
>> Let me know your thoughts.
>>
>> Martijn Visser | Product Manager
>>
>> mart...@ververica.com
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
>>
>> [2] https://flink.apache.org/roadmap.html
>>
>> [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz
>>
>> 
>>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward  - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>>
>>>
>>> --
>>> best,
>>> Zhipeng
>>>
>>>
>
> --
> best,
> Zhipeng
>
>


[jira] [Created] (FLINK-25506) Generate HBase Connector Options doc from HBaseConnectorOptions

2022-01-03 Thread Jing Ge (Jira)
Jing Ge created FLINK-25506:
---

 Summary: Generate HBase Connector Options doc from 
HBaseConnectorOptions
 Key: FLINK-25506
 URL: https://issues.apache.org/jira/browse/FLINK-25506
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / HBase
Reporter: Jing Ge
Assignee: Jing Ge


Currently the content is written manually. It would be better to generate it 
from the class HBaseConnectorOptions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-31 Thread Jing Ge
Hi Thomas,

thanks for the feedback and clarification. The Flink community should then
make it clear that downstream developers should keep multiple
implementations for different Flink versions if it is necessary, which is
also a valid concept, so we could focus on the backward compatibility.
Hopefully iceberg and hudi developers could join this discussion and
confirm it.

Best regards
Jing

On Wed, Dec 29, 2021 at 8:10 PM Thomas Weise  wrote:

> Hi Jing,
>
> AFAIK most of the pain is caused by lack of backward compatibility
> (binary).
>
> And to make sure I'm not adding to the confusion: It would be
> necessary to be able to run the iceberg connector built against Flink
> 1.12 with a Flink 1.13 distribution. That would solve most problems
> downstream projects like Iceberg, Beam etc. face today. Those projects
> could then publish their artifacts built against the lowest Flink
> version they support and rest assured users with newer versions of
> Flink don't run into trouble.
>
> Thanks,
> Thomas
>
> On Wed, Dec 29, 2021 at 8:04 AM Jing Ge  wrote:
> >
> > Hi Piotrek,
> >
> > thanks for asking. To be honest, I hope it could be good enough if Flink
> > could only provide backward compatibility, which is easier than providing
> > forward compatibility described in the proposal. That is also one of the
> > reasons why I started this discussion. If, after the discussion, the
> > community could reach a consensus that backward compatibility is fine
> > enough, it is also a clear decision for us to avoid any further confusion
> > and complaints, i.e. make it clear that there is no guarantee for
> > the downstream vendors to provide backward compatibility of their own
> > software, they have to maintain implementations with different Flink
> minor
> > versions simultaneously. In this case, I would move the proposal about
> > forward compatibility to the Rejected Alternatives section.
> >
> > As far as I understood, there are at least three group of users, the
> first
> > group is the end users who write Flink job/SQL; the second group is
> > platform developers who use Flink to build data infrastructure and let
> > users in the first group submit their jobs; and the third group is
> > downstream software developers(vendors) who build software (module) that
> > depends on Flink. Platform developers in the second group will also use
> the
> > software provided by the third group in their data infra. The developers
> > from the second and the third groups play an important role for the
> > adaption/upgrade of the new Flink version in the industry. There are
> > compatibility issues between them, afaik, both backward and forward. I
> was
> > not able to join the discussion caused by iceberg upgrade issue
> previously,
> > after reading the whole story, building the connector with Flink 1.12
> will
> > not solve their problem. It was the forward compatibility issue (built
> with
> > 1.13 and run on 1.12) that made it difficult for them to upgrade.
> >
> > Best regards
> > Jing
> >
> > On Wed, Dec 29, 2021 at 3:42 PM Piotr Nowojski 
> wrote:
> >
> > > Hi Jink,
> > >
> > > I haven't yet fully reviewed the FLIP document, but I wanted to clarify
> > > something.
> > >
> > > > Flink Forward Compatibility
> > > > Based on the previous clarification, Flink forward compatibility
> should
> > > mean that Flink jobs or ecosystems like external connectors/formats
> built
> > > with newer
> > > > Flink version Y like 1.15 can be running on an older Flink version X
> like
> > > 1.14 with no issue. With this definition, we might realize that the
> > > original requirement
> > > > in the thread [1] was asking for Flink forward compatibility, i.e.
> > > iceberg-flink module compiled with Flink 1.13 wanted to be running on
> Flink
> > > 1.12 cluster. And
> > > > from Flink user perspective, this is a backward compatibility issue.
> This
> > > is the same case: Flink forward compatibility equals Flink
> job/ecosystems
> > > backward
> > > > compatibility.
> > >
> > > Is it really important to provide this kind of "forward binary"
> > > compatibility? You are referring to the iceberg example, but the
> workaround
> > > there was quite obvious: just build the connector using Flink 1.12. As
> long
> > > as the connector is using stable, @Public API, AND assuming we provide
> > > binary compatibility, it should work fine with Flink 1.13. So to solve
> this
> > > user r

Re: [DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Jing Ge
Hi Piotrek,

thanks for asking. To be honest, I hope it could be good enough if Flink
could only provide backward compatibility, which is easier than providing
forward compatibility described in the proposal. That is also one of the
reasons why I started this discussion. If, after the discussion, the
community could reach a consensus that backward compatibility is fine
enough, it is also a clear decision for us to avoid any further confusion
and complaints, i.e. make it clear that there is no guarantee for
the downstream vendors to provide backward compatibility of their own
software, they have to maintain implementations with different Flink minor
versions simultaneously. In this case, I would move the proposal about
forward compatibility to the Rejected Alternatives section.

As far as I understood, there are at least three group of users, the first
group is the end users who write Flink job/SQL; the second group is
platform developers who use Flink to build data infrastructure and let
users in the first group submit their jobs; and the third group is
downstream software developers(vendors) who build software (module) that
depends on Flink. Platform developers in the second group will also use the
software provided by the third group in their data infra. The developers
from the second and the third groups play an important role for the
adaption/upgrade of the new Flink version in the industry. There are
compatibility issues between them, afaik, both backward and forward. I was
not able to join the discussion caused by iceberg upgrade issue previously,
after reading the whole story, building the connector with Flink 1.12 will
not solve their problem. It was the forward compatibility issue (built with
1.13 and run on 1.12) that made it difficult for them to upgrade.

Best regards
Jing

On Wed, Dec 29, 2021 at 3:42 PM Piotr Nowojski  wrote:

> Hi Jink,
>
> I haven't yet fully reviewed the FLIP document, but I wanted to clarify
> something.
>
> > Flink Forward Compatibility
> > Based on the previous clarification, Flink forward compatibility should
> mean that Flink jobs or ecosystems like external connectors/formats built
> with newer
> > Flink version Y like 1.15 can be running on an older Flink version X like
> 1.14 with no issue. With this definition, we might realize that the
> original requirement
> > in the thread [1] was asking for Flink forward compatibility, i.e.
> iceberg-flink module compiled with Flink 1.13 wanted to be running on Flink
> 1.12 cluster. And
> > from Flink user perspective, this is a backward compatibility issue. This
> is the same case: Flink forward compatibility equals Flink job/ecosystems
> backward
> > compatibility.
>
> Is it really important to provide this kind of "forward binary"
> compatibility? You are referring to the iceberg example, but the workaround
> there was quite obvious: just build the connector using Flink 1.12. As long
> as the connector is using stable, @Public API, AND assuming we provide
> binary compatibility, it should work fine with Flink 1.13. So to solve this
> user request, we could provide either forward OR backward binary
> compatibility, Right? The question is which one of those is easier to
> provide and explain to the users.
>
> Of course the big missing piece after FLIP-197 is that we haven't agreed on
> any type of binary compatibility.
>
> Best,
> Piotrek
>
> śr., 29 gru 2021 o 14:07 Jing Ge  napisał(a):
>
> > Hi everyone,
> >
> > with great interest I have read all discussions [1][2][3] w.r.t. the
> (API?)
> > compatibility issues. The feedback coming from the Flink user's point of
> > view is very valuable. Many thanks for it. In these discussions, there
> were
> > many explanations that talked about backward and forward compatibility
> and
> > broke down to API and ABI compatibility. Since each time when a developer
> > referred to compatibility, there was an implicit context behind, which
> > might cause confusion, e.g. the forward compatibility mentioned in the
> API
> > compatibility discussion thread[1] was actually the Flink ABI backward
> > compatibility mentioned in FLIP-196. The original requirement posted in
> the
> > API compatibility discussion thread[1] was actually Flink ABI forward
> > compatibility, afaik. I will explain it in the proposal. They were all
> > correct because they talked from different perspectives. But it was hard
> > for audiences to follow it.
> >
> > I’d like to start a discussion about backward/forward compatibility from
> > both users perspective and Flink perspective. I Tried to put myself in
> > users' shoes and then to see whether we could do anything to reduce
> users'
> > effort for upgrading Flink.
> >
> >

[DISCUSS] FLIP-207: Flink backward and forward compatibility

2021-12-29 Thread Jing Ge
Hi everyone,

with great interest I have read all discussions [1][2][3] w.r.t. the (API?)
compatibility issues. The feedback coming from the Flink user's point of
view is very valuable. Many thanks for it. In these discussions, there were
many explanations that talked about backward and forward compatibility and
broke down to API and ABI compatibility. Since each time when a developer
referred to compatibility, there was an implicit context behind, which
might cause confusion, e.g. the forward compatibility mentioned in the API
compatibility discussion thread[1] was actually the Flink ABI backward
compatibility mentioned in FLIP-196. The original requirement posted in the
API compatibility discussion thread[1] was actually Flink ABI forward
compatibility, afaik. I will explain it in the proposal. They were all
correct because they talked from different perspectives. But it was hard
for audiences to follow it.

I’d like to start a discussion about backward/forward compatibility from
both users perspective and Flink perspective. I Tried to put myself in
users' shoes and then to see whether we could do anything to reduce users'
effort for upgrading Flink.

The proposal contains four parts:
- Clarify the definition of API and ABI, backward and forward compatibility
to make sure everyone is on the same page.
- Analyze and classify issues from users' feedback.
- Summarize the definition and the gap between the current status and what
users need.
- Proposed changes

Please find the details on
https://cwiki.apache.org/confluence/display/FLINK/FLIP-207%3A+Flink+backward+and+forward+compatibility

Looking forward to your feedback. Many thanks.

best regards
Jing


[jira] [Created] (FLINK-25416) Build unified Parquet BulkFormat for both Table API and DataStream API

2021-12-22 Thread Jing Ge (Jira)
Jing Ge created FLINK-25416:
---

 Summary: Build unified Parquet BulkFormat for both Table API and 
DataStream API
 Key: FLINK-25416
 URL: https://issues.apache.org/jira/browse/FLINK-25416
 Project: Flink
  Issue Type: New Feature
Reporter: Jing Ge


*Background information*

Current AvroParquet implementation AvroParquetRecordFormat uses the high level 
API ParquetReader that does not provide offset information, which turns out the 
restoreReader logic has big room to improve.

Beyond AvroParquetRecordFormat there is another format implementation 
ParquetVectorizedInputFormat w.r.t. the parquet which is coupled tightly with 
the Table API.

It would be better to provide an unified Parquet BulkFormat with one 
implementation that can support both Table API and DataStream API.

 

*Some thoughts*

Use the low level API {{ParquetFileReader}} with {{BulkFormat}} directly like 
'ParquetVectorizedInputFormat' did instead of with {{StreamFormat}} for the 
following reasons:
 * the read logic is built in the internal low level class 
{{InternalParquetRecordReader}} with package private visibility in 
parquet-hadoop lib which uses another low level class {{ParquetFileReader}} 
internally. This makes the implementation of StreamFormat very complicated. I 
think the design idea of StreamFormat is to simplify the implementation. They 
do not seem to work together.

 * {{{}ParquetFileReader{}}}reads data in batch mode, i.e. {{{}PageReadStore 
pages = reader.readNextFilteredRowGroup();{}}}. If we build these logic into 
StreamFormat({{{}AvroParquetRecordFormat{}}} in this case), 
{{AvroParquetRecordFormat}} has to take over the role 
{{InternalParquetRecordReader}} does, including but not limited to

 ## read {{PageReadStore}} in batch mode.
 ## manage {{{}PageReadStore{}}}, i.e. read next page when all records in the 
current page have been consumed and cache it.
 ## manage the read index within the current {{PageReadStore}} because 
StreamFormat has its own setting for read size, etc.
All of these make {{AvroParquetRecordFormat}} become the {{BulkFormat}} instead 
of {{StreamFormat}}

 * {{StreamFormat}} can only be used via {{{}StreamFormatAdapter{}}}, which 
means everything we will do with the low level APIs for parquet-hadoop lib 
should have no conflict with the built-in logic provided by 
{{{}StreamFormatAdapter{}}}.

Now we could see if we build these logics into a {{StreamFormat}} 
implementation, i.e. {{{}AvroParquetRecordFormat{}}}, all convenient built-in 
logic provided by the {{StreamFormatAdapter}} turns into obstacles. There is 
also a violation of single responsibility principle, i.e. 
{{AvroParquetRecordFormat }}will take some responsibility of 
{{{}BulkFormat{}}}. These might be the reasons why 
'ParquetVectorizedInputFormat' implemented {{BulkFormat}} instead of 
{{{}StreamFormat{}}}.

In order to build a unified parquet implementation for both Table API and 
DataStream API, it makes more sense to consider building these code into a 
{{BulkFormat}} implementation class. Since the output data types are different, 
{{RowData}} vs. {{{}Avro{}}}, extra converter logic should be introduced into 
the architecture design. Depending on how complicated the issue will be and how 
big the impact it will have on the current code base, a new FLIP might be 
required. 

Following code piece were suggested by Arvid Heise for the next optimized 
AvroParquetReader:
{code:java}
// Injected
GenericData model = GenericData.get();
org.apache.hadoop.conf.Configuration conf = new 
org.apache.hadoop.conf.Configuration();

// Low level reader - fetch metadata
ParquetFileReader reader = null;
MessageType fileSchema = reader.getFileMetaData().getSchema();
Map metaData = 
reader.getFileMetaData().getKeyValueMetaData();

// init Avro specific things
AvroReadSupport readSupport = new AvroReadSupport<>(model);
ReadSupport.ReadContext readContext =
readSupport.init(
new InitContext(
  conf,
metaData.entrySet().stream()
.collect(Collectors.toMap(e -> 
e.getKey(), e -> Collections.singleton(e.getValue(,
fileSchema));
RecordMaterializer recordMaterializer = 
readSupport.prepareForRead(conf, metaData, fileSchema, readContext);
MessageType requestedSchema = readContext.getRequestedSchema();

// prepare record reader
ColumnIOFactory columnIOFactory = new 
ColumnIOFactory(reader.getFileMetaData().getCreatedBy());
MessageColumnIO columnIO = 
columnIOFactory.getColumnIO(requestedSchema, fileSchema, true);

// for recovery
 

Re: [DISCUSS] JUnit 5 Migration

2021-12-17 Thread Jing Ge
Thanks Hang and Qingsheng for your effort and starting this discussion. As
additional information, I've created an umbrella ticket(
https://issues.apache.org/jira/browse/FLINK-25325). It is recommended to
create all JUnit5 migration related tasks under it, So we could track the
whole migration easily.

I think, for the parameterized test issue, the major problem is that, on
one hand, JUnit 5 has its own approach to make parameterized tests and it
does not allow to use parameterized fixtures at class level. This is a huge
difference compared to JUnit4. On the other hand, currently, there are many
cross module test class inheritances, which means that the migration could
not be done in one shot. It must be allowed to run JUnit4 and JUnit5 tests
simultaneously during the migration process. As long as there are sub
parameterized test classes in JUnit4, it will be risky to migrate the
parent class to JUnit5. And if the parent class has to stick with JUnit4
during the migration, any migrated JUnit5 subclass might need to duplicate
the test methods defined in the parent class. In this case, I would prefer
to duplicate the test methods with different names in the parent class for
both JUnit4 and JUnit5 only during the migration process as temporary
solution and remove the test methods for JUnit4 once the migration process
is finished, i.e. when all subclasses are JUnit5 tests. It is a trade-off
solution. Hopefully we could find another better solution during the
discussion.

Speaking of replacing @Test with @TestTemplate, since I did read all tests,
do we really need to replace all of them with @TestTemplate w.r.t. the
parameterized tests?

For the PowrMock tests, it is a good opportunity to remove them.

best regards
Jing

On Fri, Dec 17, 2021 at 2:14 PM Hang Ruan  wrote:

> Hi, all,
>
> Apache Flink is using JUnit for unit tests and integration tests widely in
> the project, however, it binds to the legacy JUnit 4 deeply. It is
> important to migrate existing cases to JUnit 5 in order to avoid splitting
> the project into different JUnit versions.
>
> Qingsheng Ren and I have conducted some trials about the JUnit 5 migration,
> but there are too many modules that need to migrate. We would like to get
> more help from the community. It is planned to migrate module by module,
> and a JUnit 5 migration guide
> <
> https://docs.google.com/document/d/1514Wa_aNB9bJUen4xm5uiuXOooOJTtXqS_Jqk9KJitU/edit?usp=sharing
> >[1]
> has been provided to new helpers on the cooperation method and how to
> migrate.
>
> There are still some problem to discuss:
>
> 1. About parameterized test:
>
> Some test classes inherit from other base test classes. We have discussed
> different situations in the guide, but the situation where a parameterized
> test subclass inherits from a non parameterized parent class has not been
> resolved.
>
> In JUnit 4, the parent test class always has some test cases annotated by
> @Test. And  the parameterized subclass will run these test cases in the
> parent class in a parameterized way.
>
> In JUnit 5, if we want a test case to be invoked multiple times, the test
> case must be annotated by @TestTemplate. A test case can not be annotated
> by both @Test and @TestTemplate, which means a test case can not be invoked
> as both a parameterized test and a non parameterized test.
>
> We thought of two ways to migrate this situation, but not good enough. Both
> two ways will introduce redundant codes, and make it hard to maintain.
>
> The first way is to change the parent class to a parameterized test and
> replace @Test tests to @TestTemplate tests. For its non parameterized
> subclasses, we provide them a fake parameter method, which will provide
> nothing.
>
> The second way is to change the parameterized subclasses. We should
> override all @Test tests in the parent class and annotate them with
> @TestTemplate, these methods in the parameterized subclass should invoke
> the same method in its parent class.
>
>
> 2. About PowerMock:
>
> According to the code style rules[2] of Flink project, it’s discouraged to
> use mockito for test cases, as well as Powermock, an extension of mockito.
> Considering the situation that PowerMock lacks JUnit 5 support [3], we
> suggest rewriting Powermock-based test class (~10 classes) with reusable
> test implementations, and finally deprecate Powermock from the project.
>
>
> 3. Commit Author:
>
> JUnit 5 migration will cause a lot of changed codes. Maybe we should use a
> common author for the JUnit 5 migration commits.
>
> Looking forward to your suggestions, thanks!
>
> Best,
>
> Hang
>
> [1]
>
> https://docs.google.com/document/d/1514Wa_aNB9bJUen4xm5uiuXOooOJTtXqS_Jqk9KJitU/edit?usp=sharing
>
> [2]
>
> https://flink.apache.org/contributing/code-style-and-quality-common.html#avoid-mockito---use-reusable-test-implementations
>
>
> [3] https://github.com/powermock/powermock/issues/929
>


[jira] [Created] (FLINK-25325) Migration Flink from Junit4 to Junit5

2021-12-15 Thread Jing Ge (Jira)
Jing Ge created FLINK-25325:
---

 Summary: Migration Flink from Junit4 to Junit5
 Key: FLINK-25325
 URL: https://issues.apache.org/jira/browse/FLINK-25325
 Project: Flink
  Issue Type: New Feature
  Components: Tests
Affects Versions: 1.14.0
Reporter: Jing Ge
 Fix For: 1.15.0


Based on the consensus from the mailing list discussion[1][2], we have been 
starting working on the JUnit4 to JUnit5 migration. 

This is the umbrella ticket which describes the big picture of the migration 
with following steps:
 * AssertJ integration and guideline
 * Test Framework upgrade from JUnit4 to JUnit5
 * JUnit5 migration guideline(document and reference migration)
 * Optimization for issues found while writing new test in JUint5
 * [Long-term]Module based graceful migration of old tests in JUnit4 to JUnit5

 

[1] 

[[DISCUSS]Moving to 
JUnit5|https://lists.apache.org/thread/jsjvc2cqb91pyh47d4p6olk3c1vxqm3w]

[2] [[DISCUSS] Conventions on assertions to use in 
tests|https://lists.apache.org/thread/33t7hz8w873p1bc5msppk65792z08rgt]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25324) Migration Flink from Junit4 to Junit5

2021-12-15 Thread Jing Ge (Jira)
Jing Ge created FLINK-25324:
---

 Summary: Migration Flink from Junit4 to Junit5
 Key: FLINK-25324
 URL: https://issues.apache.org/jira/browse/FLINK-25324
 Project: Flink
  Issue Type: New Feature
  Components: Tests
Affects Versions: 1.14.0
Reporter: Jing Ge
 Fix For: 1.15.0


Based on the consensus from the mailing list discussion[1][2], we have been 
starting working on the JUnit4 to JUnit5 migration. 

This is the umbrella ticket which describes the big picture of the migration 
with following steps:
 * AssertJ integration and guideline
 * Test Framework upgrade from JUnit4 to JUnit5
 * JUnit5 migration guideline(document and reference migration)
 * Optimization for issues found while writing new test in JUint5
 * [Long-term]Module based graceful migration of old tests in JUnit4 to JUnit5

 

[1] 

[[DISCUSS]Moving to 
JUnit5|https://lists.apache.org/thread/jsjvc2cqb91pyh47d4p6olk3c1vxqm3w]

[2] [[DISCUSS] Conventions on assertions to use in 
tests|https://lists.apache.org/thread/33t7hz8w873p1bc5msppk65792z08rgt]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Conventions on assertions to use in tests

2021-12-14 Thread Jing Ge
Hi all,

I took a close look at assertj and found there are two concepts for writing
tests with two entry points interfaces: WithAssertions for normal style and
BDDAssertions for BDD style. I would not suggest using them in one project
simultaneously. Since all related work done previously were using the
normal style afaik, the normal style seems to be the right one to stick
with.

WDYT?

Best regards
Jing

On Fri, Dec 3, 2021 at 12:15 PM Marios Trivyzas  wrote:

> Definitely +1 from me as well. Otherwise backporting tests (accompanying
> fixes) would consume significant time.
>
> On Fri, Dec 3, 2021 at 11:42 AM Till Rohrmann 
> wrote:
>
> > I think this is a very good idea, Matthias. +1 for backporting the
> jassert
> > changes to 1.14 and 1.13 if possible.
> >
> > Cheers,
> > Till
> >
> > On Fri, Dec 3, 2021 at 11:38 AM Matthias Pohl 
> > wrote:
> >
> > > Currently, we only added the jassert to the master branch. I was
> > wondering
> > > whether we could backport the corresponding PR [1] to release-1.14 and
> > > release-1.13, too. Otherwise, we would have to implement tests twice
> when
> > > providing PRs with new tests that need to be backported: The jassert
> > > version for master and a hamcrest (or any other available library) for
> > the
> > > backports.
> > >
> > > It's not really a bugfix. But it might help developers with their
> > > backports.
> > >
> > > Matthias
> > >
> > > [1] https://github.com/apache/flink/pull/17871
> > >
> > > On Thu, Nov 25, 2021 at 12:54 PM Marios Trivyzas 
> > wrote:
> > >
> > > > As @Matthias Pohl  mentioned, I agree that
> no1
> > > is
> > > > to end up with consistency
> > > > regarding the assertions in our tests, but I also like how those
> > > assertions
> > > > shape up with the AssertJ approach.
> > > >
> > > > On Thu, Nov 25, 2021 at 9:38 AM Francesco Guardiani <
> > > > france...@ververica.com>
> > > > wrote:
> > > >
> > > > > This is the result of experimenting around creating custom
> assertions
> > > for
> > > > > Table API types
> > > > > https://github.com/slinkydeveloper/flink/commit/
> > > > > d1ce37a62c2200b2c3008a9cc2cac91234222fd5[1]. I will PR it once the
> > two
> > > > PRs
> > > > > in the
> > > > > previous mail get merged
> > > > >
> > > > > On Monday, 22 November 2021 17:59:29 CET Francesco Guardiani wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > Given I see generally consensus around having a convention and
> > using
> > > > > > assertj, I propose to merge these 2 PRs:
> > > > > >
> > > > > > * Add the explanation of this convention in our code quality
> guide:
> > > > > > https://github.com/apache/flink-web/pull/482
> > > > > > * Add assertj to dependency management in the parent pom and link
> > in
> > > > the
> > > > > PR
> > > > > > template the code quality guide:
> > > > > https://github.com/apache/flink/pull/17871
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > Once we merge those, I'll work in the next days to add some
> custom
> > > > > > assertions in table-common for RowData and Row (commonly asserted
> > > > > > everywhere in the table codebase).
> > > > > >
> > > > > > @Matthias Pohl  about the confluence
> page,
> > > it
> > > > > seems
> > > > > > a bit outdated, judging from the last modified date. I propose to
> > > > > continue
> > > > > > to use this guide
> > > > > >
> > > >
> > https://flink.apache.org/contributing/code-style-and-quality-common.html
> > > > > as
> > > > > > it seems more complete.
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 22, 2021 at 8:58 AM Matthias Pohl <
> > > matth...@ververica.com>
> > > > > >
> > > > > > wrote:
> > > > > > > Agree. Clarifying once more what our preferred option is here,
> > is a
> > > > > good
> > > > > > > idea. So, +1 for unification. I don't have a strong opinion on
> > what
> > > > > > > framework to use. But we may want to add this at the end of the
> > > > > discussion
> > > > > > > to our documentation (e.g. [1] or maybe the PR description?) to
> > > make
> > > > > users
> > > > > > > aware of it and be able to provide a reference in case it comes
> > up
> > > > > again
> > > > > > > (besides this ML thread). Or do we already have something like
> > that
> > > > > > > somewhere in the docs where I missed it?
> > > > > > >
> > > > > > > Matthias
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lesso
> > > > > > > ns+Learned>
> > > > > > > On Wed, Nov 17, 2021 at 11:13 AM Marios Trivyzas <
> > mat...@gmail.com
> > > >
> > > > > wrote:
> > > > > > >> I'm also +1 both for unification and specifically for assertJ.
> > > > > > >> I think it covers a wide variety of assertions and as
> Francesco
> > > > > mentioned
> > > > > > >> it's easily extensible, so that
> > > > > > >> we can create custom assertions where needed, and avoid
> > repeating
> > > > test
> > > > > > >> code.
> > > > > > >>
> > > > > > >> On Tue, Nov 16, 2021 at 9:57 AM David Morávek <
> 

Re: [VOTE] Release 1.11.5/1.12.6/1.13.4/1.14.1, release candidate #1

2021-12-13 Thread Jing Ge
+1   LGTM. Many thanks for your effort!

On Mon, Dec 13, 2021 at 8:28 PM Chesnay Schepler  wrote:

> Hi everyone,
>
> This vote is for the emergency patch releases for 1.11, 1.12, 1.13 and
> 1.14 to address CVE-2021-44228.
> It covers all 4 releases as they contain the same changes (upgrading
> Log4j to 2.15.0) and were prepared simultaneously by the same person.
> (Hence, if something is broken, it likely applies to all releases)
>
> Please review and vote on the release candidate #1 for the versions
> 1.11.5, 1.12.6, 1.13.4 and 1.14.1, as follows:
> [ ] +1, Approve the releases
> [ ] -1, Do not approve the releases (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source releases and binary convenience releases to
> be deployed to dist.apache.org [2], which are signed with the key with
> fingerprint C2EED7B111D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
>  * *the jars for 1.13/1.14 are still being built*
> * source code tags [5],
> * website pull request listing the new releases and adding announcement
> blog post [6].
>
> The vote will be open for at least 24 hours. The minimum vote time has
> been shortened as the changes are minimal and the matter is urgent.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
> 1.11:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350327
> 1.12:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350328
> 1.13:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350686
> 1.14:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350512
> [2]
> 1.11: https://dist.apache.org/repos/dist/dev/flink/flink-1.11.5-rc1/
> 1.12: https://dist.apache.org/repos/dist/dev/flink/flink-1.12.6-rc1/
> 1.13: https://dist.apache.org/repos/dist/dev/flink/flink-1.13.4-rc1/
> 1.14: https://dist.apache.org/repos/dist/dev/flink/flink-1.14.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> 1.11/1.12:
> https://repository.apache.org/content/repositories/orgapacheflink-1455
> 1.13:
> https://repository.apache.org/content/repositories/orgapacheflink-1457
> 1.14:
> https://repository.apache.org/content/repositories/orgapacheflink-1456
> [5]
> 1.11: https://github.com/apache/flink/releases/tag/release-1.11.5-rc1
> 1.12: https://github.com/apache/flink/releases/tag/release-1.12.6-rc1
> 1.13: https://github.com/apache/flink/releases/tag/release-1.13.4-rc1
> 1.14: https://github.com/apache/flink/releases/tag/release-1.14.1-rc1
> [6] https://github.com/apache/flink-web/pull/489
>


Re: [DISCUSS] Immediate dedicated Flink releases for log4j vulnerability

2021-12-13 Thread Jing Ge
+1

As I suggested to publish the blog post last week asap, users have been
keen to have such urgent releases. Many thanks for it.



On Mon, Dec 13, 2021 at 8:29 AM Konstantin Knauf  wrote:

> +1
>
> I didn't think this was necessary when I published the blog post on Friday,
> but this has made higher waves than I expected over the weekend.
>
>
>
> On Mon, Dec 13, 2021 at 8:23 AM Yuan Mei  wrote:
>
> > +1 for quick release.
> >
> > On Mon, Dec 13, 2021 at 2:55 PM Martijn Visser 
> > wrote:
> >
> > > +1 to address the issue like this
> > >
> > > On Mon, 13 Dec 2021 at 07:46, Jingsong Li 
> > wrote:
> > >
> > > > +1 for fixing it in these versions and doing quick releases. Looks
> good
> > > to
> > > > me.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, Dec 13, 2021 at 2:18 PM Becket Qin 
> > wrote:
> > > > >
> > > > > +1. The solution sounds good to me. There have been a lot of
> > inquiries
> > > > > about how to react to this.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Mon, Dec 13, 2021 at 12:40 PM Prasanna kumar <
> > > > > prasannakumarram...@gmail.com> wrote:
> > > > >
> > > > > > 1+ for making Updates for 1.12.5 .
> > > > > > We are looking for fix in 1.12 version.
> > > > > > Please notify once the fix is done.
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 13, 2021 at 9:45 AM Leonard Xu 
> > > wrote:
> > > > > >
> > > > > > > +1 for the quick release and the special vote period 24h.
> > > > > > >
> > > > > > > > 2021年12月13日 上午11:49,Dian Fu  写道:
> > > > > > > >
> > > > > > > > +1 for the proposal and creating a quick release.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Dian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 13, 2021 at 11:15 AM Kyle Bendickson <
> > > k...@tabular.io>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> +1 to doing a release for this widely publicized
> > vulnerability.
> > > > > > > >>
> > > > > > > >> In my experience, users will often update to the latest
> minor
> > > > patch
> > > > > > > version
> > > > > > > >> without much fuss. Plus, users have also likely heard about
> > this
> > > > and
> > > > > > > will
> > > > > > > >> appreciate a simple fix (updating their version where
> > possible).
> > > > > > > >>
> > > > > > > >> The work-around will need to still be noted for users who
> > can’t
> > > > > > upgrade
> > > > > > > for
> > > > > > > >> whatever reason (EMR hasn’t caught up, etc).
> > > > > > > >>
> > > > > > > >> I also agree with your assessment to apply a patch on each
> of
> > > > those
> > > > > > > >> previous versions with only the log4j commit, so that they
> > don’t
> > > > need
> > > > > > > to be
> > > > > > > >> as rigorously tested.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Kyle (GitHub @kbendick)
> > > > > > > >>
> > > > > > > >> On Sun, Dec 12, 2021 at 2:23 PM Stephan Ewen <
> > se...@apache.org>
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Hi all!
> > > > > > > >>>
> > > > > > > >>> Without doubt, you heard about the log4j vulnerability [1].
> > > > > > > >>>
> > > > > > > >>> There is an advisory blog post on how to mitigate this in
> > > Apache
> > > > > > Flink
> > > > > > > >> [2],
> > > > > > > >>> which involves setting a config option and restarting the
> > > > processes.
> > > > > > > That
> > > > > > > >>> is fortunately a relatively simple fix.
> > > > > > > >>>
> > > > > > > >>> Despite this workaround, I think we should do an immediate
> > > > release
> > > > > > with
> > > > > > > >> the
> > > > > > > >>> updated dependency. Meaning not waiting for the next bug
> fix
> > > > releases
> > > > > > > >>> coming in a few weeks, but releasing asap.
> > > > > > > >>> The mood I perceive in the industry is pretty much panicky
> > over
> > > > this,
> > > > > > > >> and I
> > > > > > > >>> expect we will see many requests for a patched release and
> > many
> > > > > > > >> discussions
> > > > > > > >>> why the workaround alone would not be enough due to certain
> > > > > > guidelines.
> > > > > > > >>>
> > > > > > > >>> I suggest that we preempt those discussions and create
> > releases
> > > > the
> > > > > > > >>> following way:
> > > > > > > >>>
> > > > > > > >>>  - we take the latest already released versions from each
> > > release
> > > > > > > >> branch:
> > > > > > > >>> ==> 1.14.0, 1.13.3, 1.12.5, 1.11.4
> > > > > > > >>>  - we add a single commit to those that just updates the
> > log4j
> > > > > > > >> dependency
> > > > > > > >>>  - we release those as 1.14.1, 1.13.4, 1.12.6, 1.11.5, etc.
> > > > > > > >>>  - that way we don't need to do functional release tests,
> > > > because the
> > > > > > > >>> released code is identical to the previous release, except
> > for
> > > > the
> > > > > > > log4j
> > > > > > > >>> dependency
> > > > > > > >>>  - we can then continue the work on the upcoming bugfix
> > > releases
> > > > as
> > > > > > > >>> planned, without high pressure
> > > > > > > >>>
> > > > > 

[jira] [Created] (FLINK-25220) Writing an architectural rule for all IT cases w.r.t. MiniCluster

2021-12-08 Thread Jing Ge (Jira)
Jing Ge created FLINK-25220:
---

 Summary: Writing an architectural rule for all IT cases w.r.t. 
MiniCluster
 Key: FLINK-25220
 URL: https://issues.apache.org/jira/browse/FLINK-25220
 Project: Flink
  Issue Type: Improvement
Reporter: Jing Ge


Writing an architectural rule verifies that all IT cases (within 
o.a.f.table.*?) have:
 # a public, static member of type MiniClusterWithClientResource annotated with 
ClassRule.
 #  a public, non-static member of type MiniClusterWithClientResource annotated 
with Rule.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Deprecate Java 8 support

2021-12-01 Thread Jing Ge
Thanks Chesnay for bringing this up for discussion. +1 for Java 8
deprecation. Every decision has pros and cons. All concerns mentioned
previously are fair enough. I think that the active support for Java 8
ending in 4 months will have an impact on the projects that still stick
with Java 8 and on Flink users too. Flink should not be the last one to do
the migration. It's a good timing to move forward.

Best regards
Jing

On Wed, Dec 1, 2021 at 2:31 AM wenlong.lwl  wrote:

> hi, @Chesnay Schepler  would you explain more about
> what would happen when deprecating Java 8, but still support it. IMO, if we
> still generate packages based on Java 8 which seems to be a  consensus, we
> still can not take the advantages you mentioned even if we announce that
> Java 8 support is deprecated.
>
>
> Best,
> Wenlong
>
> On Mon, 29 Nov 2021 at 17:22, Marios Trivyzas  wrote:
>
> > +1 from me as well on the Java 8 deprecation!
> > It's important to make the users aware, and "force" them but also the
> > communities of other
> > related projects (like the aforementioned Hive) to start preparing for
> the
> > future drop of Java 8
> > support and the upgrade to the recent stable versions.
> >
> >
> > On Sun, Nov 28, 2021 at 11:15 PM Thomas Weise  wrote:
> >
> > > +1 for Java 8 deprecation. It's an important signal for users and we
> > > need to give sufficient time to adopt. Thanks Chesnay for starting the
> > > discussion! Maybe user@ can be included into this discussion?
> > >
> > > Thomas
> > >
> > >
> > > On Fri, Nov 26, 2021 at 6:49 AM Becket Qin 
> wrote:
> > > >
> > > > Thanks for raising the discussion, Chesnay.
> > > >
> > > > I think it is OK to deprecate Java 8 to let the users know that Java
> 11
> > > > migration should be put into the schedule. However, According to some
> > of
> > > > the statistics of the Java version adoption[1], a large number of
> users
> > > are
> > > > still using Java 8 in production. I doubt that Java 8 users will drop
> > to
> > > a
> > > > negligible amount within the next 2 - 3 Flink releases. I would
> suggest
> > > > making the time to drop Java 8 support flexible.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > [1] https://www.infoq.com/news/2021/07/snyk-jvm-2021/
> > > >
> > > > On Fri, Nov 26, 2021 at 5:09 AM Till Rohrmann 
> > > wrote:
> > > >
> > > > > +1 for the deprecation and reaching out to the user ML to ask for
> > > feedback
> > > > > from our users. Thanks for driving this Chesnay!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Nov 25, 2021 at 10:15 AM Roman Khachatryan <
> ro...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > The situation is probably a bit different now compared to the
> > > previous
> > > > > > upgrade: some users might be using Amazon Coretto (or other
> builds)
> > > > > > which have longer support.
> > > > > >
> > > > > > Still +1 for deprecation to trigger migration, and thanks for
> > > bringing
> > > > > > this up!
> > > > > >
> > > > > > Regards,
> > > > > > Roman
> > > > > >
> > > > > > On Thu, Nov 25, 2021 at 10:09 AM Arvid Heise 
> > > wrote:
> > > > > > >
> > > > > > > +1 to deprecate Java 8, so we can hopefully incorporate the
> > module
> > > > > > concept
> > > > > > > in Flink.
> > > > > > >
> > > > > > > On Thu, Nov 25, 2021 at 9:49 AM Chesnay Schepler <
> > > ches...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Users can already use APIs from Java 8/11.
> > > > > > > >
> > > > > > > > On 25/11/2021 09:35, Francesco Guardiani wrote:
> > > > > > > > > +1 with what both Ingo and Matthias sad, personally, I
> cannot
> > > wait
> > > > > to
> > > > > > > > start using some of
> > > > > > > > > the APIs introduced in Java 9. And I'm pretty sure that's
> the
> > > same
> > > > > > for
> > > > > > > > our users as well.
> > > > > > > > >
> > > > > > > > > On Tuesday, 23 November 2021 13:35:07 CET Ingo Bürk wrote:
> > > > > > > > >> Hi everyone,
> > > > > > > > >>
> > > > > > > > >> continued support for Java 8 can also create project
> risks,
> > > e.g.
> > > > > if
> > > > > > a
> > > > > > > > >> vulnerability arises in Flink's dependencies and we cannot
> > > upgrade
> > > > > > them
> > > > > > > > >> because they no longer support Java 8. Some projects
> already
> > > > > started
> > > > > > > > >> deprecating support as well, like Kafka, and other
> projects
> > > will
> > > > > > likely
> > > > > > > > >> follow.
> > > > > > > > >> Let's also keep in mind that the proposal here is not to
> > drop
> > > > > > support
> > > > > > > > right
> > > > > > > > >> away, but to deprecate it, send the message, and motivate
> > > users to
> > > > > > start
> > > > > > > > >> migrating. Delaying this process could ironically mean
> users
> > > have
> > > > > > less
> > > > > > > > time
> > > > > > > > >> to prepare for it.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Ingo
> > > > > > > > >>
> > > > > > > > >> On Tue, Nov 23, 2021 at 8:54 AM Matthias Pohl 

[jira] [Created] (FLINK-24741) Deprecate FileRecordFormat, use StreamFormat instead

2021-11-02 Thread Jing Ge (Jira)
Jing Ge created FLINK-24741:
---

 Summary: Deprecate FileRecordFormat, use StreamFormat instead
 Key: FLINK-24741
 URL: https://issues.apache.org/jira/browse/FLINK-24741
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream, API / Python
Reporter: Jing Ge
 Fix For: 1.15.0


Issue: The FileRecordFormat and StreamFormat have too much commons. This makes 
user confused.

Suggestion: The currently marked as PublicEvolving interface FileRecordFormat 
should be deprecated. The StreamFormat should be extended and recommended 
instead. All relevant usages should be refactored and informed appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-24721) Update the based on the Maven official indication

2021-11-01 Thread Jing Ge (Jira)
Jing Ge created FLINK-24721:
---

 Summary: Update the  based on the Maven official 
indication
 Key: FLINK-24721
 URL: https://issues.apache.org/jira/browse/FLINK-24721
 Project: Flink
  Issue Type: Technical Debt
Reporter: Jing Ge


Example:
 org.apache.flink
 flink-formats
 1.15-SNAPSHOT
 *..*

 

Intellij Idea will point out the issue.

There are two solutions follow the official indication. 
[http://maven.apache.org/ref/3.3.9/maven-model/maven.html#class_parent].

option 1: 
 *../pom.xml*

option 2: 
 since we use the default value of relativePath, the whole line could be saved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discussion] Avoid redundancy of Javadoc @return tag comment

2021-10-15 Thread Jing Ge
Since images do not work with pony mail, enclosed please find them as
attachments.

Best regards
Jing

On Thu, Oct 14, 2021 at 6:43 PM Jing Ge  wrote:

> Agree. If the description contains more information than the return tag
> comment. Part of content overlap is acceptable. Otherwise I think it
> depends on the visibility and influence of the API. Agree again, it is not
> a big issue for internal interfaces and classes. But for some core public
> APIs that will be used by many application developers, such redundancy
> looks unprofessional and sometimes even confused, e.g. using read mode in
> Intellij Idea, it will look like:
>
> [image: image.png]
> [image: image.png]
>
> When I read these for the first time, I was confused, why again and again?
> Is there anything wrong with my Intellij idea, maybe the rendering has a
> bug? After I realized it was the "paragon of redundancy" - word from
> Robert C. Martin, you could imagine my feeling about the code quality at
> that time :-).
>
> I would suggest doing clean-up for some core public APIs that have an
> impact on application developers.
>
> best regards
> Jing
>
> On Thu, Oct 14, 2021 at 3:24 PM Piotr Nowojski 
> wrote:
>
>> +1 for trying to clean this up, but I'm not entirely sure in which
>> direction. Usually I was fixing this towards option 1. I agree, in your
>> example option 2. looks much better, but usually the javadoc contains a
>> bit
>> more information, not only copy/pasted @return text. For example:
>>
>> /**
>>  * Send out a request to a specified coordinator and return the
>> response.
>>  *
>>  * @param operatorId specifies which coordinator to receive the
>> request
>>  * @param request the request to send
>>  * @return the response from the coordinator
>>  */
>> CompletableFuture sendCoordinationRequest
>>
>> Sometimes this can be split quite nicely between @return and main java
>> doc:
>>
>> /**
>>  * Try to transition the execution state from the current state to the
>> new state.
>>  *
>>  * @param currentState of the execution
>>  * @param newState of the execution
>>  * @return true if the transition was successful, otherwise false
>>  */
>> private boolean transitionState(ExecutionState currentState,
>> ExecutionState newState);
>>
>> but that's not always the case.
>>
>> At the same time I don't have hard feelings either direction. After all it
>> doesn't seem to be that big of an issue even if we leave it as is.
>>
>> Best,
>> Piotrek
>>
>> czw., 14 paź 2021 o 14:25 Jing Ge  napisał(a):
>>
>> > Hi Flink developers,
>> >
>> > It might be a good idea to avoid the redundant javadoc comment found in
>> > some classes, e.g. org.apache.flink.core.fs.Path w.r.t. the @Return tag
>> > comment on some methods.
>> >
>> > To make the discussion clear, let's focus on a concrete example(there
>> are
>> > many more):
>> >
>> > > /**
>> > >  * Returns the FileSystem that owns this Path.
>> > >  *
>> > >  * @return the FileSystem that owns this Path
>> > >  * @throws IOException thrown if the file system could not be
>> retrieved
>> > >  */
>> > > public FileSystem getFileSystem() throws IOException {
>> > > return FileSystem.get(this.toUri());
>> > > }
>> > >
>> > >
>> > In order to remove the redundancy, there are two options:
>> >
>> > option 1: keep the description and remove the @Return tag comment:
>> >
>> > > /**
>> > >  * Returns the FileSystem that owns this Path.
>> > >  *
>> > >  * @throws IOException thrown if the file system could not be
>> retrieved
>> > >  */
>> > > public FileSystem getFileSystem() throws IOException {
>> > > return FileSystem.get(this.toUri());
>> > > }
>> > >
>> > > option 2: keep the @return tag comment and remove the duplicated
>> > description:
>> >
>> > > /**
>> > >  * @return the FileSystem that owns this Path
>> > >  * @throws IOException thrown if the file system could not be
>> retrieved
>> > >  */
>> > > public FileSystem getFileSystem() throws IOException {
>> > > return FileSystem.get(this.toUri());
>> > > }
>> > >
>> > > It looks like these two options are similar. From the develo

Re: [Discussion] Avoid redundancy of Javadoc @return tag comment

2021-10-14 Thread Jing Ge
Agree. If the description contains more information than the return tag
comment. Part of content overlap is acceptable. Otherwise I think it
depends on the visibility and influence of the API. Agree again, it is not
a big issue for internal interfaces and classes. But for some core public
APIs that will be used by many application developers, such redundancy
looks unprofessional and sometimes even confused, e.g. using read mode in
Intellij Idea, it will look like:

[image: image.png]
[image: image.png]

When I read these for the first time, I was confused, why again and again?
Is there anything wrong with my Intellij idea, maybe the rendering has a
bug? After I realized it was the "paragon of redundancy" - word from Robert
C. Martin, you could imagine my feeling about the code quality at that time
:-).

I would suggest doing clean-up for some core public APIs that have an
impact on application developers.

best regards
Jing

On Thu, Oct 14, 2021 at 3:24 PM Piotr Nowojski  wrote:

> +1 for trying to clean this up, but I'm not entirely sure in which
> direction. Usually I was fixing this towards option 1. I agree, in your
> example option 2. looks much better, but usually the javadoc contains a bit
> more information, not only copy/pasted @return text. For example:
>
> /**
>  * Send out a request to a specified coordinator and return the
> response.
>  *
>  * @param operatorId specifies which coordinator to receive the request
>  * @param request the request to send
>  * @return the response from the coordinator
>  */
> CompletableFuture sendCoordinationRequest
>
> Sometimes this can be split quite nicely between @return and main java doc:
>
> /**
>  * Try to transition the execution state from the current state to the
> new state.
>  *
>  * @param currentState of the execution
>  * @param newState of the execution
>  * @return true if the transition was successful, otherwise false
>  */
> private boolean transitionState(ExecutionState currentState,
> ExecutionState newState);
>
> but that's not always the case.
>
> At the same time I don't have hard feelings either direction. After all it
> doesn't seem to be that big of an issue even if we leave it as is.
>
> Best,
> Piotrek
>
> czw., 14 paź 2021 o 14:25 Jing Ge  napisał(a):
>
> > Hi Flink developers,
> >
> > It might be a good idea to avoid the redundant javadoc comment found in
> > some classes, e.g. org.apache.flink.core.fs.Path w.r.t. the @Return tag
> > comment on some methods.
> >
> > To make the discussion clear, let's focus on a concrete example(there are
> > many more):
> >
> > > /**
> > >  * Returns the FileSystem that owns this Path.
> > >  *
> > >  * @return the FileSystem that owns this Path
> > >  * @throws IOException thrown if the file system could not be retrieved
> > >  */
> > > public FileSystem getFileSystem() throws IOException {
> > > return FileSystem.get(this.toUri());
> > > }
> > >
> > >
> > In order to remove the redundancy, there are two options:
> >
> > option 1: keep the description and remove the @Return tag comment:
> >
> > > /**
> > >  * Returns the FileSystem that owns this Path.
> > >  *
> > >  * @throws IOException thrown if the file system could not be retrieved
> > >  */
> > > public FileSystem getFileSystem() throws IOException {
> > > return FileSystem.get(this.toUri());
> > > }
> > >
> > > option 2: keep the @return tag comment and remove the duplicated
> > description:
> >
> > > /**
> > >  * @return the FileSystem that owns this Path
> > >  * @throws IOException thrown if the file system could not be retrieved
> > >  */
> > > public FileSystem getFileSystem() throws IOException {
> > > return FileSystem.get(this.toUri());
> > > }
> > >
> > > It looks like these two options are similar. From the developer's
> > perspective, I would prefer using @return tag comment, i.e. option 2.
> > Having an explicit @return tag makes it easier for me to find the return
> > value quickly.
> >
> > This issue is very common, it has been used as a Noise Comments example
> in
> > Uncle Bob's famous "Clean Code" book on page 64 but unfortunately without
> > giving any clear recommendation about how to solve it. From
> Stackoverflow,
> > we can find an interesting discussion about this issue and developer's
> > thoughts behind it:
> >
> >
> https://stackoverflow.com/questions/10088311/javadoc-return-tag-comment-duplication-necessary
> > .
> > Javadoc 16 provides even a new feature to solve this common issue.
> >
> > Since @return is recommended to use for the javadoc, I would suggest
> Flink
> > community following it and therefore open this discussion to know your
> > thoughts. Hopefully we can come to an agreement about this. Many thanks.
> >
> > Best regards
> > Jing
> >
>


[Discussion] Avoid redundancy of Javadoc @return tag comment

2021-10-14 Thread Jing Ge
Hi Flink developers,

It might be a good idea to avoid the redundant javadoc comment found in
some classes, e.g. org.apache.flink.core.fs.Path w.r.t. the @Return tag
comment on some methods.

To make the discussion clear, let's focus on a concrete example(there are
many more):

> /**
>  * Returns the FileSystem that owns this Path.
>  *
>  * @return the FileSystem that owns this Path
>  * @throws IOException thrown if the file system could not be retrieved
>  */
> public FileSystem getFileSystem() throws IOException {
> return FileSystem.get(this.toUri());
> }
>
>
In order to remove the redundancy, there are two options:

option 1: keep the description and remove the @Return tag comment:

> /**
>  * Returns the FileSystem that owns this Path.
>  *
>  * @throws IOException thrown if the file system could not be retrieved
>  */
> public FileSystem getFileSystem() throws IOException {
> return FileSystem.get(this.toUri());
> }
>
> option 2: keep the @return tag comment and remove the duplicated
description:

> /**
>  * @return the FileSystem that owns this Path
>  * @throws IOException thrown if the file system could not be retrieved
>  */
> public FileSystem getFileSystem() throws IOException {
> return FileSystem.get(this.toUri());
> }
>
> It looks like these two options are similar. From the developer's
perspective, I would prefer using @return tag comment, i.e. option 2.
Having an explicit @return tag makes it easier for me to find the return
value quickly.

This issue is very common, it has been used as a Noise Comments example in
Uncle Bob's famous "Clean Code" book on page 64 but unfortunately without
giving any clear recommendation about how to solve it. From Stackoverflow,
we can find an interesting discussion about this issue and developer's
thoughts behind it:
https://stackoverflow.com/questions/10088311/javadoc-return-tag-comment-duplication-necessary.
Javadoc 16 provides even a new feature to solve this common issue.

Since @return is recommended to use for the javadoc, I would suggest Flink
community following it and therefore open this discussion to know your
thoughts. Hopefully we can come to an agreement about this. Many thanks.

Best regards
Jing


<    3   4   5   6   7   8