[jira] [Created] (FLINK-35669) Release Testing: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-21 Thread Junrui Li (Jira)
Junrui Li created FLINK-35669:
-

 Summary: Release Testing: Verify FLIP-383: Support Job Recovery 
from JobMaster Failures for Batch Jobs
 Key: FLINK-35669
 URL: https://issues.apache.org/jira/browse/FLINK-35669
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Junrui Li
Assignee: Junrui Li
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-33892



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-21 Thread Sergey Nuyanzin
+1 (binding)

On Fri, Jun 21, 2024 at 6:52 PM Jeyhun Karimov  wrote:
>
> Hi Sergio,
>
> +1 (non-binding)
>
> Regards,
> Jeyhun
>
> On Fri, Jun 21, 2024 at 4:59 PM Jim Hughes 
> wrote:
>
> > Hi Sergio,
> >
> > +1 (non-binding)
> >
> > Thanks,
> >
> > Jim
> >
> > On Fri, Jun 21, 2024 at 10:50 AM Timo Walther  wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 21.06.24 16:18, Sergio Pena wrote:
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedback about FLIP-463: Schema Definition in CREATE
> > > > TABLE AS Statement [1]. The discussion thread is here [2].
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least 72
> > > > hours unless there is an objection or insufficient votes. The FLIP will
> > > be
> > > > considered accepted if 3 binding votes (from active committers
> > according
> > > to
> > > > the Flink bylaws [3]) are gathered by the community.
> > > >
> > > > [1]
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
> > > > [2] https://lists.apache.org/thread/1ryxxyyg3h9v4rbosc80zryvjk6c8k83
> > > > [3] [
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals)
> > 
> > > <
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws%23FlinkBylaws-Approvals)
> > >
> > > > <
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws%23FlinkBylaws-Approvals)
> > > >
> > > >
> > > > Thanks,
> > > > Sergio Peña
> > > >
> > >
> > >
> >



-- 
Best regards,
Sergey


Re: [DISCUSS] Support to group rows by column ordinals

2024-06-21 Thread Sergey Nuyanzin
>I think so. But this is orthogonal to this issue [1] (deciding whether to
>provide the proposed feature or not), no?

thanks for your response
yep, I looked into the issue however still not clear what problem is
solved with this approach
from my point of view the query there is good to highlight the feature however
not very good in highlighting the problem it is going to solve.
That's why I was asking about use cases.

>From my past experience I can recall such queries(very simplified version)
SELECT some_expression FROM t1

GROUP BY some_expression
ORDER BY some_expression;

where to be able to group/order by the expressions (which could be
quite complex) there are several options
1. duplicate expression in GROUP/ORDER BY like in example above
2. extra SELECT wrapper
3. ability to group/order by ordinals
4. ability to group/order by aliases in SELECT  (for this reason I
mentioned this option here)


On Fri, Jun 21, 2024 at 7:15 PM Jeyhun Karimov  wrote:
>
> Hi Sergey,
>
> Thanks for your comments.
>
> Could you please elaborate more on use cases of this feature?
>
>
> IMHO, the main use-cases can be
> - simplifying some SQL queries (e.g., with queries including many/long
> column names)
> - having consistency when refactoring (e.g., columns are renamed)
> - handling calculated columns (e.g., for columns with large expressions)
>
> Giving the fact that was already mentioned by Timo in the PR
> > > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> > aggregation/sorting.
> > and IMHO referencing by ordinals might be error prone if someone adds
> > more columns in SELECT and forgets about ordinals.
>
>
> Yes, I completely agree. I also expressed similar ideas about the cons of
> the feature above.
>
> Would it make sense to consider enabling reference by aliases as
> > another option here?
> > Or did I miss anything?
>
>
> I think so. But this is orthogonal to this issue [1] (deciding whether to
> provide the proposed feature or not), no?
> WDYT?
>
>
> Regards,
> Jeyhun
>
> [1] https://issues.apache.org/jira/browse/FLINK-34366
>
>
> On Thu, Jun 20, 2024 at 1:27 PM Sergey Nuyanzin  wrote:
>
> > Hey Jeyhun,
> >
> > Thanks for starting the discussion.
> > Could you please elaborate more on use cases of this feature?
> >
> > The one that I see in FLINK-34366[1] is to simplify referencing to
> > aliases in SELECT from GROUP BY
> > (also potentially ORDER BY and HAVING). I wonder whether there is some
> > other use cases where
> > support of addressing by ordinals is required?
> >
> > I'm asking since SqlConformance in Calcite and as a result
> > FlinkSqlConformance in Flink give ability
> > to reference to aliases from SELECT by enabling it e.g. [2] where javadoc
> > says
> > >   * Whether to allow aliases from the {@code SELECT} clause to be used as
> > >   * column names in the {@code GROUP BY} clause.
> >
> > Giving the fact that was already mentioned by Timo in the PR
> > > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> > aggregation/sorting.
> > and IMHO referencing by ordinals might be error prone if someone adds
> > more columns in SELECT and forgets about ordinals.
> >
> > Would it make sense to consider enabling reference by aliases as
> > another option here?
> > Or did I miss anything?
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-34366
> > [2]
> > https://github.com/apache/calcite/blob/c0a53f6b17daaca9d057e70d7fae0a0e9c2cd02a/core/src/main/java/org/apache/calcite/sql/validate/SqlConformance.java#L92-L103
> >
> > On Thu, Jun 20, 2024 at 12:12 PM Muhammet Orazov
> >  wrote:
> > >
> > > Hey Jeyhun,
> > >
> > > Thanks for bringing it up! +-1 from my side.
> > >
> > > Personally, I find this feature confusing, it feels always natural to
> > > use
> > > column names. SQL power users will ask for it, I have seen it used in
> > > automated complex queries also.
> > >
> > > But it seems counterintuitive to enable flag for this feature. Enabling
> > > it, should not disable grouping/ordering by the column names?
> > >
> > > Best,
> > > Muhammet
> > >
> > >
> > > On 2024-06-17 20:30, Jeyhun Karimov wrote:
> > > > Hi devs,
> > > >
> > > > I am moving our discussion on the PR thread [1] to the dev mailing list
> > > > to
> > > > close the loop on the related issue [2]. The end goal of the PR is to
> > > > support grouping/ordering by via column ordinals. The target
> > > > implementation
> > > > (currently there is no flag) should support a flag, so that a user can
> > > > also
> > > > use the old behavior as suggested by @Timo.
> > > >
> > > > Some vendors such as Postgres [3], SQLite [4], MySQL/MariaDB [5],
> > > > Oracle
> > > > [6], Spark [7], and BigQuery[8] support group/order by clauses with
> > > > column
> > > > ordinals.
> > > >
> > > > Obviously, supporting this clause might lead to less readable and
> > > > maintainable SQL code. This might also cause a bit of complications
> > > > both on
> > > > the codebase and on the user-experience 

Re: [DISCUSS] Support to group rows by column ordinals

2024-06-21 Thread Jeyhun Karimov
Hi Sergey,

Thanks for your comments.

Could you please elaborate more on use cases of this feature?


IMHO, the main use-cases can be
- simplifying some SQL queries (e.g., with queries including many/long
column names)
- having consistency when refactoring (e.g., columns are renamed)
- handling calculated columns (e.g., for columns with large expressions)

Giving the fact that was already mentioned by Timo in the PR
> > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> aggregation/sorting.
> and IMHO referencing by ordinals might be error prone if someone adds
> more columns in SELECT and forgets about ordinals.


Yes, I completely agree. I also expressed similar ideas about the cons of
the feature above.

Would it make sense to consider enabling reference by aliases as
> another option here?
> Or did I miss anything?


I think so. But this is orthogonal to this issue [1] (deciding whether to
provide the proposed feature or not), no?
WDYT?


Regards,
Jeyhun

[1] https://issues.apache.org/jira/browse/FLINK-34366


On Thu, Jun 20, 2024 at 1:27 PM Sergey Nuyanzin  wrote:

> Hey Jeyhun,
>
> Thanks for starting the discussion.
> Could you please elaborate more on use cases of this feature?
>
> The one that I see in FLINK-34366[1] is to simplify referencing to
> aliases in SELECT from GROUP BY
> (also potentially ORDER BY and HAVING). I wonder whether there is some
> other use cases where
> support of addressing by ordinals is required?
>
> I'm asking since SqlConformance in Calcite and as a result
> FlinkSqlConformance in Flink give ability
> to reference to aliases from SELECT by enabling it e.g. [2] where javadoc
> says
> >   * Whether to allow aliases from the {@code SELECT} clause to be used as
> >   * column names in the {@code GROUP BY} clause.
>
> Giving the fact that was already mentioned by Timo in the PR
> > In some DBMSs it is common to write GROUP BY 1 or ORDER BY 1 for global
> aggregation/sorting.
> and IMHO referencing by ordinals might be error prone if someone adds
> more columns in SELECT and forgets about ordinals.
>
> Would it make sense to consider enabling reference by aliases as
> another option here?
> Or did I miss anything?
>
> [1] https://issues.apache.org/jira/browse/FLINK-34366
> [2]
> https://github.com/apache/calcite/blob/c0a53f6b17daaca9d057e70d7fae0a0e9c2cd02a/core/src/main/java/org/apache/calcite/sql/validate/SqlConformance.java#L92-L103
>
> On Thu, Jun 20, 2024 at 12:12 PM Muhammet Orazov
>  wrote:
> >
> > Hey Jeyhun,
> >
> > Thanks for bringing it up! +-1 from my side.
> >
> > Personally, I find this feature confusing, it feels always natural to
> > use
> > column names. SQL power users will ask for it, I have seen it used in
> > automated complex queries also.
> >
> > But it seems counterintuitive to enable flag for this feature. Enabling
> > it, should not disable grouping/ordering by the column names?
> >
> > Best,
> > Muhammet
> >
> >
> > On 2024-06-17 20:30, Jeyhun Karimov wrote:
> > > Hi devs,
> > >
> > > I am moving our discussion on the PR thread [1] to the dev mailing list
> > > to
> > > close the loop on the related issue [2]. The end goal of the PR is to
> > > support grouping/ordering by via column ordinals. The target
> > > implementation
> > > (currently there is no flag) should support a flag, so that a user can
> > > also
> > > use the old behavior as suggested by @Timo.
> > >
> > > Some vendors such as Postgres [3], SQLite [4], MySQL/MariaDB [5],
> > > Oracle
> > > [6], Spark [7], and BigQuery[8] support group/order by clauses with
> > > column
> > > ordinals.
> > >
> > > Obviously, supporting this clause might lead to less readable and
> > > maintainable SQL code. This might also cause a bit of complications
> > > both on
> > > the codebase and on the user-experience side. On the other hand, we
> > > already
> > > see that numerous vendors support this feature out of the box, because
> > > there was/is a need for this feature.
> > >
> > > That is why, I would like to discuss and hear your opinions about
> > > introducing/abandoning this feature.
> > >
> > > Regards,
> > > Jeyhun
> > >
> > > [1] https://github.com/apache/flink/pull/24270
> > > [2] https://issues.apache.org/jira/browse/FLINK-34366
> > > [3] https://www.postgresql.org/docs/6.5/sql-select.htm
> > > [4] https://www.sqlite.org/lang_select.html
> > > [5] https://www.db-fiddle.com/f/uTrfRrNs4uXLr4Q9j2piCk/1
> > > [6]
> > >
> https://oracle-base.com/articles/23/group-by-and-having-clause-using-column-alias-or-column-position-23
> > > [7]
> > >
> https://github.com/apache/spark/commit/90613df652d45e121ab2b3a5bbb3b63cb15d297a
> > > [8]
> > >
> https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_col_ordinals
>
>
>
> --
> Best regards,
> Sergey
>


Re: [VOTE] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-21 Thread Jeyhun Karimov
Hi Sergio,

+1 (non-binding)

Regards,
Jeyhun

On Fri, Jun 21, 2024 at 4:59 PM Jim Hughes 
wrote:

> Hi Sergio,
>
> +1 (non-binding)
>
> Thanks,
>
> Jim
>
> On Fri, Jun 21, 2024 at 10:50 AM Timo Walther  wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > Timo
> >
> >
> > On 21.06.24 16:18, Sergio Pena wrote:
> > > Hi everyone,
> > >
> > > Thanks for all the feedback about FLIP-463: Schema Definition in CREATE
> > > TABLE AS Statement [1]. The discussion thread is here [2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours unless there is an objection or insufficient votes. The FLIP will
> > be
> > > considered accepted if 3 binding votes (from active committers
> according
> > to
> > > the Flink bylaws [3]) are gathered by the community.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
> > > [2] https://lists.apache.org/thread/1ryxxyyg3h9v4rbosc80zryvjk6c8k83
> > > [3] [
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals)
> 
> > <
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws%23FlinkBylaws-Approvals)
> >
> > > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws%23FlinkBylaws-Approvals)
> > >
> > >
> > > Thanks,
> > > Sergio Peña
> > >
> >
> >
>


Re: [VOTE] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-21 Thread Jim Hughes
Hi Sergio,

+1 (non-binding)

Thanks,

Jim

On Fri, Jun 21, 2024 at 10:50 AM Timo Walther  wrote:

> +1 (binding)
>
> Thanks,
> Timo
>
>
> On 21.06.24 16:18, Sergio Pena wrote:
> > Hi everyone,
> >
> > Thanks for all the feedback about FLIP-463: Schema Definition in CREATE
> > TABLE AS Statement [1]. The discussion thread is here [2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours unless there is an objection or insufficient votes. The FLIP will
> be
> > considered accepted if 3 binding votes (from active committers according
> to
> > the Flink bylaws [3]) are gathered by the community.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
> > [2] https://lists.apache.org/thread/1ryxxyyg3h9v4rbosc80zryvjk6c8k83
> > [3] [
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals)
> 
> > <
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws%23FlinkBylaws-Approvals)
> >
> >
> > Thanks,
> > Sergio Peña
> >
>
>


Re: [VOTE] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-21 Thread Timo Walther

+1 (binding)

Thanks,
Timo


On 21.06.24 16:18, Sergio Pena wrote:

Hi everyone,

Thanks for all the feedback about FLIP-463: Schema Definition in CREATE
TABLE AS Statement [1]. The discussion thread is here [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours unless there is an objection or insufficient votes. The FLIP will be
considered accepted if 3 binding votes (from active committers according to
the Flink bylaws [3]) are gathered by the community.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
[2] https://lists.apache.org/thread/1ryxxyyg3h9v4rbosc80zryvjk6c8k83
[3] [
https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals)


Thanks,
Sergio Peña





[VOTE] FLIP-463: Schema Definition in CREATE TABLE AS Statement

2024-06-21 Thread Sergio Pena
Hi everyone,

Thanks for all the feedback about FLIP-463: Schema Definition in CREATE
TABLE AS Statement [1]. The discussion thread is here [2].

I'd like to start a vote for it. The vote will be open for at least 72
hours unless there is an objection or insufficient votes. The FLIP will be
considered accepted if 3 binding votes (from active committers according to
the Flink bylaws [3]) are gathered by the community.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-463%3A+Schema+Definition+in+CREATE+TABLE+AS+Statement
[2] https://lists.apache.org/thread/1ryxxyyg3h9v4rbosc80zryvjk6c8k83
[3] [
https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals](https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Approvals)


Thanks,
Sergio Peña


[jira] [Created] (FLINK-35668) Throw exception "java.lang.OutOfMemoryError" when import data of a MySQL table to StarRocks

2024-06-21 Thread Lv Luo Gang (Jira)
Lv Luo Gang created FLINK-35668:
---

 Summary: Throw exception "java.lang.OutOfMemoryError" when import 
data of a MySQL table to StarRocks
 Key: FLINK-35668
 URL: https://issues.apache.org/jira/browse/FLINK-35668
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: cdc-3.1.0
 Environment: flink-1.18.1

flink-cdc-3.1.0

MySQL 8.0.33

StarRocks-3.2.7
Reporter: Lv Luo Gang


I have 40 mysql insert sql files of a big table which total record number is 
about 100 million, each file size is 100MB. I recover these files into a mysql 
table named "standby_atomic_action" use mysql cli program in a loop, at the 
same time, I started a Flink CDC pipeline with scan.startup.mode "initial" to 
copy the MySQL table data to a StarRocks table, when the Flink task executes 
the last sql split to get snapshot data, it returns more than 1 million 
records, then the Flink taskexecutor throw an exception 
"java.lang.OutOfMemoryError" and has been terminated.


I have checked the flink cdc source code of method 
org.apache.flink.cdc.connectors.mysql.source.utils.StatementUtils#buildSplitScanQuery
 in module flink-connector-mysql-cdc, it calls method buildSplitQuery use 
parameter limitSize -1, so when isLastSplit is true, the split sql is "select * 
from standby_atomic_action where id>=?" without limit cause, at same time, 
mysql cli have just committed more than 1 million records which far greater 
than scan.incremental.snapshot.chunk.size default value 8096.

 
*Exception:*
2024-06-21 05:52:35,440 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Snapshot step 1 - Determining low watermark {ts_sec=0, file=binlog.45, 
pos=488153283, kind=SPECIFIC, 
gtids=334bec6e-849e-11eb-9157-b8599f49e7f4:1-936486941,
778ec805-a69b-11eb-9250-506b4b02d56e:1-900365321,
d6a416c1-a2de-11e9-adcf-506b4b233658:1-166810501,
d6bb9ebd-a2de-11e9-936f-506b4bfd5c94:1-80755324, row=0, event=0} for split 
MySqlSnapshotSplit\{tableId=qwgas.standby_atomic_action, 
splitId='qwgas.standby_atomic_action:10569', splitKeyType=[`id` BIGINT NOT 
NULL], splitStart=[92277940], splitEnd=null, highWatermark=null}
2024-06-21 05:52:35,440 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Snapshot step 2 - Snapshotting data
2024-06-21 05:52:35,440 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exporting data from split 'qwgas.standby_atomic_action:10569' of table 
qwgas.standby_atomic_action
2024-06-21 05:52:35,440 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - For split 'qwgas.standby_atomic_action:10569' of table 
qwgas.standby_atomic_action using select statement: '{color:#FF}SELECT * 
FROM `qwgas`.`standby_atomic_action` WHERE `id` >= ?{color}'
...
...
...
2024-06-21 05:54:59,970 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - {color:#FF}Exported 1877835 records for split 
'qwgas.standby_atomic_action:10569'{color} after 00:02:24.53
2024-06-21 05:55:06,410 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 5149 records for split 'qwgas.standby_atomic_action_counter:148' 
after 00:00:33.897
2024-06-21 05:55:07,484 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 6646 records for split 'qwgas.standby_atomic_action_counter:147' 
after 00:00:34.971
2024-06-21 05:55:10,714 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 4214 records for split 'qwgas.standby_atomic_action_counter:149' 
after 00:00:37.106
2024-06-21 05:55:11,784 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - {color:#FF}Exported 1878137 records for split 
'qwgas.standby_atomic_action:10569'{color} after 00:02:36.344
2024-06-21 05:55:17,187 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 5826 records for split 'qwgas.standby_atomic_action_counter:148' 
after 00:00:44.674
2024-06-21 05:55:18,274 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 7150 records for split 'qwgas.standby_atomic_action_counter:147' 
after 00:00:45.761
2024-06-21 05:55:21,522 INFO  
org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask 
[] - Exported 4312 records for split 'qwgas.standby_atomic_action_counter:149' 
after 00:00:47.914
2024-06-21 05:56:15,500 ERROR 
com.starrocks.data.load.stream.v2.StreamLoadManagerV2        [] - 
StarRocks-Sink-Manager error
java.lang.OutOfMemoryError: Java heap space
2024-06-21 05:56:15,500 ERROR 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager [] - 
Received uncaught 

[jira] [Created] (FLINK-35667) Implement Reducing Async State API for ForStStateBackend

2024-06-21 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35667:
---

 Summary: Implement Reducing Async State API for ForStStateBackend
 Key: FLINK-35667
 URL: https://issues.apache.org/jira/browse/FLINK-35667
 Project: Flink
  Issue Type: Sub-task
Reporter: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35666) Implement Aggregating Async State API for ForStStateBackend

2024-06-21 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35666:
---

 Summary: Implement Aggregating Async State API for 
ForStStateBackend
 Key: FLINK-35666
 URL: https://issues.apache.org/jira/browse/FLINK-35666
 Project: Flink
  Issue Type: Sub-task
Reporter: Zakelly Lan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Support to group rows by column ordinals

2024-06-21 Thread Jeyhun Karimov
Hi Muhammet,

Thanks for your comment.

Personally, I find this feature confusing, it feels always natural to use
> column names.

- I also have the similar experience w.r.t. using column ordinals.


But it seems counterintuitive to enable flag for this feature. Enabling
it, should not disable grouping/ordering by the column names?
- No, it wont disable the grouping/ordering by the column names. One
"disadvantage" is that once the flag is enabled, "GROUP BY 1" will have a
different meaning and query result.

Regards,
Jeyhun

On Thu, Jun 20, 2024 at 12:12 PM Muhammet Orazov 
wrote:

> Hey Jeyhun,
>
> Thanks for bringing it up! +-1 from my side.
>
> Personally, I find this feature confusing, it feels always natural to
> use
> column names. SQL power users will ask for it, I have seen it used in
> automated complex queries also.
>
> But it seems counterintuitive to enable flag for this feature. Enabling
> it, should not disable grouping/ordering by the column names?
>
> Best,
> Muhammet
>
>
> On 2024-06-17 20:30, Jeyhun Karimov wrote:
> > Hi devs,
> >
> > I am moving our discussion on the PR thread [1] to the dev mailing list
> > to
> > close the loop on the related issue [2]. The end goal of the PR is to
> > support grouping/ordering by via column ordinals. The target
> > implementation
> > (currently there is no flag) should support a flag, so that a user can
> > also
> > use the old behavior as suggested by @Timo.
> >
> > Some vendors such as Postgres [3], SQLite [4], MySQL/MariaDB [5],
> > Oracle
> > [6], Spark [7], and BigQuery[8] support group/order by clauses with
> > column
> > ordinals.
> >
> > Obviously, supporting this clause might lead to less readable and
> > maintainable SQL code. This might also cause a bit of complications
> > both on
> > the codebase and on the user-experience side. On the other hand, we
> > already
> > see that numerous vendors support this feature out of the box, because
> > there was/is a need for this feature.
> >
> > That is why, I would like to discuss and hear your opinions about
> > introducing/abandoning this feature.
> >
> > Regards,
> > Jeyhun
> >
> > [1] https://github.com/apache/flink/pull/24270
> > [2] https://issues.apache.org/jira/browse/FLINK-34366
> > [3] https://www.postgresql.org/docs/6.5/sql-select.htm
> > [4] https://www.sqlite.org/lang_select.html
> > [5] https://www.db-fiddle.com/f/uTrfRrNs4uXLr4Q9j2piCk/1
> > [6]
> >
> https://oracle-base.com/articles/23/group-by-and-having-clause-using-column-alias-or-column-position-23
> > [7]
> >
> https://github.com/apache/spark/commit/90613df652d45e121ab2b3a5bbb3b63cb15d297a
> > [8]
> >
> https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#group_by_col_ordinals
>


[jira] [Created] (FLINK-35665) Release Testing: FLIP-441: Show the JobType and remove Execution Mode on Flink WebUI

2024-06-21 Thread Rui Fan (Jira)
Rui Fan created FLINK-35665:
---

 Summary: Release Testing:  FLIP-441: Show the JobType and remove 
Execution Mode on Flink WebUI 
 Key: FLINK-35665
 URL: https://issues.apache.org/jira/browse/FLINK-35665
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
 Fix For: 1.20.0
 Attachments: image-2024-06-21-15-51-53-480.png

Test suggestion:

 

1. Using this following job to check the jobType
{code:java}
import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.connector.source.util.ratelimit.RateLimiterStrategy;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.RestOptions;
import org.apache.flink.connector.datagen.source.DataGeneratorSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/** Test for showing job type in Flink WebUI. */
public class JobTypeDemo {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(conf);

        env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
//                env.setRuntimeMode(RuntimeExecutionMode.BATCH);
//                env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

        DataGeneratorSource generatorSource =
                new DataGeneratorSource<>(
                        value -> value,
                        600,
                        RateLimiterStrategy.perSecond(10),
                        Types.LONG);
        env.fromSource(generatorSource, WatermarkStrategy.noWatermarks(), "Data 
Generator")
                .map((MapFunction) value -> value)
                .name("Map___1")
                .print();
        env.execute(JobTypeDemo.class.getSimpleName());
    }
} {code}
2. Start it and check if the jobType is Streaming in Flink web UI.

  !image-2024-06-21-15-49-40-729.png|width=1581,height=662!

3. Applying the env.setRuntimeMode(RuntimeExecutionMode.BATCH);, and check if 
the jobType is Batch in Flink web UI.

4. Applying the env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);, and check 
if the jobType is Batch in Flink web UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)