[jira] [Created] (FLINK-31275) Generate job id in planner
Shammon created FLINK-31275: --- Summary: Generate job id in planner Key: FLINK-31275 URL: https://issues.apache.org/jira/browse/FLINK-31275 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.18.0 Reporter: Shammon Currently flink generates job id in `JobGraph`. We need to generate job id in planner and create relations between source and sink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31274) Python code examples in documentation are not complete
Ari Huttunen created FLINK-31274: Summary: Python code examples in documentation are not complete Key: FLINK-31274 URL: https://issues.apache.org/jira/browse/FLINK-31274 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ari Huttunen Because the python examples in the documentation do not contain all the code needed to run the examples, a new user cannot easily run the examples. This ticket is done when * Each documentation page contains all the code needed to run the examples on that page without needing to copy code from other pages * Code is not partially left out e.g. t_env = TableEnvironment.create(...) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31273) Left join with IS_NULL filter be wrongly pushed down and get wrong join results
Yunhong Zheng created FLINK-31273: - Summary: Left join with IS_NULL filter be wrongly pushed down and get wrong join results Key: FLINK-31273 URL: https://issues.apache.org/jira/browse/FLINK-31273 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.16.1, 1.17.0 Reporter: Yunhong Zheng Fix For: 1.17.1 Left join with IS_NULL filter be wrongly pushed down and get wrong join results. The sql is: {code:java} SELECT * FROM MyTable1 LEFT JOIN MyTable2 ON a1 = a2 WHERE a2 IS NULL AND a1 < 10 The wrongly plan is: LogicalProject(a1=[$0], b1=[$1], c1=[$2], b2=[$3], c2=[$4], a2=[$5]) +- LogicalFilter(condition=[IS NULL($5)]) +- LogicalJoin(condition=[=($0, $5)], joinType=[left]) :- LogicalValues(tuples=[[]]) +- LogicalTableScan(table=[[default_catalog, default_database, MyTable2]]) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31272) Duplicate operators appear in the StreamGraph for Python DataStream API jobs
Dian Fu created FLINK-31272: --- Summary: Duplicate operators appear in the StreamGraph for Python DataStream API jobs Key: FLINK-31272 URL: https://issues.apache.org/jira/browse/FLINK-31272 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.16.0 Reporter: Dian Fu For the following job: {code} import argparse import json import sys import time from typing import Iterable, cast from pyflink.common import Types, Time, Encoder from pyflink.datastream import StreamExecutionEnvironment, ProcessWindowFunction, EmbeddedRocksDBStateBackend, \ PredefinedOptions, FileSystemCheckpointStorage, CheckpointingMode, ExternalizedCheckpointCleanup from pyflink.datastream.connectors.file_system import FileSink, RollingPolicy, OutputFileConfig from pyflink.datastream.state import ReducingState, ReducingStateDescriptor from pyflink.datastream.window import TimeWindow, Trigger, TriggerResult, T, TumblingProcessingTimeWindows, \ ProcessingTimeTrigger class CountWithProcessTimeoutTrigger(ProcessingTimeTrigger): def __init__(self, window_size: int): self._window_size = window_size self._count_state_descriptor = ReducingStateDescriptor( "count", lambda a, b: a + b, Types.LONG()) @staticmethod def of(window_size: int) -> 'CountWithProcessTimeoutTrigger': return CountWithProcessTimeoutTrigger(window_size) def on_element(self, element: T, timestamp: int, window: TimeWindow, ctx: 'Trigger.TriggerContext') -> TriggerResult: count_state = cast(ReducingState, ctx.get_partitioned_state(self._count_state_descriptor)) count_state.add(1) # print("element arrive:", element, "count_state:", count_state.get(), window.max_timestamp(), # ctx.get_current_watermark()) if count_state.get() >= self._window_size: # 必须fire&purge print("fire element count", element, count_state.get(), window.max_timestamp(), ctx.get_current_watermark()) count_state.clear() return TriggerResult.FIRE_AND_PURGE if timestamp >= window.end: count_state.clear() return TriggerResult.FIRE_AND_PURGE else: return TriggerResult.CONTINUE def on_processing_time(self, timestamp: int, window: TimeWindow, ctx: Trigger.TriggerContext) -> TriggerResult: if timestamp >= window.end: return TriggerResult.CONTINUE else: print("fire with process_time:", timestamp) count_state = cast(ReducingState, ctx.get_partitioned_state(self._count_state_descriptor)) count_state.clear() return TriggerResult.FIRE_AND_PURGE def on_event_time(self, timestamp: int, window: TimeWindow, ctx: 'Trigger.TriggerContext') -> TriggerResult: return TriggerResult.CONTINUE def clear(self, window: TimeWindow, ctx: 'Trigger.TriggerContext') -> None: count_state = ctx.get_partitioned_state(self._count_state_descriptor) count_state.clear() def to_dict_map(v): time.sleep(1) dict_value = json.loads(v) return dict_value def get_group_key(value, keys): group_key_values = [] for key in keys: one_key_value = 'null' if key in value: list_value = value[key] if list_value: one_key_value = str(list_value[0]) group_key_values.append(one_key_value) group_key = '_'.join(group_key_values) # print("group_key=", group_key) return group_key class CountWindowProcessFunction(ProcessWindowFunction[dict, dict, str, TimeWindow]): def __init__(self, uf): self._user_function = uf def process(self, key: str, context: ProcessWindowFunction.Context[TimeWindow], elements: Iterable[dict]) -> Iterable[dict]: result_list = self._user_function.process_after_group_by_function(elements) return result_list if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( '--output', dest='output', required=False, help='Output file to write results to.') argv = sys.argv[1:] known_args, _ = parser.parse_known_args(argv) output_path = known_args.output env = StreamExecutionEnvironment.get_execution_environment() # write all the data to one file env.set_parallelism(1) # process time env.get_config().set_auto_watermark_interval(0) state_backend = EmbeddedRocksDBStateBackend(True) state_backend.set_predefined_options(PredefinedOptions.FLASH_SSD_OP
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
+1 (binding) Looking forward to this :) Gyula On Wed, 1 Mar 2023 at 04:02, feng xiangyu wrote: > +1 (non-binding) > > ConradJam 于2023年3月1日周三 10:37写道: > > > +1 (non-binding) > > > > Zhanghao Chen 于2023年3月1日周三 10:18写道: > > > > > Thanks for driving this. +1 (non-binding) > > > > > > Best, > > > Zhanghao Chen > > > > > > From: David Mor?vek > > > Sent: Tuesday, February 28, 2023 21:46 > > > To: dev > > > Subject: [VOTE] FLIP-291: Externalized Declarative Resource Management > > > > > > Hi Everyone, > > > > > > I want to start the vote on FLIP-291: Externalized Declarative Resource > > > Management [1]. The FLIP was discussed in this thread [2]. > > > > > > The goal of the FLIP is to enable external declaration of the resource > > > requirements of a running job. > > > > > > The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 > > CET) > > > unless > > > there is an objection or insufficient votes. > > > > > > [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > > > > > Best, > > > D. > > > > > > > > > -- > > Best > > > > ConradJam > > >
[jira] [Created] (FLINK-31271) Introduce system database for catalog in table store
Shammon created FLINK-31271: --- Summary: Introduce system database for catalog in table store Key: FLINK-31271 URL: https://issues.apache.org/jira/browse/FLINK-31271 Project: Flink Issue Type: Improvement Components: Table Store Affects Versions: table-store-0.4.0 Reporter: Shammon Introduce a system database for each catalog in table store to manage catalog information such as tables dependencies, relations between snapshots and checkpoints for each table -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager
Thanks Weihua for preparing this FLIP. +1 for the proposal. As one of the contributors of the fine-grained slot manager, I'd like to share some backgrounds here. - There used to be a defaut slot manager implementation, which is non-declarative and has been removed now. The two features, declarative / reactive resource management and fine-grained resource management, were proposed at about the same time. We were aware that by design the declarative slot manager is a subset of fine-grained slot manager at that time, but still decided to implement two slot managers for the purpose of decoupling efforts and reducing cross-team synchronization overhead. Merging the two slot managers once they are proved stable is IMO a technical debt that was planned at the very beginning. - The FineGrainedSlotManager has been verified in Alibaba's internal production as well as Alibaba Cloud services as the default slot manager for about 2 years. Concerning test cases, we currently have a ci stage for fine grained resource management. To avoid adding too much burden, the stage only includes tests from flink-runtime and flink-test modules. I think switching the default slot manager and applying the whole set of tests on the fine-grained slot manager would help us to be more confident about it. Concerning cutting out functionalities out of slot manager, I think Yangze and I have tried our best to shape the FineGrainedSlotManager into reasonable components. I personally don't have other ideas to further disassemble the component, but I'm open to such suggestions. However, from the stability perspective, I'd be in favor of not introducing significant changes to the FineGrainedSlotManager while switching it to the default. Because the current implementation has already been verified (or at least partially verified because Alibaba does not cover all the Flink use cases), and introducing more changes also means more chances of breaking things. Best, Xintong On Wed, Mar 1, 2023 at 11:12 AM Shammon FY wrote: > Hi > > Thanks for starting this work weihua, I think unifying > DeclarativeSlotManager and FineGrainedSlotManager is valuable. > > I agree with @Matthias and @John that we need a way to ensure that > DeclarativeSlotManager's capabilities are fully covered by > FineGrainedSlotManager > > 1. For their functional differences, can you give some detailed tests to > verify that the new FineGrainedSlotManager has these capabilities? This can > effectively verify the new functions > > 2. I'm worried that many functions are not independent and it is difficult > to migrate step-by-step. You can list the relationship between them in > detail. > > 3. As John mentioned, give a smoke test for FineGrainedSlotManager is a > good idea. Or you can add some test information to the > DeclarativeSlotManager to determine how many tests have used it. In this > way, we can gradually construct test cases for FineGrainedSlotManager > during the development process. > > > Best, > Shammon > > > On Tue, Feb 28, 2023 at 10:22 PM John Roesler wrote: > > > Thanks for the FLIP, Weihua! > > > > I’ve read the FLIP, and it sounds good to me. We need to avoid > > proliferating alternative implementations wherever possible. I have just > a > > couple of comments: > > > > 1. I share Matthias’s concern about ensuring the behavior is really the > > same. One suggestion I’ve used for this kind of thing is, as a smoke > test, > > to update the DeclarativeSlotManager to just delegate to the > > FineGrainedSlotManager. If the full test suite still passes, you can be > > pretty sure the new default is really ok. It would not be a good idea to > > actually keep that in for the release, since it would remove the option > to > > fall back in case of bugs. Either way, we need to make sure all test > > scenarios are present for the FGSM. > > > > 4. In addition to changing the default, would it make sense to log a > > deprecation warning on initialization if the DeclarativeSlotManager is > used? > > > > Thanks again, > > John > > > > On Tue, Feb 28, 2023, at 07:20, Matthias Pohl wrote: > > > Hi Weihua, > > > Thanks for your proposal. From a conceptual point: AFAIU, the > > > DeclarativeSlotManager covers a subset (i.e. only evenly sized slots) > of > > > what the FineGrainedSlotManager should be able to achieve (variable > slot > > > size per task manager). Is this the right assumption/understanding? In > > this > > > sense, merging both implementations into a single one sounds good. A > few > > > more general comments, though: > > > > > > 1. Did you do a proper test coverage analysis? That's not mentioned in > > the > > > current version of the FLIP. I'm bringing this up because we ran into > the > > > same issue when fixing the flaws that popped up after introducing the > > > multi-component leader election (see FLIP-285 [1]). There is a risk > that > > by > > > removing the legacy code we decrease test coverage because certain > > > test cases that were covered
[jira] [Created] (FLINK-31270) Fix flink jar name in docs for table store
Shammon created FLINK-31270: --- Summary: Fix flink jar name in docs for table store Key: FLINK-31270 URL: https://issues.apache.org/jira/browse/FLINK-31270 Project: Flink Issue Type: Bug Components: Table Store Affects Versions: table-store-0.4.0 Reporter: Shammon -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31269) Split hive connector to each module of each version
Shammon created FLINK-31269: --- Summary: Split hive connector to each module of each version Key: FLINK-31269 URL: https://issues.apache.org/jira/browse/FLINK-31269 Project: Flink Issue Type: Improvement Components: Table Store Affects Versions: table-store-0.4.0 Reporter: Shammon -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31268) OperatorCoordinator.Context#metricGroup will return null when restore from a savepoint
Hang Ruan created FLINK-31268: - Summary: OperatorCoordinator.Context#metricGroup will return null when restore from a savepoint Key: FLINK-31268 URL: https://issues.apache.org/jira/browse/FLINK-31268 Project: Flink Issue Type: Improvement Components: Runtime / Metrics Reporter: Hang Ruan The `metricGroup` is initialized lazily in the method `OperatorCoordinatorHandler#initializeOperatorCoordinators`. This will cause the NullPointerException when we use it in the method like `Source#restoreEnumerator`, which will be invoked through `SchedulerBase#createAndRestoreExecutionGraph` before `OperatorCoordinatorHandler#initializeOperatorCoordinators` in class `SchedulerBase#`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31267) Fine-Grained Resource Management supports table and sql levels
waywtdcc created FLINK-31267: Summary: Fine-Grained Resource Management supports table and sql levels Key: FLINK-31267 URL: https://issues.apache.org/jira/browse/FLINK-31267 Project: Flink Issue Type: New Feature Components: Table SQL / API Affects Versions: 1.16.1 Reporter: waywtdcc Fine-Grained Resource Management supports table and sql levels. Now Fine-Grained Resource can only be used at the datastream api level, and does not support table and sql level settings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager
Hi Thanks for starting this work weihua, I think unifying DeclarativeSlotManager and FineGrainedSlotManager is valuable. I agree with @Matthias and @John that we need a way to ensure that DeclarativeSlotManager's capabilities are fully covered by FineGrainedSlotManager 1. For their functional differences, can you give some detailed tests to verify that the new FineGrainedSlotManager has these capabilities? This can effectively verify the new functions 2. I'm worried that many functions are not independent and it is difficult to migrate step-by-step. You can list the relationship between them in detail. 3. As John mentioned, give a smoke test for FineGrainedSlotManager is a good idea. Or you can add some test information to the DeclarativeSlotManager to determine how many tests have used it. In this way, we can gradually construct test cases for FineGrainedSlotManager during the development process. Best, Shammon On Tue, Feb 28, 2023 at 10:22 PM John Roesler wrote: > Thanks for the FLIP, Weihua! > > I’ve read the FLIP, and it sounds good to me. We need to avoid > proliferating alternative implementations wherever possible. I have just a > couple of comments: > > 1. I share Matthias’s concern about ensuring the behavior is really the > same. One suggestion I’ve used for this kind of thing is, as a smoke test, > to update the DeclarativeSlotManager to just delegate to the > FineGrainedSlotManager. If the full test suite still passes, you can be > pretty sure the new default is really ok. It would not be a good idea to > actually keep that in for the release, since it would remove the option to > fall back in case of bugs. Either way, we need to make sure all test > scenarios are present for the FGSM. > > 4. In addition to changing the default, would it make sense to log a > deprecation warning on initialization if the DeclarativeSlotManager is used? > > Thanks again, > John > > On Tue, Feb 28, 2023, at 07:20, Matthias Pohl wrote: > > Hi Weihua, > > Thanks for your proposal. From a conceptual point: AFAIU, the > > DeclarativeSlotManager covers a subset (i.e. only evenly sized slots) of > > what the FineGrainedSlotManager should be able to achieve (variable slot > > size per task manager). Is this the right assumption/understanding? In > this > > sense, merging both implementations into a single one sounds good. A few > > more general comments, though: > > > > 1. Did you do a proper test coverage analysis? That's not mentioned in > the > > current version of the FLIP. I'm bringing this up because we ran into the > > same issue when fixing the flaws that popped up after introducing the > > multi-component leader election (see FLIP-285 [1]). There is a risk that > by > > removing the legacy code we decrease test coverage because certain > > test cases that were covered for the legacy classes might not be > > necessarily covered in the new implementation, yet (see FLINK-30338 [2] > > which covers this issue for the leader election case). Ideally, we don't > > want to remove test cases accidentally because they were only implemented > > for the DeclarativeSlotManager but missed for the FineGrainedSlotManager. > > > > 2. DeclarativeSlotManager and FineGrainedSlotManager feel quite big in > > terms of lines of code. Without knowing whether it's actually a > reasonable > > thing to do: Instead of just adding more features to the > > FineGrainedSlotManager, have you thought of cutting out functionality > into > > smaller sub-components along this refactoring? Such a step-by-step > approach > > might improve the overall codebase and might make reviewing the > refactoring > > easier. I did a first pass over the code and struggled to identify code > > blocks that could be moved out of the SlotManager implementation(s). > > Therefore, I might be wrong with this proposal. I haven't worked on this > > codebase in that detail that it would allow me to come up with a > judgement > > call. I wanted to bring it up, anyway, because I'm curious whether that > > could be an option. There's a comment created by Chesnay (CC'd) in the > > JavaDoc of TaskExecutorManager [3] indicating something similar. I'm > > wondering whether he can add some insights here. > > > > 3. For me personally, having a more detailed summary comparing the > > subcomponents of both SlotManager implementations with where > > their functionality matches and where they differ might help understand > the > > consequences of the changes proposed in FLIP-298. > > > > Best, > > Matthias > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > > [2] https://issues.apache.org/jira/browse/FLINK-30338 > > [3] > > > https://github.com/apache/flink/blob/f611ea8cb5deddb42429df2c99f0c68d7382e9bd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/TaskExecutorManager.java#L66-L68 > > > > On Tue, Feb 28, 2023 at 6:14
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
+1 (non-binding) ConradJam 于2023年3月1日周三 10:37写道: > +1 (non-binding) > > Zhanghao Chen 于2023年3月1日周三 10:18写道: > > > Thanks for driving this. +1 (non-binding) > > > > Best, > > Zhanghao Chen > > > > From: David Mor?vek > > Sent: Tuesday, February 28, 2023 21:46 > > To: dev > > Subject: [VOTE] FLIP-291: Externalized Declarative Resource Management > > > > Hi Everyone, > > > > I want to start the vote on FLIP-291: Externalized Declarative Resource > > Management [1]. The FLIP was discussed in this thread [2]. > > > > The goal of the FLIP is to enable external declaration of the resource > > requirements of a running job. > > > > The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 > CET) > > unless > > there is an objection or insufficient votes. > > > > [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > > > Best, > > D. > > > > > -- > Best > > ConradJam >
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
+1 (non-binding) Zhanghao Chen 于2023年3月1日周三 10:18写道: > Thanks for driving this. +1 (non-binding) > > Best, > Zhanghao Chen > > From: David Mor?vek > Sent: Tuesday, February 28, 2023 21:46 > To: dev > Subject: [VOTE] FLIP-291: Externalized Declarative Resource Management > > Hi Everyone, > > I want to start the vote on FLIP-291: Externalized Declarative Resource > Management [1]. The FLIP was discussed in this thread [2]. > > The goal of the FLIP is to enable external declaration of the resource > requirements of a running job. > > The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 CET) > unless > there is an objection or insufficient votes. > > [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > Best, > D. > -- Best ConradJam
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
Thanks for driving this. +1 (non-binding) Best, Zhanghao Chen From: David Mor?vek Sent: Tuesday, February 28, 2023 21:46 To: dev Subject: [VOTE] FLIP-291: Externalized Declarative Resource Management Hi Everyone, I want to start the vote on FLIP-291: Externalized Declarative Resource Management [1]. The FLIP was discussed in this thread [2]. The goal of the FLIP is to enable external declaration of the resource requirements of a running job. The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 CET) unless there is an objection or insufficient votes. [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management Best, D.
[jira] [Created] (FLINK-31266) dashboard info error (received and send alway show 0 when having data)
linqichen created FLINK-31266: - Summary: dashboard info error (received and send alway show 0 when having data) Key: FLINK-31266 URL: https://issues.apache.org/jira/browse/FLINK-31266 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.14.4 Reporter: linqichen Attachments: receivedAndSend0.jpg !receivedAndSend0.jpg! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
+1 (binding) Am Di., 28. Feb. 2023 um 15:02 Uhr schrieb John Roesler : > Thanks for the FLIP, David! > > I’m +1 (non-binding) > > -John > > On Tue, Feb 28, 2023, at 07:46, David Morávek wrote: > > Hi Everyone, > > > > I want to start the vote on FLIP-291: Externalized Declarative Resource > > Management [1]. The FLIP was discussed in this thread [2]. > > > > The goal of the FLIP is to enable external declaration of the resource > > requirements of a running job. > > > > The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 > CET) > > unless > > there is an objection or insufficient votes. > > > > [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > > > Best, > > D. > -- https://twitter.com/snntrable https://github.com/knaufk
Re: [VOTE] Release 1.15.4, release candidate #1
Thanks Danny, +1 (non-binding) - Verified hashes and signatures - Built Source archive using maven - Web PR looks good. - Started WordCount Example On Tue, 28 Feb 2023 at 16:37, Jing Ge wrote: > Thanks Danny, > > +1 (non-binding) > > - GPG signatures looks good > - checked dist and maven repo > - maven clean install from source > - checked version consistency in pom files > - went through the web release notes and found one task is still open: > FLINK-31133 [1] > - download artifacts > - started/stopped local cluster and ran WordCount job in streaming and > batch > > Best regards, > Jing > > [1] https://issues.apache.org/jira/browse/FLINK-31133 > > On Tue, Feb 28, 2023 at 3:12 PM Matthias Pohl > wrote: > > > Thanks Danny. > > > > +1 (non-binding) > > > > * Downloaded artifacts > > * Built Flink from sources > > * Verified SHA512 checksums GPG signatures > > * Compared checkout with provided sources > > * Verified pom file versions > > * Went over NOTICE file/pom files changes without finding anything > > suspicious > > * Deployed standalone session cluster and ran WordCount example in batch > > and streaming: Nothing suspicious in log files found > > > > On Tue, Feb 28, 2023 at 9:50 AM Teoh, Hong > > > wrote: > > > > > Thanks Danny for driving this > > > > > > +1 (non-binding) > > > > > > * Hashes and Signatures look good > > > * All required files on dist.apache.org > > > * Source archive builds using maven > > > * Started packaged example WordCountSQLExample job > > > * Web PR looks good. > > > > > > Cheers, > > > Hong > > > > > > > > > > > > > On 24 Feb 2023, at 05:36, Weihua Hu wrote: > > > > > > > > CAUTION: This email originated from outside of the organization. Do > not > > > click links or open attachments unless you can confirm the sender and > > know > > > the content is safe. > > > > > > > > > > > > > > > > Thanks Danny. > > > > > > > > +1(non-binding) > > > > > > > > Tested the following: > > > > - Download the artifacts and build image > > > > - Ran WordCount on Kubernetes(session mode and application mode) > > > > > > > > > > > > Best, > > > > Weihua > > > > > > > > > > > > On Fri, Feb 24, 2023 at 12:29 PM Yanfei Lei > > wrote: > > > > > > > >> Thanks Danny. > > > >> +1 (non-binding) > > > >> > > > >> - Downloaded artifacts & built Flink from sources > > > >> - Verified GPG signatures of bin and source. > > > >> - Verified version in poms > > > >> - Ran WordCount example in streaming and batch mode(standalone > > cluster) > > > >> - Went over flink-web PR, looks good except for Sergey's remark. > > > >> > > > >> Danny Cranmer 于2023年2月24日周五 02:08写道: > > > >>> > > > >>> Hi everyone, > > > >>> Please review and vote on the release candidate #1 for the version > > > >> 1.15.4, > > > >>> as follows: > > > >>> [ ] +1, Approve the release > > > >>> [ ] -1, Do not approve the release (please provide specific > comments) > > > >>> > > > >>> > > > >>> The complete staging area is available for your review, which > > includes: > > > >>> * JIRA release notes [1], > > > >>> * the official Apache source release and binary convenience > releases > > to > > > >> be > > > >>> deployed to dist.apache.org [2], which are signed with the key > with > > > >>> fingerprint 125FD8DB [3], > > > >>> * all artifacts to be deployed to the Maven Central Repository [4], > > > >>> * source code tag "release-1.15.4-rc1" [5], > > > >>> * website pull request listing the new release and adding > > announcement > > > >> blog > > > >>> post [6]. > > > >>> > > > >>> The vote will be open for at least 72 hours (excluding weekends > > > >> 2023-02-28 > > > >>> 19:00). It is adopted by majority approval, with at least 3 PMC > > > >> affirmative > > > >>> votes. > > > >>> > > > >>> Thanks, > > > >>> Danny > > > >>> > > > >>> [1] > > > >>> > > > >> > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352526 > > > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc1/ > > > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > >>> [4] > > > >>> > > > >> > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1588/org/apache/flink/ > > > >>> [5] > https://github.com/apache/flink/releases/tag/release-1.15.4-rc1 > > > >>> [6] https://github.com/apache/flink-web/pull/611 > > > >> > > > >> > > > >> > > > >> -- > > > >> Best, > > > >> Yanfei > > > >> > > > > > > > > >
[jira] [Created] (FLINK-31265) Add smoke test for Pulsar connector
Weijie Guo created FLINK-31265: -- Summary: Add smoke test for Pulsar connector Key: FLINK-31265 URL: https://issues.apache.org/jira/browse/FLINK-31265 Project: Flink Issue Type: Sub-task Reporter: Weijie Guo At present, pulsar connector only supports datastream job. We should take the following two steps: 1. Add smoke test for datastream uber jar. 2. During the code review process for SQL/Table API support, ensure that the corresponding smoke test is not missing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31264) Add smoke test for HBase connector
Weijie Guo created FLINK-31264: -- Summary: Add smoke test for HBase connector Key: FLINK-31264 URL: https://issues.apache.org/jira/browse/FLINK-31264 Project: Flink Issue Type: Sub-task Components: Connectors / HBase, Tests Reporter: Weijie Guo 1.Sync `SQLClientHBaseITCase` to flink-connector-hbase and re-write it by test {{TestContainer}}. 2.Introduce smoke test for habse datastream job. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31263) Add smoke test for ElasticSearch connector
Weijie Guo created FLINK-31263: -- Summary: Add smoke test for ElasticSearch connector Key: FLINK-31263 URL: https://issues.apache.org/jira/browse/FLINK-31263 Project: Flink Issue Type: Sub-task Components: Connectors / ElasticSearch Reporter: Weijie Guo Assignee: Weijie Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Release 1.15.4, release candidate #1
Thanks Danny, +1 (non-binding) - GPG signatures looks good - checked dist and maven repo - maven clean install from source - checked version consistency in pom files - went through the web release notes and found one task is still open: FLINK-31133 [1] - download artifacts - started/stopped local cluster and ran WordCount job in streaming and batch Best regards, Jing [1] https://issues.apache.org/jira/browse/FLINK-31133 On Tue, Feb 28, 2023 at 3:12 PM Matthias Pohl wrote: > Thanks Danny. > > +1 (non-binding) > > * Downloaded artifacts > * Built Flink from sources > * Verified SHA512 checksums GPG signatures > * Compared checkout with provided sources > * Verified pom file versions > * Went over NOTICE file/pom files changes without finding anything > suspicious > * Deployed standalone session cluster and ran WordCount example in batch > and streaming: Nothing suspicious in log files found > > On Tue, Feb 28, 2023 at 9:50 AM Teoh, Hong > wrote: > > > Thanks Danny for driving this > > > > +1 (non-binding) > > > > * Hashes and Signatures look good > > * All required files on dist.apache.org > > * Source archive builds using maven > > * Started packaged example WordCountSQLExample job > > * Web PR looks good. > > > > Cheers, > > Hong > > > > > > > > > On 24 Feb 2023, at 05:36, Weihua Hu wrote: > > > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and > know > > the content is safe. > > > > > > > > > > > > Thanks Danny. > > > > > > +1(non-binding) > > > > > > Tested the following: > > > - Download the artifacts and build image > > > - Ran WordCount on Kubernetes(session mode and application mode) > > > > > > > > > Best, > > > Weihua > > > > > > > > > On Fri, Feb 24, 2023 at 12:29 PM Yanfei Lei > wrote: > > > > > >> Thanks Danny. > > >> +1 (non-binding) > > >> > > >> - Downloaded artifacts & built Flink from sources > > >> - Verified GPG signatures of bin and source. > > >> - Verified version in poms > > >> - Ran WordCount example in streaming and batch mode(standalone > cluster) > > >> - Went over flink-web PR, looks good except for Sergey's remark. > > >> > > >> Danny Cranmer 于2023年2月24日周五 02:08写道: > > >>> > > >>> Hi everyone, > > >>> Please review and vote on the release candidate #1 for the version > > >> 1.15.4, > > >>> as follows: > > >>> [ ] +1, Approve the release > > >>> [ ] -1, Do not approve the release (please provide specific comments) > > >>> > > >>> > > >>> The complete staging area is available for your review, which > includes: > > >>> * JIRA release notes [1], > > >>> * the official Apache source release and binary convenience releases > to > > >> be > > >>> deployed to dist.apache.org [2], which are signed with the key with > > >>> fingerprint 125FD8DB [3], > > >>> * all artifacts to be deployed to the Maven Central Repository [4], > > >>> * source code tag "release-1.15.4-rc1" [5], > > >>> * website pull request listing the new release and adding > announcement > > >> blog > > >>> post [6]. > > >>> > > >>> The vote will be open for at least 72 hours (excluding weekends > > >> 2023-02-28 > > >>> 19:00). It is adopted by majority approval, with at least 3 PMC > > >> affirmative > > >>> votes. > > >>> > > >>> Thanks, > > >>> Danny > > >>> > > >>> [1] > > >>> > > >> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352526 > > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc1/ > > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > >>> [4] > > >>> > > >> > > > https://repository.apache.org/content/repositories/orgapacheflink-1588/org/apache/flink/ > > >>> [5] https://github.com/apache/flink/releases/tag/release-1.15.4-rc1 > > >>> [6] https://github.com/apache/flink-web/pull/611 > > >> > > >> > > >> > > >> -- > > >> Best, > > >> Yanfei > > >> > > > > >
[jira] [Created] (FLINK-31262) Add sql connector fat jar test to SmokeKafkaITCase
Weijie Guo created FLINK-31262: -- Summary: Add sql connector fat jar test to SmokeKafkaITCase Key: FLINK-31262 URL: https://issues.apache.org/jira/browse/FLINK-31262 Project: Flink Issue Type: Sub-task Components: Connectors / Kafka, Tests Reporter: Weijie Guo Assignee: Weijie Guo {{SmokeKafkaITCase}} only covered the application packaging test of datastream job, we should also bring this kind of test to sql job submitted by sql-client to cover the case of sql connector fat jar. In fact, we already have similar test class with the same purpose, that is {{SQLClientKafkaITCase}}, but this test is ignored since 2021/4/20. It depend on the {{LocalStandaoneKafkaResource}} which download {{kafka}} and sets up a local cluster, but we often encounter download failures in CI environments. Fortunately, now we prefer to use {{TestContainer}} in E2E test. So I suggest integrating the corresponding test logic into {{SmokeKafkaITCase}}, and then removing it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-287: Extend Sink#InitContext to expose ExecutionConfig and JobID
I have update the FLIP with the 2 options that we have discussed.. Option 1: Expose ExecutionConfig directly on InitContext this have a minimal impact as we only have to expose the new methods Option 2: Expose ReadableExecutionConfig on InitContext with this option we have more impact as we need to add a new method to TypeInformation and change all implementations (current exists 72 implementations) Waiting for feedback or concerns about the two options
Re: Contributing a Google Cloud Pub/Sub Lite source and sink?
Thanks! My Jira username is `dpcollins-google`. -Daniel On Tue, Feb 28, 2023 at 10:19 AM Martijn Visser wrote: > Hi Daniel, > > If you can provide your Jira username, those permissions can be granted. > > Best regards, > > Martijn > > On Tue, Feb 28, 2023 at 3:13 PM Daniel Collins > > wrote: > > > Hello, > > > > Absent any comments, I was attempting to create a connector FLIP as > > described above, however it appears that I do not have permissions to > edit > > confluence to do so. Can someone please add access for me? > > > > -Daniel > > > > On Mon, Feb 27, 2023 at 10:28 AM Daniel Collins > > wrote: > > > > > Hello Martijn, > > > > > > Thanks for the redirect! > > > > > > > The process for contributing a connector is to create a Connector > FLIP > > > [1], which needs to be discussed and voted on in the Dev mailing list > [2] > > > > > > I don't mind doing this, but would like to get a read of the room > before > > > doing so. If people object to it, then it probably doesn't make sense. > > > > > > > One thing in particular is who can help with the maintenance of the > > > connector: will there be more volunteers who can help with bug fixes, > new > > > features etc. > > > > > > In this case, the product team is willing to handle feature requests > and > > > bug reports for the connector, as we currently do for our beam and > spark > > > connectors. I don't know if there is any mechanism for sending emails > > when > > > jira tickets with a specific tag are opened? But if there is I can > ensure > > > that any tickets get routed to a member of my team to look at. > > > > > > What would people's thoughts be about this? > > > > > > -Daniel > > > > > > On Mon, Feb 27, 2023 at 3:53 AM Martijn Visser < > martijnvis...@apache.org > > > > > > wrote: > > > > > >> Hi Daniel, > > >> > > >> Thanks for reaching out. Keep in mind that you weren't subscribed to > the > > >> Flink Dev mailing list, I've just pushed this through the mailing list > > >> moderation. > > >> > > >> The process for contributing a connector is to create a Connector FLIP > > >> [1], > > >> which needs to be discussed and voted on in the Dev mailing list [2]. > > One > > >> thing in particular is who can help with the maintenance of the > > connector: > > >> will there be more volunteers who can help with bug fixes, new > features > > >> etc. As we've seen with the current PubSub connector, that's already > > quite > > >> hard: it's currently lacking volunteers overall. I do recall a > proposal > > to > > >> contribute a PubSub Lite connector a while back, but that ultimately > was > > >> not followed through. > > >> > > >> Best regards, > > >> > > >> Martijn > > >> > > >> [1] > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template > > >> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > >> > > >> On Mon, Feb 27, 2023 at 9:44 AM Daniel Collins > > >> > > >> wrote: > > >> > > >> > Hello flink devs, > > >> > > > >> > My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite > > streaming > > >> > system, which is a lower cost streaming data system with semantics > > more > > >> > similar to OSS streaming solutions. I've authored a source and sink > > >> > connector for flink and load tested it at GiB/s scale- what would be > > the > > >> > process for contributing this to flink? > > >> > > > >> > I've opened a JIRA to do this > > >> > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, > if > > >> this > > >> > seems like a reasonable thing to do could someone assign it to me? > > >> > > > >> > The code for the connector currently lives here > > >> > https://github.com/googleapis/java-pubsublite-flink, I believe it > is > > >> > following the FLIP-27 guidelines, but please let me know if I'm > > >> > implementing the wrong interfaces. Which repo and in what folder > > should > > >> I > > >> > move this code into? > > >> > > > >> > -Daniel > > >> > > > >> > > > > > >
Re: Contributing a Google Cloud Pub/Sub Lite source and sink?
Hi Daniel, If you can provide your Jira username, those permissions can be granted. Best regards, Martijn On Tue, Feb 28, 2023 at 3:13 PM Daniel Collins wrote: > Hello, > > Absent any comments, I was attempting to create a connector FLIP as > described above, however it appears that I do not have permissions to edit > confluence to do so. Can someone please add access for me? > > -Daniel > > On Mon, Feb 27, 2023 at 10:28 AM Daniel Collins > wrote: > > > Hello Martijn, > > > > Thanks for the redirect! > > > > > The process for contributing a connector is to create a Connector FLIP > > [1], which needs to be discussed and voted on in the Dev mailing list [2] > > > > I don't mind doing this, but would like to get a read of the room before > > doing so. If people object to it, then it probably doesn't make sense. > > > > > One thing in particular is who can help with the maintenance of the > > connector: will there be more volunteers who can help with bug fixes, new > > features etc. > > > > In this case, the product team is willing to handle feature requests and > > bug reports for the connector, as we currently do for our beam and spark > > connectors. I don't know if there is any mechanism for sending emails > when > > jira tickets with a specific tag are opened? But if there is I can ensure > > that any tickets get routed to a member of my team to look at. > > > > What would people's thoughts be about this? > > > > -Daniel > > > > On Mon, Feb 27, 2023 at 3:53 AM Martijn Visser > > > wrote: > > > >> Hi Daniel, > >> > >> Thanks for reaching out. Keep in mind that you weren't subscribed to the > >> Flink Dev mailing list, I've just pushed this through the mailing list > >> moderation. > >> > >> The process for contributing a connector is to create a Connector FLIP > >> [1], > >> which needs to be discussed and voted on in the Dev mailing list [2]. > One > >> thing in particular is who can help with the maintenance of the > connector: > >> will there be more volunteers who can help with bug fixes, new features > >> etc. As we've seen with the current PubSub connector, that's already > quite > >> hard: it's currently lacking volunteers overall. I do recall a proposal > to > >> contribute a PubSub Lite connector a while back, but that ultimately was > >> not followed through. > >> > >> Best regards, > >> > >> Martijn > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template > >> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > >> > >> On Mon, Feb 27, 2023 at 9:44 AM Daniel Collins > >> > >> wrote: > >> > >> > Hello flink devs, > >> > > >> > My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite > streaming > >> > system, which is a lower cost streaming data system with semantics > more > >> > similar to OSS streaming solutions. I've authored a source and sink > >> > connector for flink and load tested it at GiB/s scale- what would be > the > >> > process for contributing this to flink? > >> > > >> > I've opened a JIRA to do this > >> > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, if > >> this > >> > seems like a reasonable thing to do could someone assign it to me? > >> > > >> > The code for the connector currently lives here > >> > https://github.com/googleapis/java-pubsublite-flink, I believe it is > >> > following the FLIP-27 guidelines, but please let me know if I'm > >> > implementing the wrong interfaces. Which repo and in what folder > should > >> I > >> > move this code into? > >> > > >> > -Daniel > >> > > >> > > >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Ran Tao, Thanks for working on this FLIP. The FLIP is in a pretty good shape already and I don't have much to add. Will we also support ILIKE in queries? Or is this a pure DDL expressions? For consistency, we should support it in SELECT and Table API as well. I hope this is not too much effort. I hope everybody is aware that ILIKE is not standard compliant but seems to be used by a variety of vendors. > Because it may be modified under discuss, we put it on the google docs. please see FLIP-297: Improve Auxiliary Sql Statements Docs This comment confused me. It would be nice to have the Wiki page as the single source of truth and abandon the Google doc. In the past we used Google Docs more but nowadays I support using only the Wiki to avoid any confusion. Regards, Timo On 28.02.23 12:51, Ran Tao wrote: thanks Sergey, sounds good. You can add in FLIP ticket[1]. [1] https://issues.apache.org/jira/browse/FLINK-31256 Best Regards, Ran Tao https://github.com/chucheng92 Sergey Nuyanzin 于2023年2月28日周二 19:44写道: Currently I think we can load from the jar and check the services file to get the connector type. but is it necessary we may continue to discuss. Hi, Sergey, WDYT? Another idea is FactoryUtil#discoverFactories and check if it implements DynamicTableSourceFactory or DynamicTableSinkFactory with versions it could be trickier... Moreover it seems the version could be a part of the name sometimes[1]. I think name and type could be enough or please correct me if I'm wrong or can we open a single ticket under this FLIP? I have a relatively old jira issue[2] for showing connectors with a poc pr. Could I propose to move this jira issue as a subtask under the FLIP one and revive it? [1] https://github.com/apache/flink/blob/161014149e803bfd1d3653badb230b2ed36ce3cb/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/Factory.java#L65-L69 [2] https://issues.apache.org/jira/browse/FLINK-25788 On Tue, Feb 28, 2023 at 11:56 AM Ran Tao wrote: Hi, Jark. thanks. About ILIKE I have updated the FLIP for ILIKE support (Including existing showTables & showColumns how to change). About show connectors @Sergey, Currently I think we can load from the jar and check the services file to get the connector type. but is it necessary we may continue to discuss. Hi, Sergey, WDYT?or can we open a single ticket under this FLIP? Best Regards, Ran Tao Jark Wu 于2023年2月28日周二 17:45写道: Besides, if we introduce the ILIKE, we should also add this feature for the previous SHOW with LIKE statements. They should be included in this FLIP. Best, Jark 2023年2月28日 17:40,Jark Wu 写道: Hi Ran, Could you add descriptions about what’s the behavior and differences between the LIKE and ILIKE? Besides, I don’t see the SHOW CONNECTOR syntax and description and how it works in the FLIP. Is it intended to be included in this FLIP? Best, Jark 2023年2月28日 10:58,Ran Tao 写道: Hi, guys. thanks for advices. allow me to make a small summary: 1.Support ILIKE 2.Using catalog api to support show operations 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA 4.Support SHOW CONNECTORS If there are no other questions, i will try to start a VOTE for this FLIP. WDYT? Best Regards, Ran Tao Sergey Nuyanzin 于2023年2月27日周一 21:12写道: Hi Jark, thanks for your comment. Considering they are orthogonal and information schema requires more complex design and discussion, it deserves a separate FLIP I'm ok with a separate FLIP for INFORMATION_SCHEMA. Sergey, are you willing to contribute this FLIP? Seems I need to have more research done for that. I would try to help/contribute here On Mon, Feb 27, 2023 at 3:46 AM Ran Tao wrote: HI, Jing. thanks. @about ILIKE, from my collections of some popular engines founds that just snowflake has this syntax in show with filtering. do we need to support it? if yes, then current some existed show operations need to be addressed either. @about ShowOperation with like. it's a good idea. yes, two parameters for constructor can work. thanks for your advice. Best Regards, Ran Tao Jing Ge 于2023年2月27日周一 06:29写道: Hi, @Aitozi This is exactly why LoD has been introduced: to avoid exposing internal structure(2nd and lower level API). @Jark IMHO, there is no conflict between LoD and "High power-to-weight ratio" with the given example, List.subList() returns List interface itself, no internal or further interface has been exposed. After offering tEvn.getCatalog(), "all" methods in Catalog Interface have been provided by TableEnvironment(via getCatalog()). From user's perspective and maintenance perspective there is no/less difference between providing them directly via TableEnvironment or via getCatalog(). They are all exposed. Using getCatalog() will reduce the number of boring wrapper methods, but on the other hand not every method in Catalog needs to be exposed, so the number of wrapper methods would b
Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager
Thanks for the FLIP, Weihua! I’ve read the FLIP, and it sounds good to me. We need to avoid proliferating alternative implementations wherever possible. I have just a couple of comments: 1. I share Matthias’s concern about ensuring the behavior is really the same. One suggestion I’ve used for this kind of thing is, as a smoke test, to update the DeclarativeSlotManager to just delegate to the FineGrainedSlotManager. If the full test suite still passes, you can be pretty sure the new default is really ok. It would not be a good idea to actually keep that in for the release, since it would remove the option to fall back in case of bugs. Either way, we need to make sure all test scenarios are present for the FGSM. 4. In addition to changing the default, would it make sense to log a deprecation warning on initialization if the DeclarativeSlotManager is used? Thanks again, John On Tue, Feb 28, 2023, at 07:20, Matthias Pohl wrote: > Hi Weihua, > Thanks for your proposal. From a conceptual point: AFAIU, the > DeclarativeSlotManager covers a subset (i.e. only evenly sized slots) of > what the FineGrainedSlotManager should be able to achieve (variable slot > size per task manager). Is this the right assumption/understanding? In this > sense, merging both implementations into a single one sounds good. A few > more general comments, though: > > 1. Did you do a proper test coverage analysis? That's not mentioned in the > current version of the FLIP. I'm bringing this up because we ran into the > same issue when fixing the flaws that popped up after introducing the > multi-component leader election (see FLIP-285 [1]). There is a risk that by > removing the legacy code we decrease test coverage because certain > test cases that were covered for the legacy classes might not be > necessarily covered in the new implementation, yet (see FLINK-30338 [2] > which covers this issue for the leader election case). Ideally, we don't > want to remove test cases accidentally because they were only implemented > for the DeclarativeSlotManager but missed for the FineGrainedSlotManager. > > 2. DeclarativeSlotManager and FineGrainedSlotManager feel quite big in > terms of lines of code. Without knowing whether it's actually a reasonable > thing to do: Instead of just adding more features to the > FineGrainedSlotManager, have you thought of cutting out functionality into > smaller sub-components along this refactoring? Such a step-by-step approach > might improve the overall codebase and might make reviewing the refactoring > easier. I did a first pass over the code and struggled to identify code > blocks that could be moved out of the SlotManager implementation(s). > Therefore, I might be wrong with this proposal. I haven't worked on this > codebase in that detail that it would allow me to come up with a judgement > call. I wanted to bring it up, anyway, because I'm curious whether that > could be an option. There's a comment created by Chesnay (CC'd) in the > JavaDoc of TaskExecutorManager [3] indicating something similar. I'm > wondering whether he can add some insights here. > > 3. For me personally, having a more detailed summary comparing the > subcomponents of both SlotManager implementations with where > their functionality matches and where they differ might help understand the > consequences of the changes proposed in FLIP-298. > > Best, > Matthias > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > [2] https://issues.apache.org/jira/browse/FLINK-30338 > [3] > https://github.com/apache/flink/blob/f611ea8cb5deddb42429df2c99f0c68d7382e9bd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/TaskExecutorManager.java#L66-L68 > > On Tue, Feb 28, 2023 at 6:14 AM Matt Wang wrote: > >> This is a good proposal for me, it will make the code of the SlotManager >> more clear. >> >> >> >> -- >> >> Best, >> Matt Wang >> >> >> Replied Message >> | From | David Morávek | >> | Date | 02/27/2023 22:45 | >> | To | | >> | Subject | Re: [DISCUSS] FLIP-298: Unifying the Implementation of >> SlotManager | >> Hi Weihua, I still need to dig into the details, but the overall sentiment >> of this change sounds reasonable. >> >> Best, >> D. >> >> On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen >> wrote: >> >> Thanks for driving this topic. I think this FLIP could help clean up the >> codebase to make it easier to maintain. +1 on it. >> >> Best, >> Zhanghao Chen >> >> From: Weihua Hu >> Sent: Monday, February 27, 2023 20:40 >> To: dev >> Subject: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager >> >> Hi everyone, >> >> I would like to begin a discussion on FLIP-298: Unifying the Implementation >> of SlotManager[1]. There are currently two types of SlotManager in Flink: >> DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotMana
Re: Contributing a Google Cloud Pub/Sub Lite source and sink?
Hello, Absent any comments, I was attempting to create a connector FLIP as described above, however it appears that I do not have permissions to edit confluence to do so. Can someone please add access for me? -Daniel On Mon, Feb 27, 2023 at 10:28 AM Daniel Collins wrote: > Hello Martijn, > > Thanks for the redirect! > > > The process for contributing a connector is to create a Connector FLIP > [1], which needs to be discussed and voted on in the Dev mailing list [2] > > I don't mind doing this, but would like to get a read of the room before > doing so. If people object to it, then it probably doesn't make sense. > > > One thing in particular is who can help with the maintenance of the > connector: will there be more volunteers who can help with bug fixes, new > features etc. > > In this case, the product team is willing to handle feature requests and > bug reports for the connector, as we currently do for our beam and spark > connectors. I don't know if there is any mechanism for sending emails when > jira tickets with a specific tag are opened? But if there is I can ensure > that any tickets get routed to a member of my team to look at. > > What would people's thoughts be about this? > > -Daniel > > On Mon, Feb 27, 2023 at 3:53 AM Martijn Visser > wrote: > >> Hi Daniel, >> >> Thanks for reaching out. Keep in mind that you weren't subscribed to the >> Flink Dev mailing list, I've just pushed this through the mailing list >> moderation. >> >> The process for contributing a connector is to create a Connector FLIP >> [1], >> which needs to be discussed and voted on in the Dev mailing list [2]. One >> thing in particular is who can help with the maintenance of the connector: >> will there be more volunteers who can help with bug fixes, new features >> etc. As we've seen with the current PubSub connector, that's already quite >> hard: it's currently lacking volunteers overall. I do recall a proposal to >> contribute a PubSub Lite connector a while back, but that ultimately was >> not followed through. >> >> Best regards, >> >> Martijn >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Connector+Template >> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws >> >> On Mon, Feb 27, 2023 at 9:44 AM Daniel Collins >> >> wrote: >> >> > Hello flink devs, >> > >> > My name is Daniel, I'm the tech lead for Google's Pub/Sub Lite streaming >> > system, which is a lower cost streaming data system with semantics more >> > similar to OSS streaming solutions. I've authored a source and sink >> > connector for flink and load tested it at GiB/s scale- what would be the >> > process for contributing this to flink? >> > >> > I've opened a JIRA to do this >> > https://issues.apache.org/jira/projects/FLINK/issues/FLINK-31229, if >> this >> > seems like a reasonable thing to do could someone assign it to me? >> > >> > The code for the connector currently lives here >> > https://github.com/googleapis/java-pubsublite-flink, I believe it is >> > following the FLIP-27 guidelines, but please let me know if I'm >> > implementing the wrong interfaces. Which repo and in what folder should >> I >> > move this code into? >> > >> > -Daniel >> > >> >
Re: [VOTE] Release 1.15.4, release candidate #1
Thanks Danny. +1 (non-binding) * Downloaded artifacts * Built Flink from sources * Verified SHA512 checksums GPG signatures * Compared checkout with provided sources * Verified pom file versions * Went over NOTICE file/pom files changes without finding anything suspicious * Deployed standalone session cluster and ran WordCount example in batch and streaming: Nothing suspicious in log files found On Tue, Feb 28, 2023 at 9:50 AM Teoh, Hong wrote: > Thanks Danny for driving this > > +1 (non-binding) > > * Hashes and Signatures look good > * All required files on dist.apache.org > * Source archive builds using maven > * Started packaged example WordCountSQLExample job > * Web PR looks good. > > Cheers, > Hong > > > > > On 24 Feb 2023, at 05:36, Weihua Hu wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > > > > > Thanks Danny. > > > > +1(non-binding) > > > > Tested the following: > > - Download the artifacts and build image > > - Ran WordCount on Kubernetes(session mode and application mode) > > > > > > Best, > > Weihua > > > > > > On Fri, Feb 24, 2023 at 12:29 PM Yanfei Lei wrote: > > > >> Thanks Danny. > >> +1 (non-binding) > >> > >> - Downloaded artifacts & built Flink from sources > >> - Verified GPG signatures of bin and source. > >> - Verified version in poms > >> - Ran WordCount example in streaming and batch mode(standalone cluster) > >> - Went over flink-web PR, looks good except for Sergey's remark. > >> > >> Danny Cranmer 于2023年2月24日周五 02:08写道: > >>> > >>> Hi everyone, > >>> Please review and vote on the release candidate #1 for the version > >> 1.15.4, > >>> as follows: > >>> [ ] +1, Approve the release > >>> [ ] -1, Do not approve the release (please provide specific comments) > >>> > >>> > >>> The complete staging area is available for your review, which includes: > >>> * JIRA release notes [1], > >>> * the official Apache source release and binary convenience releases to > >> be > >>> deployed to dist.apache.org [2], which are signed with the key with > >>> fingerprint 125FD8DB [3], > >>> * all artifacts to be deployed to the Maven Central Repository [4], > >>> * source code tag "release-1.15.4-rc1" [5], > >>> * website pull request listing the new release and adding announcement > >> blog > >>> post [6]. > >>> > >>> The vote will be open for at least 72 hours (excluding weekends > >> 2023-02-28 > >>> 19:00). It is adopted by majority approval, with at least 3 PMC > >> affirmative > >>> votes. > >>> > >>> Thanks, > >>> Danny > >>> > >>> [1] > >>> > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352526 > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc1/ > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > >>> [4] > >>> > >> > https://repository.apache.org/content/repositories/orgapacheflink-1588/org/apache/flink/ > >>> [5] https://github.com/apache/flink/releases/tag/release-1.15.4-rc1 > >>> [6] https://github.com/apache/flink-web/pull/611 > >> > >> > >> > >> -- > >> Best, > >> Yanfei > >> > >
Re: [VOTE] FLIP-291: Externalized Declarative Resource Management
Thanks for the FLIP, David! I’m +1 (non-binding) -John On Tue, Feb 28, 2023, at 07:46, David Morávek wrote: > Hi Everyone, > > I want to start the vote on FLIP-291: Externalized Declarative Resource > Management [1]. The FLIP was discussed in this thread [2]. > > The goal of the FLIP is to enable external declaration of the resource > requirements of a running job. > > The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 CET) > unless > there is an objection or insufficient votes. > > [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > Best, > D.
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
Thanks for the answer, David! It sounds like there is a race condition, but it’s a known issue not specific to this FLIP, and the failure case isn’t too bad. I’m satisfied with that. Thanks, John On Thu, Feb 23, 2023, at 10:39, David Morávek wrote: > Hi Everyone, > > @John > > This is a problem that we've spent some time trying to crack; in the end, > we've decided to go against doing any upgrades to JobGraphStore from > JobMaster to avoid having multiple writers that are guarded by different > leader election lock (Dispatcher and JobMaster might live in a different > process). The contract we've decided to choose instead is leveraging the > idempotency of the endpoint and having the user of the API retry in case > we're unable to persist new requirements in the JobGraphStore [1]. We > eventually need to move JobGraphStore out of the dispatcher, but that's way > out of the scope of this FLIP. The solution is a deliberate trade-off. The > worst scenario is that the Dispatcher fails over in between retries, which > would simply rescale the job to meet the previous resource requirements > (more extended unavailability of underlying HA storage would have worse > consequences than this). Does that answer your question? > > @Matthias > > Good catch! I'm fixing it now, thanks! > > [1] > https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9#diff-a4b690fb2c4975d25b05eb4161617af0d704a85ff7b1cad19d3c817c12f1e29cR1151 > > Best, > D. > > On Tue, Feb 21, 2023 at 12:24 AM John Roesler wrote: > >> Thanks for the FLIP, David! >> >> I just had one small question. IIUC, the REST API PUT request will go >> through the new DispatcherGateway method to be handled. Then, after >> validation, the dispatcher would call the new JobMasterGateway method to >> actually update the job. >> >> Which component will write the updated JobGraph? I just wanted to make >> sure it’s the JobMaster because it it were the dispatcher, there could be a >> race condition with the async JobMaster method. >> >> Thanks! >> -John >> >> On Mon, Feb 20, 2023, at 07:34, Matthias Pohl wrote: >> > Thanks for your clarifications, David. I don't have any additional major >> > points to add. One thing about the FLIP: The RPC layer API for updating >> the >> > JRR returns a future with a JRR? I don't see value in returning a JRR >> here >> > since it's an idempotent operation? Wouldn't it be enough to return >> > CompletableFuture here? Or am I missing something? >> > >> > Matthias >> > >> > On Mon, Feb 20, 2023 at 1:48 PM Maximilian Michels >> wrote: >> > >> >> Thanks David! If we could get the pre-allocation working as part of >> >> the FLIP, that would be great. >> >> >> >> Concerning the downscale case, I agree this is a special case for the >> >> (single-job) application mode where we could re-allocate slots in a >> >> way that could leave entire task managers unoccupied which we would >> >> then be able to release. The goal essentially is to reduce slot >> >> fragmentation on scale down by packing the slots efficiently. The >> >> easiest way to add this optimization when running in application mode >> >> would be to drop as many task managers during the restart such that >> >> NUM_REQUIRED_SLOTS >= NUM_AVAILABLE_SLOTS stays true. We can look into >> >> this independently of the FLIP. >> >> >> >> Feel free to start the vote. >> >> >> >> -Max >> >> >> >> On Mon, Feb 20, 2023 at 9:10 AM David Morávek wrote: >> >> > >> >> > Hi everyone, >> >> > >> >> > Thanks for the feedback! I've updated the FLIP to use idempotent PUT >> API >> >> instead of PATCH and to properly handle lower bound settings, to support >> >> the "pre-allocation" of the resources. >> >> > >> >> > @Max >> >> > >> >> > > How hard would it be to address this issue in the FLIP? >> >> > >> >> > I've included this in the FLIP. It might not be too hard to implement >> >> this in the end. >> >> > >> >> > > B) drop as many superfluous task managers as needed >> >> > >> >> > I've intentionally left this part out for now because this ultimately >> >> needs to be the responsibility of the Resource Manager. After all, in >> the >> >> Session Cluster scenario, the Scheduler doesn't have the bigger picture >> of >> >> other tasks of other jobs running on those TMs. This will most likely >> be a >> >> topic for another FLIP. >> >> > >> >> > WDYT? If there are no other questions or concerns, I'd like to start >> the >> >> vote on Wednesday. >> >> > >> >> > Best, >> >> > D. >> >> > >> >> > On Wed, Feb 15, 2023 at 3:34 PM Maximilian Michels >> >> wrote: >> >> >> >> >> >> I missed that the FLIP states: >> >> >> >> >> >> > Currently, even though we’d expose the lower bound for clarity and >> >> API completeness, we won’t allow setting it to any other value than one >> >> until we have full support throughout the stack. >> >> >> >> >> >> How hard would it be to address this issue in the FLIP? >> >> >> >> >> >> There is not much value to offer setting a lower bound which won't be >> >>
[VOTE] FLIP-291: Externalized Declarative Resource Management
Hi Everyone, I want to start the vote on FLIP-291: Externalized Declarative Resource Management [1]. The FLIP was discussed in this thread [2]. The goal of the FLIP is to enable external declaration of the resource requirements of a running job. The vote will last for at least 72 hours (Friday, 3rd of March, 15:00 CET) unless there is an objection or insufficient votes. [1] https://lists.apache.org/thread/b8fnj127jsl5ljg6p4w3c4wvq30cnybh [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management Best, D.
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
I agree that it is useful to have a configurable lower bound. Thanks for looking into it as part of a follow up! No objections from my side to move forward with the vote. -Max On Tue, Feb 28, 2023 at 1:36 PM David Morávek wrote: > > > I suppose we could further remove the min because it would always be > safer to scale down if resources are not available than not to run at > all [1]. > > Apart from what @Roman has already mentioned, there are still cases where > we're certain that there is no point in running the jobs with resources > lower than X; e.g., because the state is too large to be processed with > parallelism of 1; this allows you not to waste resources if you're certain > that the job would go into the restart loop / won't be able to checkpoint > > I believe that for most use cases, simply keeping the lower bound at 1 will > be sufficient. > > > I saw that the minimum bound is currently not used in the code you posted > above [2]. Is that still planned? > > Yes. We already allow setting the lower bound via API, but it's not > considered by the scheduler. I'll address this limitation in a separate > issue. > > > Note that originally we had assumed min == max but I think that would be > a less safe scaling approach because we would get stuck waiting for > resources when they are not available, e.g. k8s resource limits reached. > > 100% agreed; The above-mentioned knobs should allow you to balance the > trade-off. > > > Does that make sense? > > Best, > D. > > > > On Tue, Feb 28, 2023 at 1:14 PM Roman Khachatryan wrote: > > > Hi, > > > > Thanks for the update, I think distinguishing the rescaling behaviour and > > the desired parallelism declaration is important. > > > > Having the ability to specify min parallelism might be useful in > > environments with multiple jobs: Scheduler will then have an option to stop > > the less suitable job. > > In other setups, where the job should not be stopped at all, the user can > > always set it to 0. > > > > Regards, > > Roman > > > > > > On Tue, Feb 28, 2023 at 12:58 PM Maximilian Michels > > wrote: > > > >> Hi David, > >> > >> Thanks for the update! We consider using the new declarative resource > >> API for autoscaling. Currently, we treat a scaling decision as a new > >> deployment which means surrendering all resources to Kubernetes and > >> subsequently reallocating them for the rescaled deployment. The > >> declarative resource management API is a great step forward because it > >> allows us to do faster and safer rescaling. Faster, because we can > >> continue to run while resources are pre-allocated which minimizes > >> downtime. Safer, because we can't get stuck when the desired resources > >> are not available. > >> > >> An example with two vertices and their respective parallelisms: > >> v1: 50 > >> v2: 10 > >> Let's assume slot sharing is disabled, so we need 60 task slots to run > >> the vertices. > >> > >> If the autoscaler was to decide to scale up v1 and v2, it could do so > >> in a safe way by using min/max configuration: > >> v1: [min: 50, max: 70] > >> v2: [min: 10, max: 20] > >> This would then need 90 task slots to run at max capacity. > >> > >> I suppose we could further remove the min because it would always be > >> safer to scale down if resources are not available than to not run at > >> all [1]. In fact, I saw that the minimum bound is currently not used > >> in the code you posted above [2]. Is that still planned? > >> > >> -Max > >> > >> PS: Note that originally we had assumed min == max but I think that > >> would be a less safe scaling approach because we would get stuck > >> waiting for resources when they are not available, e.g. k8s resource > >> limits reached. > >> > >> [1] However, there might be costs involved with executing the > >> rescaling, e.g. for using external storage like s3, especially without > >> local recovery. > >> [2] > >> https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9 > >> > >> On Tue, Feb 28, 2023 at 9:33 AM David Morávek wrote: > >> > > >> > Hi Everyone, > >> > > >> > We had some more talks about the pre-allocation of resources with @Max, > >> and > >> > here is the final state that we've converged to for now: > >> > > >> > The vital thing to note about the new API is that it's declarative, > >> meaning > >> > we're declaring the desired state to which we want our job to converge; > >> If, > >> > after the requirements update job no longer holds the desired resources > >> > (fewer resources than the lower bound), it will be canceled and > >> transition > >> > back into the waiting for resources state. > >> > > >> > In some use cases, you might always want to rescale to the upper bound > >> > (this goes along the lines of "preallocating resources" and minimizing > >> the > >> > number of rescales, which is especially useful with the large state). > >> This > >> > can be controlled by two knobs that already exist: > >> > > >> > 1) "jobmanager.adaptive-scheduler.
Re: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager
Hi Weihua, Thanks for your proposal. From a conceptual point: AFAIU, the DeclarativeSlotManager covers a subset (i.e. only evenly sized slots) of what the FineGrainedSlotManager should be able to achieve (variable slot size per task manager). Is this the right assumption/understanding? In this sense, merging both implementations into a single one sounds good. A few more general comments, though: 1. Did you do a proper test coverage analysis? That's not mentioned in the current version of the FLIP. I'm bringing this up because we ran into the same issue when fixing the flaws that popped up after introducing the multi-component leader election (see FLIP-285 [1]). There is a risk that by removing the legacy code we decrease test coverage because certain test cases that were covered for the legacy classes might not be necessarily covered in the new implementation, yet (see FLINK-30338 [2] which covers this issue for the leader election case). Ideally, we don't want to remove test cases accidentally because they were only implemented for the DeclarativeSlotManager but missed for the FineGrainedSlotManager. 2. DeclarativeSlotManager and FineGrainedSlotManager feel quite big in terms of lines of code. Without knowing whether it's actually a reasonable thing to do: Instead of just adding more features to the FineGrainedSlotManager, have you thought of cutting out functionality into smaller sub-components along this refactoring? Such a step-by-step approach might improve the overall codebase and might make reviewing the refactoring easier. I did a first pass over the code and struggled to identify code blocks that could be moved out of the SlotManager implementation(s). Therefore, I might be wrong with this proposal. I haven't worked on this codebase in that detail that it would allow me to come up with a judgement call. I wanted to bring it up, anyway, because I'm curious whether that could be an option. There's a comment created by Chesnay (CC'd) in the JavaDoc of TaskExecutorManager [3] indicating something similar. I'm wondering whether he can add some insights here. 3. For me personally, having a more detailed summary comparing the subcomponents of both SlotManager implementations with where their functionality matches and where they differ might help understand the consequences of the changes proposed in FLIP-298. Best, Matthias [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box [2] https://issues.apache.org/jira/browse/FLINK-30338 [3] https://github.com/apache/flink/blob/f611ea8cb5deddb42429df2c99f0c68d7382e9bd/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/TaskExecutorManager.java#L66-L68 On Tue, Feb 28, 2023 at 6:14 AM Matt Wang wrote: > This is a good proposal for me, it will make the code of the SlotManager > more clear. > > > > -- > > Best, > Matt Wang > > > Replied Message > | From | David Morávek | > | Date | 02/27/2023 22:45 | > | To | | > | Subject | Re: [DISCUSS] FLIP-298: Unifying the Implementation of > SlotManager | > Hi Weihua, I still need to dig into the details, but the overall sentiment > of this change sounds reasonable. > > Best, > D. > > On Mon, Feb 27, 2023 at 2:26 PM Zhanghao Chen > wrote: > > Thanks for driving this topic. I think this FLIP could help clean up the > codebase to make it easier to maintain. +1 on it. > > Best, > Zhanghao Chen > > From: Weihua Hu > Sent: Monday, February 27, 2023 20:40 > To: dev > Subject: [DISCUSS] FLIP-298: Unifying the Implementation of SlotManager > > Hi everyone, > > I would like to begin a discussion on FLIP-298: Unifying the Implementation > of SlotManager[1]. There are currently two types of SlotManager in Flink: > DeclarativeSlotManager and FineGrainedSlotManager. FineGrainedSlotManager > should behave as DeclarativeSlotManager if the user does not configure the > slot request profile. > > Therefore, this FLIP aims to unify the implementation of SlotManager in > order to reduce maintenance costs. > > Looking forward to hearing from you. > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-298%3A+Unifying+the+Implementation+of+SlotManager > > Best, > Weihua > >
[jira] [Created] (FLINK-31261) Make AdaptiveScheduler aware of the (local) state size
Roman Khachatryan created FLINK-31261: - Summary: Make AdaptiveScheduler aware of the (local) state size Key: FLINK-31261 URL: https://issues.apache.org/jira/browse/FLINK-31261 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.18.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.18.0 FLINK-21450 makes the Adaptive Schulder aware of Local Recovery. Each slot-group pair is assigned a score based on a keyGroupRange size. That score isn't always optimlal - it could be improved by computing the score based on the actual state size on disk. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31260) PushLocalHashAggIntoScanRule should also work with union RelNode
Aitozi created FLINK-31260: -- Summary: PushLocalHashAggIntoScanRule should also work with union RelNode Key: FLINK-31260 URL: https://issues.apache.org/jira/browse/FLINK-31260 Project: Flink Issue Type: Improvement Reporter: Aitozi As discussed in [comments|https://github.com/apache/flink/pull/22001#discussion_r1119652784] Currently, {{PushLocalHashAggIntoScanRule}} match for the Exchange -> LocalHashAggregate -> Scan. As a result, the following pattern can not be optimized {code:java} +- Union(all=[true], union=[type, sum$0]) :- Union(all=[true], union=[type, sum$0]) : :- LocalHashAggregate(groupBy=[type], select=[type, Partial_SUM(price) AS sum$0]) : : +- TableSourceScan(table=[[default_catalog, default_database, table1, project=[type, price], metadata=[]]], fields=[type, price]) : +- LocalHashAggregate(groupBy=[type], select=[type, Partial_SUM(price) AS sum$0]) : +- TableSourceScan(table=[[default_catalog, default_database, table2, project=[type, price], metadata=[]]], fields=[type, price]) +- LocalHashAggregate(groupBy=[type], select=[type, Partial_SUM(price) AS sum$0]) +- TableSourceScan(table=[[default_catalog, default_database, table3, project=[type, price], metadata=[]]], fields=[type, price]) {code} We should extend the rule to support this pattern. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
> I suppose we could further remove the min because it would always be safer to scale down if resources are not available than not to run at all [1]. Apart from what @Roman has already mentioned, there are still cases where we're certain that there is no point in running the jobs with resources lower than X; e.g., because the state is too large to be processed with parallelism of 1; this allows you not to waste resources if you're certain that the job would go into the restart loop / won't be able to checkpoint I believe that for most use cases, simply keeping the lower bound at 1 will be sufficient. > I saw that the minimum bound is currently not used in the code you posted above [2]. Is that still planned? Yes. We already allow setting the lower bound via API, but it's not considered by the scheduler. I'll address this limitation in a separate issue. > Note that originally we had assumed min == max but I think that would be a less safe scaling approach because we would get stuck waiting for resources when they are not available, e.g. k8s resource limits reached. 100% agreed; The above-mentioned knobs should allow you to balance the trade-off. Does that make sense? Best, D. On Tue, Feb 28, 2023 at 1:14 PM Roman Khachatryan wrote: > Hi, > > Thanks for the update, I think distinguishing the rescaling behaviour and > the desired parallelism declaration is important. > > Having the ability to specify min parallelism might be useful in > environments with multiple jobs: Scheduler will then have an option to stop > the less suitable job. > In other setups, where the job should not be stopped at all, the user can > always set it to 0. > > Regards, > Roman > > > On Tue, Feb 28, 2023 at 12:58 PM Maximilian Michels > wrote: > >> Hi David, >> >> Thanks for the update! We consider using the new declarative resource >> API for autoscaling. Currently, we treat a scaling decision as a new >> deployment which means surrendering all resources to Kubernetes and >> subsequently reallocating them for the rescaled deployment. The >> declarative resource management API is a great step forward because it >> allows us to do faster and safer rescaling. Faster, because we can >> continue to run while resources are pre-allocated which minimizes >> downtime. Safer, because we can't get stuck when the desired resources >> are not available. >> >> An example with two vertices and their respective parallelisms: >> v1: 50 >> v2: 10 >> Let's assume slot sharing is disabled, so we need 60 task slots to run >> the vertices. >> >> If the autoscaler was to decide to scale up v1 and v2, it could do so >> in a safe way by using min/max configuration: >> v1: [min: 50, max: 70] >> v2: [min: 10, max: 20] >> This would then need 90 task slots to run at max capacity. >> >> I suppose we could further remove the min because it would always be >> safer to scale down if resources are not available than to not run at >> all [1]. In fact, I saw that the minimum bound is currently not used >> in the code you posted above [2]. Is that still planned? >> >> -Max >> >> PS: Note that originally we had assumed min == max but I think that >> would be a less safe scaling approach because we would get stuck >> waiting for resources when they are not available, e.g. k8s resource >> limits reached. >> >> [1] However, there might be costs involved with executing the >> rescaling, e.g. for using external storage like s3, especially without >> local recovery. >> [2] >> https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9 >> >> On Tue, Feb 28, 2023 at 9:33 AM David Morávek wrote: >> > >> > Hi Everyone, >> > >> > We had some more talks about the pre-allocation of resources with @Max, >> and >> > here is the final state that we've converged to for now: >> > >> > The vital thing to note about the new API is that it's declarative, >> meaning >> > we're declaring the desired state to which we want our job to converge; >> If, >> > after the requirements update job no longer holds the desired resources >> > (fewer resources than the lower bound), it will be canceled and >> transition >> > back into the waiting for resources state. >> > >> > In some use cases, you might always want to rescale to the upper bound >> > (this goes along the lines of "preallocating resources" and minimizing >> the >> > number of rescales, which is especially useful with the large state). >> This >> > can be controlled by two knobs that already exist: >> > >> > 1) "jobmanager.adaptive-scheduler.min-parallelism-increase" - this >> affects >> > a minimal parallelism increase step of a running job; we'll slightly >> change >> > the semantics, and we'll trigger rescaling either once this condition is >> > met or when you hit the ceiling; setting this to the high number will >> > ensure that you always rescale to the upper bound >> > >> > 2) "jobmanager.adaptive-scheduler.resource-stabilization-timeout" - for >> new >> > and already restarting jobs, we'll a
[jira] [Created] (FLINK-31259) Gateway supports initialization of catalog at startup
Shammon created FLINK-31259: --- Summary: Gateway supports initialization of catalog at startup Key: FLINK-31259 URL: https://issues.apache.org/jira/browse/FLINK-31259 Project: Flink Issue Type: Improvement Components: Table SQL / Gateway Affects Versions: 1.18.0 Reporter: Shammon Support to initializing catalogs in gateway when it starts -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
Hi, Thanks for the update, I think distinguishing the rescaling behaviour and the desired parallelism declaration is important. Having the ability to specify min parallelism might be useful in environments with multiple jobs: Scheduler will then have an option to stop the less suitable job. In other setups, where the job should not be stopped at all, the user can always set it to 0. Regards, Roman On Tue, Feb 28, 2023 at 12:58 PM Maximilian Michels wrote: > Hi David, > > Thanks for the update! We consider using the new declarative resource > API for autoscaling. Currently, we treat a scaling decision as a new > deployment which means surrendering all resources to Kubernetes and > subsequently reallocating them for the rescaled deployment. The > declarative resource management API is a great step forward because it > allows us to do faster and safer rescaling. Faster, because we can > continue to run while resources are pre-allocated which minimizes > downtime. Safer, because we can't get stuck when the desired resources > are not available. > > An example with two vertices and their respective parallelisms: > v1: 50 > v2: 10 > Let's assume slot sharing is disabled, so we need 60 task slots to run > the vertices. > > If the autoscaler was to decide to scale up v1 and v2, it could do so > in a safe way by using min/max configuration: > v1: [min: 50, max: 70] > v2: [min: 10, max: 20] > This would then need 90 task slots to run at max capacity. > > I suppose we could further remove the min because it would always be > safer to scale down if resources are not available than to not run at > all [1]. In fact, I saw that the minimum bound is currently not used > in the code you posted above [2]. Is that still planned? > > -Max > > PS: Note that originally we had assumed min == max but I think that > would be a less safe scaling approach because we would get stuck > waiting for resources when they are not available, e.g. k8s resource > limits reached. > > [1] However, there might be costs involved with executing the > rescaling, e.g. for using external storage like s3, especially without > local recovery. > [2] > https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9 > > On Tue, Feb 28, 2023 at 9:33 AM David Morávek wrote: > > > > Hi Everyone, > > > > We had some more talks about the pre-allocation of resources with @Max, > and > > here is the final state that we've converged to for now: > > > > The vital thing to note about the new API is that it's declarative, > meaning > > we're declaring the desired state to which we want our job to converge; > If, > > after the requirements update job no longer holds the desired resources > > (fewer resources than the lower bound), it will be canceled and > transition > > back into the waiting for resources state. > > > > In some use cases, you might always want to rescale to the upper bound > > (this goes along the lines of "preallocating resources" and minimizing > the > > number of rescales, which is especially useful with the large state). > This > > can be controlled by two knobs that already exist: > > > > 1) "jobmanager.adaptive-scheduler.min-parallelism-increase" - this > affects > > a minimal parallelism increase step of a running job; we'll slightly > change > > the semantics, and we'll trigger rescaling either once this condition is > > met or when you hit the ceiling; setting this to the high number will > > ensure that you always rescale to the upper bound > > > > 2) "jobmanager.adaptive-scheduler.resource-stabilization-timeout" - for > new > > and already restarting jobs, we'll always respect this timeout, which > > allows you to wait for more resources even though you already have more > > resources than defined in the lower bound; again, in the case we reach > the > > ceiling (the upper bound), we'll transition into the executing state. > > > > > > We're still planning to dig deeper in this direction with other efforts, > > but this is already good enough and should allow us to move the FLIP > > forward. > > > > WDYT? Unless there are any objectives against the above, I'd like to > > proceed to a vote. > > > > Best, > > D. > > > > On Thu, Feb 23, 2023 at 5:39 PM David Morávek wrote: > > > > > Hi Everyone, > > > > > > @John > > > > > > This is a problem that we've spent some time trying to crack; in the > end, > > > we've decided to go against doing any upgrades to JobGraphStore from > > > JobMaster to avoid having multiple writers that are guarded by > different > > > leader election lock (Dispatcher and JobMaster might live in a > different > > > process). The contract we've decided to choose instead is leveraging > the > > > idempotency of the endpoint and having the user of the API retry in > case > > > we're unable to persist new requirements in the JobGraphStore [1]. We > > > eventually need to move JobGraphStore out of the dispatcher, but > that's way > > > out of the scope of this FLIP. The solution is a deliberate
[jira] [Created] (FLINK-31258) can not get kerberos keytab in flink operator
Jun Zhang created FLINK-31258: - Summary: can not get kerberos keytab in flink operator Key: FLINK-31258 URL: https://issues.apache.org/jira/browse/FLINK-31258 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Jun Zhang env: flink k8s operator 1.4 flink 1.14.6 : the conf {code:java} flinkConfiguration: security.kerberos.login.keytab=/path/your/user.keytab security.kerberos.login.principal=y...@hadoop.com {code} and I get an exception: {code:java} Status: Cluster Info: Error: {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could not create Kubernetes cluster \"basic-example\"."},{"type":"org.apache.flink.configuration.IllegalConfigurationException","message":"Kerberos login configuration is invalid: keytab [/path/your/user.keytab] doesn't exist!"}]} {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31257) Fix errors in “CSV Formats" page
ZhengYi Weng created FLINK-31257: Summary: Fix errors in “CSV Formats" page Key: FLINK-31257 URL: https://issues.apache.org/jira/browse/FLINK-31257 Project: Flink Issue Type: Improvement Components: chinese-translation Reporter: ZhengYi Weng Attachments: csv-1.png As shown in the picture,I find some errors in Format Options.There are also some punctuation errors that can be corrected together. The page url is [https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/formats/csv/|http://example.com/] The markdown file is located in docs/content.zh/docs/connectors/table/formats/csv.md -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
Hi David, Thanks for the update! We consider using the new declarative resource API for autoscaling. Currently, we treat a scaling decision as a new deployment which means surrendering all resources to Kubernetes and subsequently reallocating them for the rescaled deployment. The declarative resource management API is a great step forward because it allows us to do faster and safer rescaling. Faster, because we can continue to run while resources are pre-allocated which minimizes downtime. Safer, because we can't get stuck when the desired resources are not available. An example with two vertices and their respective parallelisms: v1: 50 v2: 10 Let's assume slot sharing is disabled, so we need 60 task slots to run the vertices. If the autoscaler was to decide to scale up v1 and v2, it could do so in a safe way by using min/max configuration: v1: [min: 50, max: 70] v2: [min: 10, max: 20] This would then need 90 task slots to run at max capacity. I suppose we could further remove the min because it would always be safer to scale down if resources are not available than to not run at all [1]. In fact, I saw that the minimum bound is currently not used in the code you posted above [2]. Is that still planned? -Max PS: Note that originally we had assumed min == max but I think that would be a less safe scaling approach because we would get stuck waiting for resources when they are not available, e.g. k8s resource limits reached. [1] However, there might be costs involved with executing the rescaling, e.g. for using external storage like s3, especially without local recovery. [2] https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9 On Tue, Feb 28, 2023 at 9:33 AM David Morávek wrote: > > Hi Everyone, > > We had some more talks about the pre-allocation of resources with @Max, and > here is the final state that we've converged to for now: > > The vital thing to note about the new API is that it's declarative, meaning > we're declaring the desired state to which we want our job to converge; If, > after the requirements update job no longer holds the desired resources > (fewer resources than the lower bound), it will be canceled and transition > back into the waiting for resources state. > > In some use cases, you might always want to rescale to the upper bound > (this goes along the lines of "preallocating resources" and minimizing the > number of rescales, which is especially useful with the large state). This > can be controlled by two knobs that already exist: > > 1) "jobmanager.adaptive-scheduler.min-parallelism-increase" - this affects > a minimal parallelism increase step of a running job; we'll slightly change > the semantics, and we'll trigger rescaling either once this condition is > met or when you hit the ceiling; setting this to the high number will > ensure that you always rescale to the upper bound > > 2) "jobmanager.adaptive-scheduler.resource-stabilization-timeout" - for new > and already restarting jobs, we'll always respect this timeout, which > allows you to wait for more resources even though you already have more > resources than defined in the lower bound; again, in the case we reach the > ceiling (the upper bound), we'll transition into the executing state. > > > We're still planning to dig deeper in this direction with other efforts, > but this is already good enough and should allow us to move the FLIP > forward. > > WDYT? Unless there are any objectives against the above, I'd like to > proceed to a vote. > > Best, > D. > > On Thu, Feb 23, 2023 at 5:39 PM David Morávek wrote: > > > Hi Everyone, > > > > @John > > > > This is a problem that we've spent some time trying to crack; in the end, > > we've decided to go against doing any upgrades to JobGraphStore from > > JobMaster to avoid having multiple writers that are guarded by different > > leader election lock (Dispatcher and JobMaster might live in a different > > process). The contract we've decided to choose instead is leveraging the > > idempotency of the endpoint and having the user of the API retry in case > > we're unable to persist new requirements in the JobGraphStore [1]. We > > eventually need to move JobGraphStore out of the dispatcher, but that's way > > out of the scope of this FLIP. The solution is a deliberate trade-off. The > > worst scenario is that the Dispatcher fails over in between retries, which > > would simply rescale the job to meet the previous resource requirements > > (more extended unavailability of underlying HA storage would have worse > > consequences than this). Does that answer your question? > > > > @Matthias > > > > Good catch! I'm fixing it now, thanks! > > > > [1] > > https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9#diff-a4b690fb2c4975d25b05eb4161617af0d704a85ff7b1cad19d3c817c12f1e29cR1151 > > > > Best, > > D. > > > > On Tue, Feb 21, 2023 at 12:24 AM John Roesler wrote: > > > >> Thanks for the FLIP, David! > >> > >> I just had one
Re: [VOTE] Flink minor version support policy for old releases
+1 (non-binding) Thanks for driving this Danny. On Tue, Feb 28, 2023 at 9:41 AM Samrat Deb wrote: > +1 (non binding) > > Thanks for driving it > > Bests, > Samrat > > On Tue, 28 Feb 2023 at 1:36 PM, Junrui Lee wrote: > > > Thanks Danny for driving it. > > > > +1 (non-binding) > > > > Best regards, > > Junrui > > > > yuxia 于2023年2月28日周二 14:04写道: > > > > > Thanks Danny for driving it. > > > > > > +1 (non-binding) > > > > > > Best regards, > > > Yuxia > > > > > > - 原始邮件 - > > > 发件人: "Weihua Hu" > > > 收件人: "dev" > > > 发送时间: 星期二, 2023年 2 月 28日 下午 12:48:09 > > > 主题: Re: [VOTE] Flink minor version support policy for old releases > > > > > > Thanks, Danny. > > > > > > +1 (non-binding) > > > > > > Best, > > > Weihua > > > > > > > > > On Tue, Feb 28, 2023 at 12:38 PM weijie guo > > > > wrote: > > > > > > > Thanks Danny for bring this. > > > > > > > > +1 (non-binding) > > > > > > > > Best regards, > > > > > > > > Weijie > > > > > > > > > > > > Jing Ge 于2023年2月27日周一 20:23写道: > > > > > > > > > +1 (non-binding) > > > > > > > > > > BTW, should we follow the content style [1] to describe the new > rule > > > > using > > > > > 1.2.x, 1.1.y, 1.1.z? > > > > > > > > > > [1] > > https://flink.apache.org/downloads/#update-policy-for-old-releases > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl > > > > > wrote: > > > > > > > > > > > Thanks, Danny. Sounds good to me. > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer < > > > > dannycran...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > I am starting a vote to update the "Update Policy for old > > releases" > > > > [1] > > > > > > to > > > > > > > include additional bugfix support for end of life versions. > > > > > > > > > > > > > > As per the discussion thread [2], the change we are voting on > is: > > > > > > > - Support policy: updated to include: "Upon release of a new > > Flink > > > > > minor > > > > > > > version, the community will perform one final bugfix release > for > > > > > resolved > > > > > > > critical/blocker issues in the Flink minor version losing > > support." > > > > > > > - Release process: add a step to start the discussion thread > for > > > the > > > > > > final > > > > > > > patch version, if there are resolved critical/blocking issues > to > > > > flush. > > > > > > > > > > > > > > Voting schema: since our bylaws [3] do not cover this > particular > > > > > > scenario, > > > > > > > and releases require PMC involvement, we will use a consensus > > vote > > > > with > > > > > > PMC > > > > > > > binding votes. > > > > > > > > > > > > > > Thanks, > > > > > > > Danny > > > > > > > > > > > > > > [1] > > > > > > > > > https://flink.apache.org/downloads.html#update-policy-for-old-releases > > > > > > > [2] > > > https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv > > > > > > > [3] > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Best regards, Sergey
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
thanks Sergey, sounds good. You can add in FLIP ticket[1]. [1] https://issues.apache.org/jira/browse/FLINK-31256 Best Regards, Ran Tao https://github.com/chucheng92 Sergey Nuyanzin 于2023年2月28日周二 19:44写道: > >>Currently I think we can load from the jar and check the services file to > >> get the connector type. but is it necessary we may continue to discuss. > >>Hi, Sergey, WDYT? > > Another idea is FactoryUtil#discoverFactories and > check if it implements DynamicTableSourceFactory or DynamicTableSinkFactory > with versions it could be trickier... > Moreover it seems the version could be a part of the name sometimes[1]. > I think name and type could be enough or please correct me if I'm wrong > > >>or can we open a single ticket under this FLIP? > I have a relatively old jira issue[2] for showing connectors with a poc pr. > Could I propose to move this jira issue as a subtask under the FLIP one and > revive it? > > [1] > > https://github.com/apache/flink/blob/161014149e803bfd1d3653badb230b2ed36ce3cb/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/Factory.java#L65-L69 > [2] https://issues.apache.org/jira/browse/FLINK-25788 > > On Tue, Feb 28, 2023 at 11:56 AM Ran Tao wrote: > > > Hi, Jark. thanks. > > > About ILIKE > > I have updated the FLIP for ILIKE support (Including existing showTables > & > > showColumns how to change). > > > > > About show connectors @Sergey, > > Currently I think we can load from the jar and check the services file to > > get the connector type. but is it necessary we may continue to discuss. > > Hi, Sergey, WDYT?or can we open a single ticket under this FLIP? > > > > > > Best Regards, > > Ran Tao > > > > > > Jark Wu 于2023年2月28日周二 17:45写道: > > > > > Besides, if we introduce the ILIKE, we should also add this feature for > > > the previous SHOW with LIKE statements. They should be included in this > > > FLIP. > > > > > > Best, > > > Jark > > > > > > > 2023年2月28日 17:40,Jark Wu 写道: > > > > > > > > Hi Ran, > > > > > > > > Could you add descriptions about what’s the behavior and differences > > > between the LIKE and ILIKE? > > > > > > > > Besides, I don’t see the SHOW CONNECTOR syntax and description and > how > > > it works in the FLIP. Is it intended to be included in this FLIP? > > > > > > > > Best, > > > > Jark > > > > > > > > > > > >> 2023年2月28日 10:58,Ran Tao 写道: > > > >> > > > >> Hi, guys. thanks for advices. > > > >> > > > >> allow me to make a small summary: > > > >> > > > >> 1.Support ILIKE > > > >> 2.Using catalog api to support show operations > > > >> 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA > > > >> 4.Support SHOW CONNECTORS > > > >> > > > >> If there are no other questions, i will try to start a VOTE for this > > > FLIP. > > > >> WDYT? > > > >> > > > >> Best Regards, > > > >> Ran Tao > > > >> > > > >> > > > >> Sergey Nuyanzin 于2023年2月27日周一 21:12写道: > > > >> > > > >>> Hi Jark, > > > >>> > > > >>> thanks for your comment. > > > >>> > > > Considering they > > > are orthogonal and information schema requires more complex design > > and > > > discussion, it deserves a separate FLIP > > > >>> I'm ok with a separate FLIP for INFORMATION_SCHEMA. > > > >>> > > > Sergey, are you willing to contribute this FLIP? > > > >>> Seems I need to have more research done for that. > > > >>> I would try to help/contribute here > > > >>> > > > >>> > > > >>> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao > > wrote: > > > >>> > > > HI, Jing. thanks. > > > > > > @about ILIKE, from my collections of some popular engines founds > > that > > > >>> just > > > snowflake has this syntax in show with filtering. > > > do we need to support it? if yes, then current some existed show > > > >>> operations > > > need to be addressed either. > > > @about ShowOperation with like. it's a good idea. yes, two > > parameters > > > for > > > constructor can work. thanks for your advice. > > > > > > > > > Best Regards, > > > Ran Tao > > > > > > > > > Jing Ge 于2023年2月27日周一 06:29写道: > > > > > > > Hi, > > > > > > > > @Aitozi > > > > > > > > This is exactly why LoD has been introduced: to avoid exposing > > > internal > > > > structure(2nd and lower level API). > > > > > > > > @Jark > > > > > > > > IMHO, there is no conflict between LoD and "High power-to-weight > > > ratio" > > > > with the given example, List.subList() returns List interface > > itself, > > > >>> no > > > > internal or further interface has been exposed. After offering > > > > tEvn.getCatalog(), "all" methods in Catalog Interface have been > > > >>> provided > > > by > > > > TableEnvironment(via getCatalog()). From user's perspective and > > > maintenance > > > > perspective there is no/less difference between providing them > > > directly > > > via > > > > TableEnvironment or via getCatalog(). They are all exposed. Using > >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
>>Currently I think we can load from the jar and check the services file to >> get the connector type. but is it necessary we may continue to discuss. >>Hi, Sergey, WDYT? Another idea is FactoryUtil#discoverFactories and check if it implements DynamicTableSourceFactory or DynamicTableSinkFactory with versions it could be trickier... Moreover it seems the version could be a part of the name sometimes[1]. I think name and type could be enough or please correct me if I'm wrong >>or can we open a single ticket under this FLIP? I have a relatively old jira issue[2] for showing connectors with a poc pr. Could I propose to move this jira issue as a subtask under the FLIP one and revive it? [1] https://github.com/apache/flink/blob/161014149e803bfd1d3653badb230b2ed36ce3cb/flink-table/flink-table-common/src/main/java/org/apache/flink/table/factories/Factory.java#L65-L69 [2] https://issues.apache.org/jira/browse/FLINK-25788 On Tue, Feb 28, 2023 at 11:56 AM Ran Tao wrote: > Hi, Jark. thanks. > > About ILIKE > I have updated the FLIP for ILIKE support (Including existing showTables & > showColumns how to change). > > > About show connectors @Sergey, > Currently I think we can load from the jar and check the services file to > get the connector type. but is it necessary we may continue to discuss. > Hi, Sergey, WDYT?or can we open a single ticket under this FLIP? > > > Best Regards, > Ran Tao > > > Jark Wu 于2023年2月28日周二 17:45写道: > > > Besides, if we introduce the ILIKE, we should also add this feature for > > the previous SHOW with LIKE statements. They should be included in this > > FLIP. > > > > Best, > > Jark > > > > > 2023年2月28日 17:40,Jark Wu 写道: > > > > > > Hi Ran, > > > > > > Could you add descriptions about what’s the behavior and differences > > between the LIKE and ILIKE? > > > > > > Besides, I don’t see the SHOW CONNECTOR syntax and description and how > > it works in the FLIP. Is it intended to be included in this FLIP? > > > > > > Best, > > > Jark > > > > > > > > >> 2023年2月28日 10:58,Ran Tao 写道: > > >> > > >> Hi, guys. thanks for advices. > > >> > > >> allow me to make a small summary: > > >> > > >> 1.Support ILIKE > > >> 2.Using catalog api to support show operations > > >> 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA > > >> 4.Support SHOW CONNECTORS > > >> > > >> If there are no other questions, i will try to start a VOTE for this > > FLIP. > > >> WDYT? > > >> > > >> Best Regards, > > >> Ran Tao > > >> > > >> > > >> Sergey Nuyanzin 于2023年2月27日周一 21:12写道: > > >> > > >>> Hi Jark, > > >>> > > >>> thanks for your comment. > > >>> > > Considering they > > are orthogonal and information schema requires more complex design > and > > discussion, it deserves a separate FLIP > > >>> I'm ok with a separate FLIP for INFORMATION_SCHEMA. > > >>> > > Sergey, are you willing to contribute this FLIP? > > >>> Seems I need to have more research done for that. > > >>> I would try to help/contribute here > > >>> > > >>> > > >>> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao > wrote: > > >>> > > HI, Jing. thanks. > > > > @about ILIKE, from my collections of some popular engines founds > that > > >>> just > > snowflake has this syntax in show with filtering. > > do we need to support it? if yes, then current some existed show > > >>> operations > > need to be addressed either. > > @about ShowOperation with like. it's a good idea. yes, two > parameters > > for > > constructor can work. thanks for your advice. > > > > > > Best Regards, > > Ran Tao > > > > > > Jing Ge 于2023年2月27日周一 06:29写道: > > > > > Hi, > > > > > > @Aitozi > > > > > > This is exactly why LoD has been introduced: to avoid exposing > > internal > > > structure(2nd and lower level API). > > > > > > @Jark > > > > > > IMHO, there is no conflict between LoD and "High power-to-weight > > ratio" > > > with the given example, List.subList() returns List interface > itself, > > >>> no > > > internal or further interface has been exposed. After offering > > > tEvn.getCatalog(), "all" methods in Catalog Interface have been > > >>> provided > > by > > > TableEnvironment(via getCatalog()). From user's perspective and > > maintenance > > > perspective there is no/less difference between providing them > > directly > > via > > > TableEnvironment or via getCatalog(). They are all exposed. Using > > > getCatalog() will reduce the number of boring wrapper methods, but > on > > >>> the > > > other hand not every method in Catalog needs to be exposed, so the > > >>> number > > > of wrapper methods would be limited/less, if we didn't expose > > Catalog. > > > Nevertheless, since we already offered getCatalog(), it makes sense > > to > > > continue using it. The downside is the learning effort for users - > > they > > > have to know that listDatabases() is hidden in Cata
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi, Jark. thanks. > About ILIKE I have updated the FLIP for ILIKE support (Including existing showTables & showColumns how to change). > About show connectors @Sergey, Currently I think we can load from the jar and check the services file to get the connector type. but is it necessary we may continue to discuss. Hi, Sergey, WDYT?or can we open a single ticket under this FLIP? Best Regards, Ran Tao Jark Wu 于2023年2月28日周二 17:45写道: > Besides, if we introduce the ILIKE, we should also add this feature for > the previous SHOW with LIKE statements. They should be included in this > FLIP. > > Best, > Jark > > > 2023年2月28日 17:40,Jark Wu 写道: > > > > Hi Ran, > > > > Could you add descriptions about what’s the behavior and differences > between the LIKE and ILIKE? > > > > Besides, I don’t see the SHOW CONNECTOR syntax and description and how > it works in the FLIP. Is it intended to be included in this FLIP? > > > > Best, > > Jark > > > > > >> 2023年2月28日 10:58,Ran Tao 写道: > >> > >> Hi, guys. thanks for advices. > >> > >> allow me to make a small summary: > >> > >> 1.Support ILIKE > >> 2.Using catalog api to support show operations > >> 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA > >> 4.Support SHOW CONNECTORS > >> > >> If there are no other questions, i will try to start a VOTE for this > FLIP. > >> WDYT? > >> > >> Best Regards, > >> Ran Tao > >> > >> > >> Sergey Nuyanzin 于2023年2月27日周一 21:12写道: > >> > >>> Hi Jark, > >>> > >>> thanks for your comment. > >>> > Considering they > are orthogonal and information schema requires more complex design and > discussion, it deserves a separate FLIP > >>> I'm ok with a separate FLIP for INFORMATION_SCHEMA. > >>> > Sergey, are you willing to contribute this FLIP? > >>> Seems I need to have more research done for that. > >>> I would try to help/contribute here > >>> > >>> > >>> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao wrote: > >>> > HI, Jing. thanks. > > @about ILIKE, from my collections of some popular engines founds that > >>> just > snowflake has this syntax in show with filtering. > do we need to support it? if yes, then current some existed show > >>> operations > need to be addressed either. > @about ShowOperation with like. it's a good idea. yes, two parameters > for > constructor can work. thanks for your advice. > > > Best Regards, > Ran Tao > > > Jing Ge 于2023年2月27日周一 06:29写道: > > > Hi, > > > > @Aitozi > > > > This is exactly why LoD has been introduced: to avoid exposing > internal > > structure(2nd and lower level API). > > > > @Jark > > > > IMHO, there is no conflict between LoD and "High power-to-weight > ratio" > > with the given example, List.subList() returns List interface itself, > >>> no > > internal or further interface has been exposed. After offering > > tEvn.getCatalog(), "all" methods in Catalog Interface have been > >>> provided > by > > TableEnvironment(via getCatalog()). From user's perspective and > maintenance > > perspective there is no/less difference between providing them > directly > via > > TableEnvironment or via getCatalog(). They are all exposed. Using > > getCatalog() will reduce the number of boring wrapper methods, but on > >>> the > > other hand not every method in Catalog needs to be exposed, so the > >>> number > > of wrapper methods would be limited/less, if we didn't expose > Catalog. > > Nevertheless, since we already offered getCatalog(), it makes sense > to > > continue using it. The downside is the learning effort for users - > they > > have to know that listDatabases() is hidden in Catalog, go to another > > haystack and then find the needle in there. > > > > +1 for Information schema with a different FLIP. From a design > perspective, > > information schema should be the first choice for most cases and easy > >>> to > > use. Catalog, on the other hand, will be more powerful and offer more > > advanced features. > > > > Best regards, > > Jing > > > > > > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu wrote: > > > >> Hi Sergey, > >> > >> I think INFORMATION_SCHEMA is a very interesting idea, and I hope we > can > >> support it. However, it doesn't conflict with the idea of auxiliary > >> statements. I can see different benefits of them. The information > schema > >> provides powerful and flexible capabilities but needs to learn the > > complex > >> entity relationship[1]. The auxiliary SQL statements are super handy > and > >> can resolve most problems, but they offer limited features. > >> > >> I can see almost all the mature systems support both of them. I > think > it > >> also makes sense to support both of them in Flink. Considering they > >> are orthogonal and information schema re
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Besides, if we introduce the ILIKE, we should also add this feature for the previous SHOW with LIKE statements. They should be included in this FLIP. Best, Jark > 2023年2月28日 17:40,Jark Wu 写道: > > Hi Ran, > > Could you add descriptions about what’s the behavior and differences between > the LIKE and ILIKE? > > Besides, I don’t see the SHOW CONNECTOR syntax and description and how it > works in the FLIP. Is it intended to be included in this FLIP? > > Best, > Jark > > >> 2023年2月28日 10:58,Ran Tao 写道: >> >> Hi, guys. thanks for advices. >> >> allow me to make a small summary: >> >> 1.Support ILIKE >> 2.Using catalog api to support show operations >> 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA >> 4.Support SHOW CONNECTORS >> >> If there are no other questions, i will try to start a VOTE for this FLIP. >> WDYT? >> >> Best Regards, >> Ran Tao >> >> >> Sergey Nuyanzin 于2023年2月27日周一 21:12写道: >> >>> Hi Jark, >>> >>> thanks for your comment. >>> Considering they are orthogonal and information schema requires more complex design and discussion, it deserves a separate FLIP >>> I'm ok with a separate FLIP for INFORMATION_SCHEMA. >>> Sergey, are you willing to contribute this FLIP? >>> Seems I need to have more research done for that. >>> I would try to help/contribute here >>> >>> >>> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao wrote: >>> HI, Jing. thanks. @about ILIKE, from my collections of some popular engines founds that >>> just snowflake has this syntax in show with filtering. do we need to support it? if yes, then current some existed show >>> operations need to be addressed either. @about ShowOperation with like. it's a good idea. yes, two parameters for constructor can work. thanks for your advice. Best Regards, Ran Tao Jing Ge 于2023年2月27日周一 06:29写道: > Hi, > > @Aitozi > > This is exactly why LoD has been introduced: to avoid exposing internal > structure(2nd and lower level API). > > @Jark > > IMHO, there is no conflict between LoD and "High power-to-weight ratio" > with the given example, List.subList() returns List interface itself, >>> no > internal or further interface has been exposed. After offering > tEvn.getCatalog(), "all" methods in Catalog Interface have been >>> provided by > TableEnvironment(via getCatalog()). From user's perspective and maintenance > perspective there is no/less difference between providing them directly via > TableEnvironment or via getCatalog(). They are all exposed. Using > getCatalog() will reduce the number of boring wrapper methods, but on >>> the > other hand not every method in Catalog needs to be exposed, so the >>> number > of wrapper methods would be limited/less, if we didn't expose Catalog. > Nevertheless, since we already offered getCatalog(), it makes sense to > continue using it. The downside is the learning effort for users - they > have to know that listDatabases() is hidden in Catalog, go to another > haystack and then find the needle in there. > > +1 for Information schema with a different FLIP. From a design perspective, > information schema should be the first choice for most cases and easy >>> to > use. Catalog, on the other hand, will be more powerful and offer more > advanced features. > > Best regards, > Jing > > > On Sat, Feb 25, 2023 at 3:57 PM Jark Wu wrote: > >> Hi Sergey, >> >> I think INFORMATION_SCHEMA is a very interesting idea, and I hope we can >> support it. However, it doesn't conflict with the idea of auxiliary >> statements. I can see different benefits of them. The information schema >> provides powerful and flexible capabilities but needs to learn the > complex >> entity relationship[1]. The auxiliary SQL statements are super handy and >> can resolve most problems, but they offer limited features. >> >> I can see almost all the mature systems support both of them. I think it >> also makes sense to support both of them in Flink. Considering they >> are orthogonal and information schema requires more complex design >>> and >> discussion, it deserves a separate FLIP. Sergey, are you willing to >> contribute this FLIP? >> >> Best, >> Jark >> >> [1]: >> >> > >>> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html >> >> >> On Fri, 24 Feb 2023 at 22:43, Ran Tao wrote: >> >>> Thanks John. >>> >>> It seems that most people prefer the information_schema implementation. >>> information_schema does have more benefits (however, the show operation >> is >>> also an option and supplement). >>> Otherwise, the sql syntax and keywords ma
Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema
Hi, I have to agree with what Huang and Jingsong said. We should think more about it from the user's(developers who use the API) perspective. The first method T deserialize(byte[]) is convenient for users to deserialize a single event. Many formats are using it, i.e. Avro, csv, etc. There should also be Json cases for single event deserialization, if I am not mistaken. void deserialize(byte[], Collector) method is a default interface method. There will be a big code refactoring if we use it to replace T deserialize(byte[]). The benefit is very limited after considering all concerns. For few cases that do not need T deserialize(byte[]), the maintenance effort is IMHO acceptable. It is, after all, only one method. Best regards, Jing On Tue, Feb 28, 2023 at 9:47 AM Benchao Li wrote: > I share the same concerns with Jingsong and Hang, however, I'll raise a > point why keeping both is also not a good idea. > > In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON > array into multiple records. This feature can only be used in `void > deserialize(byte[] message, Collector out)`. And many more cdc formats > are doing the similar thing. If we keep both methods, many formats/features > will not be available for `T deserialize(byte[] message)`. > > And for format maintenance, we usually need to keep these two methods, > which is also a burden for format maintainers. > > [1] https://issues.apache.org/jira/browse/FLINK-18590 > > Jingsong Li 于2023年2月28日周二 16:03写道: > > > - `T deserialize(byte[] message)` is widely used and it is a public > > api. It is very friendly for single record deserializers. > > - `void deserialize(byte[] message, Collector out)` supports > > multiple records. > > > > I think we can just keep them as they are. > > > > Best, > > Jingsong > > > > > > On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan > wrote: > > > > > > Hi, Shammon, > > > > > > I think the method `void deserialize(byte[] message, Collector out)` > > > with a default implementation encapsulate how to deal with null for > > > developers. If we remove the `T deserialize(byte[] message)`, the > > > developers have to remember to handle null. Maybe we will get duplicate > > > code among them. > > > And I find there are only 5 implementations override the method `void > > > deserialize(byte[] message, Collector out)`. Other implementations > > reuse > > > the same code to handle null. > > > I don't know the benefits of removing this method. Looking forward to > > other > > > people's opinions. > > > > > > Best, > > > Hang > > > > > > > > > > > > Shammon FY 于2023年2月28日周二 14:14写道: > > > > > > > Hi devs > > > > > > > > Currently there are two deserialization methods in > > `DeserializationSchema` > > > > 1. `T deserialize(byte[] message)`, only deserialize one record from > > > > binary, if there is no record it should return null. > > > > 2. `void deserialize(byte[] message, Collector out)`, supports > > > > deserializing none, one or multiple records gracefully, it can > > completely > > > > replace method `T deserialize(byte[] message)`. > > > > > > > > The deserialization logic in the above two methods is basically > > coincident, > > > > we recommend users use the second method to deserialize data. To > > improve > > > > code maintainability, I'd like to mark the first function as > > `@Deprecated`, > > > > and remove it when it is no longer used in the future. > > > > > > > > I have created an issue[1] to track it, looking forward to your > > feedback, > > > > thanks > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31251 > > > > > > > > > > > > Best, > > > > Shammon > > > > > > > > > -- > > Best, > Benchao Li >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Ran, Could you add descriptions about what’s the behavior and differences between the LIKE and ILIKE? Besides, I don’t see the SHOW CONNECTOR syntax and description and how it works in the FLIP. Is it intended to be included in this FLIP? Best, Jark > 2023年2月28日 10:58,Ran Tao 写道: > > Hi, guys. thanks for advices. > > allow me to make a small summary: > > 1.Support ILIKE > 2.Using catalog api to support show operations > 3.Need a dedicated FLIP try to support INFORMATION_SCHEMA > 4.Support SHOW CONNECTORS > > If there are no other questions, i will try to start a VOTE for this FLIP. > WDYT? > > Best Regards, > Ran Tao > > > Sergey Nuyanzin 于2023年2月27日周一 21:12写道: > >> Hi Jark, >> >> thanks for your comment. >> >>> Considering they >>> are orthogonal and information schema requires more complex design and >>> discussion, it deserves a separate FLIP >> I'm ok with a separate FLIP for INFORMATION_SCHEMA. >> >>> Sergey, are you willing to contribute this FLIP? >> Seems I need to have more research done for that. >> I would try to help/contribute here >> >> >> On Mon, Feb 27, 2023 at 3:46 AM Ran Tao wrote: >> >>> HI, Jing. thanks. >>> >>> @about ILIKE, from my collections of some popular engines founds that >> just >>> snowflake has this syntax in show with filtering. >>> do we need to support it? if yes, then current some existed show >> operations >>> need to be addressed either. >>> @about ShowOperation with like. it's a good idea. yes, two parameters for >>> constructor can work. thanks for your advice. >>> >>> >>> Best Regards, >>> Ran Tao >>> >>> >>> Jing Ge 于2023年2月27日周一 06:29写道: >>> Hi, @Aitozi This is exactly why LoD has been introduced: to avoid exposing internal structure(2nd and lower level API). @Jark IMHO, there is no conflict between LoD and "High power-to-weight ratio" with the given example, List.subList() returns List interface itself, >> no internal or further interface has been exposed. After offering tEvn.getCatalog(), "all" methods in Catalog Interface have been >> provided >>> by TableEnvironment(via getCatalog()). From user's perspective and >>> maintenance perspective there is no/less difference between providing them directly >>> via TableEnvironment or via getCatalog(). They are all exposed. Using getCatalog() will reduce the number of boring wrapper methods, but on >> the other hand not every method in Catalog needs to be exposed, so the >> number of wrapper methods would be limited/less, if we didn't expose Catalog. Nevertheless, since we already offered getCatalog(), it makes sense to continue using it. The downside is the learning effort for users - they have to know that listDatabases() is hidden in Catalog, go to another haystack and then find the needle in there. +1 for Information schema with a different FLIP. From a design >>> perspective, information schema should be the first choice for most cases and easy >> to use. Catalog, on the other hand, will be more powerful and offer more advanced features. Best regards, Jing On Sat, Feb 25, 2023 at 3:57 PM Jark Wu wrote: > Hi Sergey, > > I think INFORMATION_SCHEMA is a very interesting idea, and I hope we >>> can > support it. However, it doesn't conflict with the idea of auxiliary > statements. I can see different benefits of them. The information >>> schema > provides powerful and flexible capabilities but needs to learn the complex > entity relationship[1]. The auxiliary SQL statements are super handy >>> and > can resolve most problems, but they offer limited features. > > I can see almost all the mature systems support both of them. I think >>> it > also makes sense to support both of them in Flink. Considering they > are orthogonal and information schema requires more complex design >> and > discussion, it deserves a separate FLIP. Sergey, are you willing to > contribute this FLIP? > > Best, > Jark > > [1]: > > >>> >> https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html > > > On Fri, 24 Feb 2023 at 22:43, Ran Tao wrote: > >> Thanks John. >> >> It seems that most people prefer the information_schema >>> implementation. >> information_schema does have more benefits (however, the show >>> operation > is >> also an option and supplement). >> Otherwise, the sql syntax and keywords may be changed frequently. >> Of course, it will be more complicated than the extension of the >> show >> operation. >> It is necessary to design various tables in information_schema, >>> which > may >> take a period of effort. >> >> I will try to design the information_schema and integrate it with flink. >> This may be a rela
[SUMMARY] Flink 1.17 Release Sync 2/28/2023
Hi devs and users, I'd like to share some highlights from Flink 1.17 release sync on 2/28/2023. Release testing: - All release testing tasks have finished in the last week. Big thanks to our contributors and volunteers for the effort on this! 1.17 Blockers: There are 2 blockers currently: FLINK-31092 / FLINK-31104. We hope that these blockers can be addressed in the next day so we can create the RC later this week. CI Instabilities & other issues: There are two CI instability issues: FLINK-31134 / FLINK-18356. Release management: - Release announcement is under review by release managers. - Tasks related to 1.17 release management are tracked in FLINK-31146. The next release meeting will be on Mar 7, 2023. Feel free to join us! Google Meet: https://meet.google.com/wcx-fjbt-hhz Dial-in: https://tel.meet/wcx-fjbt-hhz?pin=1940846765126 Best regards, Qingsheng, Matthias, Martijn and Leonard
Re: [VOTE] Release 1.15.4, release candidate #1
Thanks Danny for driving this +1 (non-binding) * Hashes and Signatures look good * All required files on dist.apache.org * Source archive builds using maven * Started packaged example WordCountSQLExample job * Web PR looks good. Cheers, Hong > On 24 Feb 2023, at 05:36, Weihua Hu wrote: > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > Thanks Danny. > > +1(non-binding) > > Tested the following: > - Download the artifacts and build image > - Ran WordCount on Kubernetes(session mode and application mode) > > > Best, > Weihua > > > On Fri, Feb 24, 2023 at 12:29 PM Yanfei Lei wrote: > >> Thanks Danny. >> +1 (non-binding) >> >> - Downloaded artifacts & built Flink from sources >> - Verified GPG signatures of bin and source. >> - Verified version in poms >> - Ran WordCount example in streaming and batch mode(standalone cluster) >> - Went over flink-web PR, looks good except for Sergey's remark. >> >> Danny Cranmer 于2023年2月24日周五 02:08写道: >>> >>> Hi everyone, >>> Please review and vote on the release candidate #1 for the version >> 1.15.4, >>> as follows: >>> [ ] +1, Approve the release >>> [ ] -1, Do not approve the release (please provide specific comments) >>> >>> >>> The complete staging area is available for your review, which includes: >>> * JIRA release notes [1], >>> * the official Apache source release and binary convenience releases to >> be >>> deployed to dist.apache.org [2], which are signed with the key with >>> fingerprint 125FD8DB [3], >>> * all artifacts to be deployed to the Maven Central Repository [4], >>> * source code tag "release-1.15.4-rc1" [5], >>> * website pull request listing the new release and adding announcement >> blog >>> post [6]. >>> >>> The vote will be open for at least 72 hours (excluding weekends >> 2023-02-28 >>> 19:00). It is adopted by majority approval, with at least 3 PMC >> affirmative >>> votes. >>> >>> Thanks, >>> Danny >>> >>> [1] >>> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352526 >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.4-rc1/ >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS >>> [4] >>> >> https://repository.apache.org/content/repositories/orgapacheflink-1588/org/apache/flink/ >>> [5] https://github.com/apache/flink/releases/tag/release-1.15.4-rc1 >>> [6] https://github.com/apache/flink-web/pull/611 >> >> >> >> -- >> Best, >> Yanfei >>
[jira] [Created] (FLINK-31256) FLIP-297: Improve Auxiliary Sql Statements
Ran Tao created FLINK-31256: --- Summary: FLIP-297: Improve Auxiliary Sql Statements Key: FLINK-31256 URL: https://issues.apache.org/jira/browse/FLINK-31256 Project: Flink Issue Type: Improvement Reporter: Ran Tao The FLIP design doc can be found at page. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema
I share the same concerns with Jingsong and Hang, however, I'll raise a point why keeping both is also not a good idea. In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON array into multiple records. This feature can only be used in `void deserialize(byte[] message, Collector out)`. And many more cdc formats are doing the similar thing. If we keep both methods, many formats/features will not be available for `T deserialize(byte[] message)`. And for format maintenance, we usually need to keep these two methods, which is also a burden for format maintainers. [1] https://issues.apache.org/jira/browse/FLINK-18590 Jingsong Li 于2023年2月28日周二 16:03写道: > - `T deserialize(byte[] message)` is widely used and it is a public > api. It is very friendly for single record deserializers. > - `void deserialize(byte[] message, Collector out)` supports > multiple records. > > I think we can just keep them as they are. > > Best, > Jingsong > > > On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan wrote: > > > > Hi, Shammon, > > > > I think the method `void deserialize(byte[] message, Collector out)` > > with a default implementation encapsulate how to deal with null for > > developers. If we remove the `T deserialize(byte[] message)`, the > > developers have to remember to handle null. Maybe we will get duplicate > > code among them. > > And I find there are only 5 implementations override the method `void > > deserialize(byte[] message, Collector out)`. Other implementations > reuse > > the same code to handle null. > > I don't know the benefits of removing this method. Looking forward to > other > > people's opinions. > > > > Best, > > Hang > > > > > > > > Shammon FY 于2023年2月28日周二 14:14写道: > > > > > Hi devs > > > > > > Currently there are two deserialization methods in > `DeserializationSchema` > > > 1. `T deserialize(byte[] message)`, only deserialize one record from > > > binary, if there is no record it should return null. > > > 2. `void deserialize(byte[] message, Collector out)`, supports > > > deserializing none, one or multiple records gracefully, it can > completely > > > replace method `T deserialize(byte[] message)`. > > > > > > The deserialization logic in the above two methods is basically > coincident, > > > we recommend users use the second method to deserialize data. To > improve > > > code maintainability, I'd like to mark the first function as > `@Deprecated`, > > > and remove it when it is no longer used in the future. > > > > > > I have created an issue[1] to track it, looking forward to your > feedback, > > > thanks > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-31251 > > > > > > > > > Best, > > > Shammon > > > > -- Best, Benchao Li
Re: [VOTE] Flink minor version support policy for old releases
+1 (non binding) Thanks for driving it Bests, Samrat On Tue, 28 Feb 2023 at 1:36 PM, Junrui Lee wrote: > Thanks Danny for driving it. > > +1 (non-binding) > > Best regards, > Junrui > > yuxia 于2023年2月28日周二 14:04写道: > > > Thanks Danny for driving it. > > > > +1 (non-binding) > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "Weihua Hu" > > 收件人: "dev" > > 发送时间: 星期二, 2023年 2 月 28日 下午 12:48:09 > > 主题: Re: [VOTE] Flink minor version support policy for old releases > > > > Thanks, Danny. > > > > +1 (non-binding) > > > > Best, > > Weihua > > > > > > On Tue, Feb 28, 2023 at 12:38 PM weijie guo > > wrote: > > > > > Thanks Danny for bring this. > > > > > > +1 (non-binding) > > > > > > Best regards, > > > > > > Weijie > > > > > > > > > Jing Ge 于2023年2月27日周一 20:23写道: > > > > > > > +1 (non-binding) > > > > > > > > BTW, should we follow the content style [1] to describe the new rule > > > using > > > > 1.2.x, 1.1.y, 1.1.z? > > > > > > > > [1] > https://flink.apache.org/downloads/#update-policy-for-old-releases > > > > > > > > Best regards, > > > > Jing > > > > > > > > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl > > > > wrote: > > > > > > > > > Thanks, Danny. Sounds good to me. > > > > > > > > > > +1 (non-binding) > > > > > > > > > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer < > > > dannycran...@apache.org> > > > > > wrote: > > > > > > > > > > > I am starting a vote to update the "Update Policy for old > releases" > > > [1] > > > > > to > > > > > > include additional bugfix support for end of life versions. > > > > > > > > > > > > As per the discussion thread [2], the change we are voting on is: > > > > > > - Support policy: updated to include: "Upon release of a new > Flink > > > > minor > > > > > > version, the community will perform one final bugfix release for > > > > resolved > > > > > > critical/blocker issues in the Flink minor version losing > support." > > > > > > - Release process: add a step to start the discussion thread for > > the > > > > > final > > > > > > patch version, if there are resolved critical/blocking issues to > > > flush. > > > > > > > > > > > > Voting schema: since our bylaws [3] do not cover this particular > > > > > scenario, > > > > > > and releases require PMC involvement, we will use a consensus > vote > > > with > > > > > PMC > > > > > > binding votes. > > > > > > > > > > > > Thanks, > > > > > > Danny > > > > > > > > > > > > [1] > > > > > > > https://flink.apache.org/downloads.html#update-policy-for-old-releases > > > > > > [2] > > https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv > > > > > > [3] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management
Hi Everyone, We had some more talks about the pre-allocation of resources with @Max, and here is the final state that we've converged to for now: The vital thing to note about the new API is that it's declarative, meaning we're declaring the desired state to which we want our job to converge; If, after the requirements update job no longer holds the desired resources (fewer resources than the lower bound), it will be canceled and transition back into the waiting for resources state. In some use cases, you might always want to rescale to the upper bound (this goes along the lines of "preallocating resources" and minimizing the number of rescales, which is especially useful with the large state). This can be controlled by two knobs that already exist: 1) "jobmanager.adaptive-scheduler.min-parallelism-increase" - this affects a minimal parallelism increase step of a running job; we'll slightly change the semantics, and we'll trigger rescaling either once this condition is met or when you hit the ceiling; setting this to the high number will ensure that you always rescale to the upper bound 2) "jobmanager.adaptive-scheduler.resource-stabilization-timeout" - for new and already restarting jobs, we'll always respect this timeout, which allows you to wait for more resources even though you already have more resources than defined in the lower bound; again, in the case we reach the ceiling (the upper bound), we'll transition into the executing state. We're still planning to dig deeper in this direction with other efforts, but this is already good enough and should allow us to move the FLIP forward. WDYT? Unless there are any objectives against the above, I'd like to proceed to a vote. Best, D. On Thu, Feb 23, 2023 at 5:39 PM David Morávek wrote: > Hi Everyone, > > @John > > This is a problem that we've spent some time trying to crack; in the end, > we've decided to go against doing any upgrades to JobGraphStore from > JobMaster to avoid having multiple writers that are guarded by different > leader election lock (Dispatcher and JobMaster might live in a different > process). The contract we've decided to choose instead is leveraging the > idempotency of the endpoint and having the user of the API retry in case > we're unable to persist new requirements in the JobGraphStore [1]. We > eventually need to move JobGraphStore out of the dispatcher, but that's way > out of the scope of this FLIP. The solution is a deliberate trade-off. The > worst scenario is that the Dispatcher fails over in between retries, which > would simply rescale the job to meet the previous resource requirements > (more extended unavailability of underlying HA storage would have worse > consequences than this). Does that answer your question? > > @Matthias > > Good catch! I'm fixing it now, thanks! > > [1] > https://github.com/dmvk/flink/commit/5e7edcb77d8522c367bc6977f80173b14dc03ce9#diff-a4b690fb2c4975d25b05eb4161617af0d704a85ff7b1cad19d3c817c12f1e29cR1151 > > Best, > D. > > On Tue, Feb 21, 2023 at 12:24 AM John Roesler wrote: > >> Thanks for the FLIP, David! >> >> I just had one small question. IIUC, the REST API PUT request will go >> through the new DispatcherGateway method to be handled. Then, after >> validation, the dispatcher would call the new JobMasterGateway method to >> actually update the job. >> >> Which component will write the updated JobGraph? I just wanted to make >> sure it’s the JobMaster because it it were the dispatcher, there could be a >> race condition with the async JobMaster method. >> >> Thanks! >> -John >> >> On Mon, Feb 20, 2023, at 07:34, Matthias Pohl wrote: >> > Thanks for your clarifications, David. I don't have any additional major >> > points to add. One thing about the FLIP: The RPC layer API for updating >> the >> > JRR returns a future with a JRR? I don't see value in returning a JRR >> here >> > since it's an idempotent operation? Wouldn't it be enough to return >> > CompletableFuture here? Or am I missing something? >> > >> > Matthias >> > >> > On Mon, Feb 20, 2023 at 1:48 PM Maximilian Michels >> wrote: >> > >> >> Thanks David! If we could get the pre-allocation working as part of >> >> the FLIP, that would be great. >> >> >> >> Concerning the downscale case, I agree this is a special case for the >> >> (single-job) application mode where we could re-allocate slots in a >> >> way that could leave entire task managers unoccupied which we would >> >> then be able to release. The goal essentially is to reduce slot >> >> fragmentation on scale down by packing the slots efficiently. The >> >> easiest way to add this optimization when running in application mode >> >> would be to drop as many task managers during the restart such that >> >> NUM_REQUIRED_SLOTS >= NUM_AVAILABLE_SLOTS stays true. We can look into >> >> this independently of the FLIP. >> >> >> >> Feel free to start the vote. >> >> >> >> -Max >> >> >> >> On Mon, Feb 20, 2023 at 9:10 AM David Morávek wrote: >> >> > >> >> > Hi everyon
[jira] [Created] (FLINK-31255) OperatorUtils#createWrappedOperatorConfig fails to wrap operator config
Zhipeng Zhang created FLINK-31255: - Summary: OperatorUtils#createWrappedOperatorConfig fails to wrap operator config Key: FLINK-31255 URL: https://issues.apache.org/jira/browse/FLINK-31255 Project: Flink Issue Type: Bug Components: Library / Machine Learning Affects Versions: ml-2.1.0, ml-2.0.0, ml-2.2.0 Reporter: Zhipeng Zhang Currently we use operator wrapper to enable using normal operators in iterations. However, teh operatorConfig is not correctly unwrapped. For example, the following code fails because of wrong type serializer. {code:java} @Test public void testIterationWithMapPartition() throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream input = env.fromParallelCollection(new NumberSequenceIterator(0L, 5L), Types.LONG); DataStreamList result = Iterations.iterateBoundedStreamsUntilTermination( DataStreamList.of(input), ReplayableDataStreamList.notReplay(input), IterationConfig.newBuilder() .setOperatorLifeCycle(OperatorLifeCycle.PER_ROUND) .build(), new IterationBodyWithMapPartition()); List counts = IteratorUtils.toList(result.get(0).executeAndCollect()); System.out.println(counts.size()); } private static class IterationBodyWithMapPartition implements IterationBody { @Override public IterationBodyResult process( DataStreamList variableStreams, DataStreamList dataStreams) { DataStream input = variableStreams.get(0); DataStream mapPartitionResult = DataStreamUtils.mapPartition( input, new MapPartitionFunction () { @Override public void mapPartition(Iterable iterable, Collector collector) throws Exception { for (Long iter: iterable) { collector.collect(iter); } } }); DataStream terminationCriteria = mapPartitionResult.flatMap(new TerminateOnMaxIter(2)).returns(Types.INT); return new IterationBodyResult( DataStreamList.of(mapPartitionResult), variableStreams, terminationCriteria); } } {code} The error stack is: Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.flink.iteration.IterationRecord at org.apache.flink.iteration.typeinfo.IterationRecordSerializer.serialize(IterationRecordSerializer.java:34) at org.apache.flink.iteration.datacache.nonkeyed.FileSegmentWriter.addRecord(FileSegmentWriter.java:79) at org.apache.flink.iteration.datacache.nonkeyed.DataCacheWriter.addRecord(DataCacheWriter.java:107) at org.apache.flink.iteration.datacache.nonkeyed.ListStateWithCache.add(ListStateWithCache.java:148) at org.apache.flink.ml.common.datastream.DataStreamUtils$MapPartitionOperator.processElement(DataStreamUtils.java:445) at org.apache.flink.iteration.operator.perround.OneInputPerRoundWrapperOperator.processElement(OneInputPerRoundWrapperOperator.java:69) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Flink minor version support policy for old releases
Thanks Danny for driving it. +1 (non-binding) Best regards, Junrui yuxia 于2023年2月28日周二 14:04写道: > Thanks Danny for driving it. > > +1 (non-binding) > > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "Weihua Hu" > 收件人: "dev" > 发送时间: 星期二, 2023年 2 月 28日 下午 12:48:09 > 主题: Re: [VOTE] Flink minor version support policy for old releases > > Thanks, Danny. > > +1 (non-binding) > > Best, > Weihua > > > On Tue, Feb 28, 2023 at 12:38 PM weijie guo > wrote: > > > Thanks Danny for bring this. > > > > +1 (non-binding) > > > > Best regards, > > > > Weijie > > > > > > Jing Ge 于2023年2月27日周一 20:23写道: > > > > > +1 (non-binding) > > > > > > BTW, should we follow the content style [1] to describe the new rule > > using > > > 1.2.x, 1.1.y, 1.1.z? > > > > > > [1] https://flink.apache.org/downloads/#update-policy-for-old-releases > > > > > > Best regards, > > > Jing > > > > > > On Mon, Feb 27, 2023 at 1:06 PM Matthias Pohl > > > wrote: > > > > > > > Thanks, Danny. Sounds good to me. > > > > > > > > +1 (non-binding) > > > > > > > > On Wed, Feb 22, 2023 at 10:11 AM Danny Cranmer < > > dannycran...@apache.org> > > > > wrote: > > > > > > > > > I am starting a vote to update the "Update Policy for old releases" > > [1] > > > > to > > > > > include additional bugfix support for end of life versions. > > > > > > > > > > As per the discussion thread [2], the change we are voting on is: > > > > > - Support policy: updated to include: "Upon release of a new Flink > > > minor > > > > > version, the community will perform one final bugfix release for > > > resolved > > > > > critical/blocker issues in the Flink minor version losing support." > > > > > - Release process: add a step to start the discussion thread for > the > > > > final > > > > > patch version, if there are resolved critical/blocking issues to > > flush. > > > > > > > > > > Voting schema: since our bylaws [3] do not cover this particular > > > > scenario, > > > > > and releases require PMC involvement, we will use a consensus vote > > with > > > > PMC > > > > > binding votes. > > > > > > > > > > Thanks, > > > > > Danny > > > > > > > > > > [1] > > > > > https://flink.apache.org/downloads.html#update-policy-for-old-releases > > > > > [2] > https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv > > > > > [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > > > > > > > > > > > > > >
Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema
- `T deserialize(byte[] message)` is widely used and it is a public api. It is very friendly for single record deserializers. - `void deserialize(byte[] message, Collector out)` supports multiple records. I think we can just keep them as they are. Best, Jingsong On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan wrote: > > Hi, Shammon, > > I think the method `void deserialize(byte[] message, Collector out)` > with a default implementation encapsulate how to deal with null for > developers. If we remove the `T deserialize(byte[] message)`, the > developers have to remember to handle null. Maybe we will get duplicate > code among them. > And I find there are only 5 implementations override the method `void > deserialize(byte[] message, Collector out)`. Other implementations reuse > the same code to handle null. > I don't know the benefits of removing this method. Looking forward to other > people's opinions. > > Best, > Hang > > > > Shammon FY 于2023年2月28日周二 14:14写道: > > > Hi devs > > > > Currently there are two deserialization methods in `DeserializationSchema` > > 1. `T deserialize(byte[] message)`, only deserialize one record from > > binary, if there is no record it should return null. > > 2. `void deserialize(byte[] message, Collector out)`, supports > > deserializing none, one or multiple records gracefully, it can completely > > replace method `T deserialize(byte[] message)`. > > > > The deserialization logic in the above two methods is basically coincident, > > we recommend users use the second method to deserialize data. To improve > > code maintainability, I'd like to mark the first function as `@Deprecated`, > > and remove it when it is no longer used in the future. > > > > I have created an issue[1] to track it, looking forward to your feedback, > > thanks > > > > [1] https://issues.apache.org/jira/browse/FLINK-31251 > > > > > > Best, > > Shammon > >