[ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-03 Thread Xintong Song
Dear Community, I'm pleased to share this good news with everyone. As some of you may have already heard, Apache Flink has won the 2023 SIGMOD Systems Award [1]. "Apache Flink greatly expanded the use of stream data-processing." -- SIGMOD Awards Committee SIGMOD is one of the most influential

Re: [DISCUSS] Issue tracking workflow

2022-10-24 Thread Xintong Song
ithub Issue? > 3. There's no longer one central administration, which is especially > valuable to track all issues across projects like the different connectors, > Flink ML, Table Store etc. > 4. Our current CI labeling works on the Jira issues, not on the Github > Issues labels. > > Be

[DISCUSS] Issue tracking workflow

2022-10-23 Thread Xintong Song
Hi devs and users, As many of you may have already noticed, Infra announced that they will soon disable public Jira account signups [1]. That means, in order for someone who is not yet a Jira user to open or comment on an issue, he/she has to first reach out to a PMC member to create an account

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-13 Thread Xintong Song
RecordsOut" of tasks). >>>>> > > The original issue was that the numRecordsOut of the sink counted >>>>> both (which is completely wrong). >>>>> > > >>>>> > > A new metric was always required; otherwise you inevi

Re: Is Flink 1.15.3 planned in foreseeable future?

2022-10-13 Thread Xintong Song
Actually, this is an on-going discussion related to 1.15.3. The community discovered a breaking change in 1.15.x and is discussing how to resolve this right now [1]. There is very likely a 1.15.3 release after this is resolved. Best, Xintong [1]

Re: Job Manager getting restarted while restarting task manager

2022-10-12 Thread Xintong Song
TaskManagers won't make the JobMananger restart. You can > provide the whole log as an attachment to investigate. > > On Wed, 12 Oct 2022 at 6:01 PM, Puneet Duggal > wrote: > >> Hi Xintong Song, >> >> Thanks for your immediate reply. Yes, I do restart task manager via k

Re: Job Manager getting restarted while restarting task manager

2022-10-11 Thread Xintong Song
The log shows that the jobmanager received a SIGTERM signal from external. Depending on how you deploy Flink, that could be a 'kill ' command, or a kubernetes pod removal / eviction, etc. You may want to check where the signal came from. Best, Xintong On Wed, Oct 12, 2022 at 6:26 AM Puneet

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-10 Thread Xintong Song
+1 for reverting these changes in Flink 1.16. For 1.15.3, can we make these metrics available via both names (numXXXOut and numXXXSend)? In this way we don't break it for those who already migrated to 1.15 and numXXXSend. That means we still need to change SinkWriterOperator to use another metric

Re: Flink TaskManager memory configuration failed

2022-06-22 Thread Xintong Song
512mb is just too small for a TaskManager. You would need to either increase it, or decrease the other memory components (which currently use default values). The 64mb Total Flink Memory comes from the 512mb Total Process Memory minus 192mb minimum JVM Overhead and 256mb default JVM Metaspace.

[ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Xintong Song
Hi everyone, I'm very happy to announce that the Apache Flink community has created a dedicated Slack workspace [1]. Welcome to join us on Slack. ## Join the Slack workspace You can join the Slack workspace by either of the following two ways: 1. Click the invitation link posted on the project

[ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 Thread Xintong Song
Hi everyone, I'm very happy to announce that the Apache Flink community has created a dedicated Slack workspace [1]. Welcome to join us on Slack. ## Join the Slack workspace You can join the Slack workspace by either of the following two ways: 1. Click the invitation link posted on the project

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Xintong Song
e is more about making communication more efficient, rather than making information easier to find. Thank you~ Xintong Song On Wed, May 11, 2022 at 5:39 PM Konstantin Knauf wrote: > I don't think we can maintain two additional channels. Some people have > already concerns about covering o

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-10 Thread Xintong Song
iliar with Discourse or Reddit. My impression is that they are not as easy to set up and maintain as Slack. Thank you~ Xintong Song [1] https://asktug.com/ On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf wrote: > Thanks for starting this discussion again. I am pretty much with Timo >

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-07 Thread Xintong Song
ncerning StackOverFlow, it definitely worth more attention from the community. Thanks for the suggestion / reminder, Piotr & David. I think Slack and StackOverFlow are probably not mutual exclusive. Thank you~ Xintong Song [1] https://zapier.com/ On Sat, May 7, 2022 at 9:50 AM Jingsong Li wrote:

Fwd: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Xintong Song
Thank you~ Xintong Song -- Forwarded message - From: Xintong Song Date: Fri, May 6, 2022 at 5:07 PM Subject: Re: [Discuss] Creating an Apache Flink slack workspace To: private Cc: Chesnay Schepler Hi Chesnay, Correct me if I'm wrong, I don't find this is *repeatedly

Re: Missing metrics in Flink v 1.15.0 rc-0

2022-04-06 Thread Xintong Song
ading / sink writing data from / to external systems, are not counted. In your case, there's only 1 vertex in the DAG, thus no internal data exchanges. Thank you~ Xintong Song On Wed, Apr 6, 2022 at 11:21 PM Peter Schrott wrote: > Hi there, > > I just successfully upgraded our Flink cluster t

Re: Flink ad-hoc方向问题

2022-03-24 Thread Xintong Song
是有这个规划的。 目前已经有一些相对零散的调度性能方面的优化在社区做起来了 [1],后续还有一些比较大的 feature 还在酝酿中。 Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-25318 On Thu, Mar 24, 2022 at 1:54 PM LuNing Wang wrote: > > Flink未来会在ad-hoc方向投入吗?类似Flink自带Trino/Presto的性能优化方式,这样批、流、OLAP/ad-hoc只需要一个引擎就可以。 >

Re: TM OOMKilled

2022-02-15 Thread Xintong Song
blem. If the problem is not fixed, but the job runs longer before the OOM happens, then it's likely the 3rd case. Moreover, you can monitor the pod memory footprint changes if such metrics are available. Thank you~ Xintong Song On Tue, Feb 15, 2022 at 11:56 PM Alexey Trenikhun wrote: > Hi

Re: TM OOMKilled

2022-02-14 Thread Xintong Song
you share what that is for? Thank you~ Xintong Song On Tue, Feb 15, 2022 at 12:10 PM Alexey Trenikhun wrote: > Hello, > We use RocksDB, but there is no problem with Java heap, which is limited > by 3.523gb, the problem with total container memory. The pod is killed > not due OutO

Re: [DISCUSS] Future of Per-Job Mode

2022-01-24 Thread Xintong Song
support shipping local dependencies. - I'm not sure about dropping the per-job mode soonish, as many users are still working with it. We'd better not force these users to migrate to the application mode when upgrading the Flink version. Thank you~ Xintong Song On Fri, Jan 21, 2022 at 4:30 PM

Re: Flink native k8s integration vs. operator

2022-01-13 Thread Xintong Song
Thanks for volunteering to drive this effort, Marton, Thomas and Gyula. Looking forward to the public discussion. Please feel free to reach out if there's anything you need from us. Thank you~ Xintong Song On Fri, Jan 14, 2022 at 8:27 AM Chenya Zhang wrote: > Thanks Thomas, Gy

Re: Flink native k8s integration vs. operator

2022-01-06 Thread Xintong Song
his way, users are free to choose between active and reactive (e.g., HPA) rescaling, while always benefiting from the beyond-deployment lifecycle (upgrades, savepoint management, etc.) and alignment with the K8s ecosystem (Flink client free, operating via kubectl, etc.). Thank you~ Xintong Song On T

Re: How to handle java.lang.OutOfMemoryError: Metaspace

2021-12-26 Thread Xintong Song
`taskmanager.numberOfTaskSlots`. If you have multiple jobs submitted to a shared Flink cluster, decreasing the number of slots in a task manager should also reduce the amount of classes loaded by the JVM, thus requiring less metaspace. Thank you~ Xintong Song On Mon, Dec 27, 2021 at 9:08 AM John Smith

Re: 托管内存为什么不能够指定最小或者最大值?

2021-12-21 Thread Xintong Song
Network 和 JVM Overhead 之所以采用了 min-max,是因为这两项如果太小往往会导致 Failure,如果太大也并不会对性能有多少帮助属于资源浪费。 相反,Managed 内存有时候可以很小,甚至一些场景下可以为 0,且增大 Managed 内存通常是有助于提高性能的,所以设计上没有引入 min-max 的配置。 Thank you~ Xintong Song On Wed, Dec 22, 2021 at 10:53 AM johnjlong wrote: > 大佬们,托管内存为什么不能够指定最小或者最大值? &g

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2021-12-21 Thread Xintong Song
ger support hadoop versions < 2.8 at all. And if that is not permitted by our users, we may consider to keep the codebase as is and wait for a bit longer. WDYT? Thank you~ Xintong Song [1] https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compati

Re: Direct buffer memory in job with hbase client

2021-12-15 Thread Xintong Song
job needs, which probably depends on your hbase client configurations. Thank you~ Xintong Song On Wed, Dec 15, 2021 at 1:40 PM Anton wrote: > Hi, from time to time my job is stopping to process messages with warn > message listed below. Tried to increase jobmanager.memory.proces

Re: High Availability on Kubernetes

2021-10-25 Thread Xintong Song
to not only pod evictions, but also other problems (jvm out-of-memory, remote storage connection downtime, etc.). Thank you~ Xintong Song On Tue, Oct 26, 2021 at 7:39 AM Deshpande, Omkar wrote: > Hello, > > We are running flink on Kubernetes(Standalone) in application cluster > mode. The

Re: 如何查看1.10的中文文档

2021-10-08 Thread Xintong Song
https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/ Thank you~ Xintong Song On Sat, Oct 9, 2021 at 10:57 AM 杨浩 wrote: > 我们公司用的flink版本是release-1.10,请问如何查看该版本的中文文档, > > > 英文文档:https://ci.apache.org/projects/flink/flink-docs-release-1.10/ > 中文只能看最新的:https://flin

[ANNOUNCE] Release 1.14.0, release candidate #0

2021-08-29 Thread Xintong Song
Hi everyone, The RC0 for Apache Flink 1.14.0 has been created. This is still a preview-only release candidate to drive the current testing efforts and so no official votes will take place. It has all the artifacts that we would typically have for a release, except for the release note and the

Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-09 Thread Xintong Song
Thanks Yun and everyone~! Thank you~ Xintong Song On Mon, Aug 9, 2021 at 10:14 PM Till Rohrmann wrote: > Thanks Yun Tang for being our release manager and the great work! Also > thanks a lot to everyone who contributed to this release. > > Cheers, > Till > > On Mon, A

Re: [ANNOUNCE] Apache Flink 1.13.2 released

2021-08-09 Thread Xintong Song
Thanks Yun and everyone~! Thank you~ Xintong Song On Mon, Aug 9, 2021 at 10:14 PM Till Rohrmann wrote: > Thanks Yun Tang for being our release manager and the great work! Also > thanks a lot to everyone who contributed to this release. > > Cheers, > Till > > On Mon, A

Re: Memory usage UI

2021-07-01 Thread Xintong Song
lead to confusions. Since Flink-1.12, we have introduced a new web ui for the memory metrics, where the legacy metrics are preserved only for backward compatibility and are placed in an `Advanced` pane. I'd recommend ignoring them in 99% of the cases. Thank you~ Xintong Song On Fri, Jul 2

Re: Flink v1.12.2 Kubernetes Session Mode cannot mount log4j.properties in configMap

2021-06-21 Thread Xintong Song
ies in a session cluster [3]. Please be aware that in standalone Kubernetes deployment, Flink looks for log4j-console.properties instead of log4j.properties. By default, this will write the logs to stdout, so that the logs can be viewed by the `kubectl logs` command. Thank you~ Xintong Song [1] ht

Re: Resource Planning

2021-06-15 Thread Xintong Song
Hi Thomas, It would be helpful if you can provide the jobmanager/taskmanager logs, and gc logs if possible. Additionally, you may consider to monitor the cpu/memory related metrics [1], see if there's anything abnormal when the problem is observed. Thank you~ Xintong Song [1] https

Re: Add control mode for flink

2021-06-08 Thread Xintong Song
case of general control messages. - Watermarks are probably similar to the other control messages. However, it's already exposed to users as public APIs. If we want to migrate it to the new control flow, we'd be very careful not to break any compatibility. Thank you~ Xintong Song On Wed, Jun 9,

Re: Re: Add control mode for flink

2021-06-08 Thread Xintong Song
events from JobMaster 3. Consume control events from arbitrary operators downstream where the events are produced Thank you~ Xintong Song On Tue, Jun 8, 2021 at 1:37 PM Yun Gao wrote: > Very thanks Jiangang for bringing this up and very thanks for the > discussion! > >

Re: Add control mode for flink

2021-06-07 Thread Xintong Song
ut potentially other future features as well. - AFAICS, it's non-trivial to make a 3rd-party dynamic configuration framework work together with Flink's consistency mechanism. Thank you~ Xintong Song On Mon, Jun 7, 2021 at 11:05 AM 刘建刚 wrote: > Thank you for the reply. I have checked the p

Re: In native k8s application mode, how can I know whether the job is failed or finished?

2021-06-03 Thread Xintong Song
the session cluster. Thus, status of historical jobs can be accessed via the JM. 2. You can try setting up a history server [1], where information of finished jobs can be archived. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/advanced

Re: yarn ship from s3

2021-05-26 Thread Xintong Song
be straightforward. Unfortunately, these efforts are still in progress, and are more or less staled recently. Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-20681 [2] https://issues.apache.org/jira/browse/FLINK-20811 [3] https://issues.apache.org/jira/browse/FLINK-20867

Re: reactive mode and back pressure

2021-05-17 Thread Xintong Song
Yes, it does. Internally, each re-scheduling is performed as stop-and-resume the job, similar to a failover. Without checkpoints, the job will always restore from the very beginning. Thank you~ Xintong Song On Mon, May 17, 2021 at 2:54 PM Alexey Trenikhun wrote: > Hi Xintong, >

Re: The heartbeat of JobManager timed out

2021-05-16 Thread Xintong Song
usually observed for large scale jobs (in terms of number of vertices and parallelism). In that case, we would have to increase the heartbeat timeout. Thank you~ Xintong Song On Mon, May 17, 2021 at 11:12 AM Smile wrote: > JM log shows this: > > INFO org.apache.flink.yarn.Ya

Re: reactive mode and back pressure

2021-05-16 Thread Xintong Song
with both the default and the new reactive modes. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/checkpoints/#unaligned-checkpoints On Fri, May 14, 2021 at 11:29 PM Alexey Trenikhun wrote: > Hello, > > Is new reactive

Re: How does JobManager terminate dangling task manager

2021-05-13 Thread Xintong Song
by the checkpointing mechanism. The new task does not resume from the exact position where the old task is stopped. Instead, it resumes from the last successful checkpoint. Thank you~ Xintong Song On Thu, May 13, 2021 at 5:38 PM Guowei Ma wrote: > Hi, > In fact, not only JobManager(ResoruceM

Re: Question regarding cpu limit config in Flink standalone mode

2021-05-06 Thread Xintong Song
`kubernets.taskmanager.cpu` controls the cpu resource of pods Flink requests from Kubernetes. Thank you~ Xintong Song On Fri, May 7, 2021 at 10:35 AM Fan Xie wrote: > Hi Flink Community, > > Recently I am working on an auto-scaling project that needs to dynamically > adjust the cpu config of Flin

Re: [ANNOUNCE] Apache Flink 1.13.0 released

2021-05-05 Thread Xintong Song
Thanks Dawid & Guowei as the release managers, and everyone who has contributed to this release. Thank you~ Xintong Song On Thu, May 6, 2021 at 9:51 AM Leonard Xu wrote: > Thanks Dawid & Guowei for the great work, thanks everyone involved. > > Best, > Leonard > &

Re: [ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Xintong Song
Thanks for driving this, Konstantin. Great job~! Thank you~ Xintong Song On Thu, Apr 22, 2021 at 11:57 PM Matthias Pohl wrote: > Thanks for setting this up, Konstantin. +1 > > On Thu, Apr 22, 2021 at 11:16 AM Konstantin Knauf > wrote: > >> Hi everyone, >> &

Re: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Xintong Song
These metrics should also be available via REST. You can check the original design doc [1] for which metrics the UI is using. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager On Tue, Apr 13, 2021 at 9:08 PM Alexis Sarda

Re: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Xintong Song
' and is not controlled by the garbage collectors. Thank you~ Xintong Song On Tue, Apr 13, 2021 at 7:53 PM Alexis Sarda-Espinosa < alexis.sarda-espin...@microfocus.com> wrote: > Hello, > > > > I have a Flink TM configured with taskmanager.memory.managed.size: 1372m. > There is a str

Re: Re: flink的cpu和内存资源分配

2021-04-12 Thread Xintong Song
你截图的日志也明确显示了各部分的内存大小分别是多少,heap 只是其中一部分,所有的加起来才是你配置的 1728m。 调整配置是可以让 TM 用到更多的内存,至于能否提升性能,那要看你的计算任务瓶颈是否在内存上。如果瓶颈在 cpu、io 甚至上游数据源,那一味调大内存也帮助不大。 Thank you~ Xintong Song On Mon, Apr 12, 2021 at 10:32 AM penguin. wrote: > 谢谢!因为我是一个机器作为一个TM,flink配置文件中默认的taskmanager.memory.process.size >

Re: flink的cpu和内存资源分配

2021-04-11 Thread Xintong Song
Native 内存有一部分是有可能超用的,另外 CPU 也是有可能超用的。但是通常 K8s/Yarn 运行环境中都提供外围的资源限制,比如不允许资源超用或只允许一定比例的资源超用,这个要看具体的环境配置。 可以看一下内存模型与配置相关的几篇官方文档 [1]。 Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/deployment/memory/mem_setup.html On Sun, Apr 11, 2021 at 9:16 PM p

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-28 Thread Xintong Song
+1 It's already a matter of fact for a while that we no longer port new features to the Mesos deployment. Thank you~ Xintong Song On Fri, Mar 26, 2021 at 10:37 PM Till Rohrmann wrote: > +1 for officially deprecating this component for the 1.13 release. > > Cheers, > Till > &

Re: flink 使用yarn部署,报错:Maximum Memory: 8192 Requested: 10240MB. Please check the 'yarn.scheduler.maximum-allocation-mb' and the 'yarn.nodemanager.resource.memory-mb' configuration values

2021-03-21 Thread Xintong Song
报错信息里已经说明了:你的 Yarn 集群配置允许的最大 container 是 2g,而你的 flink 配置的 TM 大小是 10g。 Thank you~ Xintong Song On Sat, Mar 20, 2021 at 7:52 PM william <712677...@qq.com> wrote: > org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't > deploy Yarn ses

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Xintong Song
If all the tasks have the same parallelism 36, your job should only allocate 36 slots. The evenly-spread-out-slots option should help in your case. Is it possible for you to share the complete jobmanager logs? Thank you~ Xintong Song On Tue, Mar 16, 2021 at 12:46 AM Aeden Jameson wrote

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Xintong Song
subtask of it, and there's no guarantee which 36 out of the 54 contain it. Thank you~ Xintong Song On Mon, Mar 15, 2021 at 3:54 AM Chesnay Schepler wrote: > Is this a brand-new job, with the cluster having all 18 TMs at the time > of submission? (or did you add more TMs while the job was running) &g

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2021-03-07 Thread Xintong Song
Hi Hemant, I don't see any problem in your settings. Any exceptions suggesting why TM containers are not coming up? Thank you~ Xintong Song On Sat, Mar 6, 2021 at 3:53 PM bat man wrote: > Hi Xintong Song, > I tried using the java options to generate heap dump referring to docs[1] >

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2021-03-05 Thread Xintong Song
]. If a memory leak is suspected, to further understand where the memory is consumed, you may need to dump the heap on OOMs and looking for unexpected memory usages leveraging profiling tools. Thank you~ Xintong Song [1] https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html

Re: yarn.containers.vcores使用问题

2021-03-04 Thread Xintong Song
你的 flink 是什么版本? 部署模式是 per-job 还是 session? “看到任务配置参数也生效了”具体是在哪里看到的? Thank you~ Xintong Song On Thu, Mar 4, 2021 at 4:35 PM 阿华田 wrote: > 使用-yD yarn.containers.vcores=4 > 区设置flink任务的总的cpu核数,看到任务配置参数也生效了 但是实际申请核数还是按照 cpu slot一比一申请的 > 各位大佬使用yarn.containers.vcores是不是还需要开启yarn的cpu 调度 &g

Re: Scaling Higher than 10k Nodes

2021-03-04 Thread Xintong Song
, such as tremendous memory consumption, buzy rpc main thread, etc. To make that case work, we did many optimizations on our internal flink version, which we are trying to contribute to the community. See FLINK-21110 [1] for the details. Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse

Re: Flink problem

2021-02-19 Thread Xintong Song
What you're looking for might be Session Window[1]. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/stream/operators/windows.html#session-windows On Fri, Feb 19, 2021 at 7:35 PM ゞ野蠻遊戲χ wrote: > hi all > > For example, if A use

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
bytes > INFO [] - Network: 128.000mb (134217730 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 192.000mb (201326592 bytes) Thank you~ Xintong Song On Tue, Feb 2, 2021 at 8

Re: Memory usage increases on every job restart resulting in eventual OOMKill

2021-02-02 Thread Xintong Song
Hi Randal, The image is too blurred to be clearly seen. I have a few questions. - IIUC, you are using the standalone K8s deployment [1], not the native K8s deployment [2]. Could you confirm that? - How is the memory measured? Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink

Re: flink on yarn , JobManager和ApplicationMaster的关系

2021-02-02 Thread Xintong Song
你之前的理解是正确的。Yarn 的 AM 就是 Flink 的 JM。 你看到的文档描述是有问题的。我查了一下 git history,你所摘录的内容 2014 年撰写的,描述的应该是项目初期的 on yarn 部署方式,早已经过时了。这部分内容在最新的 1.12 版本文档中已经被移除了。 Thank you~ Xintong Song On Tue, Feb 2, 2021 at 6:43 PM lp <973182...@qq.com> wrote: > 或者说,我知道,对于MapReduce任务,ApplicationMaster的实现是MRAp

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-31 Thread Xintong Song
any issue related to the upgrading of the ZK version that may cause the leadership loss. Thank you~ Xintong Song On Sun, Jan 31, 2021 at 4:14 AM Colletta, Edward wrote: > “but I'm not aware of any similar issue reported since the upgrading” > > For the record, we experienced this same er

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-29 Thread Xintong Song
Thank you~ Xintong Song On Sat, Jan 30, 2021 at 8:27 AM Xintong Song wrote: > There's indeed a ZK version upgrading during 1.9 and 1.11, but I'm not > aware of any similar issue reported since the upgrading. > I would suggest the following: > - Turn on the DEBUG log see if

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-29 Thread Xintong Song
issue. Thank you~ Xintong Song Thank you~ Xintong Song On Sat, Jan 30, 2021 at 6:47 AM Lu Niu wrote: > Hi, Xintong > > Thanks for replying. Could it relate to the zk version? We are a platform > team at Pinterest in the middle of migrating form 1.9.1 to 1.11. Both 1.9 > and 1.

[ANNOUNCE] Apache Flink 1.10.3 released

2021-01-28 Thread Xintong Song
in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348668 We would like to thank all contributors of the Apache Flink community who made this release possible! Regards, Xintong Song

[ANNOUNCE] Apache Flink 1.10.3 released

2021-01-28 Thread Xintong Song
in Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348668 We would like to thank all contributors of the Apache Flink community who made this release possible! Regards, Xintong Song

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2021-01-28 Thread Xintong Song
about timeout, and there's no gc issue spotted, I would consider a network instability. Thank you~ Xintong Song On Fri, Jan 29, 2021 at 3:15 AM Lu Niu wrote: > After checking the log I found the root cause is zk client timeout on TM: > ``` > 2021-01-25 14:01:49

Re: Flink1.12 批处理模式,分词统计时单词个数为1的单词不会被打印

2021-01-28 Thread Xintong Song
你用的应该是 1.12.0 版本吧。这是一个已知问题 [1],升级到 1.12.1 有修复。 Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-20764 On Thu, Jan 28, 2021 at 4:55 PM xhyan0427 <15527609...@163.com> wrote: > 代码: > val env = StreamExecutionEnvironment.getExecutio

Re: pyflink 1.12.1 没有 python 3.8 安装文件

2021-01-23 Thread Xintong Song
Apache 官方也能下载到。 https://dist.apache.org/repos/dist/release/flink/flink-1.12.1/python/ Thank you~ Xintong Song On Sun, Jan 24, 2021 at 11:39 AM macdoor wrote: > 谢谢!不好意思没有仔细读文档,现在哪里能下载build 好的 Linux 下的 Python 3.8 的 pyflink 1.12.1 > 吗?觉得自己build的还是不放心 > > > > -- > Sen

Re: pyflink 1.12.1 没有 python 3.8 安装文件

2021-01-23 Thread Xintong Song
flinkDev/building.html#build-pyflink> > if >needed. > > Thank you~ Xintong Song [1] https://flink.apache.org/news/2021/01/19/release-1.12.1.html On Sun, Jan 24, 2021 at 9:12 AM macdoor wrote: > 在 Linux python 3.8上无法安装 pyflink 1.12.1 ,最高是 1.12.0,查看可以提供的安装文件 > https://p

Re: FlinkUserCodeClassLoader在session模式的集群下是如何卸载类的

2021-01-21 Thread Xintong Song
standalone 集群 session 模式,作业的 main 方法应该是在 client 端执行的。 Thank you~ Xintong Song On Fri, Jan 22, 2021 at 9:52 AM Asahi Lee <978466...@qq.com> wrote: > 你好! > > 我使用的是flink-1.12.0版本,启动的单机集群;在我的flink程序main方法中,我使用URLClassLoader加载了一个 > http://a.jar的jar包 <http://a.xn--jarjar-g

Re: flink heartbeat timeout

2021-01-20 Thread Xintong Song
1. 50s 的 timeout 时间通常应该是够用的。建议排查一下 timeout 当时环境中是否存在网络抖动,或者 JM/TM 进程是否存在长时间 GC 导致不响应。 2. 目前 flink 集群配置无法做到不重启热更新 Thank you~ Xintong Song On Thu, Jan 21, 2021 at 11:39 AM guoxb__...@sina.com wrote: > Hi > > *问题描述:* > > > 我在使用flink进行流式计算任务,我的程序造昨晚上21点启动的,当时看是正常的,数据也是正常处理的,

Re: Pyflink JVM Metaspace 内存泄漏定位

2021-01-20 Thread Xintong Song
cc @Jark 看起来像是 JDBC connector 的问题。这块你熟悉吗?或者知道谁比较熟悉吗? Thank you~ Xintong Song On Wed, Jan 20, 2021 at 8:07 PM YueKun wrote: > hi,不确定是否能看到图片,Jmap导出的数据分析看如下:< > http://apache-flink.147419.n8.nabble.com/file/t1276/WX20210120-191436.png> > > > > > -- > Sent from:

Re: Pyflink JVM Metaspace 内存泄漏定位

2021-01-20 Thread Xintong Song
JDBC连接是谁创建的,能找到相关调用栈吗,是 flink 提供的 connector 还是用户代码? Thank you~ Xintong Song On Wed, Jan 20, 2021 at 6:32 PM YueKun wrote: > 目前看泄漏是因为 mysql 的 JDBC 引起的,和 > > http://apache-flink.147419.n8.nabble.com/1-11-1-OutOfMemoryError-Metaspace-td8367.html#a8399 > 这个问题一样。这个有什么解决方法吗?需要更换 mys

Re: Pyflink JVM Metaspace 内存泄漏定位

2021-01-19 Thread Xintong Song
你用的是 Flink 是哪个版本?Flink 有一些已知的、已修复的 metaspace 泄露问题 [1] [2],看下是否符合你的情况。 另外,也不排除与你的代码实现、用到的依赖包的实现相关。具体问题定位需要 jstack / jmap 检查一下是否有此前任务的残留 thread / object。 Thank you~ Xintong Song [1] https://issues.apache.org/jira/browse/FLINK-16408 [2] https://issues.apache.org/jira/browse/FLINK-20333 On Tue

Re: 1.12.0版本启动异常 on yarn per job方式

2021-01-19 Thread Xintong Song
检查一下你的作业 jar 包里是否把 hadoop 依赖也打进去了。一般情况下 hadoop 依赖应该设成 provided,如果作业确实有需要用到和 yarn 集群不同版本的 hadoop 依赖,需要 shade。 Thank you~ Xintong Song Thank you~ Xintong Song On Tue, Jan 19, 2021 at 3:31 PM guanyq wrote: > 看错误是与hadoop-common-2.7.4.jar冲突,但是不知道如何解决。 > help > 2021-0

[ANNOUNCE] Apache Flink 1.12.1 released

2021-01-18 Thread Xintong Song
The Apache Flink community is very happy to announce the release of Apache Flink 1.12.1, which is the first bugfix release for the Apache Flink 1.12 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

[ANNOUNCE] Apache Flink 1.12.1 released

2021-01-18 Thread Xintong Song
The Apache Flink community is very happy to announce the release of Apache Flink 1.12.1, which is the first bugfix release for the Apache Flink 1.12 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: Resource changed on src filesystem after upgrade

2021-01-18 Thread Xintong Song
as `yarn.ship-files`, `yarn.ship-archives` or `yarn.provided.lib.dirs`? This helps us to locate the code path that this file went through. Thank you~ Xintong Song On Sun, Jan 17, 2021 at 10:32 PM Mark Davis wrote: > Hi all, > I am upgrading my DataSet jobs from Flink 1.8 to 1.12. > After th

Re: 腾讯安检测的这个Apache Flink目录遍历漏洞风险通告,社区会对以前的版本根据修复吗

2021-01-09 Thread Xintong Song
社区目前没有计划对以前版本修复。 一般情况下,Flink 社区只对最近的两个版本进行维护。也就是说目前最新的版本是 1.12,因此会对 1.11、1.12 这两个版本进行 bugfix,而一旦 1.13 发布 1.11 就停止维护了。 Thank you~ Xintong Song On Fri, Jan 8, 2021 at 6:24 PM zhouyajun wrote: > 报告链接:https://s.tencent.com/research/bsafe/1215.html > > > > -- > Sent from: h

Re: How does Flink handle shorted lived keyed streams

2020-12-24 Thread Xintong Song
I believe what you are looking for is the State TTL [1][2]. Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl [2] https://ci.apache.org/projects/flink/flink-docs-stabledev/table/config.html#table-exec-state

[ANNOUNCE] Apache Flink 1.11.3 released

2020-12-18 Thread Xintong Song
The Apache Flink community is very happy to announce the release of Apache Flink 1.11.3, which is the third bugfix release for the Apache Flink 1.11 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

[ANNOUNCE] Apache Flink 1.11.3 released

2020-12-18 Thread Xintong Song
The Apache Flink community is very happy to announce the release of Apache Flink 1.11.3, which is the third bugfix release for the Apache Flink 1.11 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2020-12-17 Thread Xintong Song
I'm not aware of any significant changes to the HA components between 1.9/1.11. Would you mind sharing the complete jobmanager/taskmanager logs? Thank you~ Xintong Song On Fri, Dec 18, 2020 at 8:53 AM Lu Niu wrote: > Hi, Xintong > > Thanks for replying and your suggestion. I

Re: flink 1.12 RocksDBStateBackend 报错

2020-12-17 Thread Xintong Song
https://issues.apache.org/jira/browse/FLINK-20646 Thank you~ Xintong Song On Thu, Dec 17, 2020 at 11:40 PM zhisheng wrote: > hi,xintong > > 有对应的 Issue ID 吗? > > Xintong Song 于2020年12月17日周四 下午4:48写道: > > > 确实是 1.12.0 的 bug。 > > 我们在所有用到 state 的地方都应该去声明 ManagedMe

Re: flink 1.12 RocksDBStateBackend 报错

2020-12-17 Thread Xintong Song
确实是 1.12.0 的 bug。 我们在所有用到 state 的地方都应该去声明 ManagedMemoryUseCase.STATE_BACKEND。有一个新添加的 ReduceTransformation 没有做这个声明,导致所有涉及到这个算子的作业使用 RocksDB 都会出问题。 我马上建 issue,这个可能要推动社区加急发一个 bugfix 版本了 Thank you~ Xintong Song On Thu, Dec 17, 2020 at 11:05 AM HunterXHunter <1356469...@qq.com> wrote: &g

Re: Flink 1.11 job hit error "Job leader lost leadership" or "ResourceManager leader changed to new address null"

2020-12-16 Thread Xintong Song
into the ZooKeeper logs checking why RM's leadership is revoked. Thank you~ Xintong Song On Thu, Dec 17, 2020 at 8:42 AM Lu Niu wrote: > Hi, Flink users > > Recently we migrated to flink 1.11 and see exceptions like: > ``` > 2020-12-15 12

Re: Re: 直接内存溢出

2020-12-16 Thread Xintong Song
日志文件开头会打环境信息,包括 JVM 参数。 Thank you~ Xintong Song On Wed, Dec 16, 2020 at 10:01 PM aven wrote: > 感谢回复,我尝试一下这两个参数。 > 我还有一个问题,flink的内存配置参数在启动,在运行时是否有办法查看。 > 或者在启动的时候是可以通过日志打印出来吗? > > > > > > > > > > > > > > -- > > Best! > Aven > &

Re: 直接内存溢出

2020-12-16 Thread Xintong Song
条件的话尽快升级到新版本。 Thank you~ Xintong Song On Wed, Dec 16, 2020 at 7:07 PM 巫旭阳 wrote: > 报错信息如下 > Caused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.nio.Bits.reserveMemory(Bits.java:693) > > at java.nio.DirectByteBuffer.(DirectByte

Re: Flink 1.10.0 on yarn 提交job失败

2020-12-15 Thread Xintong Song
可以运行。 Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/hadoop.html#adding-hadoop-to-lib On Sat, Dec 12, 2020 at 8:05 PM Jacob <17691150...@163.com> wrote: > Hello, 请问在flink 1.10.0 on yarn提交job出现此问题是什么原因,hadoop > jar包依赖吗?该程序在1.1

Re: flink 1.11.2 on yarn 可用slot始终为0,job无法提交

2020-12-09 Thread Xintong Song
jobmanager 的日志方便发下吗? 另外,可以看下 yarn 是否分配了 taskmanager 的 container,如果有的话通过 yarn 获取以下 taskmanager 的日志。 Thank you~ Xintong Song On Thu, Dec 10, 2020 at 9:55 AM Jacob <17691150...@163.com> wrote: > < > http://apache-flink.147419.n8.nabble.com/file/t1162/Screenshot_2020-12-09_153858.

Re: Flink-yarn模块加载外部文件的问题

2020-12-06 Thread Xintong Song
/browse/FLINK-20505 Thank you~ Xintong Song On Mon, Dec 7, 2020 at 10:03 AM zhou chao wrote: > hi all, 最近在1.11上使用io.extra-file加载外部http文件出现一点小问题 > > 由于http的文件在FileSystem.getFileStatus去拿状态时会走HttpFileSystem的类的getFileStatus方法,该方法返回的FileStatus中length为-1。 > 在cl

Re: taskmanager.cpu.cores 1.7976931348623157E308

2020-12-06 Thread Xintong Song
FYI, I've opened FLINK-20503 for this. https://issues.apache.org/jira/browse/FLINK-20503 Thank you~ Xintong Song On Mon, Dec 7, 2020 at 11:10 AM Xintong Song wrote: > I forgot to mention that it is designed that task managers always have > `Double#MAX_VALUE` cpu cores in local exe

Re: taskmanager.cpu.cores 1.7976931348623157E308

2020-12-06 Thread Xintong Song
for users. Will fire an issue on that. Thank you~ Xintong Song On Mon, Dec 7, 2020 at 11:03 AM Xintong Song wrote: > Hi Rex, > > We're running this in a local environment so that may be contributing to >> what we're seeing. >> > Just to double check on this. By `local envi

Re: taskmanager.cpu.cores 1.7976931348623157E308

2020-12-06 Thread Xintong Song
in such cases. - kubernetes.jobmanager.cpu - kubernetes.taskmanager.cpu - yarn.appmaster.vcores - yarn.containers.vcores - mesos.resourcemanager.tasks.cpus Thank you~ Xintong Song [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_set

Re: 回复: flink-1.11.2 的 内存溢出问题

2020-11-16 Thread Xintong Song
理论上一个 TM 可以拆分成多少 slot 并没有硬性的限制,但是并不是说并发越大,性能就一定越好。 增大并发,会增加作业对内存的需求。TM 上的 slot 数量过多时,可能会造成 GC 压力大、网络内存不足、OOM 等情况。另外,同一个 TM 上的 slot 多了,运行的 task 多了,也会给框架造成一定的压力。 建议先观察一下 TM 的 cpu 使用情况,如果作业确实存在处理性能不足(延迟增大、存在反压)同时 TM container 的 cpu (多核)利用率上不去,再考虑调大并发。 Thank you~ Xintong Song On Tue, Nov 17

Re: 回复: flink-1.11.2 的 内存溢出问题

2020-11-16 Thread Xintong Song
> > 好的,谢谢回复,那请问下 taskmanager.memory.task.off-heap.size 这个参数可以通过 下面代码动态设置吗? > > streamTableEnv.getConfig().getConfiguration().setString(key, value); > 不可以的,这个是集群配置。 可以通过 flink-conf.yaml 配置文件进行配置,或者在提交作业时通过 -yD key=value 的方式动态指定。 Thank you~ Xintong Song On Tue, Nov 17,

Re: flink-1.11.2 的 内存溢出问题

2020-11-16 Thread Xintong Song
是什么部署模式呢?standalone? 之前任务运行一段时间报错之后,重新运行的时候是所有 TM 都重启了吗?还是有复用之前的 TM? Thank you~ Xintong Song On Mon, Nov 16, 2020 at 5:53 PM 史 正超 wrote: > 使用的是rocksdb, 并行度是5,1个tm, 5个slot,tm 内存给 > 10G,启动任务报下面的错误。之前有启动成功过,运行一段时间后,也是报内存溢出,然后接成原来的offset启动任务,直接启动不起来了。 > > 2020-11

Re: Flink AutoScaling EMR

2020-11-15 Thread Xintong Song
ntains on the decommissioning node will be killed. Thank you~ Xintong Song On Fri, Nov 13, 2020 at 2:57 PM Robert Metzger wrote: > Hi, > it seems that YARN has a feature for targeting specific hardware: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.htm

  1   2   3   4   >