Re: Re:flink s3 checkpoint 一直IN_PROGRESS(100%)直到失败

2022-03-08 Thread Yun Tang
Hi 一般是卡在最后一步从JM写checkpoint meta上面了,建议使用jstack等工具检查一下JM的cpu栈,看问题出在哪里。 祝好 唐云 From: Sun.Zhu <17626017...@163.com> Sent: Tuesday, March 8, 2022 14:12 To: user-zh@flink.apache.org Subject: Re:flink s3 checkpoint 一直IN_PROGRESS(100%)直到失败 图挂了

Re: Move savepoint to another s3 bucket

2022-03-08 Thread Dawid Wysakowicz
Hi Lukas, I am afraid you're hitting this bug: https://issues.apache.org/jira/browse/FLINK-25952 Best, Dawid On 08/03/2022 16:37, Lukáš Drbal wrote: Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error. We are using lyft

Problem about adding custom kryo serializer

2022-03-08 Thread guoliubin85
Hi, I have an entity class built by Google Flatbuf, to raise the performance, I have tried written a serializer class. public class TransactionSerializer extends Serializer { @Override public void write(Kryo kryo, Output output, Transaction transaction) { ByteBuffer

?????? Flink????????????

2022-03-08 Thread hjw
streaming api ??sql api streaming api ---- ??: "user-zh"

Re: Flatmap operator in an Asynchronous call

2022-03-08 Thread Diwakar Jha
Thanks Gen, I will look into customized Source and SpiltEnumerator. On Mon, Mar 7, 2022 at 10:20 PM Gen Luo wrote: > Hi Diwakar, > > An asynchronous flatmap function without the support of the framework can > be problematic. You should not call collector.collect outside the main > thread of the

Re: flink on yarn任务停止发生异常

2022-03-08 Thread Jiangang Liu
异常提示的很明确了,做savepoint的过程中有的task不在running状态,你可以看下你的作业是否发生了failover。 QiZhu Chan 于2022年3月8日周二 17:37写道: > Hi, > > 各位社区大佬们,帮忙看一下如下报错是什么原因造成的?正常情况下客户端日志应该返回一个savepoint路径,但却出现如下异常日志,同时作业已被停止并且查看hdfs有发现当前job产生的savepoint文件 > > > > >

Re: Flink计算机制疑问

2022-03-08 Thread Jiangang Liu
只存计算结果,来一条数据更新一次状态并且下发出去。具体可以参考下state: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/ hjw <1010445...@qq.com.invalid> 于2022年3月9日周三 01:32写道: > 如下一段sql:SELECT color, sum(id) FROM T GROUP BY >

Re: Left join query not clearing state after migrating from 1.9.0 to 1.14.3

2022-03-08 Thread Prakhar Mathur
Hi Roman, Thanks for the reply, here is the screenshot of the latest failed checkpoint. [image: Screenshot 2022-03-09 at 11.44.46 AM.png] I couldn't find the details for the last successful one as we only store the last 10 checkpoints' details. Also, can you give us an idea of exactly what

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Guowei Ma
Hi, Ken If you are talking about the Batch scene, there may be another idea that the engine automatically and evenly distributes the amount of data to be processed by each Stage to each worker node. This also means that, in some cases, the user does not need to manually define a Partitioner. At

Re: Re: k8s native session 问题咨询

2022-03-08 Thread Yang Wang
你用新版本试一下,看着是已经修复了 https://issues.apache.org/jira/browse/FLINK-19212 Best, Yang 崔深圳 于2022年3月9日周三 10:31写道: > > > > web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server > side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: > Failed to serialize the result for RPC call : >

Re:Re: k8s native session 问题咨询

2022-03-08 Thread 崔深圳
web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to serialize the result for RPC call : requestMultipleJobDetails.\n\tat

Re:Re: k8s native session 问题咨询

2022-03-08 Thread 崔深圳
web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to serialize the result for RPC call : requestMultipleJobDetails.\n\tat

Re: k8s native session 问题咨询

2022-03-08 Thread yu'an huang
你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager, 然后向Active Job manager拿到结果再返回给client. > On 7 Mar 2022, at 7:46 PM, 崔深圳 wrote: > > k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest > service访问,总是路由到非master节点,有什么办法使其稳定吗?

Re: Controlling group partitioning with DataStream

2022-03-08 Thread Ken Krugler
Hi Dario, Just to close the loop on this, I answered my own question on SO. Unfortunately it seems like the recommended solution is to do the same hack I did a while ago, which is to generate (via trial-and-error) a key that gets assigned to the target slot. I was hoping for something a bit

replay kinesis events

2022-03-08 Thread Guoqin Zheng
Hi Flink experts, Wondering if there is a built-in way to replay already-processed events in an event queue. For example, if I have a flink app processing event stream from Kinesis. Now if I find a bug in the flink app and make a fix. And I would like to re-process events that are already

Re: Question about Flink counters

2022-03-08 Thread Shane Bishop
Hi, My issue has been resolved through discussion with AWS support. It turns out that Kinesis Data Analytics reports to CloudWatch in a way I did not expect. The way to view the accurate values for Flink counters is with Average in CloudWatch metrics. Below is the response from AWS support,

Re: PyFlink : submission via rest

2022-03-08 Thread aryan m
Thanks Dian! That worked ! On Sun, Mar 6, 2022 at 10:47 PM Dian Fu wrote: > The dedicated REST API is still not supported. However, you could try to > use PythonDriver just like you said and just submit it like a Java Flink > job. > > Regards, > Dian > > On Sun, Mar 6, 2022 at 3:38 AM aryan m

Flink????????????

2022-03-08 Thread hjw
sql??SELECT color, sum(id) FROM T GROUP BY colorFlinkTgroup by key??color)??Flink???

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Filip Karnicki
Hi Roman, Igal (@ below) Thank you for your answer. I don't think I'll have access to flink's lib folder given it's a shared Cloudera cluster. The only thing I could think of is to not include com.google.protobuf in the classloader.parent-first-patterns.additional setting, and including

Move savepoint to another s3 bucket

2022-03-08 Thread Lukáš Drbal
Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error. We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can see version 1.13.6-396a8d44-szn which is just internal build from flink commit

Re: MapState.entries()

2022-03-08 Thread Alexey Trenikhun
Thank you ! From: Schwalbe Matthias Sent: Monday, March 7, 2022 11:36:22 PM To: Alexey Trenikhun ; Flink User Mail List Subject: RE: MapState.entries() Hi Alexey, To my best knowledge it’s lazy with RocksDBStateBackend, using the Java iterator you could

RE: Incremental checkpointing & RocksDB Serialization

2022-03-08 Thread Schwalbe Matthias
Hi Vidya, As to the choice of serializer: * Flink provides two implementations that support state migration, AVRO serializer, and Pojo serializer * Pojo serializer happens to be one of the fastest available serializers (faster than AVRO) * If your record sticks to Pojo coding rules

Re: Flink Checkpoint Timeout

2022-03-08 Thread Mahantesh Patil
I see for every consequential checkpoint timeout fail , number of tasks which completed checkpointing keeps decreasing, why would that happen? Does flink try to process data beyond old checkpoint barrier which failed to complete due to timeout? On Tue, Mar 8, 2022 at 12:48 AM yidan zhao wrote:

Re: [statefun] hadoop dependencies and StatefulFunctionsConfigValidator

2022-03-08 Thread Roman Khachatryan
Hi Filip, Have you tried putting protobuf-java 3.7.1 into the Flink's lib/ folder? Or maybe re-writing the dependencies you mentioned to be loaded as plugins? [1] I don't see any other ways to solve this problem. Probably Chesnay or Seth will suggest a better solution. [1]

Savepoint API challenged with large savepoints

2022-03-08 Thread Schwalbe Matthias
Dear Flink Team, In the last weeks I was faced with a large savepoint (around 40GiB) that contained lots of obsolete data points and overwhelmed our infrastructure (i.e. failed to load/restart). We could not afford to lose the state, hence I spent the time to transcode the savepoint into

Re: Left join query not clearing state after migrating from 1.9.0 to 1.14.3

2022-03-08 Thread Roman Khachatryan
Hi Prakhar, Could you please share the statistics about the last successful and failed checkpoints, e.g. from the UI. Ideally, with detailed breakdown for the operators that seems problematic. Regards, Roman On Fri, Mar 4, 2022 at 8:48 AM Prakhar Mathur wrote: > > Hi, > > Can someone kindly

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jingsong Li
Thanks all for your discussions. I'll share my opinion here: 1. Hive SQL and Hive-like SQL are the absolute mainstay of current Batch ETL in China. Hive+Spark (HiveSQL-like)+Databricks also occupies a large market worldwide. - Unlike OLAP SQL (such as presto, which is ansi-sql rather than hive

Re: Flink Checkpoint Timeout

2022-03-08 Thread yidan zhao
If the checkpoint timeout leads to the job's fail, then the job will be recovered and data will be reprocessed from the last completed checkpoint. If the job doesn't fail, then not. Mahantesh Patil 于2022年3月8日周二 14:47写道: > Hello Team, > > What happens after checkpoint timeout? > > Does Flink

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Zou Dan
Hi Martijn, Thanks for bringing this up. Hive SQL (using in Hive & Spark) plays an important role in batch processing, it has almost become de facto standard in batch processing. In our company, there are hundreds of thousands of spark jobs each day. IMO, if we want to promote Flink batch, Hive

Re: Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Jark Wu
Hi Martijn, Thanks for starting this discussion. I think it's great for the community to to reach a consensus on the roadmap of Hive query syntax. I agree that the Hive project is not actively developed nowadays. However, Hive still occupies the majority of the batch market and the Hive