Hi
一般是卡在最后一步从JM写checkpoint meta上面了,建议使用jstack等工具检查一下JM的cpu栈,看问题出在哪里。
祝好
唐云
From: Sun.Zhu <17626017...@163.com>
Sent: Tuesday, March 8, 2022 14:12
To: user-zh@flink.apache.org
Subject: Re:flink s3 checkpoint 一直IN_PROGRESS(100%)直到失败
图挂了
Hi Lukas,
I am afraid you're hitting this bug:
https://issues.apache.org/jira/browse/FLINK-25952
Best,
Dawid
On 08/03/2022 16:37, Lukáš Drbal wrote:
Hello everyone,
I'm trying to move savepoint to another s3 account but restore always
failed with some weird 404 error.
We are using lyft
Hi,
I have an entity class built by Google Flatbuf, to raise the performance, I
have tried written a serializer class.
public class TransactionSerializer extends Serializer {
@Override
public void write(Kryo kryo, Output output, Transaction transaction) {
ByteBuffer
streaming api ??sql api
streaming api
----
??:
"user-zh"
Thanks Gen, I will look into customized Source and SpiltEnumerator.
On Mon, Mar 7, 2022 at 10:20 PM Gen Luo wrote:
> Hi Diwakar,
>
> An asynchronous flatmap function without the support of the framework can
> be problematic. You should not call collector.collect outside the main
> thread of the
异常提示的很明确了,做savepoint的过程中有的task不在running状态,你可以看下你的作业是否发生了failover。
QiZhu Chan 于2022年3月8日周二 17:37写道:
> Hi,
>
> 各位社区大佬们,帮忙看一下如下报错是什么原因造成的?正常情况下客户端日志应该返回一个savepoint路径,但却出现如下异常日志,同时作业已被停止并且查看hdfs有发现当前job产生的savepoint文件
>
>
>
>
>
只存计算结果,来一条数据更新一次状态并且下发出去。具体可以参考下state:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/
hjw <1010445...@qq.com.invalid> 于2022年3月9日周三 01:32写道:
> 如下一段sql:SELECT color, sum(id) FROM T GROUP BY
>
Hi Roman,
Thanks for the reply, here is the screenshot of the latest failed
checkpoint.
[image: Screenshot 2022-03-09 at 11.44.46 AM.png]
I couldn't find the details for the last successful one as we only store
the last 10 checkpoints' details. Also, can you give us an idea of exactly
what
Hi, Ken
If you are talking about the Batch scene, there may be another idea that
the engine automatically and evenly distributes the amount of data to be
processed by each Stage to each worker node. This also means that, in some
cases, the user does not need to manually define a Partitioner.
At
你用新版本试一下,看着是已经修复了
https://issues.apache.org/jira/browse/FLINK-19212
Best,
Yang
崔深圳 于2022年3月9日周三 10:31写道:
>
>
>
> web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server
> side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException:
> Failed to serialize the result for RPC call :
>
web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server
side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to
serialize the result for RPC call : requestMultipleJobDetails.\n\tat
web ui报错:请求这个接口: /jobs/overview,时而报错, Exception on server
side:\norg.apache.flink.runtime.rpc.akka.exceptions.AkkaRpcException: Failed to
serialize the result for RPC call : requestMultipleJobDetails.\n\tat
你好,路由到非master节点会有什么问题吗,非master节点在处理请求时应该会通过HA服务找到Active的Job manager, 然后向Active
Job manager拿到结果再返回给client.
> On 7 Mar 2022, at 7:46 PM, 崔深圳 wrote:
>
> k8s native session 模式下,配置了ha,job_manager 的数量为3,然后web ui,通过rest
> service访问,总是路由到非master节点,有什么办法使其稳定吗?
Hi Dario,
Just to close the loop on this, I answered my own question on SO.
Unfortunately it seems like the recommended solution is to do the same hack I
did a while ago, which is to generate (via trial-and-error) a key that gets
assigned to the target slot.
I was hoping for something a bit
Hi Flink experts,
Wondering if there is a built-in way to replay already-processed events in
an event queue. For example, if I have a flink app processing event stream
from Kinesis. Now if I find a bug in the flink app and make a fix. And I
would like to re-process events that are already
Hi,
My issue has been resolved through discussion with AWS support.
It turns out that Kinesis Data Analytics reports to CloudWatch in a way I did
not expect. The way to view the accurate values for Flink counters is with
Average in CloudWatch metrics.
Below is the response from AWS support,
Thanks Dian! That worked !
On Sun, Mar 6, 2022 at 10:47 PM Dian Fu wrote:
> The dedicated REST API is still not supported. However, you could try to
> use PythonDriver just like you said and just submit it like a Java Flink
> job.
>
> Regards,
> Dian
>
> On Sun, Mar 6, 2022 at 3:38 AM aryan m
sql??SELECT color, sum(id) FROM T GROUP BY
colorFlinkTgroup
by
key??color)??Flink???
Hi Roman, Igal (@ below)
Thank you for your answer. I don't think I'll have access to flink's lib
folder given it's a shared Cloudera cluster. The only thing I could think
of is to not include com.google.protobuf in the
classloader.parent-first-patterns.additional setting, and
including
Hello everyone,
I'm trying to move savepoint to another s3 account but restore always
failed with some weird 404 error.
We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can
see version 1.13.6-396a8d44-szn which is just internal build from flink
commit
Thank you !
From: Schwalbe Matthias
Sent: Monday, March 7, 2022 11:36:22 PM
To: Alexey Trenikhun ; Flink User Mail List
Subject: RE: MapState.entries()
Hi Alexey,
To my best knowledge it’s lazy with RocksDBStateBackend, using the Java
iterator you could
Hi Vidya,
As to the choice of serializer:
* Flink provides two implementations that support state migration, AVRO
serializer, and Pojo serializer
* Pojo serializer happens to be one of the fastest available serializers
(faster than AVRO)
* If your record sticks to Pojo coding rules
I see for every consequential checkpoint timeout fail , number of tasks
which completed checkpointing keeps decreasing, why would that happen? Does
flink try to process data beyond old checkpoint barrier which failed to
complete due to timeout?
On Tue, Mar 8, 2022 at 12:48 AM yidan zhao wrote:
Hi Filip,
Have you tried putting protobuf-java 3.7.1 into the Flink's lib/ folder?
Or maybe re-writing the dependencies you mentioned to be loaded as plugins? [1]
I don't see any other ways to solve this problem.
Probably Chesnay or Seth will suggest a better solution.
[1]
Dear Flink Team,
In the last weeks I was faced with a large savepoint (around 40GiB) that
contained lots of obsolete data points and overwhelmed our infrastructure (i.e.
failed to load/restart).
We could not afford to lose the state, hence I spent the time to transcode the
savepoint into
Hi Prakhar,
Could you please share the statistics about the last successful and
failed checkpoints, e.g. from the UI.
Ideally, with detailed breakdown for the operators that seems problematic.
Regards,
Roman
On Fri, Mar 4, 2022 at 8:48 AM Prakhar Mathur wrote:
>
> Hi,
>
> Can someone kindly
Thanks all for your discussions.
I'll share my opinion here:
1. Hive SQL and Hive-like SQL are the absolute mainstay of current
Batch ETL in China. Hive+Spark (HiveSQL-like)+Databricks also occupies
a large market worldwide.
- Unlike OLAP SQL (such as presto, which is ansi-sql rather than hive
If the checkpoint timeout leads to the job's fail, then the job will be
recovered and data will be reprocessed from the last completed checkpoint.
If the job doesn't fail, then not.
Mahantesh Patil 于2022年3月8日周二 14:47写道:
> Hello Team,
>
> What happens after checkpoint timeout?
>
> Does Flink
Hi Martijn,
Thanks for bringing this up.
Hive SQL (using in Hive & Spark) plays an important role in batch processing,
it has almost become de facto standard in batch processing. In our company,
there are hundreds of thousands of spark jobs each day.
IMO, if we want to promote Flink batch, Hive
Hi Martijn,
Thanks for starting this discussion. I think it's great
for the community to to reach a consensus on the roadmap
of Hive query syntax.
I agree that the Hive project is not actively developed nowadays.
However, Hive still occupies the majority of the batch market
and the Hive
30 matches
Mail list logo