Re: What is the state of Scala wrappers?

2023-02-13 Thread Alexey Novakov via user
I think this blog article is good enough for the Scala support awareness: https://flink.apache.org/2022/02/22/scala-free.html In my opinion, it would be much better if one of the Scala-wrappers community project moves under umbrella of *org.apache.flink *to have a chance to survive longer. Alexey

Flink程序内存Dump不了

2023-02-13 Thread lxk
Flink version:1.16 java version: jdk1.8.0_251 问题:最近上线的Flink程序,频繁young gc,几秒一次,在调整了新生代大小之后,还是没有解决,目前整个jvm堆大小是3.57g。因此想通过程序内存情况来分析哪里问题有问题,我们通过yarn上的applicationId,使用ps -ef|grep 1666758697316_2639108,找到对应的pid,最后执行 jmap -dump:format b,file=user.dump 26326

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
Hi Alexis, If we change the operator uid and restart the job, the job will not be started successfully[1]. We have to use --allowNonRestoredState to start it. This means that the state for the old uid will not be used in the operator with the new uid. I think the data in the state will be lost.

Re: Updating Scala package names while preserving state

2023-02-13 Thread Thomas Eckestad
My conclusions. First, I think it would be good to clarify the background. The class for which I changed the package/namespace is a POJO class which is part of the applications state. According to the official Flink documentation on state evolution: Class name of the POJO type cannot change,

Re: Pyflink Side Output Question and/or suggested documentation change

2023-02-13 Thread Andrew Otto
Thank you! On Mon, Feb 13, 2023 at 5:55 AM Dian Fu wrote: > Thanks Andrew, I think this is a valid advice. I will update the > documentation~ > > Regards, > Dian > > , > > On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto wrote: > >> Question about side outputs and OutputTags in pyflink. The docs

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Alexis Sarda-Espinosa
Hi Hang, Thanks for the confirmation. One follow-up question with a somewhat convoluted scenario: 1. An unaligned checkpoint is created. 2. I stop the job *without* savepoint. 3. I want to start a modified job from the checkpoint, but I changed one of the operator's uids. If the

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
ps: the savepoint will also not contain in-flight data. Best, Hang Hang Ruan 于2023年2月13日周一 19:31写道: > Hi Alexis, > > No, aligned checkpoint will not contain the in-flight. Aligned checkpoint > makes sure that the data before the barrier has been processed and there is > no need to store

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
Hi Alexis, No, aligned checkpoint will not contain the in-flight. Aligned checkpoint makes sure that the data before the barrier has been processed and there is no need to store in-flight data for one checkpoint. I think these documents[1][2] will help you to understand it. Best, Hang [1]

Re: Pyflink Side Output Question and/or suggested documentation change

2023-02-13 Thread Dian Fu
Thanks Andrew, I think this is a valid advice. I will update the documentation~ Regards, Dian , On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto wrote: > Question about side outputs and OutputTags in pyflink. The docs >

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Theodor Wübker
Hey Hector, thanks for your reply. Your assumption is entirely correct, I have a few Million datasets on the topic already to test a streaming use case. I am planning on testing it with a variety of settings, but the problems occur with any cluster-configuration. For example Parallelism 1 with

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Hector Rios
Hi Theo In your initial email, you mentioned that you have "a bit of Data on it" when referring to your topic with ten partitions. Correct me if I'm wrong, but that sounds like the data in your topic is bounded and trying to test a streaming use-case. What kind of parallelism do you have

运行中的作业状态清除操作

2023-02-13 Thread Jason_H
遇到的问题:在flink中,使用状态的地方已经设置了ttl, 但是上下游把数据清空了,然后想重新测试发现, flink计算的结果不是从0开始累计的,是基于之前的结果继续累计的,这种情况在不重启作业的情况下 有什么办法处理吗? 具体来说就是,在不停止flink作业的情况下,怎么清楚运行中的flink作业的状态数据,以达到计算结果归0开始累计。 在不重启作业的情况下,清空状态数据是不是和重启作业运行是一样的效果,避免状态累计数据对结果产生影响呢。 | | Jason_H | | hyb_he...@163.com |

How to use Postgres UUID types in Flink SQL?

2023-02-13 Thread Frank Lyaruu
HI Flink community, I'm trying to use a JDBC dataset from Postgres. That dataset contains a column of the postgres type UUID. I don't know how to 'type' this column in SQL. If I try: CREATE TABLE my_table (id: STRING) WITH () I get this error: java.lang.ClassCastException: class java.util.UUID