date:20200423

Re: Flink Forward 2020 Recorded Sessions

2020-04-23 Thread Marta Paes Moreira

Hi, Sivaprasanna.

The talks will be up on Youtube sometime after the conference ends.

Today, the starting schedule is different (9AM CEST / 12:30PM IST / 3PM
CST) and more friendly to Europe, India and China. Hope you manage to join
some sessions!

Marta

On Fri, 24 Apr 2020 at 06:58, Sivaprasanna 
wrote:

> Hello,
>
> I had registered for the Flink Forward 2020 and had attended couple of
> sessions but due to the odd timings and overlapping sessions on the same
> slot, I wasn't able to attend some interesting talks. I have received mails
> with link to rewatch some 2-3 webinars but not all (that had happened yet).
> Where can I find the recorded sessions?
>
> Thanks,
> Sivaprasanna
>

Re: Task Assignment

2020-04-23 Thread Navneeth Krishnan

Hi Marta,

Thanks for you response. What I'm looking for is something like data
localization. If I have one TM which is processing a set of keys, I want to
ensure all keys of the same type goes to the same TM rather than using
hashing to find the downstream slot. I could use a common key to do this
but I would have to parallelize as much as possible since the number of
incoming messages is too large to narrow down to a single key and
processing it.

Thanks

On Thu, Apr 23, 2020 at 2:02 AM Marta Paes Moreira 
wrote:

> Hi, Navneeth.
>
> If you *key* your stream using stream.keyBy(…), this will logically split
> your input and all the records with the same key will be processed in the
> same operator instance. This is the default behavior in Flink for keyed
> streams and transparently handled.
>
> You can read more about it in the documentation [1].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state-and-operator-state
>
> On Thu, Apr 23, 2020 at 7:44 AM Navneeth Krishnan <
> reachnavnee...@gmail.com> wrote:
>
>> Hi All,
>>
>> Is there a way for an upstream operator to know how the downstream
>> operator tasks are assigned? Basically I want to group my messages to be
>> processed on slots in the same node based on some key.
>>
>> Thanks
>>
>

Flink Forward 2020 Recorded Sessions

2020-04-23 Thread Sivaprasanna

Hello,

I had registered for the Flink Forward 2020 and had attended couple of
sessions but due to the odd timings and overlapping sessions on the same
slot, I wasn't able to attend some interesting talks. I have received mails
with link to rewatch some 2-3 webinars but not all (that had happened yet).
Where can I find the recorded sessions?

Thanks,
Sivaprasanna

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-23 Thread Lu Niu

Hi, Robert

BTW, I did some field study and I think it's possible to support streaming
sink using presto s3 filesystem. I think that would help user to use presto
s3 fs in all access to s3. I created this jira ticket
https://issues.apache.org/jira/browse/FLINK-17364 . what do you think?

Best
Lu

On Tue, Apr 21, 2020 at 1:46 PM Lu Niu  wrote:

> Cool, thanks!
>
> On Tue, Apr 21, 2020 at 4:51 AM Robert Metzger 
> wrote:
>
>> I'm not aware of anything. I think the presto s3 file system is generally
>> the recommended S3 FS implementation.
>>
>> On Mon, Apr 13, 2020 at 11:46 PM Lu Niu  wrote:
>>
>>> Thank you both. Given the debug overhead, I might just try out presto s3
>>> file system then. Besides that presto s3 file system doesn't support
>>> streaming sink, is there anything else I need to keep in mind? Thanks!
>>>
>>> Best
>>> Lu
>>>
>>> On Thu, Apr 9, 2020 at 12:29 AM Robert Metzger 
>>> wrote:
>>>
 Hey,
 Others have experienced this as well, yes:
 https://lists.apache.org/thread.html/5cfb48b36e2aa2b91b2102398ddf561877c28fdbabfdb59313965f0a%40%3Cuser.flink.apache.org%3EDiskErrorException
 I have also notified the Hadoop project about this issue:
 https://issues.apache.org/jira/browse/HADOOP-15915

 I agree with Congxian: You could try reaching out to the Hadoop user@
 list for additional help. Maybe logging on DEBUG level helps already?
 If you are up for an adventure, you could also consider adding some
 debugging code into Hadoop's DiskChecker and compile a custom Hadoop
 version.

 Best,
 Robert

 On Thu, Apr 9, 2020 at 6:39 AM Congxian Qiu 
 wrote:

> Hi LU
>
> I'm not familiar with S3 file system, maybe others in Flink community
> can help you in this case, or maybe you can also reach out to s3
> teams/community for help.
>
> Best,
> Congxian
>
>
> Lu Niu  于2020年4月8日周三 上午11:05写道：
>
>> Hi, Congxiao
>>
>> Thanks for replying. yeah, I also found those references. However, as
>> I mentioned in original post, there is enough capacity in all disk. Also,
>> when I switch to presto file system, the problem goes away. Wondering
>> whether others encounter similar issue.
>>
>> Best
>> Lu
>>
>> On Tue, Apr 7, 2020 at 7:03 PM Congxian Qiu 
>> wrote:
>>
>>> Hi
>>> From the stack, seems the problem is that "
>>> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.
>>> util.DiskChecker$DiskErrorException: Could not find any valid local
>>> directory for s3ablock-0001-", and I googled the exception, found there 
>>> is
>>> some relative page[1], could you please make sure there is enough space 
>>> on
>>> the local dis.
>>>
>>> [1]
>>> https://community.pivotal.io/s/article/Map-Reduce-job-failed-with-Could-not-find-any-valid-local-directory-for-output-attempt---m-x-file-out
>>> Best,
>>> Congxian
>>>
>>>
>>> Lu Niu  于2020年4月8日周三 上午8:41写道：
>>>
 Hi, flink users

 Did anyone encounter such error? The error comes from
 S3AFileSystem. But there is no capacity issue on any disk. we are using
 hadoop 2.7.1.
 ```

 Caused by: java.util.concurrent.ExecutionException: 
 java.io.IOException: Could not open output stream for state backend
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
 org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:450)
at 
 org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:47)
at 
 org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1011)
... 3 more
 Caused by: java.io.IOException: Could not open output stream for state 
 backend
at 
 org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.createStream(FsCheckpointStreamFactory.java:367)
at 
 org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.flush(FsCheckpointStreamFactory.java:234)
at 
 org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.write(FsCheckpointStreamFactory.java:209)
at 
 org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:131)
at 
 org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:99)
at

Re: Debug Slowness in Async Checkpointing

2020-04-23 Thread Lu Niu

Hi, Robert

Thanks for relying. Yeah. After I added monitoring on the above path, it
shows the slowness did come from uploading file to s3. Right now I am still
investigating the issue. At the same time, I am trying PrestoS3FileSystem
to check whether that can mitigate the problem.

Best
Lu

On Thu, Apr 23, 2020 at 8:10 AM Robert Metzger  wrote:

> Hi Lu,
>
> were you able to resolve the issue with the slow async checkpoints?
>
> I've added Yu Li to this thread. He has more experience with the state
> backends to decide which monitoring is appropriate for such situations.
>
> Best,
> Robert
>
>
> On Tue, Apr 21, 2020 at 10:50 PM Lu Niu  wrote:
>
>> Hi, Robert
>>
>> Thanks for replying. To improve observability , do you think we should
>> expose more metrics in checkpointing? for example, in incremental
>> checkpoint, the time spend on uploading sst files?
>> https://github.com/apache/flink/blob/5b71c7f2fe36c760924848295a8090898cb10f15/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java#L319
>>
>> Best
>> Lu
>>
>>
>> On Fri, Apr 17, 2020 at 11:31 AM Robert Metzger 
>> wrote:
>>
>>> Hi,
>>> did you check the TaskManager logs if there are retries by the s3a file
>>> system during checkpointing?
>>>
>>> I'm not aware of any metrics in Flink that could be helpful in this
>>> situation.
>>>
>>> Best,
>>> Robert
>>>
>>> On Tue, Apr 14, 2020 at 12:02 AM Lu Niu  wrote:
>>>
 Hi, Flink users

 We notice sometimes async checkpointing can be extremely slow, leading
 to checkpoint timeout. For example, For a state size around 2.5MB, it could
 take 7~12min in async checkpointing:

 [image: Screen Shot 2020-04-09 at 5.04.30 PM.png]

 Notice all the slowness comes from async checkpointing, no delay in
 sync part and barrier assignment. As we use rocksdb incremental
 checkpointing, I notice the slowness might be caused by uploading the file
 to s3. However, I am not completely sure since there are other steps in
 async checkpointing. Does flink expose fine-granular metrics to debug such
 slowness?

 setup: flink 1.9.1, rocksdb incremental state backend,
 S3AHaoopFileSystem

 Best
 Lu

>>>

Re: Flink on k8s ，设置 taskmanager.heap.mb 对于 jvm 启动堆大小不生效

2020-04-23 Thread Xintong Song

抱歉，我刚刚说的是 docker-compose.yaml 是只用 docker 不用 kubernetes 的情况。

对于 kubernetes，如果你是按照官方文档[1]推荐的方法部署 flink 的，那么直接把这个参数加在
taskmanager-deployment.yaml 的 args 处应该就可以了。

> args:

- taskmanager

*- Dtaskmanager.heap.size=2000m*


Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/deployment/kubernetes.html



On Fri, Apr 24, 2020 at 11:10 AM LakeShen  wrote:

> Hi Xintong,
>
> 非常感谢你的回复。想再请教一个问题，什么地方会使用到 docker-compose.yaml  呢。
>
> 我目前使用一种绕开的方式解决这个问题，就是在 DockerFile 打镜像的时候，先把 conf 目录拷贝出来(这个里面的 TaskManger
> 内存动态传入的)
> 然后在 config.sh 中，强行设置了 FLINK-CONF-DIR . 但是我觉得你的方式更优雅一些。
>
> 所以想问一下 什么地方会使用到 docker-compose.yaml呢 。
>
> Best,
> LakeShen
>
> Xintong Song  于2020年4月24日周五 上午10:49写道：
>
> > 应该没有其他地方去写 flink-conf.yaml，能把具体用来打镜像、动态写配置的命令或者脚本发一下吗？
> >
> > 另外你这个问题还有一种解决方案，是 taskmanager.heap.mb 通过 -D 参数传给 taskmanager.sh。可以在
> > docker-compose.yaml 中 taskmanager command 处追加 -Dtaskmanager.heap.mb=2000m
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Thu, Apr 23, 2020 at 5:59 PM LakeShen 
> > wrote:
> >
> > > Hi 社区，
> > >
> > > 最近我在弄 Flink on k8s,使用的 Flink 版本为 Flink 1.6。作业模式为 standalone per job 模式。
> > >
> > > 我在创建启动 jobmanager 的时候，设置的 taskmanager.heap.mb 为 2000 mb，虽然在 flink web
> ui
> > > 上面看到的 jobmanager  的配置， taskmanager.heap.mb 的确是 2000mb，在我启动 taskmanager
> > > deployment 的时候，我登录到 其中一个 pod 上看，发现 taskmanager 启动的 -xms 和 -xmx 都是
> 922mb。
> > >
> > > 我将 taskmanager.heap.mb 设置为 1000 mb，停止我的作业，重启，同样，登录到taskmanager 其中一个
> > > pod,-xms 和 -xmx 都是 922mb,也就是说 设置的taskmanager.heap.mb 没有对 taskmanager
> 启动的
> > > jvm 堆没有生效。
> > >
> > > 我看了源码，flink on k8s ,standalone per job 模式，taskmanager 会使用
> taskmanager.sh
> > > 来启动。在 taskmanager.sh 中，taskmanager heap mb 是根据镜像中的，flink dist 目录下面，conf
> > > 目录中的 flink-conf.yaml 里面的配置来启动。
> > >
> > > 我现在在打镜像的时候，也会把flink-dist 目录打进去，同样把 taskmanager.heap.mb动态传入到
> > > flink-conf.yaml中，但是最终我在启动我的作业的时候，登录到 taskmanager 的一个 pod 上面查看，发现其
> > > flink-conf.yaml 里面， taskmanager.heap.mb 始终是 1024.
> > >
> > > 是不是在什么地方，把 taskmanager.heap.mb 写死到了 flink-conf.yaml 中呢？
> > >
> > >
> > > Best,
> > > LakeShen
> > >
> >
>

Re: Flink on k8s ，设置 taskmanager.heap.mb 对于 jvm 启动堆大小不生效

2020-04-23 Thread LakeShen

Hi Xintong,

非常感谢你的回复。想再请教一个问题，什么地方会使用到 docker-compose.yaml  呢。

我目前使用一种绕开的方式解决这个问题，就是在 DockerFile 打镜像的时候，先把 conf 目录拷贝出来(这个里面的 TaskManger
内存动态传入的)
然后在 config.sh 中，强行设置了 FLINK-CONF-DIR . 但是我觉得你的方式更优雅一些。

所以想问一下 什么地方会使用到 docker-compose.yaml呢 。

Best,
LakeShen

Xintong Song  于2020年4月24日周五 上午10:49写道：

> 应该没有其他地方去写 flink-conf.yaml，能把具体用来打镜像、动态写配置的命令或者脚本发一下吗？
>
> 另外你这个问题还有一种解决方案，是 taskmanager.heap.mb 通过 -D 参数传给 taskmanager.sh。可以在
> docker-compose.yaml 中 taskmanager command 处追加 -Dtaskmanager.heap.mb=2000m
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Apr 23, 2020 at 5:59 PM LakeShen 
> wrote:
>
> > Hi 社区，
> >
> > 最近我在弄 Flink on k8s,使用的 Flink 版本为 Flink 1.6。作业模式为 standalone per job 模式。
> >
> > 我在创建启动 jobmanager 的时候，设置的 taskmanager.heap.mb 为 2000 mb，虽然在 flink web ui
> > 上面看到的 jobmanager  的配置， taskmanager.heap.mb 的确是 2000mb，在我启动 taskmanager
> > deployment 的时候，我登录到 其中一个 pod 上看，发现 taskmanager 启动的 -xms 和 -xmx 都是 922mb。
> >
> > 我将 taskmanager.heap.mb 设置为 1000 mb，停止我的作业，重启，同样，登录到taskmanager 其中一个
> > pod,-xms 和 -xmx 都是 922mb,也就是说 设置的taskmanager.heap.mb 没有对 taskmanager 启动的
> > jvm 堆没有生效。
> >
> > 我看了源码，flink on k8s ,standalone per job 模式，taskmanager 会使用 taskmanager.sh
> > 来启动。在 taskmanager.sh 中，taskmanager heap mb 是根据镜像中的，flink dist 目录下面，conf
> > 目录中的 flink-conf.yaml 里面的配置来启动。
> >
> > 我现在在打镜像的时候，也会把flink-dist 目录打进去，同样把 taskmanager.heap.mb动态传入到
> > flink-conf.yaml中，但是最终我在启动我的作业的时候，登录到 taskmanager 的一个 pod 上面查看，发现其
> > flink-conf.yaml 里面， taskmanager.heap.mb 始终是 1024.
> >
> > 是不是在什么地方，把 taskmanager.heap.mb 写死到了 flink-conf.yaml 中呢？
> >
> >
> > Best,
> > LakeShen
> >
>

Re: IntelliJ java formatter

2020-04-23 Thread Xintong Song

Hi Flavio,

I'm not aware of anyway to automatically format the codes. The only thing I
find that might help is to enable your IDE with a checkstyle plugin.
https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/ide_setup.html#checkstyle-for-java

Thank you~

Xintong Song

On Thu, Apr 23, 2020 at 8:19 PM Flavio Pompermaier 
wrote:

> Hi to all,
> I'm migrating to IntelliJ because it's very complicated to have a fully
> working env in Eclipse (too many missing maven plugins). Is there a way to
> automatically format a Java class (respecting the configured checkstyle)?
> Or do I have to manually fix every Checkstyle problem?
>
> Thanks in advance,
> Flavio
>

Re: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory)

2020-04-23 Thread Fu, Kai

Hi, thanks for the reply.

It was indeed the class loading issue and it’s introduced by latest version of 
package “aws-kinesisanalytics-runtime”. I resolved the issue by removing the 
package and customized the runtime myself.

-- Best wishes
Fu Kai


From: Arvid Heise 
Date: Thursday, April 23, 2020 at 9:03 PM
To: "Fu, Kai" 
Cc: Chesnay Schepler , "user@flink.apache.org" 

Subject: RE: [EXTERNAL] Unable to unmarshall response 
(com.ctc.wstx.stax.WstxInputFactory cannot be cast to 
javax.xml.stream.XMLInputFactory)


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


This looks like a typical issue with classloading.

kinesis is probably residing in flink-dist/lib while woodstock is added in your 
job.jar (or vice versa).

Could you try to use both jars in the same way? Alternatively, could you 
provide more information regarding your dependencies?

On Tue, Apr 21, 2020 at 11:21 AM Fu, Kai 
mailto:k...@amazon.com>> wrote:
Hi, I’m using Flink 1.8 with JDK 8.

-- Best wishes
Fu Kai


From: Chesnay Schepler mailto:ches...@apache.org>>
Date: Tuesday, April 21, 2020 at 5:15 PM
To: "Fu, Kai" mailto:k...@amazon.com>>, 
"user@flink.apache.org" 
mailto:user@flink.apache.org>>
Subject: RE: [EXTERNAL] Unable to unmarshall response 
(com.ctc.wstx.stax.WstxInputFactory cannot be cast to 
javax.xml.stream.XMLInputFactory)

Which Flink version are you using?

On 21/04/2020 11:11, Fu, Kai wrote:
Hi,

I’m running Flink application on AWS Kinesis Flink platform to read a kinesis 
stream from another account with assumed role, while I’m getting exception like 
below. But it works when I’m running the application locally, I’ve given all 
the related roles admin permission. Could anyone help what’s the potential 
problem?

[
"org.apache.flink.kinesis.shaded.com.amazonaws.SdkClientException: 
Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be 
cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response Text: 
OK",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1738)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1434)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1356)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1719)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1686)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1675)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:589)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:561)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession(STSAssumeRoleSessionCredentialsProvider.java:321)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000(STSAssumeRoleSessionCredentialsProvider.java:37)",
"\tat 
org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:76)",

Re: Flink 1.10 Out of memory

2020-04-23 Thread Xintong Song

@Stephan,
I don't think so. If JVM hits the direct memory limit, you should see the
error message "OutOfMemoryError: Direct buffer memory".

Thank you~

Xintong Song



On Thu, Apr 23, 2020 at 6:11 PM Stephan Ewen  wrote:

> @Xintong and @Lasse could it be that the JVM hits the "Direct Memory"
> limit here?
> Would increasing the "taskmanager.memory.framework.off-heap.size" help?
>
> On Mon, Apr 20, 2020 at 11:02 AM Zahid Rahman 
> wrote:
>
>> As you can see from the task manager tab of flink web dashboard
>>
>> Physical Memory:3.80 GB
>> JVM Heap Size:1.78 GB
>> Flink Managed Memory:128 MB
>>
>> *Flink is only using 128M MB which can easily cause OOM*
>> *error.*
>>
>> *These are DEFAULT settings.*
>>
>> *I dusted off an old laptop so it only 3.8 GB RAM.*
>>
>> What does your job metrics say  ?
>>
>> On Mon, 20 Apr 2020, 07:26 Xintong Song,  wrote:
>>
>>> Hi Lasse,
>>>
>>> From what I understand, your problem is that JVM tries to fork some
>>> native process (if you look at the exception stack the root exception is
>>> thrown from a native method) but there's no enough memory for doing that.
>>> This could happen when either Mesos is using cgroup strict mode for memory
>>> control, or there's no more memory on the machine. Flink cannot prevent
>>> native processes from using more memory. It can only reserve certain amount
>>> of memory for such native usage when requesting worker memory from the
>>> deployment environment (in your case Mesos) and allocating Java heap /
>>> direct memory.
>>>
>>> My suggestion is to try increasing the JVM overhead configuration. You
>>> can leverage the configuration options
>>> 'taskmanager.memory.jvm-overhead.[min|max|fraction]'. See more details in
>>> the documentation[1].
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-overhead-max
>>>
>>> On Sat, Apr 18, 2020 at 4:02 AM Zahid Rahman 
>>> wrote:
>>>
 https://betsol.com/java-memory-management-for-java-virtual-machine-jvm/

 Backbutton.co.uk
 ¯\_(ツ)_/¯
 ♡۶Java♡۶RMI ♡۶
 Make Use Method {MUM}
 makeuse.org
 


 On Fri, 17 Apr 2020 at 14:07, Lasse Nedergaard <
 lassenedergaardfl...@gmail.com> wrote:

> Hi.
>
> We have migrated to Flink 1.10 and face out of memory exception and
> hopeful can someone point us in the right direction.
>
> We have a job that use broadcast state, and we sometimes get out
> memory when it creates a savepoint. See stacktrack below.
> We have assigned 2.2 GB/task manager and
> configured  taskmanager.memory.process.size : 2200m
> In Flink 1.9 our container was terminated because OOM, so 1.10 do a
> better job, but it still not working and the task manager is leaking mem
> for each OOM and finial kill by Mesos
>
>
> Any idea what we can do to figure out what settings we need to change?
>
> Thanks in advance
>
> Lasse Nedergaard
>
>
> WARN o.a.flink.runtime.state.filesystem.FsCheckpointStreamFactory -
> Could not close the state stream for
> s3://flinkstate/dcos-prod/checkpoints/fc9318cc236d09f0bfd994f138896d6c/chk-3509/cf0714dc-ad7c-4946-b44c-96d4a131a4fa.
> java.io.IOException: Cannot allocate memory at
> java.io.FileOutputStream.writeBytes(Native Method) at
> java.io.FileOutputStream.write(FileOutputStream.java:326) at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at
> java.io.FilterOutputStream.flush(FilterOutputStream.java:140) at
> java.io.FilterOutputStream.close(FilterOutputStream.java:158) at
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:995)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at
> org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
> at
> org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:277)
> at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263) at
> org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250) at
> org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
> at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:167)
> at
> org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:83)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>

Re: flink待优化的列表，希望flink PMC 解决

2020-04-23 Thread Caizhi Weng

Hi xuefli,

感谢你的建议。虽然我不是 PMC，但对其中的一些问题我也想来谈一下自己的理解。

1. 你说的是 shaded 的依赖吗？考虑到用户代码本身也可能依赖一些常见的库（例如 guava 等），为了防止和用户代码的版本冲突，Flink
才对常用的库进行了 shade，这样就相当于调用 Flink 自己的代码一样。这个机制正是为了解决版本兼容问题引入的。

2. open 应该只是进行数据源的连接操作，不同的 slot 处理的是不同的 input
split，不会重复读取数据（但的确可能重复连接数据源）。对于 OLAP 场景这个的确是个优化点，但对于 batch / streaming
作业这个影响可能不是特别大。

3. session 集群的资源隔离是非常困难的，建议使用 per job 模式就能防止这个问题的出现。

4. 这个可以举个例子吗？我理解你是想看历史的 failover 原因，这个在 web ui 里面就有，就在 root exception 的旁边。

5. 无论数据量大小，作业的处理逻辑应该都是一致的。我不太明白这个问题...

6. 我理解这个问题是想要一个 listener 的机制，使得作业运行到一定阶段以后来通知 listener。JM 主动通知 client
是不可行的，因为 client 可能在防火墙或者 NAT 网络里；不过现在有一个正在开发的 Operator Coordiantor
机制，之后会扩展设计，使得 client 可以对 coordinator 进行轮询，一部分程度上可以解决作业与 client 的沟通问题。见
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
。

xue...@outlook.com  于2020年4月20日周一 下午2:23写道：

> 1、 maven大包涉及到长期的工程维护问题
>
> 现在官网提供的maven打包方式，直接把第三方包解开后按照目录方式存放
> 而不是维持maven depend on的标准的jar包方式（带版本）
> 现在这种方式不利于软件的项目长期管理，项目长期累月运行后，
> 随着人员变化以及版本升级，会带来很多版本兼容和识别的工程问题
>
> 期望flink在1.10.0的后续版本改进该问题，可能需要更改运行时的classloader
>
>
>
> 2、 维度数据通过RichXXXFunction的open重复加载，浪费存储空间的问题
>
> 假设启动一个任务并行度是1K，假设平均分配到10台计算主机计算，那么一个TaskManager会有100个slot，执行RichXXFunciton的open方法执行，那么同一台运算主机，就会有10个重复性的数据打开、加载浪费CPU和内存，希望能够做到在同一个TaskManger，按照job
> groupName或者jobID实现slot启动后预先加载数据，而不是slot所在的每个线程或者RichXXXFuction重复性的加载数据。Open通过本地的方式从启动预加载数据同步阻塞超时方式获取；
>
>
>
> 3、 TaskManager的在不同任务之间实现软资源隔离的问题
>
>
> 因业务代码写的可能有问题或者OOM，flink做不到像docker那样的资源隔离，但因多个不同的Job共用slot甚至TaskManager，会出现一个JOB出问题，搞挂一个计算节点的问题。如果能在JOBID级别对于RuntimeException尽心按照Job捕获打印异常，而不影响其他的Job，然后转成控制类的XXXEvent完成Job的生命周期的终结以及上报metrics；
>
>
>
> 4、
>  
> 异常信息，特别是业务的异常信息，往往被flink框架给掩盖，或者带有checkpoint的failover时，会不断重试。希望把业务的异常信息像单机一样直接暴露，对于一些异常信息提供metrics上报，部分限制重试；
>
>
>
> 5、 研发过程中的小数据量的逻辑测试和现网超大数据量的逻辑往往不一致
>
> 比如双流join、甚至最简单的官网样例WordCount也会有这个问题，需要增强Mock仿真现网的实际
>
> 情况，否则带来的问题是更改逻辑，导致上次的savepoint不能再用；
>
>
>
> 6、 向业务方开放接口，回调方式监听处理过程，业务方干预checkpint和任务的完成
>
>
> 举例：比如我的数据是由一个JOB1独立消费清洗规整后sink落盘到HDFS，支持按大小和时间滚动文件。我的另外一个JOB2持续监听JOB1的sink的HDFS文件和目录当成source，通过实时处理后在sink到hdfs或者其他sink。
>
> sourceFunction的无介入点：
>
> A、 对于JOB2如何按照什么顺序消费HDFS文件我无法干预
>
> B、 无法知道这个文件是否消费完成
>
> C、 无法知道这个文件的消费进度
>
> D、JOB Fail时无法干预
> sinkFunction的无介入点：
>
> A、 无法知道什么数据已经checkpoint
>
> B、
> 如果JOB出现Fail和Restore因flink只对集群内部的state保证只执行一次，但对sink和souce目前缺乏有效的干预方式，因sink和source的差异无法做，为什么不开放给业务处理
> Job的无接入点：
>
> A、
> 因JOB是长期运行的，但业务的处理是由时间或者业务上的完成点。即需要回调由业务方判断业务已经阶段性完成，这些sink的数据可以使用，或者阶段性的终止JOB，而不是只有很粗暴的一种方式cancel
> job。
>
> 我现在只能一次次的刷新hdfs的sink判断是否数据是否无中间状态已经处理完毕，还要对带window的观察，webUI的Records
> Received和Records
> Send，因这些数据是在state中缓冲的webUI上看不到，需要等待StreamingFinkSink的RolloverInterval和InactivityInterval的时间过去后去判断业务数据是否处理完毕
>
>
>
> 7、 TODO
>
> 以上部分都是直接用DataStream遇到的问题和找不到观察点和介入接口
>
>
>
>
> 发送自 Windows 10 版邮件应用
>
>

Re: Flink on k8s ，设置 taskmanager.heap.mb 对于 jvm 启动堆大小不生效

2020-04-23 Thread Xintong Song

应该没有其他地方去写 flink-conf.yaml，能把具体用来打镜像、动态写配置的命令或者脚本发一下吗？

另外你这个问题还有一种解决方案，是 taskmanager.heap.mb 通过 -D 参数传给 taskmanager.sh。可以在
docker-compose.yaml 中 taskmanager command 处追加 -Dtaskmanager.heap.mb=2000m

Thank you~

Xintong Song



On Thu, Apr 23, 2020 at 5:59 PM LakeShen  wrote:

> Hi 社区，
>
> 最近我在弄 Flink on k8s,使用的 Flink 版本为 Flink 1.6。作业模式为 standalone per job 模式。
>
> 我在创建启动 jobmanager 的时候，设置的 taskmanager.heap.mb 为 2000 mb，虽然在 flink web ui
> 上面看到的 jobmanager  的配置， taskmanager.heap.mb 的确是 2000mb，在我启动 taskmanager
> deployment 的时候，我登录到 其中一个 pod 上看，发现 taskmanager 启动的 -xms 和 -xmx 都是 922mb。
>
> 我将 taskmanager.heap.mb 设置为 1000 mb，停止我的作业，重启，同样，登录到taskmanager 其中一个
> pod,-xms 和 -xmx 都是 922mb,也就是说 设置的taskmanager.heap.mb 没有对 taskmanager 启动的
> jvm 堆没有生效。
>
> 我看了源码，flink on k8s ,standalone per job 模式，taskmanager 会使用 taskmanager.sh
> 来启动。在 taskmanager.sh 中，taskmanager heap mb 是根据镜像中的，flink dist 目录下面，conf
> 目录中的 flink-conf.yaml 里面的配置来启动。
>
> 我现在在打镜像的时候，也会把flink-dist 目录打进去，同样把 taskmanager.heap.mb动态传入到
> flink-conf.yaml中，但是最终我在启动我的作业的时候，登录到 taskmanager 的一个 pod 上面查看，发现其
> flink-conf.yaml 里面， taskmanager.heap.mb 始终是 1024.
>
> 是不是在什么地方，把 taskmanager.heap.mb 写死到了 flink-conf.yaml 中呢？
>
>
> Best,
> LakeShen
>

Re: JDBC table api questions

2020-04-23 Thread Zhenghua Gao

FLINK-16471 introduce a JDBCCatalog, which implements Catalog interface.
Currently we only support PostgresCatalog and listTables().
If you want to get the list of views, you can implement listViews()
(currently return an empty list).

*Best Regards,*
*Zhenghua Gao*

On Thu, Apr 23, 2020 at 8:48 PM Flavio Pompermaier 
wrote:

> Hi all,
> is there a way to get the list of existing views in a JDBC database?
> Is this something that could be supported somehow?
>
> Moreover, it would be interesting for us to also know the original field
> type of a table..is there a way to get it (without implementing a dedicated
> API)? Do you think it makes sense to expose it in the Table API?
>
> Best,
> Flavio
>

OUT OF MEMORY : CORRECTION

2020-04-23 Thread Zahid Rahman

There was  a post earlier with some one had a problem of out of memory
error with flink.

The answer is to reduce flink managed memory from default  70% to may be
50%.

This error could be caused due to missing memory ;

or maintaining a local list by programmer so over using user allocated
memory caused by heavy processing ;

 or using a small jvm ,

Or System spends too much time on gc.

Out of memory has nothing to do flink or flink is not at fault.


This process is known as "pimping" flink.

also part of pimping is use to use local disk for memory spill.

Re: JDBC Table and parameters provider

2020-04-23 Thread Flavio Pompermaier

I've created 3 ticket related to this discussion, feel free to comment them:


   1. https://issues.apache.org/jira/browse/FLINK-17358 - JDBCTableSource
   support FiltertableTableSource
   2.

   https://issues.apache.org/jira/browse/FLINK-17360 - Support custom
   partitioners in JDBCReadOptions
   3.

   https://issues.apache.org/jira/browse/FLINK-17361 - Support creating of
   a JDBC table using a custom query

Best,
Flavio

On Wed, Apr 22, 2020 at 4:29 PM Jingsong Li  wrote:

> > Specify "query" and "provider"
> Yes, your proposal looks reasonable to me.
> Key can be "scan.***" like in [1].
>
> > specify parameters
> Maybe we need add something like "scan.parametervalues.provider.type", it
> can be "bound, specify, custom":
> - when bound, using old partitionLowerBound
> and partitionUpperBound, numPartitions
> - when specify, using specify parameters like your proposal
> - when custom, need "scan.parametervalues.provider.class"
>
> > not implement FiltertableTableSource
> Just because we have no time to finish it.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
>
> Best,
> Jingsong Lee
>
> On Wed, Apr 22, 2020 at 9:49 PM Flavio Pompermaier 
> wrote:
>
>> Ok, now I understand your proposal. However this looks like a workaround
>> to me..I want to be able to give a name to such a table and register also
>> to a catalog if I want.
>> Indeed my proposal is to add a "*connector.read.query*" as an
>> alternative to "connector.table" (that forces you to map tables as 1-to-1).
>> Then we can add a *connector.read.parametervalues.provider.class* in
>> order to customize the splitting of the query (we can also add a check that
>> the query contains at least 1 question mark).
>> If we introduce a custom parameters provider we need also to specify
>> parameters, using something like:
>>
>> *'connector.read.parametervalues.0.name
>> *' = 'minDate',
>> *'connector.read.parametervalues.0.value'*= '12/10/2019'
>> *'connector.read.parametervalues.1.name
>> *' = 'maxDate',
>> *'connector.read.parametervalues.1.value*'= '01/01/2020'
>>
>> Another question: why JDBC table source does not implement
>> *FilterableTableSource?*
>>
>> On Wed, Apr 22, 2020 at 3:27 PM Jingsong Li 
>> wrote:
>>
>>> Hi,
>>>
>>> Requirements: read data from "SELECT public.A.x, public.B.y FROM
>>> public.A JOIN public.B on public.A.pk  =
>>> public.B.fk "
>>>
>>> Solution: table name = "(SELECT public.A.x, public.B.y FROM public.A
>>> JOIN public.B on public.A.pk  = public.B.fk
>>> )"
>>>
>>> I don't why there's a 1-to-1 mapping between a Flink table and a JDBC
>>> table. If it is, there is no way support this requirement because this
>>> flink table is come from two jdbc tables.
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Wed, Apr 22, 2020 at 8:42 PM Flavio Pompermaier 
>>> wrote:
>>>
 Sorry Jingsong but I have to clarify this thing, which is not clear at
 all to me.

 From what I can see from the documentation of table API there's no way
 (currently) to associate an SQL query to a Flink Table, there's a 1-to-1
 mapping between a Flink table and a JDBC table.
 This means that, at the moment, if I want to join 2 tables from the
 same JDBC source (like in the example) Flink would fetch all the data of
 the 2 tables and then it will do the join, it will not execute the query
 directly and get results back. Right?
 If this is the case we could open an issue in the Blink optimizer that
 could improve performance if the query that involves a single JDBC source
 is executed directly to the database. and that's one point.
 Or maybe this is what you were trying to say with "Which means the
 "select ..." is dynamically generated by the Flink sql. We can not set it
 static."? Does it mean that we can't specify a query in a JDBC table?
 This sounds to go against what you write in the statement before: So
 this table name can be a rich sql: "(SELECT public.A.x, public.B.y FROM
 public.A JOIN public.B on public.A.pk  =
 public.B.fk )"

 I didn't understand what's your proposals here..I see two issues:

1. If a JDBC table is mapped 1-to-1 with a JDBC table, are queries
pushed down in a performant way?
   1. i.e. SELECT public.A.x, public.B.y FROM public.A JOIN
   public.B on public.A.pk  = public.B.fk
    is performed efficiently to the DB or is
   it performed in Flink after reading all the tables data?
   2. Add a way to handle custom parameter value provider class and
query statements. What is exactly your proposal here?


 On Wed, Apr 22, 2020 at 1:03 PM Jingsong Li

Re: K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Yun Tang

Hi Averell

Please build your own flink docker with S3 plugin as official doc said [1]

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html#using-plugins

Best
Yun Tang

From: Averell 
Sent: Thursday, April 23, 2020 20:58
To: user@flink.apache.org 
Subject: K8s native - checkpointing to S3 with RockDBStateBackend

Hi,
I am trying to deploy my job to Kubernetes following the native-Kubernetes
guide. My job is checkpointing to S3 with RockDBStateBackend. It also has a
S3 StreamingFileSink.
In my jar file, I've already had /flink-hadoop-fs,
flink-connector-filesystem, flink-s3-fs-hadoop /(as my understanding, these
are for the S3 sink, please correct me if I'm wrong)

When I tried to submit the job, I got the following error (only a few
seconds after submitting): /Could not find a file system implementation for
scheme 's3'. The scheme is not directly supported by Flink and no Hadoop
file system to support this scheme could be loaded/

Not sure how I can get over this.
Using s3a didn't help (s3 does work well when running on my dev machine)
I also tried to copy the file /flink-shaded-hadoop-2-uber-2.8.3-10.0.jar/ to
the //opt/flink/lib// folder of the JobManager pod, but it didn't help (is
it already too late? should that be there before the JM is started?)

Thanks for your help.
Averell


/
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create
checkpoint storage at checkpoint coordinator side.
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:282)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:205)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:486)
at
org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338)
at
org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:255)
at
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:227)
at
org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:215)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:120)
at
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:105)
at
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:278)
at
org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:266)
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
at
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:146)
... 10 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
Could not find a file system implementation for scheme 's3a'. The scheme is
not directly supported by Flink and no Hadoop file system to support this
scheme could be loaded.
at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at
org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.(FsCheckpointStorage.java:64)
at
org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:490)
at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:477)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:279)
... 23 more/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Restore from save point but need to read from different Kafka topics

2020-04-23 Thread Casado Tejedor , Rubén

Hi

Let me introduce our scenario:


  1.  We have a Flink job reading from a Kafka topic, using the Flink Kafka. 
Name of Kafka topic is an input variable in properties file
  2.  A savepoint is created for that job, so the Kafka offsets for the input 
topic is stored in that savepoint
  3.  The job is cancelled
  4.  The kafka topic from which the job reads is modified in the properties 
file
  5.  The flink job is executed from the savepoint

What happens in this scenario? Does the job read from the beginning of the new 
Kafka topic? Or does the job fail?
What we would need is to read from earliest in the new topic. Is it possible?

Thanks in advance!

--
Rubén Casado Tejedor, PhD
Big Data Lead
> accenture technology
' + 34 629 009 429
• ruben.casado.teje...@accenture.com



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy. Your privacy is important to us. Accenture uses your personal data only 
in compliance with data protection laws. For further information on how 
Accenture processes your personal data, please see our privacy statement at 
https://www.accenture.com/us-en/privacy-policy.
__

www.accenture.com

Re: batch range sort support

2020-04-23 Thread Benchao Li

Hi Kurt,

I've created a jira issue[1] to track this, we can move further
discussions to the jira issue.

[1] https://issues.apache.org/jira/browse/FLINK-17354

Kurt Young  于2020年4月23日周四 下午10:25写道：

> Hi Benchao, you can create a jira issue to track this.
>
> Best,
> Kurt
>
>
> On Thu, Apr 23, 2020 at 2:27 PM Benchao Li  wrote:
>
>> Hi Jingsong,
>>
>> Thanks for your quick response. I've CC'ed Chongchen who understands the
>> scenario much better.
>>
>>
>> Jingsong Li  于2020年4月23日周四 下午12:34写道：
>>
>>> Hi, Benchao,
>>>
>>> Glad to see your requirement about range partition.
>>> I have a branch to support range partition: [1]
>>>
>>> Can you describe your scene in more detail? What sink did you use for
>>> your jobs? A simple and complete business scenario? This can help the
>>> community judge the importance of the range partition.
>>>
>>> [1]https://github.com/JingsongLi/flink/commits/range
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Thu, Apr 23, 2020 at 12:15 PM Benchao Li  wrote:
>>>
 Hi,

 Currently the sort operator in blink planner is global, which has
 bottleneck if we sort a lot of data.

 And I found 'table.exec.range-sort.enabled' config in
 BatchExecSortRule, which makes me very exciting.
 After enabling this config, I found that it's not implemented
 completely now. This config changes the distribution
  from SINGLETON to range for sort operator, however in
 BatchExecExchange we do not deal with range
 distribution, and will throw UnsupportedOperationException.

 My question is,
 1. Is this config just a mistake when we merge blink into flink, and we
 actually didn't plan to implement this?
 2. If this is in the plan, then which version may we expect it to be
 ready?

 --

 Benchao Li
 School of Electronics Engineering and Computer Science, Peking University
 Tel:+86-15650713730
 Email: libenc...@gmail.com; libenc...@pku.edu.cn

>>>
>>> --
>>> Best, Jingsong Lee
>>>
>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re:Re: RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread izual




I try to make the question-model simple, such as the code below：




```

  val s = env.fromCollection(List(

("Book", 1, "")

  ))

  tableEnv.registerDataStream("tableA", s, 'a, 'b, 'c)

  class TestFunction extends ScalarFunction {

def eval(data: String) = {

  println(s"test: ${data}")

  data

}

  }

  tableEnv.registerFunction("my_test", new TestFunction)

  val tableB = tableEnv.sqlQuery(

"""

  |SELECT Row(A, C) as body FROM (

  | SELECT my_test(a) as A, my_test(c) as C from tableA

  |)

  |""".stripMargin)

  tableB.printSchema()

  tableEnv.registerTable("tableB", tableB)

  tableEnv.sqlQuery(

"""

  |SELECT body.EXPR$0, body.EXPR$1

  |FROM tableB

  |""".stripMargin).toAppendStream[Row].print()

```




the type of column `body` is Row, and it comes from tableA, part of the plan is 




Calc(select=[CAST((my_test(a) ROW my_test(c))).EXPR$0 AS EXPR$0, 
CAST((my_test(a) ROW my_test(c))).EXPR$1 AS EXPR$1])




the column `body` will be "generated" twice.




In my real case, the column `body` has many columns, and if the sql try to 
SELECT body.EXPR$0, body.EXPR$1, ..body.EXPR$n， then the plan come bigger, and 
job failed.




Maybe this is the reason?

And Is there any way to make `body` generated only one times?




Thanks for your reply.







At 2020-04-23 20:32:07, "Caizhi Weng"  wrote:

This plan looks indeed complicated, however it is hard to see what the SQL is 
doing as the plan is too long... Could you provide your SQL to us? Also, what 
version of Flink are you using? It seems that there is a very long method in 
the generated code, but Flink should have split it into many shorter methods 
(see TableConfig#maxGeneratedCodeLength). By default Flink will split methods 
longer than 64KB into shorter ones.


izual  于2020年4月23日周四 下午6:34写道：

Hi，Community：
  I add 4 complicated sqls in one job, and the job looks running well.
  But when I try to add 5th sql，the job failed at the beginning。
  And throws errors info below：
java.lang.RuntimeException: Could not instantiate generated class 
'StreamExecCalc$23166'
at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
at 
org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:47)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:428)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:418)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:373)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.api.common.InvalidProgramException: Table program 
cannot be compiled. This is a bug. Please file an issue.
at 
org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
at 
org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:65)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:78)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:65)
... 10 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:642)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3658)
at org.codehaus.janino.UnitCompiler.access$5800(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3543)
at 
org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3511)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3511)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3510)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3499)


As the warning shows OOM，Then I try to set -yjm -ytm to a big value(1024 -> 
4096)，but this does not help.


Thanks for your reply.

Flink table modules

2020-04-23 Thread Flavio Pompermaier

Hi to all,
I've seen that table API provides modules. What are they exactly?
Are they basically a way to group together a set of UDF functions?
Or they can add other stuff to the table API?

Best,
Flavi

Re: Debug Slowness in Async Checkpointing

2020-04-23 Thread Robert Metzger

Hi Lu,

were you able to resolve the issue with the slow async checkpoints?

I've added Yu Li to this thread. He has more experience with the state
backends to decide which monitoring is appropriate for such situations.

Best,
Robert


On Tue, Apr 21, 2020 at 10:50 PM Lu Niu  wrote:

> Hi, Robert
>
> Thanks for replying. To improve observability , do you think we should
> expose more metrics in checkpointing? for example, in incremental
> checkpoint, the time spend on uploading sst files?
> https://github.com/apache/flink/blob/5b71c7f2fe36c760924848295a8090898cb10f15/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java#L319
>
> Best
> Lu
>
>
> On Fri, Apr 17, 2020 at 11:31 AM Robert Metzger 
> wrote:
>
>> Hi,
>> did you check the TaskManager logs if there are retries by the s3a file
>> system during checkpointing?
>>
>> I'm not aware of any metrics in Flink that could be helpful in this
>> situation.
>>
>> Best,
>> Robert
>>
>> On Tue, Apr 14, 2020 at 12:02 AM Lu Niu  wrote:
>>
>>> Hi, Flink users
>>>
>>> We notice sometimes async checkpointing can be extremely slow, leading
>>> to checkpoint timeout. For example, For a state size around 2.5MB, it could
>>> take 7~12min in async checkpointing:
>>>
>>> [image: Screen Shot 2020-04-09 at 5.04.30 PM.png]
>>>
>>> Notice all the slowness comes from async checkpointing, no delay in sync
>>> part and barrier assignment. As we use rocksdb incremental checkpointing, I
>>> notice the slowness might be caused by uploading the file to s3. However, I
>>> am not completely sure since there are other steps in async checkpointing.
>>> Does flink expose fine-granular metrics to debug such slowness?
>>>
>>> setup: flink 1.9.1, rocksdb incremental state backend, S3AHaoopFileSystem
>>>
>>> Best
>>> Lu
>>>
>>

Re: Reading from sockets using dataset api

2020-04-23 Thread Arvid Heise

Hi Kaan,

afaik there is no (easy) way to switch from streaming back to batch API
while retaining all data in memory (correct me if I misunderstood).

However, from your description, I also have some severe understanding
problems. Why can't you dump the data to some file? Do you really have more
main memory than disk space? Or do you have no shared memory between your
generating cluster and the flink cluster?

It almost sounds as if the issue at heart is rather to find a good
serialization format on how to store the edges. The 70 billion edges could
be stored in an array of id pairs, which amount to ~560 GB uncompressed
data if stored in Avro (or any other binary serialization format) when ids
are longs. That's not much by today's standards and could also be easily
offloaded to S3.

Alternatively, if graph generation is rather cheap, you could also try to
incorporate it directly into the analysis job.

On Wed, Apr 22, 2020 at 2:58 AM Kaan Sancak  wrote:

> Hi,
>
> I have been running some experiments on  large graph data, smallest graph
> I have been using is around ~70 billion edges. I have a graph generator,
> which generates the graph in parallel and feeds to the running system.
> However, it takes a lot of time to read the edges, because even though the
> graph generation process is parallel, in Flink I can only listen from
> master node (correct me if I am wrong). Another option is dumping the
> generated data to a file and reading with readFromCsv, however this is not
> feasible in terms of storage management.
>
> What I want to do is, invoking my graph generator, using ipc/tcp
> protocols  and reading the generated data from the sockets. Since the graph
> data is also generated parallel in each node, I want to make use of ipc,
> and read the data in parallel at each node. I made some online digging  but
> couldn’t find something similar using dataset api. I would be glad if you
> have some similar use cases or examples.
>
> Is it possible to use streaming environment to create the data in parallel
> and switch to dataset api?
>
> Thanks in advance!
>
> Best
> Kaan

-- 

Arvid Heise | Senior Java Developer

Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: KeyedStream and chained forward operators

2020-04-23 Thread Piotr Nowojski

Hi,

I’m not sure how can we help you here. To my eye, your code looks ok, what you 
figured about pushing the keyBy in front of ContinuousFileReader is also valid 
and makes sense if you indeed can correctly perform the keyBy based on the 
input splits. The problem should be somewhere in your custom logic, maybe your 
KeySelector is not working exactly as expected? Maybe bucketing is mis 
behaving? I would suggest you to reproduce the problem locally (not on cluster) 
with some minimal data set and then decompose your job into a smaller 
components and validate them independently, or to bisect the job to pin point 
where is the problem.

I’m not sure if you are aware of it, but just in case not, please take a look 
at the ways how you can test your job [1]
 
Piotrek

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html 


> On 21 Apr 2020, at 17:35, Cliff Resnick  wrote:
> 
> I'm running a massive file sifting by timestamp DataSteam job from s3. 
> 
> The basic job is:
> FileMonitor -> ContinuousFileReader -> MultipleFileOutputSink 
> 
> The MultipleFileOutputSink sifts data based on timestamp to date-hour 
> directories
> 
> It's a lot of data, so I'm using high parallelism, but I want to maintain 
> reasonable output file size, so if I key post-ContinuousFileReader by 
> 5-minute timestamp keys I get the desired result of large files at the cost 
> of a network shuffle.
> 
> But since I also have timestamps on the input files I figured I could push 
> back the keyed stream to FileMonitor -> ContinuousFileReader and save the 
> network shuffle. I tested this and confirmed that it sort of worked and 
> ContinuousFileReaders are receiving properly partitioned input, but output 
> post reader is now rebalanced and sinks produce lots of tiny files. 
> 
> The code is below. Am I missing something?
> val source = env
>   .addSource(fileMonitor)
>   .name(s"Bucketed Log Source File Watcher: $path")
>   .keyBy(new KeySelector[TimestampedFileInputSplit, Long]() {
> override def getKey(split: TimestampedFileInputSplit): Long = {
>   val name = split.getPath.getName
>   val r= """(\d+)\.log""".r
>   r.findFirstMatchIn(name) match {
> case Some(m) ⇒ {
>   val t = m.group(1).toLong
>   t - (t % 300)
> }
> case _ ⇒ -1
>   }
> }
>   })
>   .transform[String]("Bucketed Log Source File Reader", fileReader)
>   .forward
>   .assignTimestampsAndWatermarks(WatermarkExtractor[String])
>   .forward
>   .addSink(SourceTrackingSink(Sift.outputBucket, BidDateFunc))
> 
> 
>

Re: batch range sort support

2020-04-23 Thread Kurt Young

Hi Benchao, you can create a jira issue to track this.

Best,
Kurt


On Thu, Apr 23, 2020 at 2:27 PM Benchao Li  wrote:

> Hi Jingsong,
>
> Thanks for your quick response. I've CC'ed Chongchen who understands the
> scenario much better.
>
>
> Jingsong Li  于2020年4月23日周四 下午12:34写道：
>
>> Hi, Benchao,
>>
>> Glad to see your requirement about range partition.
>> I have a branch to support range partition: [1]
>>
>> Can you describe your scene in more detail? What sink did you use for
>> your jobs? A simple and complete business scenario? This can help the
>> community judge the importance of the range partition.
>>
>> [1]https://github.com/JingsongLi/flink/commits/range
>>
>> Best,
>> Jingsong Lee
>>
>> On Thu, Apr 23, 2020 at 12:15 PM Benchao Li  wrote:
>>
>>> Hi,
>>>
>>> Currently the sort operator in blink planner is global, which has
>>> bottleneck if we sort a lot of data.
>>>
>>> And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule,
>>> which makes me very exciting.
>>> After enabling this config, I found that it's not implemented completely
>>> now. This config changes the distribution
>>>  from SINGLETON to range for sort operator, however in BatchExecExchange
>>> we do not deal with range
>>> distribution, and will throw UnsupportedOperationException.
>>>
>>> My question is,
>>> 1. Is this config just a mistake when we merge blink into flink, and we
>>> actually didn't plan to implement this?
>>> 2. If this is in the plan, then which version may we expect it to be
>>> ready?
>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>>
>>>
>>
>> --
>> Best, Jingsong Lee
>>
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>

Re: define WATERMARKS in queries/views?

2020-04-23 Thread lec ssmi

can  assignTimestampAndWatermark  again on a  watermarked table？

Jark Wu  于 2020年4月23日周四 20:18写道：

> Hi Matyas,
>
> You can create a new table based on the existing table using LIKE syntax
> [1] in the upcoming 1.11 version, e.g.
>
> CREATE  TABLE derived_table (
> WATERMARK FOR tstmp AS tsmp - INTERVAL '5' SECOND
> ) LIKE base_table;
>
> For now, maybe you have to manually create a new table using full DDL.
>
> Best,
> Jark
>
> [1]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> 
>
> 2020年4月23日 17:35，Őrhidi Mátyás  写道：
>
> Dear Community,
>
> is it possible to define WATERMARKS in SQL queries/views? We have a read
> only catalog implementation and we would like to assign WMs to the tables
> somehow.
>
> Thanks,
> Matyas
>
>
>

Re: Processing Message after emitting to Sink

2020-04-23 Thread Sameer W

One idea that comes to my mind is to convert ProcessFunction1 with a
CoProcessFunction[1]. The processElement1() function can send to
side-output and process and maintain the business function message as State
without emitting it.  Then as Arvid mentioned processElement2() can listen
on the side output (emitted by processElement1()) and when it receives it,
emit the result from the state and clear the state.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/flink/streaming/api/functions/co/CoProcessFunction.html

On Thu, Apr 23, 2020 at 7:20 AM Arvid Heise  wrote:

> Hi Kristoff,
>
> I see a few ways, none of which are perfect.
>
> The easiest way would be to not use a sink. Instead of outputting into a
> side-output, you could tag that element and have a successive asyncIO place
> that in RabbitMQ. If that asyncIO is ordered, then you can be sure that all
> following events are only processed after the element has been added. Of
> course, the downside is that you have to manually implement the
> communication with RabbitMQ and lose what Flink already has. This is what
> you already sketched out.
>
> A more complicated approach would be to implement a custom operator with
> input selection to replace processFunction2 [1]. Let's call it op2. You
> would add the feedback from the sink implicitly, by also consuming from
> that MQ queue on op2. Then, processFunction1 would also emit some flag
> event on the main output together with the side output. Op2 would block the
> input on receiving that flag until it has read the appropriate entry from
> the MQ. However, this approach is really complex to implement and input
> selection is somewhat based on a best-effort. So before going that route,
> I'd do a small POC to see if it fits your needs.
>
> The best solution, of course, would be to revise your overall
> architecture. It's quite smelly in a stream processing job that you need to
> halt execution at some point. If you give some more details, I could try to
> help.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/index.html?org/apache/flink/streaming/api/operators/InputSelectable.html
>
> On Wed, Apr 15, 2020 at 5:36 PM KristoffSC 
> wrote:
>
>> My point was, that as far as I know, Sinks are "terminating" operators,
>> that
>> ends the stream like .collect in Java 8 stream API. The don't emit
>> elements
>> further and I cannot link then in a way:
>>
>> source - proces - sink - process - sink
>>
>> Sink function produces DataStreamSink which is used for emitting elements
>> from a streaming topology.
>> It is not SingleOutputStreamOperator or DataStream that I can use as input
>> for next operator.
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> 
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward  - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be cast to javax.xml.stream.XMLInputFactory)

2020-04-23 Thread Arvid Heise

This looks like a typical issue with classloading.

kinesis is probably residing in flink-dist/lib while woodstock is added in
your job.jar (or vice versa).

Could you try to use both jars in the same way? Alternatively, could you
provide more information regarding your dependencies?

On Tue, Apr 21, 2020 at 11:21 AM Fu, Kai  wrote:

> Hi, I’m using Flink 1.8 with JDK 8.
>
>
>
> *-- Best wishes*
>
> *Fu Kai*
>
>
>
>
>
> *From: *Chesnay Schepler 
> *Date: *Tuesday, April 21, 2020 at 5:15 PM
> *To: *"Fu, Kai" , "user@flink.apache.org" <
> user@flink.apache.org>
> *Subject: *RE: [EXTERNAL] Unable to unmarshall response
> (com.ctc.wstx.stax.WstxInputFactory cannot be cast to
> javax.xml.stream.XMLInputFactory)
>
>
>
> Which Flink version are you using?
>
>
>
> On 21/04/2020 11:11, Fu, Kai wrote:
>
> Hi,
>
>
>
> I’m running Flink application on AWS Kinesis Flink platform to read a
> kinesis stream from another account with assumed role, while I’m getting
> exception like below. But it works when I’m running the application
> locally, I’ve given all the related roles admin permission. Could anyone
> help what’s the potential problem?
>
>
>
> [
>
> "org.apache.flink.kinesis.shaded.com.amazonaws.SdkClientException:
> Unable to unmarshall response (com.ctc.wstx.stax.WstxInputFactory cannot be
> cast to javax.xml.stream.XMLInputFactory). Response Code: 200, Response
> Text: OK",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1738)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1434)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1356)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.doInvoke(AWSSecurityTokenServiceClient.java:1719)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1686)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.invoke(AWSSecurityTokenServiceClient.java:1675)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.executeAssumeRole(AWSSecurityTokenServiceClient.java:589)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient.assumeRole(AWSSecurityTokenServiceClient.java:561)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.newSession(STSAssumeRoleSessionCredentialsProvider.java:321)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.access$000(STSAssumeRoleSessionCredentialsProvider.java:37)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:76)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider$1.call(STSAssumeRoleSessionCredentialsProvider.java:73)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.RefreshableTask.refreshValue(RefreshableTask.java:257)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.RefreshableTask.blockingRefresh(RefreshableTask.java:213)",
>
> "\tat
> org.apache.flink.kinesis.shaded.com.amazonaws.auth.RefreshableTask.getValue(RefreshableTask.java:154)",
>
> "\tat
>

K8s native - checkpointing to S3 with RockDBStateBackend

2020-04-23 Thread Averell

Hi,
I am trying to deploy my job to Kubernetes following the native-Kubernetes
guide. My job is checkpointing to S3 with RockDBStateBackend. It also has a
S3 StreamingFileSink.
In my jar file, I've already had /flink-hadoop-fs,
flink-connector-filesystem, flink-s3-fs-hadoop /(as my understanding, these
are for the S3 sink, please correct me if I'm wrong)

When I tried to submit the job, I got the following error (only a few
seconds after submitting): /Could not find a file system implementation for
scheme 's3'. The scheme is not directly supported by Flink and no Hadoop
file system to support this scheme could be loaded/

Not sure how I can get over this. 
Using s3a didn't help (s3 does work well when running on my dev machine)
I also tried to copy the file /flink-shaded-hadoop-2-uber-2.8.3-10.0.jar/ to
the //opt/flink/lib// folder of the JobManager pod, but it didn't help (is
it already too late? should that be there before the JM is started?)

Thanks for your help.
Averell


/
Caused by: org.apache.flink.util.FlinkRuntimeException: Failed to create
checkpoint storage at checkpoint coordinator side.
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:282)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:205)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.enableCheckpointing(ExecutionGraph.java:486)
at
org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:338)
at
org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:255)
at
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:227)
at
org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:215)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:120)
at
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:105)
at
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:278)
at
org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:266)
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
at
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:146)
... 10 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
Could not find a file system implementation for scheme 's3a'. The scheme is
not directly supported by Flink and no Hadoop file system to support this
scheme could be loaded.
at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
at
org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.(FsCheckpointStorage.java:64)
at
org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:490)
at
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:477)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.(CheckpointCoordinator.java:279)
... 23 more/



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

JDBC table api questions

2020-04-23 Thread Flavio Pompermaier

Hi all,
is there a way to get the list of existing views in a JDBC database?
Is this something that could be supported somehow?

Moreover, it would be interesting for us to also know the original field
type of a table..is there a way to get it (without implementing a dedicated
API)? Do you think it makes sense to expose it in the Table API?

Best,
Flavio

Stateful Functions: java.lang.IllegalStateException: There are no routers defined

2020-04-23 Thread Annemarie Burger

Hi,
I'm getting to know Stateful Functions and was trying to run the Harness
RunnerTest example. If I clone the  repository and open and execute the
project from there it works fine, but when I copy the code into my own
project, it keeps giving a "java.lang.IllegalStateException: There are no
routers defined. " I changed nothing about the code, so the Module and the
Router are both defined, and the dependencies are the same. Am I overlooking
something?
This is the full error:

Exception in thread "main" java.lang.IllegalStateException: There are no
routers defined.
at
org.apache.flink.statefun.flink.core.StatefulFunctionsUniverseValidator.validate(StatefulFunctionsUniverseValidator.java:31)
at
org.apache.flink.statefun.flink.core.StatefulFunctionsJob.main(StatefulFunctionsJob.java:66)
at 
org.apache.flink.statefun.flink.harness.Harness.start(Harness.java:128)
at gellyStreaming.gradoop.StatefulFunctions.Test.main(Test.java:17)

Process finished with exit code 1





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Modelling time for complex events generated out of simple ones

2020-04-23 Thread Arvid Heise

We had a larger discussion on stackoverflow [1], so I'm adding a cross link
if any other user is coming here first.

[1]
https://stackoverflow.com/questions/61309174/modelling-time-for-complex-events-generated-out-of-simple-ones/

On Mon, Apr 20, 2020 at 6:52 AM Salva Alcántara 
wrote:

> In my case, the relationship between input and output events is that output
> events are generated out of some rules based on input events. Essentially,
> output events correspond to specific patterns / sequences of input events.
> You can think of output events as detecting certain anomalies or abnormal
> conditions. So I guess we are more in the second case you mention where the
> Flink TM can be regarded as a generator and hence using the processing time
> makes sense.
>
> Indeed, I am using both the processing time and the event time watermark
> value at the moment of generating the output events. I think both convey
> useful information. In particular, the processing time looks as the logical
> timestamp for the output events. However, although that would be an
> exception, it might also happen that my flink app is processing old data at
> some point. That is why I am also adding another timestamp with the current
> event-time watermark value. This allows the consumer of the output events
> to
> detect whether the output event corresponds to old data or not (by
> comparing
> the difference between the processing time and event time timestamps, which
> should in normal conditions be close to each other, except when processing
> old data).
>
> In the case of using both, what naming would you use for the two fields?
> Something along the lines of event_time and processing_time seems to leak
> implementation details of my app to the external services...
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


-- 

Arvid Heise | Senior Java Developer



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread Caizhi Weng

This plan looks indeed complicated, however it is hard to see what the SQL
is doing as the plan is too long... Could you provide your SQL to us? Also,
what version of Flink are you using? It seems that there is a very long
method in the generated code, but Flink should have split it into many
shorter methods (see TableConfig#maxGeneratedCodeLength). By default Flink
will split methods longer than 64KB into shorter ones.

izual  于2020年4月23日周四 下午6:34写道：

> Hi，Community：
>   I add 4 complicated sqls in one job, and the job looks running well.
>   But when I try to add 5th sql，the job failed at the beginning。
>   And throws errors info below：
> java.lang.RuntimeException: Could not instantiate generated class
> 'StreamExecCalc$23166'
> at
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
> at
> org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:47)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:428)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:418)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:373)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.api.common.InvalidProgramException: Table
> program cannot be compiled. This is a bug. Please file an issue.
> at
> org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
> at
> org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:65)
> at
> org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:78)
> at
> org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:65)
> ... 10 more
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.util.HashMap.newNode(HashMap.java:1750)
> at java.util.HashMap.putVal(HashMap.java:642)
> at java.util.HashMap.putMapEntries(HashMap.java:515)
> at java.util.HashMap.putAll(HashMap.java:785)
> at
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3658)
> at org.codehaus.janino.UnitCompiler.access$5800(UnitCompiler.java:215)
> at
> org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3543)
> at
> org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3511)
> at
> org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3511)
> at
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3510)
> at
> org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3499)
>
> As the warning shows OOM，Then I try to set -yjm -ytm to a big value(1024
> -> 4096)，but this does not help.
>
> Thanks for your reply.
>
>
>
>

IntelliJ java formatter

2020-04-23 Thread Flavio Pompermaier

Hi to all,
I'm migrating to IntelliJ because it's very complicated to have a fully
working env in Eclipse (too many missing maven plugins). Is there a way to
automatically format a Java class (respecting the configured checkstyle)?
Or do I have to manually fix every Checkstyle problem?

Thanks in advance,
Flavio

Re: define WATERMARKS in queries/views?

2020-04-23 Thread Jark Wu

Hi Matyas,

You can create a new table based on the existing table using LIKE syntax [1] in 
the upcoming 1.11 version, e.g. 

CREATE  TABLE derived_table (
WATERMARK FOR tstmp AS tsmp - INTERVAL '5' SECOND
) LIKE base_table;

For now, maybe you have to manually create a new table using full DDL. 

Best,
Jark

[1]: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
 


> 2020年4月23日 17:35，Őrhidi Mátyás  写道：
> 
> Dear Community,
> 
> is it possible to define WATERMARKS in SQL queries/views? We have a read only 
> catalog implementation and we would like to assign WMs to the tables somehow.
> 
> Thanks,
> Matyas

Re: Unsubscribe

2020-04-23 Thread Arvid Heise

Please unsubscribe by sending a mail to user-unsubscr...@flink.apache.org

On Thu, Apr 16, 2020 at 5:06 PM Jose Cisneros 
wrote:

> Unsubscribe
>


-- 

Arvid Heise | Senior Java Developer



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-23 Thread Arvid Heise

Hi Elkhan,

Theo's advice is spot-on, you should use asyncIO with AsyncFunction.

AsyncIO is not performing any task asynchronously by itself though, so you
should either use the async API of the library if existant or manage your
own thread pool. The thread pool should be as large as your desired
parallelism (4-8). AsyncIO then ensures that the results are emitted in the
correct way (to have consistent state). AsyncIO internally uses a queue for
the parallel tasks, which should be at least as large as the number of
threads in your pool. You can decide if results should be published as fast
as possible (UNORDERED) or in the order of arrival (ORDERED where a slow
element stops faster elements from being published). For ORDERED, your
queue length should be much longer for optimal performance. If your longest
elements take 2 min but the average 30 sec, you want to have a queue that
is 4 roughly times as big as the number of threads (no exact science here).

Multithreading within an operator is strongly discouraged in 1.6 and will
even be impossible in the near future. The main reason is that it's really
hard to reason about consistent state and have a consistent checkpoint.
It's very easy to have duplicates are lost elements in such a setting.

On Thu, Apr 16, 2020 at 3:43 PM Theo Diefenthal <
theo.diefent...@scoop-software.de> wrote:

> Hi,
>
> I think you could utilize AsyncIO in your case with just using a local
> thread pool [1].
>
> Best regards
> Theo
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
>
> --
> *Von: *"Elkhan Dadashov" 
> *An: *"user" 
> *Gesendet: *Donnerstag, 16. April 2020 10:37:55
> *Betreff: *How to scale a streaming Flink pipeline without abusing
> parallelism for long computation tasks?
>
> Hi Flink users,
> I have a basic Flnk pipeline, doing flatmap.
>
> inside flatmap, I get the input, path it to the client library to compute
> some result.
>
> That library execution takes around 30 seconds to 2 minutes (depending on
> the input ) for producing the output from the given input ( it is
> time-series based long-running computation).
>
> As it takes the library long time to compute, the input payloads keep
> buffered, and if not given enough parallelism, the job will crash/restart.
> (java.lang.RuntimeException: Buffer pool is destroyed.)
>
> Wanted to check what are other options for scaling Flink streaming
> pipeline without abusing parallelism for long-running computations in Flink
> operator?
>
> Is multi-threading inside the operator recommended? ( even though the
> single input computation takes a long time, but I can definitely run 4-8 of
> them in parallel threads, instead of one by one, inside the same FlatMap
> operator.
>
> 1 core for each yarn slot ( which will hold 1 flatmap operator) seems too
> expensive. If we could launch more link operators with only 1 core, it
> could have been easier.
>
> If anyone faced a similar issue please share your experience. I'm using
> Flink 1..6.3 version.
>
> Thanks.
>

-- 

Arvid Heise | Senior Java Developer

Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Handling stale data enrichment

2020-04-23 Thread Vinay Patil

Hi,

I went through Konstantin webinar on 99 ways you can do enrichment. One
thing I am failing to understand is how do we efficiently handle stale data
enrichment.

Context: Let's say I want to enrich user data with the subscription data.
Here subscription data is acting as reference data and will be used for
joining these two streams based on event time. Consider the following
scenario:


   1. We are going to enrich Click Stream event (containing user_info) with
   Subscription details
   2. Subscription Status for Alice user is FREE
   3. Current Internal State contains Alice with Subscription status as FREE
   4.

   Reference data is not flowing because of some issue for 2hrs
   5.

   Alice upgraded the subscription to Premium at 10.30 AM
   6.

   Watched video event comes for Alice at 10.40 AM
   -

  flink pipeline looks up in internal state and writes to enrichment
  topic
  -

  Enrichment topic now contains Alice -> FREE
  7.

   Reference data starts flowing in at 11AM
   -

  let's assume we consider late elements upto 2 hours, so the click
  stream event of Alice is still buffered in the state
  - The enrichment topic will now contain duplicate records for Alice
  because of multiple firings of window
   1. Alice -> FREE -> 10 AM
  2. Alice -> PREMIUM -> 11 AM

Question is how do I avoid sending duplicate records ? I am not able to
understand it. I can think of Low Level joins but not sure how do we know
if it is stale data or not based on timestamp (watermark) as it can happen
that a particular enriched record is not updated for 6 hrs.

Regards,
Vinay Patil

Re: Streaming Job eventually begins failing during checkpointing

2020-04-23 Thread Stephan Ewen

If something requires Beam to register a new state each time, then this is
tricky, because currently you cannot unregister states from Flink.

@Yu @Yun I remember chatting about this (allowing to explicitly unregister
states so they get dropped from successive checkpoints) at some point, but
I could not find a jira ticket for this. Do you remember what the status of
that discussion is?

On Thu, Apr 16, 2020 at 6:37 PM Stephen Patel  wrote:

> I posted to the beam mailing list:
> https://lists.apache.org/thread.html/rb2ebfad16d85bcf668978b3defd442feda0903c20db29c323497a672%40%3Cuser.beam.apache.org%3E
>
> I think this is related to a Beam feature called RequiresStableInput
> (which my pipeline is using).  It will create a new operator (or keyed)
> state per checkpoint.  I'm not sure that there are any parameters that I
> have control over to tweak it's behavior (apart from increasing the
> checkpoint interval to let the pipeline run longer before building up that
> many states).
>
> Perhaps this is something that can be fixed (maybe by unregistering
> Operator States after they aren't used any more in the RequiresStableInput
> code).  It seems to me that this isn't a Flink issue, but rather a Beam
> issue.
>
> Thanks for pointing me in the right direction.
>
> On Thu, Apr 16, 2020 at 11:29 AM Yun Tang  wrote:
>
>> Hi Stephen
>>
>> I think the state name [1] which would be changed every time might the
>> root cause. I am not familiar with Beam code, would it be possible to
>> create so many operator states? Did you configure some parameters wrongly?
>>
>>
>> [1]
>> https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L95
>>
>> Best
>> Yun Tang
>> --
>> *From:* Stephen Patel 
>> *Sent:* Thursday, April 16, 2020 22:30
>> *To:* Yun Tang 
>> *Cc:* user@flink.apache.org 
>> *Subject:* Re: Streaming Job eventually begins failing during
>> checkpointing
>>
>> Correction.  I've actually found a place where it potentially might be
>> creating a new operator state per checkpoint:
>>
>> https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L91-L105
>> https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/BufferingDoFnRunner.java#L141-L149
>>
>> This gives me something I can investigate locally at least.
>>
>> On Thu, Apr 16, 2020 at 9:03 AM Stephen Patel  wrote:
>>
>> I can't say that I ever call that directly.  The beam library that I'm
>> using does call it in a couple places:
>> https://github.com/apache/beam/blob/v2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L422-L429
>>
>> But it seems to be the same descriptor every time.  Is that limit per
>> operator?  That is, can each operator host up to 32767 operator/broadcast
>> states?  I assume that's by name?
>>
>> On Wed, Apr 15, 2020 at 10:46 PM Yun Tang  wrote:
>>
>> Hi  Stephen
>>
>> This is not related with RocksDB but with default on-heap operator state
>> backend. From your exception stack trace, you have created too many
>> operator states (more than 32767).
>> How do you call context.getOperatorStateStore().getListState or
>> context.getOperatorStateStore().getBroadcastState ? Did you pass a
>> different operator state descriptor each time?
>>
>> Best
>> Yun Tang
>> --
>> *From:* Stephen Patel 
>> *Sent:* Thursday, April 16, 2020 2:09
>> *To:* user@flink.apache.org 
>> *Subject:* Streaming Job eventually begins failing during checkpointing
>>
>> I've got a flink (1.8.0, emr-5.26) streaming job running on yarn.  It's
>> configured to use rocksdb, and checkpoint once a minute to hdfs.  This job
>> operates just fine for around 20 days, and then begins failing with this
>> exception (it fails, restarts, and fails again, repeatedly):
>>
>> 2020-04-15 13:15:02,920 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
>> checkpoint 32701 @ 1586956502911 for job 9953424f21e240112dd23ab4f8320b60.
>> 2020-04-15 13:15:05,762 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed
>> checkpoint 32701 for job 9953424f21e240112dd23ab4f8320b60 (795385496 bytes
>> in 2667 ms).
>> 2020-04-15 13:16:02,919 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
>> checkpoint 32702 @ 1586956562911 for job 9953424f21e240112dd23ab4f8320b60.
>> 2020-04-15 13:16:03,147 INFO
>>  org.apache.flink.runtime.executiongraph.ExecutionGraph-
>>  (1/2) (f4737add01961f8b42b8eb4e791b83ba) switched from
>> RUNNING to FAILED.
>>

Re: Processing Message after emitting to Sink

2020-04-23 Thread Arvid Heise

Hi Kristoff,

I see a few ways, none of which are perfect.

The easiest way would be to not use a sink. Instead of outputting into a
side-output, you could tag that element and have a successive asyncIO place
that in RabbitMQ. If that asyncIO is ordered, then you can be sure that all
following events are only processed after the element has been added. Of
course, the downside is that you have to manually implement the
communication with RabbitMQ and lose what Flink already has. This is what
you already sketched out.

A more complicated approach would be to implement a custom operator with
input selection to replace processFunction2 [1]. Let's call it op2. You
would add the feedback from the sink implicitly, by also consuming from
that MQ queue on op2. Then, processFunction1 would also emit some flag
event on the main output together with the side output. Op2 would block the
input on receiving that flag until it has read the appropriate entry from
the MQ. However, this approach is really complex to implement and input
selection is somewhat based on a best-effort. So before going that route,
I'd do a small POC to see if it fits your needs.

The best solution, of course, would be to revise your overall architecture.
It's quite smelly in a stream processing job that you need to halt
execution at some point. If you give some more details, I could try to help.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/api/java/index.html?org/apache/flink/streaming/api/operators/InputSelectable.html

On Wed, Apr 15, 2020 at 5:36 PM KristoffSC 
wrote:

> My point was, that as far as I know, Sinks are "terminating" operators,
> that
> ends the stream like .collect in Java 8 stream API. The don't emit elements
> further and I cannot link then in a way:
>
> source - proces - sink - process - sink
>
> Sink function produces DataStreamSink which is used for emitting elements
> from a streaming topology.
> It is not SingleOutputStreamOperator or DataStream that I can use as input
> for next operator.
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

-- 

Arvid Heise | Senior Java Developer

Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: retract的问题

2020-04-23 Thread Benchao Li

嗯嗯，是的。over window的确是不支持retract输入，只支持append输入。
而且它也只有append输出。

lec ssmi  于2020年4月23日周四 下午6:32写道：

> 不好意思，刚才看了一下源码：
> [image: image.png]
> 这个是over window的聚合操作。
> 这个类实现没有实现producesUpdates 和producesRetractions，
> 而这两个方法的默认值都是False。是否说明，只能有INSERT类型的记录？
> 如果是的话，不就是说明over window操作的输出是一个Append-only stream?
>
> lec ssmi  于2020年4月23日周四 下午5:13写道：
>
>> 明白了，谢谢。
>>
>> Benchao Li  于2020年4月23日周四 下午5:08写道：
>>
>>> 不是这个意思。是说scalar function不需要额外处理，所以它一套逻辑就可以处理两种类型的消息了。
>>> 它不需要区分消息类型，只需要处理消息本身（消息类型是在header里）。
>>>
>>> lec ssmi  于2020年4月23日周四 下午5:00写道：
>>>
>>> >
>>> 那也就是说UDF这种ScalarFunction，是没有办法处理Retract的了？因为它把DELETE记录和INSERT记录都做了相同的操作。
>>> >
>>> > Benchao Li  于2020年4月23日周四 下午4:54写道：
>>> >
>>> > > Hi Jingsong,
>>> > > 我建了一个jira[1] 来跟踪这个事情。
>>> > >
>>> > > Hi lec，
>>> > >
>>> sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
>>> > > scalar function不需要这样子处理，因为它本身没有状态。scalar
>>> > function对于消息的类型是不需要判断的，处理过程都是一样的。
>>> > >
>>> > > [1] https://issues.apache.org/jira/browse/FLINK-17343
>>> > >
>>> > > lec ssmi  于2020年4月23日周四 下午4:41写道：
>>> > >
>>> > > > 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
>>> > > > 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
>>> > > >  if( type='DELETE'){
>>> > > >  sum=sum-value
>>> > > > } else if(type='INSERT'){
>>> > > > sum=sum+value
>>> > > >}
>>> > > >  的逻辑呢？
>>> > > >  但是在ScalarFunction中，只实现了eval方法，也就是只有
>>> INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
>>> > > >
>>> > > > Benchao Li  于2020年4月23日周四 下午4:33写道：
>>> > > >
>>> > > > > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
>>> > > > >
>>> > > > > lec ssmi  于2020年4月23日周四 下午4:29写道：
>>> > > > >
>>> > > > > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
>>> > > > > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
>>> > > > > >
>>> > > > > > Benchao Li  于2020年4月23日周四 下午4:26写道：
>>> > > > > >
>>> > > > > > > time interval join不允许输入是非append的。
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
>>> > > > > > >
>>> > > > > > > > 那如果是两个retract算子后的流进行time interval join，
>>> > > > > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
>>> > > > > > > >
>>> > > > > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
>>> > > > > > > >
>>> > > > > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
>>> > > > > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
>>> > > > > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
>>> > > > > > > > window的确是会需要处理retract，除此之外，regular
>>> > > > > > > > > group by也需要。
>>> > > > > > > > >
>>> > > > > > > > > lec ssmi  于2020年4月23日周四
>>> 下午4:05写道：
>>> > > > > > > > >
>>> > > > > > > > > > 谢谢。
>>> > > > > > > > > >
>>> > > > >
>>> 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
>>> > > > > > > > > > 但是对于Table
>>> > > > > > > > > >
>>> > > > >
>>> API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
>>> > > > > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
>>> > > > > implemented
>>> > > > > > > > > > for  datastream bounded over aggregate  。 是否说只有over
>>> > > > > > > window的时候才有retract?
>>> > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
>>> > > > > > > > > >
>>> > > > > > > > > > Benchao Li  于2020年4月23日周四
>>> 下午3:59写道：
>>> > > > > > > > > >
>>> > > > > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
>>> > > > > > > > > > >
>>> > > > > > > > > > > lec ssmi  于2020年4月23日周四
>>> > 下午3:45写道：
>>> > > > > > > > > > >
>>> > > > > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
>>> > > > > > > > > > > >
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > Benchao Li  于2020年4月23日周四
>>> > 下午3:39写道：
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > > Hi lec,
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > >
>>> 1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
>>> > > > > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window
>>> operator，在配置了early
>>> > > > > > fire或者late
>>> > > > > > > > > > fire的时候。
>>> > > > > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > >
>>> > 2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > 这个也不绝对。大部分时候是。
>>> > > > > > > > > > > > > 这个取决于这个算子本身是不是会consume
>>> > > > > > > > > > > > >
>>> retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > >
>>> > > > > > > > > > >
>>> > > > > > > >
>>> > > > >
>>> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > 是的。
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > lec ssmi  于2020年4月23日周四
>>> > > > 下午3:25写道：
>>> > > > > > > > > > > > >
>>> > > > > > > > > > > > > > Hi:

RuntimeException: Could not instantiate generated class 'StreamExecCalc$23166'

2020-04-23 Thread izual

Hi，Community：
  I add 4 complicated sqls in one job, and the job looks running well.
  But when I try to add 5th sql，the job failed at the beginning。
  And throws errors info below：
java.lang.RuntimeException: Could not instantiate generated class 
'StreamExecCalc$23166'
at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:67)
at 
org.apache.flink.table.runtime.operators.CodeGenOperatorFactory.createStreamOperator(CodeGenOperatorFactory.java:47)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:428)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:418)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:354)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:144)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:373)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.api.common.InvalidProgramException: Table program 
cannot be compiled. This is a bug. Please file an issue.
at 
org.apache.flink.table.runtime.generated.CompileUtils.doCompile(CompileUtils.java:81)
at 
org.apache.flink.table.runtime.generated.CompileUtils.compile(CompileUtils.java:65)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.compile(GeneratedClass.java:78)
at 
org.apache.flink.table.runtime.generated.GeneratedClass.newInstance(GeneratedClass.java:65)
... 10 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1750)
at java.util.HashMap.putVal(HashMap.java:642)
at java.util.HashMap.putMapEntries(HashMap.java:515)
at java.util.HashMap.putAll(HashMap.java:785)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3658)
at org.codehaus.janino.UnitCompiler.access$5800(UnitCompiler.java:215)
at 
org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3543)
at 
org.codehaus.janino.UnitCompiler$12.visitLocalVariableDeclarationStatement(UnitCompiler.java:3511)
at 
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:3511)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3510)
at 
org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:3499)


As the warning shows OOM，Then I try to set -yjm -ytm to a big value(1024 -> 
4096)，but this does not help.


Thanks for your reply.

Re: retract的问题

2020-04-23 Thread lec ssmi

不好意思，刚才看了一下源码：
[image: image.png]
这个是over window的聚合操作。
这个类实现没有实现producesUpdates 和producesRetractions，
而这两个方法的默认值都是False。是否说明，只能有INSERT类型的记录？
如果是的话，不就是说明over window操作的输出是一个Append-only stream?

lec ssmi  于2020年4月23日周四 下午5:13写道：

> 明白了，谢谢。
>
> Benchao Li  于2020年4月23日周四 下午5:08写道：
>
>> 不是这个意思。是说scalar function不需要额外处理，所以它一套逻辑就可以处理两种类型的消息了。
>> 它不需要区分消息类型，只需要处理消息本身（消息类型是在header里）。
>>
>> lec ssmi  于2020年4月23日周四 下午5:00写道：
>>
>> > 那也就是说UDF这种ScalarFunction，是没有办法处理Retract的了？因为它把DELETE记录和INSERT记录都做了相同的操作。
>> >
>> > Benchao Li  于2020年4月23日周四 下午4:54写道：
>> >
>> > > Hi Jingsong,
>> > > 我建了一个jira[1] 来跟踪这个事情。
>> > >
>> > > Hi lec，
>> > > sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
>> > > scalar function不需要这样子处理，因为它本身没有状态。scalar
>> > function对于消息的类型是不需要判断的，处理过程都是一样的。
>> > >
>> > > [1] https://issues.apache.org/jira/browse/FLINK-17343
>> > >
>> > > lec ssmi  于2020年4月23日周四 下午4:41写道：
>> > >
>> > > > 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
>> > > > 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
>> > > >  if( type='DELETE'){
>> > > >  sum=sum-value
>> > > > } else if(type='INSERT'){
>> > > > sum=sum+value
>> > > >}
>> > > >  的逻辑呢？
>> > > >  但是在ScalarFunction中，只实现了eval方法，也就是只有
>> INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
>> > > >
>> > > > Benchao Li  于2020年4月23日周四 下午4:33写道：
>> > > >
>> > > > > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
>> > > > >
>> > > > > lec ssmi  于2020年4月23日周四 下午4:29写道：
>> > > > >
>> > > > > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
>> > > > > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
>> > > > > >
>> > > > > > Benchao Li  于2020年4月23日周四 下午4:26写道：
>> > > > > >
>> > > > > > > time interval join不允许输入是非append的。
>> > > > > > >
>> > > > > > >
>> > > > > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
>> > > > > > >
>> > > > > > > > 那如果是两个retract算子后的流进行time interval join，
>> > > > > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
>> > > > > > > >
>> > > > > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
>> > > > > > > >
>> > > > > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
>> > > > > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
>> > > > > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
>> > > > > > > > window的确是会需要处理retract，除此之外，regular
>> > > > > > > > > group by也需要。
>> > > > > > > > >
>> > > > > > > > > lec ssmi  于2020年4月23日周四
>> 下午4:05写道：
>> > > > > > > > >
>> > > > > > > > > > 谢谢。
>> > > > > > > > > >
>> > > > >
>> 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
>> > > > > > > > > > 但是对于Table
>> > > > > > > > > >
>> > > > >
>> API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
>> > > > > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
>> > > > > implemented
>> > > > > > > > > > for  datastream bounded over aggregate  。 是否说只有over
>> > > > > > > window的时候才有retract?
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
>> > > > > > > > > >
>> > > > > > > > > > Benchao Li  于2020年4月23日周四
>> 下午3:59写道：
>> > > > > > > > > >
>> > > > > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
>> > > > > > > > > > >
>> > > > > > > > > > > lec ssmi  于2020年4月23日周四
>> > 下午3:45写道：
>> > > > > > > > > > >
>> > > > > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Benchao Li  于2020年4月23日周四
>> > 下午3:39写道：
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi lec,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> 1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
>> > > > > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window
>> operator，在配置了early
>> > > > > > fire或者late
>> > > > > > > > > > fire的时候。
>> > > > > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > 2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 这个也不绝对。大部分时候是。
>> > > > > > > > > > > > > 这个取决于这个算子本身是不是会consume
>> > > > > > > > > > > > >
>> retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 是的。
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > lec ssmi  于2020年4月23日周四
>> > > > 下午3:25写道：
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi:
>> > > > > > > > > > > > > >有几个问题想咨询下大佬：
>> > > > > > > > > > > > > >
>> >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
>> > > > > > > > > > > > > >
>> > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
>> > > > > > > > > > > > > >
>> >

Fault tolerance in Flink file Sink

2020-04-23 Thread Eyal Pe'er

Hi all,
I am using Flink streaming with Kafka consumer connector (FlinkKafkaConsumer) 
and file Sink (StreamingFileSink) in a cluster mode with exactly once policy.
The file sink writes the files to the local disk.
I've noticed that if a job fails and automatic restart is on, the task managers 
look for the leftovers files from the last failing job (hidden files).
Obviously, since the tasks can be assigned to different task managers, this 
sums up to more failures over and over again.
The only solution I found so far is to delete the hidden files and resubmit the 
job.
If I get it right (and please correct me If I wrong), the events in the hidden 
files were not committed to the bootstrap-server, so there is no data loss.

Is there a way, forcing Flink to ignore the files that were written already? Or 
maybe there is a better way to implement the solution (maybe somehow with 
savepoints)?

Best regards
Eyal Peer

Re: Flink 1.10 Out of memory

2020-04-23 Thread Stephan Ewen

@Xintong and @Lasse could it be that the JVM hits the "Direct Memory" limit
here?
Would increasing the "taskmanager.memory.framework.off-heap.size" help?

On Mon, Apr 20, 2020 at 11:02 AM Zahid Rahman  wrote:

> As you can see from the task manager tab of flink web dashboard
>
> Physical Memory:3.80 GB
> JVM Heap Size:1.78 GB
> Flink Managed Memory:128 MB
>
> *Flink is only using 128M MB which can easily cause OOM*
> *error.*
>
> *These are DEFAULT settings.*
>
> *I dusted off an old laptop so it only 3.8 GB RAM.*
>
> What does your job metrics say  ?
>
> On Mon, 20 Apr 2020, 07:26 Xintong Song,  wrote:
>
>> Hi Lasse,
>>
>> From what I understand, your problem is that JVM tries to fork some
>> native process (if you look at the exception stack the root exception is
>> thrown from a native method) but there's no enough memory for doing that.
>> This could happen when either Mesos is using cgroup strict mode for memory
>> control, or there's no more memory on the machine. Flink cannot prevent
>> native processes from using more memory. It can only reserve certain amount
>> of memory for such native usage when requesting worker memory from the
>> deployment environment (in your case Mesos) and allocating Java heap /
>> direct memory.
>>
>> My suggestion is to try increasing the JVM overhead configuration. You
>> can leverage the configuration options
>> 'taskmanager.memory.jvm-overhead.[min|max|fraction]'. See more details in
>> the documentation[1].
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/config.html#taskmanager-memory-jvm-overhead-max
>>
>> On Sat, Apr 18, 2020 at 4:02 AM Zahid Rahman 
>> wrote:
>>
>>> https://betsol.com/java-memory-management-for-java-virtual-machine-jvm/
>>>
>>> Backbutton.co.uk
>>> ¯\_(ツ)_/¯
>>> ♡۶Java♡۶RMI ♡۶
>>> Make Use Method {MUM}
>>> makeuse.org
>>> 
>>>
>>>
>>> On Fri, 17 Apr 2020 at 14:07, Lasse Nedergaard <
>>> lassenedergaardfl...@gmail.com> wrote:
>>>
 Hi.

 We have migrated to Flink 1.10 and face out of memory exception and
 hopeful can someone point us in the right direction.

 We have a job that use broadcast state, and we sometimes get out memory
 when it creates a savepoint. See stacktrack below.
 We have assigned 2.2 GB/task manager and
 configured  taskmanager.memory.process.size : 2200m
 In Flink 1.9 our container was terminated because OOM, so 1.10 do a
 better job, but it still not working and the task manager is leaking mem
 for each OOM and finial kill by Mesos

 Any idea what we can do to figure out what settings we need to change?

 Thanks in advance

 Lasse Nedergaard

 WARN o.a.flink.runtime.state.filesystem.FsCheckpointStreamFactory -
 Could not close the state stream for
 s3://flinkstate/dcos-prod/checkpoints/fc9318cc236d09f0bfd994f138896d6c/chk-3509/cf0714dc-ad7c-4946-b44c-96d4a131a4fa.
 java.io.IOException: Cannot allocate memory at
 java.io.FileOutputStream.writeBytes(Native Method) at
 java.io.FileOutputStream.write(FileOutputStream.java:326) at
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at
 java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at
 java.io.FilterOutputStream.flush(FilterOutputStream.java:140) at
 java.io.FilterOutputStream.close(FilterOutputStream.java:158) at
 com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3OutputStream.close(PrestoS3FileSystem.java:995)
 at
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
 at
 org.apache.flink.fs.s3presto.common.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
 at
 org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
 at
 org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.close(FsCheckpointStreamFactory.java:277)
 at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:263) at
 org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:250) at
 org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:122)
 at
 org.apache.flink.runtime.state.AsyncSnapshotCallable.closeSnapshotIO(AsyncSnapshotCallable.java:167)
 at
 org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:83)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
 org.apache.flink.runtime.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:458)
 at
 org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
 at
 org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:1143)
 at

Flink on k8s ，设置 taskmanager.heap.mb 对于 jvm 启动堆大小不生效

2020-04-23 Thread LakeShen

Hi 社区，

最近我在弄 Flink on k8s,使用的 Flink 版本为 Flink 1.6。作业模式为 standalone per job 模式。

我在创建启动 jobmanager 的时候，设置的 taskmanager.heap.mb 为 2000 mb，虽然在 flink web ui
上面看到的 jobmanager  的配置， taskmanager.heap.mb 的确是 2000mb，在我启动 taskmanager
deployment 的时候，我登录到 其中一个 pod 上看，发现 taskmanager 启动的 -xms 和 -xmx 都是 922mb。

我将 taskmanager.heap.mb 设置为 1000 mb，停止我的作业，重启，同样，登录到taskmanager 其中一个
pod,-xms 和 -xmx 都是 922mb,也就是说 设置的taskmanager.heap.mb 没有对 taskmanager 启动的
jvm 堆没有生效。

我看了源码，flink on k8s ,standalone per job 模式，taskmanager 会使用 taskmanager.sh
来启动。在 taskmanager.sh 中，taskmanager heap mb 是根据镜像中的，flink dist 目录下面，conf
目录中的 flink-conf.yaml 里面的配置来启动。

我现在在打镜像的时候，也会把flink-dist 目录打进去，同样把 taskmanager.heap.mb动态传入到
flink-conf.yaml中，但是最终我在启动我的作业的时候，登录到 taskmanager 的一个 pod 上面查看，发现其
flink-conf.yaml 里面， taskmanager.heap.mb 始终是 1024.

是不是在什么地方，把 taskmanager.heap.mb 写死到了 flink-conf.yaml 中呢？


Best,
LakeShen

Re: how to enable retract?

2020-04-23 Thread Benchao Li

FYI, the question has been answered in user-zh ML.

lec ssmi  于2020年4月23日周四 下午2:57写道：

> Hi:
>   Is  there an aggregation operation or window operation, the result is
> with retract characteristics?
>
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: flink1.10基于win10搭建Standlone服务

2020-04-23 Thread 宇张

好吧，windows不在维护了。。。
https://issues.apache.org/jira/browse/FLINK-15925

On Thu, Apr 23, 2020 at 5:30 PM 蒋佳成(Jiacheng Jiang) <920334...@qq.com>
wrote:

>
> Causedby:org.apache.flink.configuration.IllegalConfigurationException:
>
> Thenetworkmemorymin(64mb)andmax(1gb)mismatch,thenetworkmemory
>
> hastoberesolvedandsettoafixedvaluebeforetaskexecutorstarts
>
>
>
>
> 网络内存错误。1.10内存变了很多，你先看看文档
>
>
>
>
> --原始邮件--
> 发件人:"宇张" 发送时间:2020年4月23日(星期四) 下午5:23
> 收件人:"user-zh"
> 主题:Re: flink1.10基于win10搭建Standlone服务
>
>
>
> 呃，是的，某些设置的默认值都变为null了，所以tm启动报错，依次让设置这三个值，但这三个设置后报错变为下面的了，请问这个要怎么搞
> taskmanager.cpu.cores: 3
> taskmanager.memory.task.heap.size: 256mb
> taskmanager.memory.managed.size: 256mb
>
> org.apache.flink.configuration.IllegalConfigurationException: Failed to
> create TaskExecutorResourceSpec
> at
>
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner. at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
> at
>
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
> Caused by: org.apache.flink.configuration.IllegalConfigurationException:
> The network memory min (64 mb) and max (1 gb) mismatch, the network memory
> has to be resolved and set to a fixed value before task executor starts
> at
>
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorNetworkConfigSet(TaskExecutorResourceUtils.java:100)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:85)
> at
>
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
> ... 7 more
>
> On Thu, Apr 23, 2020 at 5:04 PM 蒋佳成(Jiacheng Jiang) <920334...@qq.com
> wrote:
>
>  查看日志估计是内存没有设置
> 
> 
> 
> 
>  --nbsp;原始邮件nbsp;--
>  发件人: "宇张"  发送时间: 2020年4月23日(星期四) 下午5:03
>  收件人: "user-zh"  主题: flink1.10基于win10搭建Standlone服务
> 
> 
> 
>  hi，我这面在win10 基于Standlone搭建了一个Flink1.10（好吧，就是解压启动）
>  ，然后执行start-cluster.bat启动服务，会弹出两个dos窗口，一个jm、一个tm（猜的），
>  但是几秒后tm 对应的dos窗口闪退导致程序没办法申请资源，这个不知道是flink问题还是win10问题，但是flink1.9是正常的

define WATERMARKS in queries/views?

2020-04-23 Thread Őrhidi Mátyás

Dear Community,

is it possible to define WATERMARKS in SQL queries/views? We have a read
only catalog implementation and we would like to assign WMs to the tables
somehow.

Thanks,
Matyas

?????? flink1.10????win10????Standlone????

2020-04-23 Thread ??????(Jiacheng Jiang)

Causedby:org.apache.flink.configuration.IllegalConfigurationException:
Thenetworkmemorymin(64mb)andmax(1gb)mismatch,thenetworkmemory
hastoberesolvedandsettoafixedvaluebeforetaskexecutorstarts




??1.10??




----
??:""

Re: 每天0点数据写入Elasticsearch异常且kafka数据堆积

2020-04-23 Thread zhisheng



oliver yunchang  于2020年4月23日周四 上午12:32写道：

> 非常感谢Leonard Xu和zhisheng的回复
>
> > es index 的 mapping 是否提前设置好了？
> 提前设置好了，提前创建索引的mapping如下：
>   {
>   "xxx-2020.04.23": {
> "mappings": {
>   "doc": {
> "dynamic_templates": [
>   {
> "string_fields": {
>   "match": "*",
>   "match_mapping_type": "string",
>   "mapping": {
> "type": "keyword"
>   }
> }
>   }
> ],
> "properties": {
>   "cost": {
> "type": "long"
>   },
>   "result": {
> "type": "keyword"
>   }
> }
>   }
> }
>   }
> }
> 而待写入数据的字段远不止cost和result
> 查看ES官方文档对dynamic_templates的介绍：When putting new dynamic templates through
> the put mapping <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html>
> API, all existing templates are overwritten.[1]
> 个人猜测是：已经设置的mapping未覆盖全数据字段、写入ES时依旧会调用put mapping API做修改，导致异常
>
> 重新调整了新索引的mapping为全字段，failed to process cluster event (put-mapping) within
> 30s异常消失了
>
> [1]
> https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#dynamic-templates
> <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#dynamic-templates
> >
> Best,
> Oliver yunchang
>
> > 2020年4月22日 下午4:47，zhisheng  写道：
> >
> > hi,
> >
> > es index 的 mapping 是否提前设置好了？
> >
> > 我看到异常 :
> >
> >> failed to process cluster event (put-mapping) within 30s
> >
> > 像是自动建 mapping 超时了
> >
> > Leonard Xu  于2020年4月22日周三 下午4:41写道：
> >
> >> Hi,
> >>
> >> 提前创建的索引的shard配置是怎么样的？集群的reallocation、relocation配置怎样的？
> >> 可以从这方面找思路排查下看看
> >>
> >> 祝好，
> >> Leonard Xu
> >>
> >>
> >>
> >>> 在 2020年4月22日，16:10，Oliver  写道：
> >>>
> >>> hi，
> >>>
> >>>
> >>> 我有一个任务是使用flink将kafka数据写入ES，纯ETL过程，
> >>>
> >>
> 现在遇到的问题是：每天0点之后数据写入ES异常,同时监控发现kafka消息开始堆积，重启任务后，kafka消息堆积现象逐渐恢复，如果不重启则堆积问题一直存在。
> >>>
> >>>
> >>> 想咨询下这种问题应该怎么样排查和处理？
> >>>
> >>>
> >>> flink版本：1.10
> >>> ES版本：6.x
> >>>
> >>>
> >>> 使用jar：flink-sql-connector-elasticsearch6_2.12
> >>>
> >>>
> >>> 补充：数据零点之后00:00-00:01这一分钟之间存在少量写入成功的数据，但大量数据写入失败。其中索引携带日期后缀
> >>> 所以零点涉及索引切换，不过第二天索引是提前创建，0点之后数据写入并不涉及新索引的创建
> >>>
> >>>
> >>> ES异常如下：
> >>>
> >>>
> >>> 2020-04-18 00:01:31,722 ERROR ElasticsearchSinkBase: Failed
> >> Elasticsearch item request: ElasticsearchException[Elasticsearch
> exception
> >> [type=process_cluster_event_timeout_exception, reason=failed to process
> >> cluster event (put-mapping) within 30s]]org.apache.flink.
> >> elasticsearch6.shaded.org.elasticsearch.ElasticsearchException:
> >> Elasticsearch exception [type=process_cluster_event_timeout_exception,
> >> reason=failed to process cluster event (put-mapping) within 30s]
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:510)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:421)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:135)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:198)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:653)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$3(RestHighLevelClient.java:549)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:580)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:621)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >> .elasticsearch.client.RestClient$1.completed(RestClient.java:375)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >> .elasticsearch.client.RestClient$1.completed(RestClient.java:366)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >> .apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
> .apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
> >>>   at org.apache.flink.elasticsearch6.shaded.org
> >>
>

Re: flink1.10基于win10搭建Standlone服务

2020-04-23 Thread Caizhi Weng

Hi，1.10 的配置文件都设好了吗？可以把 log 目录下的日志贴上来看看。

蒋佳成(Jiacheng Jiang) <920334...@qq.com> 于2020年4月23日周四 下午5:04写道：

> 查看日志估计是内存没有设置
>
>
>
>
> --原始邮件--
> 发件人: "宇张" 发送时间: 2020年4月23日(星期四) 下午5:03
> 收件人: "user-zh" 主题: flink1.10基于win10搭建Standlone服务
>
>
>
> hi，我这面在win10 基于Standlone搭建了一个Flink1.10（好吧，就是解压启动）
> ，然后执行start-cluster.bat启动服务，会弹出两个dos窗口，一个jm、一个tm（猜的），
> 但是几秒后tm 对应的dos窗口闪退导致程序没办法申请资源，这个不知道是flink问题还是win10问题，但是flink1.9是正常的

Re: flink1.10基于win10搭建Standlone服务

2020-04-23 Thread 宇张

现在搭建测试环境都要改配置文件了，感觉还是以前的小白式启动（解压运行）友好一点，哈哈

On Thu, Apr 23, 2020 at 5:23 PM 宇张  wrote:

> 呃，是的，某些设置的默认值都变为null了，所以tm启动报错，依次让设置这三个值，但这三个设置后报错变为下面的了，请问这个要怎么搞
> taskmanager.cpu.cores: 3
> taskmanager.memory.task.heap.size: 256mb
> taskmanager.memory.managed.size: 256mb
>
> org.apache.flink.configuration.IllegalConfigurationException: Failed to
> create TaskExecutorResourceSpec
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.(TaskManagerRunner.java:152)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
> at
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
> at
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
> Caused by: org.apache.flink.configuration.IllegalConfigurationException:
> The network memory min (64 mb) and max (1 gb) mismatch, the network memory
> has to be resolved and set to a fixed value before task executor starts
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorNetworkConfigSet(TaskExecutorResourceUtils.java:100)
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:85)
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
> ... 7 more
>
> On Thu, Apr 23, 2020 at 5:04 PM 蒋佳成(Jiacheng Jiang) <920334...@qq.com>
> wrote:
>
>> 查看日志估计是内存没有设置
>>
>>
>>
>>
>> --原始邮件--
>> 发件人: "宇张"> 发送时间: 2020年4月23日(星期四) 下午5:03
>> 收件人: "user-zh"> 主题: flink1.10基于win10搭建Standlone服务
>>
>>
>>
>> hi，我这面在win10 基于Standlone搭建了一个Flink1.10（好吧，就是解压启动）
>> ，然后执行start-cluster.bat启动服务，会弹出两个dos窗口，一个jm、一个tm（猜的），
>> 但是几秒后tm 对应的dos窗口闪退导致程序没办法申请资源，这个不知道是flink问题还是win10问题，但是flink1.9是正常的
>
>

Re: flink1.10基于win10搭建Standlone服务

2020-04-23 Thread 宇张

呃，是的，某些设置的默认值都变为null了，所以tm启动报错，依次让设置这三个值，但这三个设置后报错变为下面的了，请问这个要怎么搞
taskmanager.cpu.cores: 3
taskmanager.memory.task.heap.size: 256mb
taskmanager.memory.managed.size: 256mb

org.apache.flink.configuration.IllegalConfigurationException: Failed to
create TaskExecutorResourceSpec
at
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:72)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.startTaskManager(TaskManagerRunner.java:356)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.(TaskManagerRunner.java:152)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:308)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerSecurely$2(TaskManagerRunner.java:322)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerSecurely(TaskManagerRunner.java:321)
at
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:287)
Caused by: org.apache.flink.configuration.IllegalConfigurationException:
The network memory min (64 mb) and max (1 gb) mismatch, the network memory
has to be resolved and set to a fixed value before task executor starts
at
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorNetworkConfigSet(TaskExecutorResourceUtils.java:100)
at
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.checkTaskExecutorResourceConfigSet(TaskExecutorResourceUtils.java:85)
at
org.apache.flink.runtime.taskexecutor.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:70)
... 7 more

On Thu, Apr 23, 2020 at 5:04 PM 蒋佳成(Jiacheng Jiang) <920334...@qq.com>
wrote:

> 查看日志估计是内存没有设置
>
>
>
>
> --原始邮件--
> 发件人: "宇张" 发送时间: 2020年4月23日(星期四) 下午5:03
> 收件人: "user-zh" 主题: flink1.10基于win10搭建Standlone服务
>
>
>
> hi，我这面在win10 基于Standlone搭建了一个Flink1.10（好吧，就是解压启动）
> ，然后执行start-cluster.bat启动服务，会弹出两个dos窗口，一个jm、一个tm（猜的），
> 但是几秒后tm 对应的dos窗口闪退导致程序没办法申请资源，这个不知道是flink问题还是win10问题，但是flink1.9是正常的

Re: retract的问题

2020-04-23 Thread lec ssmi

明白了，谢谢。

Benchao Li  于2020年4月23日周四 下午5:08写道：

> 不是这个意思。是说scalar function不需要额外处理，所以它一套逻辑就可以处理两种类型的消息了。
> 它不需要区分消息类型，只需要处理消息本身（消息类型是在header里）。
>
> lec ssmi  于2020年4月23日周四 下午5:00写道：
>
> > 那也就是说UDF这种ScalarFunction，是没有办法处理Retract的了？因为它把DELETE记录和INSERT记录都做了相同的操作。
> >
> > Benchao Li  于2020年4月23日周四 下午4:54写道：
> >
> > > Hi Jingsong,
> > > 我建了一个jira[1] 来跟踪这个事情。
> > >
> > > Hi lec，
> > > sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
> > > scalar function不需要这样子处理，因为它本身没有状态。scalar
> > function对于消息的类型是不需要判断的，处理过程都是一样的。
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-17343
> > >
> > > lec ssmi  于2020年4月23日周四 下午4:41写道：
> > >
> > > > 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
> > > > 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
> > > >  if( type='DELETE'){
> > > >  sum=sum-value
> > > > } else if(type='INSERT'){
> > > > sum=sum+value
> > > >}
> > > >  的逻辑呢？
> > > >  但是在ScalarFunction中，只实现了eval方法，也就是只有
> INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午4:33写道：
> > > >
> > > > > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午4:29写道：
> > > > >
> > > > > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > > > > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> > > > > >
> > > > > > Benchao Li  于2020年4月23日周四 下午4:26写道：
> > > > > >
> > > > > > > time interval join不允许输入是非append的。
> > > > > > >
> > > > > > >
> > > > > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > > > > > >
> > > > > > > > 那如果是两个retract算子后的流进行time interval join，
> > > > > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > > > > > >
> > > > > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > > > > > >
> > > > > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > > > > > window的确是会需要处理retract，除此之外，regular
> > > > > > > > > group by也需要。
> > > > > > > > >
> > > > > > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > > > > > >
> > > > > > > > > > 谢谢。
> > > > > > > > > >
> > > > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > > > > > 但是对于Table
> > > > > > > > > >
> > > > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> > > > > implemented
> > > > > > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > > > > > window的时候才有retract?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > > > > > >
> > > > > > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > > > > > >
> > > > > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > > > > > >
> > > > > > > > > > > lec ssmi  于2020年4月23日周四
> > 下午3:45写道：
> > > > > > > > > > >
> > > > > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Benchao Li  于2020年4月23日周四
> > 下午3:39写道：
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi lec,
> > > > > > > > > > > > >
> > > > > > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > > > >
> > > > > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > > > > > fire或者late
> > > > > > > > > > fire的时候。
> > > > > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > 2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > > > >
> > > > > > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > > > > >
> retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > > > >
> > > > > > > > > > > > > 是的。
> > > > > > > > > > > > >
> > > > > > > > > > > > > lec ssmi  于2020年4月23日周四
> > > > 下午3:25写道：
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi:
> > > > > > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > > > > > >
> >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > > > > >
> > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > >
> > > > > > > > > > > > > Benchao Li
> > > > > > > > > > > > > School of Electronics Engineering and Computer
>

Re: A Strategy for Capacity Testing

2020-04-23 Thread Xintong Song

Hi Morgan,

If I understand correctly, you mean you want to measure the max throughput
that your Flink application can deal with given the certain resource
setups? I think forcing Flink to catch-up the data should help on that.

Please be aware that Flink may need a warming-up time for the performance
to get stabilized. Depends on your workload, this could take up to tens of
minutes.

Please also be careful with aggregations over large windows. The emitting
of windows might introduce large processing workloads, fluctuating the
measured throughput.

Thank you~

Xintong Song



On Thu, Apr 23, 2020 at 4:34 PM Morgan Geldenhuys <
morgan.geldenh...@tu-berlin.de> wrote:

> Community,
>
> I am interested in knowing what is the recommended way of capacity
> planning a particular Flink application with current resource
> allocation. Taking a look at the Flink documentation
> (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning),
>
> extra resources need to be allocated on top of what has already been
> assigned for normal operations for when failures occur. The amount of
> extra resources will determine how quickly the application can catch-up
> to the head of the input stream, e.g. kafka, considering event time
> processing.
>
> So, as far as i know the recommended way of testing the maximum capacity
> of the system is to slowly increase the ingestion rate to find the point
> just before backpressure would kick in.
>
> Would a strategy of starting the job at an earlier timestamp far enough
> in the past so that the system is forced to catch-up for a few minutes,
> and then take an average measurement of the ingress rate over this time
> be a sufficient strategy for determining the maximum number of messages
> that can be processed?
>
> Thank you in advance! Have a great day!
>
> Regards,
> M.
>

Re: retract的问题

2020-04-23 Thread Benchao Li

不是这个意思。是说scalar function不需要额外处理，所以它一套逻辑就可以处理两种类型的消息了。
它不需要区分消息类型，只需要处理消息本身（消息类型是在header里）。

lec ssmi  于2020年4月23日周四 下午5:00写道：

> 那也就是说UDF这种ScalarFunction，是没有办法处理Retract的了？因为它把DELETE记录和INSERT记录都做了相同的操作。
>
> Benchao Li  于2020年4月23日周四 下午4:54写道：
>
> > Hi Jingsong,
> > 我建了一个jira[1] 来跟踪这个事情。
> >
> > Hi lec，
> > sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
> > scalar function不需要这样子处理，因为它本身没有状态。scalar
> function对于消息的类型是不需要判断的，处理过程都是一样的。
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-17343
> >
> > lec ssmi  于2020年4月23日周四 下午4:41写道：
> >
> > > 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
> > > 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
> > >  if( type='DELETE'){
> > >  sum=sum-value
> > > } else if(type='INSERT'){
> > > sum=sum+value
> > >}
> > >  的逻辑呢？
> > >  但是在ScalarFunction中，只实现了eval方法，也就是只有 INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
> > >
> > > Benchao Li  于2020年4月23日周四 下午4:33写道：
> > >
> > > > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
> > > >
> > > > lec ssmi  于2020年4月23日周四 下午4:29写道：
> > > >
> > > > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > > > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> > > > >
> > > > > Benchao Li  于2020年4月23日周四 下午4:26写道：
> > > > >
> > > > > > time interval join不允许输入是非append的。
> > > > > >
> > > > > >
> > > > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > > > > >
> > > > > > > 那如果是两个retract算子后的流进行time interval join，
> > > > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > > > > >
> > > > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > > > > >
> > > > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > > > > window的确是会需要处理retract，除此之外，regular
> > > > > > > > group by也需要。
> > > > > > > >
> > > > > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > > > > >
> > > > > > > > > 谢谢。
> > > > > > > > >
> > > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > > > > 但是对于Table
> > > > > > > > >
> > > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> > > > implemented
> > > > > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > > > > window的时候才有retract?
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > > > > >
> > > > > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > > > > >
> > > > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > > > > >
> > > > > > > > > > lec ssmi  于2020年4月23日周四
> 下午3:45写道：
> > > > > > > > > >
> > > > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Benchao Li  于2020年4月23日周四
> 下午3:39写道：
> > > > > > > > > > >
> > > > > > > > > > > > Hi lec,
> > > > > > > > > > > >
> > > > > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > > >
> > > > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > > > > fire或者late
> > > > > > > > > fire的时候。
> > > > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > > > > >
> > > > > > > > > > > >
> 2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > > >
> > > > > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > > >
> > > > > > > > > > > > 是的。
> > > > > > > > > > > >
> > > > > > > > > > > > lec ssmi  于2020年4月23日周四
> > > 下午3:25写道：
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi:
> > > > > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > > > > >
>  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > > > >
> > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > >
> > > > > > > > > > > > Benchao Li
> > > > > > > > > > > > School of Electronics Engineering and Computer
> Science,
> > > > > Peking
> > > > > > > > > > University
> > > > > > > > > > > > Tel:+86-15650713730
> > > > > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Benchao

??????flink1.10????win10????Standlone????

2020-04-23 Thread ??????(Jiacheng Jiang)

??




----
??: ""

flink1.10基于win10搭建Standlone服务

2020-04-23 Thread 宇张

hi，我这面在win10 基于Standlone搭建了一个Flink1.10（好吧，就是解压启动）
，然后执行start-cluster.bat启动服务，会弹出两个dos窗口，一个jm、一个tm（猜的），
但是几秒后tm 对应的dos窗口闪退导致程序没办法申请资源，这个不知道是flink问题还是win10问题，但是flink1.9是正常的

Re: Task Assignment

2020-04-23 Thread Marta Paes Moreira

Hi, Navneeth.

If you *key* your stream using stream.keyBy(…), this will logically split
your input and all the records with the same key will be processed in the
same operator instance. This is the default behavior in Flink for keyed
streams and transparently handled.

You can read more about it in the documentation [1].

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#keyed-state-and-operator-state

On Thu, Apr 23, 2020 at 7:44 AM Navneeth Krishnan 
wrote:

> Hi All,
>
> Is there a way for an upstream operator to know how the downstream
> operator tasks are assigned? Basically I want to group my messages to be
> processed on slots in the same node based on some key.
>
> Thanks
>

Re: retract的问题

2020-04-23 Thread lec ssmi

那也就是说UDF这种ScalarFunction，是没有办法处理Retract的了？因为它把DELETE记录和INSERT记录都做了相同的操作。

Benchao Li  于2020年4月23日周四 下午4:54写道：

> Hi Jingsong,
> 我建了一个jira[1] 来跟踪这个事情。
>
> Hi lec，
> sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
> scalar function不需要这样子处理，因为它本身没有状态。scalar function对于消息的类型是不需要判断的，处理过程都是一样的。
>
> [1] https://issues.apache.org/jira/browse/FLINK-17343
>
> lec ssmi  于2020年4月23日周四 下午4:41写道：
>
> > 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
> > 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
> >  if( type='DELETE'){
> >  sum=sum-value
> > } else if(type='INSERT'){
> > sum=sum+value
> >}
> >  的逻辑呢？
> >  但是在ScalarFunction中，只实现了eval方法，也就是只有 INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
> >
> > Benchao Li  于2020年4月23日周四 下午4:33写道：
> >
> > > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
> > >
> > > lec ssmi  于2020年4月23日周四 下午4:29写道：
> > >
> > > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午4:26写道：
> > > >
> > > > > time interval join不允许输入是非append的。
> > > > >
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > > > >
> > > > > > 那如果是两个retract算子后的流进行time interval join，
> > > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > > > >
> > > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > > > >
> > > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > > > window的确是会需要处理retract，除此之外，regular
> > > > > > > group by也需要。
> > > > > > >
> > > > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > > > >
> > > > > > > > 谢谢。
> > > > > > > >
> > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > > > 但是对于Table
> > > > > > > >
> > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> > > implemented
> > > > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > > > window的时候才有retract?
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > > > >
> > > > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > > > >
> > > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > > > >
> > > > > > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > > > > > >
> > > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > > > > > >
> > > > > > > > > > > Hi lec,
> > > > > > > > > > >
> > > > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > >
> > > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > > > fire或者late
> > > > > > > > fire的时候。
> > > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > > > >
> > > > > > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > >
> > > > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > >
> > > > > > > > > > > 是的。
> > > > > > > > > > >
> > > > > > > > > > > lec ssmi  于2020年4月23日周四
> > 下午3:25写道：
> > > > > > > > > > >
> > > > > > > > > > > > Hi:
> > > > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > > >
> >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Benchao Li
> > > > > > > > > > > School of Electronics Engineering and Computer Science,
> > > > Peking
> > > > > > > > > University
> > > > > > > > > > > Tel:+86-15650713730
> > > > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Benchao Li
> > > > > > > > > School of Electronics Engineering and Computer Science,
> > Peking
> > > > > > > University
> > > > > > > > > Tel:+86-15650713730
> > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Benchao Li
> > > > > > > School of Electronics Engineering and Computer Science, Peking
> > >

Re: retract的问题

2020-04-23 Thread Benchao Li

Hi Jingsong,
我建了一个jira[1] 来跟踪这个事情。

Hi lec，
sum函数不属于scalar函数。sum的内置实现是有retract的版本的，参考：IntSumWithRetractAggFunction
scalar function不需要这样子处理，因为它本身没有状态。scalar function对于消息的类型是不需要判断的，处理过程都是一样的。

[1] https://issues.apache.org/jira/browse/FLINK-17343

lec ssmi  于2020年4月23日周四 下午4:41写道：

> 其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
> 我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
>  if( type='DELETE'){
>  sum=sum-value
> } else if(type='INSERT'){
> sum=sum+value
>}
>  的逻辑呢？
>  但是在ScalarFunction中，只实现了eval方法，也就是只有 INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。
>
> Benchao Li  于2020年4月23日周四 下午4:33写道：
>
> > 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
> >
> > lec ssmi  于2020年4月23日周四 下午4:29写道：
> >
> > > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> > >
> > > Benchao Li  于2020年4月23日周四 下午4:26写道：
> > >
> > > > time interval join不允许输入是非append的。
> > > >
> > > >
> > > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > > >
> > > > > 那如果是两个retract算子后的流进行time interval join，
> > > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > > >
> > > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > > >
> > > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > > window的确是会需要处理retract，除此之外，regular
> > > > > > group by也需要。
> > > > > >
> > > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > > >
> > > > > > > 谢谢。
> > > > > > >
> > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > > 但是对于Table
> > > > > > >
> > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> > implemented
> > > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > > window的时候才有retract?
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > > >
> > > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > > >
> > > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > > >
> > > > > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > > > > >
> > > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > > > > >
> > > > > > > > > > Hi lec,
> > > > > > > > > >
> > > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > >
> > > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > > fire或者late
> > > > > > > fire的时候。
> > > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > > >
> > > > > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > >
> > > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > >
> > > > > > > > > > 是的。
> > > > > > > > > >
> > > > > > > > > > lec ssmi  于2020年4月23日周四
> 下午3:25写道：
> > > > > > > > > >
> > > > > > > > > > > Hi:
> > > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > > >
>  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > Benchao Li
> > > > > > > > > > School of Electronics Engineering and Computer Science,
> > > Peking
> > > > > > > > University
> > > > > > > > > > Tel:+86-15650713730
> > > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Benchao Li
> > > > > > > > School of Electronics Engineering and Computer Science,
> Peking
> > > > > > University
> > > > > > > > Tel:+86-15650713730
> > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Benchao Li
> > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > University
> > > > > > Tel:+86-15650713730
> > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > University
> > > > Tel:+86-15650713730
> > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
>

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread 宇张

嗯嗯，刚刚把blink包改为provided后程序能正常运行了，但接下来工程打包的时候都要手动加下面这些感觉比较麻烦，不过能运行就好。

感谢答疑

org.apache.flink:flink-connector-kafka-base_2.11


On Thu, Apr 23, 2020 at 4:36 PM Jingsong Li  wrote:

> 不能把lib下有的jar打进去。
>
> 比如flink-table-planner-blink，lib下也有一份flink-table-planner-blink
>
> 把这一堆去掉吧：
> org.apache.flink:flink-table-common
> org.apache.flink:flink-table-api-java
>
> org.apache.flink:flink-table-api-java-bridge_2.11
> org.apache.flink:flink-table-planner-blink_2.11
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 23, 2020 at 4:24 PM 宇张  wrote:
>
> > 》》加上  >
> >
> combine.children="append">这部分配置之后对应的TableFactory文件里面有对应的KafkaFactory信息了，虽说程序还是无法运行，但是错误变为jar包冲突了，也就不是先前加载不到的错误；
> > 但是感觉每次都配置这些貌似对用户不太友好。
> >
> > org.codehaus.janino.CompilerFactory cannot be cast to
> > org.codehaus.commons.compiler.ICompilerFactory
> >
> >
> > 
> > 
> > 
> > org.apache.flink:flink-table-common
> > org.apache.flink:flink-table-api-java
> >
> > org.apache.flink:flink-table-api-java-bridge_2.11
> >
>  org.apache.flink:flink-table-planner-blink_2.11
> >
>  org.apache.flink:flink-connector-kafka-0.11_2.11
> >
>  org.apache.flink:flink-connector-kafka-0.9_2.11
> >
>  org.apache.flink:flink-connector-kafka-0.10_2.11
> >
>  org.apache.flink:flink-connector-kafka-base_2.11
> > org.apache.flink:flink-jdbc_2.11
> > org.apache.flink:flink-json
> > 
> > 
> >
> > 
> > 
> >  >
> >
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
> > 
> >  >
> >
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
> > Apache Flink
> > UTF-8
> > 
> >  >
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > com.akulaku.data.main.StreamMain
> > 
> > 
> >
> >
> > On Thu, Apr 23, 2020 at 4:07 PM Jingsong Li 
> > wrote:
> >
> > > Hi 张，
> > >
> > > 加上这个[1]试试：
> > >
> > > 
> > >   
> > >> >
> >
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
> > >   
> > >> >
> >
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
> > > Apache Flink
> > > UTF-8
> > >   
> > > 
> > >
> > >
> > > [1]https://github.com/apache/flink/blob/master/pom.xml#L1654
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Apr 23, 2020 at 3:56 PM 宇张  wrote:
> > >
> > > > 下面配置中，如果不加flink-json模块是可以打出kafkafactory的，加了flink-json模块就只剩下
> > > > JsonRowFormatFactory一个类，kafka的Factory就打印不出来了，所以是不是某一部分导致冲突了，
> > > > 但我看我先前flink1.9的工程，里面也无法打印kafkaFactory类，只有一个
> > > > GenericInMemoryCatalogFactory类，但flink1.9和1.10对比，看发布文档类加载策略有过改动
> > > >
> > > > org.apache.flink:flink-connector-kafka-0.11_2.11
> > > > org.apache.flink:flink-connector-kafka-base_2.11
> > > > org.apache.flink:flink-json
> > > >
> > > >
> > > > On Thu, Apr 23, 2020 at 3:43 PM Jingsong Li 
> > > > wrote:
> > > >
> > > > > > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > > > > 有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > > >
> > > > > @tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找
> > > > >
> > > > > @宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Thu, Apr 23, 2020 at 3:35 PM tison 
> wrote:
> > > > >
> > > > > > 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > tison  于2020年4月23日周四 下午3:34写道：
> > > > > >
> > > > > > > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session
> > 模式，在
> > > > > Client
> > > > > > > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > 宇张  于2020年4月23日周四 上午11:53写道：
> > > > > > >
> > > > > > >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > > > >> KafkaTableSourceSinkFactory
> > > > > > >> 吗？（同时 class loading 为 child-first）
> > > > > > >> 》》是的
> > > > > > >>
> > > > > > >> On Thu, Apr 23, 2020 at 11:42 AM tison 
> > > > wrote:
> > > > > > >>
> > > > > > >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > > > > > >> > >这个能拿到
> > > > > > >> >
> > > > > > >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > > > >> > KafkaTableSourceSinkFactory
> > > > > > >> > 吗？（同时 class loading 为 child-first）
> > > > > > >> >
> > > > > > >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的
> > ClassLoader
> > > > > > 有问题。之前
> > > > > > >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > tison.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > 宇张  于2020年4月23日周四 上午11:36写道：
> > > > > > >> >
> > > > > > >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > > > > > >> > >
> > > > > > >> > > 
> > > > >

Re: retract的问题

2020-04-23 Thread lec ssmi

其实我想说，如果说sql内置的算子，包括UDF这种ScalarFunction默认都是能够处理retract的话，
我们举一个最简单的例子：sum函数，那内部实现是否需要具有一个类似于
 if( type='DELETE'){
 sum=sum-value
} else if(type='INSERT'){
sum=sum+value
   }
 的逻辑呢？
 但是在ScalarFunction中，只实现了eval方法，也就是只有 INSERT的那部分相加的逻辑，没有DELETE那部分相减的逻辑。

Benchao Li  于2020年4月23日周四 下午4:33写道：

> 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
>
> lec ssmi  于2020年4月23日周四 下午4:29写道：
>
> > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> >
> > Benchao Li  于2020年4月23日周四 下午4:26写道：
> >
> > > time interval join不允许输入是非append的。
> > >
> > >
> > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > >
> > > > 那如果是两个retract算子后的流进行time interval join，
> > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > >
> > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > window的确是会需要处理retract，除此之外，regular
> > > > > group by也需要。
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > >
> > > > > > 谢谢。
> > > > > >
> 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > 但是对于Table
> > > > > >
> API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> implemented
> > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > window的时候才有retract?
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > >
> > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > >
> > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > >
> > > > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > > > >
> > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > >
> > > > > > > >
> > > > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > > > >
> > > > > > > > > Hi lec,
> > > > > > > > >
> > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > >
> > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > fire或者late
> > > > > > fire的时候。
> > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > >
> > > > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > >
> > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > >
> > > > > > > > > 是的。
> > > > > > > > >
> > > > > > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > > > > > >
> > > > > > > > > > Hi:
> > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Benchao Li
> > > > > > > > > School of Electronics Engineering and Computer Science,
> > Peking
> > > > > > > University
> > > > > > > > > Tel:+86-15650713730
> > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Benchao Li
> > > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > > University
> > > > > > > Tel:+86-15650713730
> > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Benchao Li
> > > > > School of Electronics Engineering and Computer Science, Peking
> > > University
> > > > > Tel:+86-15650713730
> > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Benchao Li
> > > School of Electronics Engineering and Computer Science, Peking
> University
> > > Tel:+86-15650713730
> > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

Re: retract的问题

2020-04-23 Thread Jingsong Li

可以建个JIRA来更新文档吗？现在retract的文档的确有点confuse

Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 4:33 PM Benchao Li  wrote:

> 阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。
>
> lec ssmi  于2020年4月23日周四 下午4:29写道：
>
> > 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> > window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
> >
> > Benchao Li  于2020年4月23日周四 下午4:26写道：
> >
> > > time interval join不允许输入是非append的。
> > >
> > >
> > > lec ssmi  于2020年4月23日周四 下午4:18写道：
> > >
> > > > 那如果是两个retract算子后的流进行time interval join，
> > > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > > >
> > > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > > window的确是会需要处理retract，除此之外，regular
> > > > > group by也需要。
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > > >
> > > > > > 谢谢。
> > > > > >
> 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > > 但是对于Table
> > > > > >
> API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be
> implemented
> > > > > > for  datastream bounded over aggregate  。 是否说只有over
> > > window的时候才有retract?
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > > >
> > > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > > >
> > > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > > >
> > > > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > > > >
> > > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > > >
> > > > > > > >
> > > > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > > > >
> > > > > > > > > Hi lec,
> > > > > > > > >
> > > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > >
> > > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> > fire或者late
> > > > > > fire的时候。
> > > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > > >
> > > > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > >
> > > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > >
> > > > > > > > > 是的。
> > > > > > > > >
> > > > > > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > > > > > >
> > > > > > > > > > Hi:
> > > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Benchao Li
> > > > > > > > > School of Electronics Engineering and Computer Science,
> > Peking
> > > > > > > University
> > > > > > > > > Tel:+86-15650713730
> > > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Benchao Li
> > > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > > University
> > > > > > > Tel:+86-15650713730
> > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Benchao Li
> > > > > School of Electronics Engineering and Computer Science, Peking
> > > University
> > > > > Tel:+86-15650713730
> > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Benchao Li
> > > School of Electronics Engineering and Computer Science, Peking
> University
> > > Tel:+86-15650713730
> > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>


-- 
Best, Jingsong Lee

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread Jingsong Li

不能把lib下有的jar打进去。

比如flink-table-planner-blink，lib下也有一份flink-table-planner-blink

把这一堆去掉吧：
org.apache.flink:flink-table-common
org.apache.flink:flink-table-api-java
org.apache.flink:flink-table-api-java-bridge_2.11
org.apache.flink:flink-table-planner-blink_2.11

Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 4:24 PM 宇张  wrote:

> 》》加上 
> combine.children="append">这部分配置之后对应的TableFactory文件里面有对应的KafkaFactory信息了，虽说程序还是无法运行，但是错误变为jar包冲突了，也就不是先前加载不到的错误；
> 但是感觉每次都配置这些貌似对用户不太友好。
>
> org.codehaus.janino.CompilerFactory cannot be cast to
> org.codehaus.commons.compiler.ICompilerFactory
>
>
> 
> 
> 
> org.apache.flink:flink-table-common
> org.apache.flink:flink-table-api-java
>
> org.apache.flink:flink-table-api-java-bridge_2.11
> org.apache.flink:flink-table-planner-blink_2.11
> org.apache.flink:flink-connector-kafka-0.11_2.11
> org.apache.flink:flink-connector-kafka-0.9_2.11
> org.apache.flink:flink-connector-kafka-0.10_2.11
> org.apache.flink:flink-connector-kafka-base_2.11
> org.apache.flink:flink-jdbc_2.11
> org.apache.flink:flink-json
> 
> 
>
> 
> 
> 
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
> 
> 
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
> Apache Flink
> UTF-8
> 
> 
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> com.akulaku.data.main.StreamMain
> 
> 
>
>
> On Thu, Apr 23, 2020 at 4:07 PM Jingsong Li 
> wrote:
>
> > Hi 张，
> >
> > 加上这个[1]试试：
> >
> > 
> >   
> >>
> implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
> >   
> >>
> implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
> > Apache Flink
> > UTF-8
> >   
> > 
> >
> >
> > [1]https://github.com/apache/flink/blob/master/pom.xml#L1654
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Apr 23, 2020 at 3:56 PM 宇张  wrote:
> >
> > > 下面配置中，如果不加flink-json模块是可以打出kafkafactory的，加了flink-json模块就只剩下
> > > JsonRowFormatFactory一个类，kafka的Factory就打印不出来了，所以是不是某一部分导致冲突了，
> > > 但我看我先前flink1.9的工程，里面也无法打印kafkaFactory类，只有一个
> > > GenericInMemoryCatalogFactory类，但flink1.9和1.10对比，看发布文档类加载策略有过改动
> > >
> > > org.apache.flink:flink-connector-kafka-0.11_2.11
> > > org.apache.flink:flink-connector-kafka-base_2.11
> > > org.apache.flink:flink-json
> > >
> > >
> > > On Thu, Apr 23, 2020 at 3:43 PM Jingsong Li 
> > > wrote:
> > >
> > > > > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > > > 有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > >
> > > > @tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找
> > > >
> > > > @宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Apr 23, 2020 at 3:35 PM tison  wrote:
> > > >
> > > > > 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > tison  于2020年4月23日周四 下午3:34写道：
> > > > >
> > > > > > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session
> 模式，在
> > > > Client
> > > > > > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > 宇张  于2020年4月23日周四 上午11:53写道：
> > > > > >
> > > > > >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > > >> KafkaTableSourceSinkFactory
> > > > > >> 吗？（同时 class loading 为 child-first）
> > > > > >> 》》是的
> > > > > >>
> > > > > >> On Thu, Apr 23, 2020 at 11:42 AM tison 
> > > wrote:
> > > > > >>
> > > > > >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > > > > >> > >这个能拿到
> > > > > >> >
> > > > > >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > > >> > KafkaTableSourceSinkFactory
> > > > > >> > 吗？（同时 class loading 为 child-first）
> > > > > >> >
> > > > > >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的
> ClassLoader
> > > > > 有问题。之前
> > > > > >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > tison.
> > > > > >> >
> > > > > >> >
> > > > > >> > 宇张  于2020年4月23日周四 上午11:36写道：
> > > > > >> >
> > > > > >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > > > > >> > >
> > > > > >> > > 
> > > > > >> > > org.apache.maven.plugins
> > > > > >> > > maven-shade-plugin
> > > > > >> > > 
> > > > > >> > > 
> > > > > >> > > 
> > > > > >> > > package
> > > > > >> > > 
> > > > > >> > > shade
> > > > > >> > > 
> > > > > >> > > 
> > > > > >> > > 
> > > > > >> > >  > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
>

A Strategy for Capacity Testing

2020-04-23 Thread Morgan Geldenhuys


Community,

I am interested in knowing what is the recommended way of capacity 
planning a particular Flink application with current resource 
allocation. Taking a look at the Flink documentation 
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#capacity-planning), 
extra resources need to be allocated on top of what has already been 
assigned for normal operations for when failures occur. The amount of 
extra resources will determine how quickly the application can catch-up 
to the head of the input stream, e.g. kafka, considering event time 
processing.


So, as far as i know the recommended way of testing the maximum capacity 
of the system is to slowly increase the ingestion rate to find the point 
just before backpressure would kick in.


Would a strategy of starting the job at an earlier timestamp far enough 
in the past so that the system is forced to catch-up for a few minutes, 
and then take an average measurement of the ingress rate over this time 
be a sufficient strategy for determining the maximum number of messages 
that can be processed?


Thank you in advance! Have a great day!

Regards,
M.

Re: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-23 Thread Leonard Xu

Hi，

看起来应该是你之前改代码时引入的小bug，因为从代码路径和测试来看都不能复现这个问题。
另外，如果修改了源代码记得邮件里说明下，不然好南

祝好，
Leonard Xu

> 在 2020年4月23日，16:26，1101300123  写道：
> 
> 我重新在源码里打了一些日志编译后，之前的问题不见了，试了好多次没有复现了，之前因为集成clickhouse 
> 改过源码的delete代码，不知道是不是这个引起的
> 在2020年4月23日 16:23，Leonard Xu 写道：
> Hi,
> 我本地复现了下，用1.10.0发现的你的sql是ok的，结果也符合预期☺️，如下[1]：
> 看到你建了JIRA，我们在issue里继续跟进吧
> 
> 祝好，
> Leonard Xu
> 
> [1]
> mysql> select * from order_state_cnt;
> ++--+--+
> | order_date | product_code | cnt  |
> ++--+--+
> | 2020-04-01 | product1 |3 |
> | 2020-04-01 | product2 |5 |
> | 2020-04-01 | product1 |5 |
> | 2020-04-01 | product2 |9 |
> ++--+--+
> 4 rows in set (0.00 sec)
> 
> mysql> select * from order_state_cnt;
> ++--+--+
> | order_date | product_code | cnt  |
> ++--+--+
> | 2020-04-01 | product1 |3 |
> | 2020-04-01 | product2 |5 |
> | 2020-04-01 | product1 |5 |
> | 2020-04-01 | product2 |9 |
> | 2020-04-01 | product1 |2 |
> | 2020-04-01 | product2 |4 |
> ++--+--+
> 6 rows in set (0.00 sec)
> 
> 
> 
> 在 2020年4月23日，10:48，1101300123  写道：
> 
> 
> 
> 我给你一些数据和代码吧!和我真实场景错误一样
> 订单主表：orders
> 13点两条记录；order_state是状态 0取消 1待支付
> {"order_no":"order1","order_state":1,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","order_state":1,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:00:00"}
> 
> 
> 13:15
> 来了一条新的记录 取消订单
> {"order_no":"order1","order_state":0,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:15:00"}
> 
> 
> 订单明细表：order_detail
> 4条记录
> {"order_no":"order1","product_code":"product1","quantity":3,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order1","product_code":"product2","quantity":5,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","product_code":"product1","quantity":2,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","product_code":"product2","quantity":4,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> 
> 
> 需求的要求是当订单创建后我们就要统计该订单对应的商品数量，而当订单状态变为取消时我们要减掉该订单对应的商品数量。
> 
> 
> 代码
> package Learn.kafkasql;
> 
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
> import org.apache.flink.table.api.EnvironmentSettings;
> import org.apache.flink.table.api.Table;
> import org.apache.flink.table.api.java.StreamTableEnvironment;
> import org.apache.flink.types.Row;
> 
> public class SqlCount {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env 
> =StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
> StreamTableEnvironment tenv = StreamTableEnvironment.create(env,settings);
> 
> tenv.sqlUpdate("CREATE TABLE orders " +
> "(" +
> "   order_no string," +
> "   order_state  int," +
> "   pay_time string," +
> "   create_time  string," +
> "   update_time  string" +
> " ) " +
> "   WITH (" +
> "   'connector.type' = 'kafka',   " +
> "   'connector.version' = 'universal', " +//--kafka版本
> "   'connector.topic' = 'tp_orders'," +//--kafkatopic
> "   'connector.properties.zookeeper.connect' = 
> '192.168.179.120:2181', " +
> "   'connector.properties.bootstrap.servers' = 
> '192.168.179.120:9092'," +
> "   'connector.properties.group.id' = 'testGroup'," +
> "   'connector.startup-mode' = 'latest-offset'," +
> "   'format.type' = 'json'  " +//--数据为json格式
> " )");
> tenv.sqlUpdate("CREATE TABLE order_detail " +
> "(" +
> "   order_no string," +
> "   product_code string," +
> "   quantity int," +
> "   create_time  string," +
> "   update_time  string" +
> " ) " +
> "   WITH (" +
> "   'connector.type' = 'kafka', " +
> "   'connector.version' = 'universal',  " +//--kafka版本
> "   'connector.topic' = 'tp_order_detail'," +//--kafkatopic
> "   'connector.properties.zookeeper.connect' = 
> '192.168.179.120:2181', " +
> "   'connector.properties.bootstrap.servers' = 
> '192.168.179.120:9092'," +
> "   'connector.properties.group.id' = 'testGroup'," +
> "   'connector.startup-mode' = 'latest-offset'," +
> "   'format.type' = 'json'  " +//--数据为json格式
> " )");
> 
> tenv.sqlUpdate("CREATE TABLE product_sale" +
> " (" +
>

Re: retract的问题

2020-04-23 Thread Benchao Li

阿里云上提供的Blink应该是内部版本，跟社区版本有些不一样。我刚才说的都是基于社区版本的。

lec ssmi  于2020年4月23日周四 下午4:29写道：

> 奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
> window做的，然后再做的join，然后将join的结果进行tumble window 聚合。
>
> Benchao Li  于2020年4月23日周四 下午4:26写道：
>
> > time interval join不允许输入是非append的。
> >
> >
> > lec ssmi  于2020年4月23日周四 下午4:18写道：
> >
> > > 那如果是两个retract算子后的流进行time interval join，
> > > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> > >
> > > Benchao Li  于2020年4月23日周四 下午4:11写道：
> > >
> > > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > > window的确是会需要处理retract，除此之外，regular
> > > > group by也需要。
> > > >
> > > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > > >
> > > > > 谢谢。
> > > > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > > 但是对于Table
> > > > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
> > > > > for  datastream bounded over aggregate  。 是否说只有over
> > window的时候才有retract?
> > > > >
> > > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > > >
> > > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > > >
> > > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > > >
> > > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > > >
> > > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > > >
> > > > > > >
> > > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > > >
> > > > > > > > Hi lec,
> > > > > > > >
> > > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > >
> > > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early
> fire或者late
> > > > > fire的时候。
> > > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > > >
> > > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > >
> > > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > > >
> > > > > > > >
> > > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > >
> > > > > > > > 是的。
> > > > > > > >
> > > > > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > > > > >
> > > > > > > > > Hi:
> > > > > > > > >有几个问题想咨询下大佬：
> > > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > > >
> > > > > > >
> > > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Benchao Li
> > > > > > > > School of Electronics Engineering and Computer Science,
> Peking
> > > > > > University
> > > > > > > > Tel:+86-15650713730
> > > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Benchao Li
> > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > University
> > > > > > Tel:+86-15650713730
> > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > University
> > > > Tel:+86-15650713730
> > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: retract的问题

2020-04-23 Thread lec ssmi

奇怪，目前我们使用阿里云的Blink，使用了join前的两个流，都是通过last_value 加上over
window做的，然后再做的join，然后将join的结果进行tumble window 聚合。

Benchao Li  于2020年4月23日周四 下午4:26写道：

> time interval join不允许输入是非append的。
>
>
> lec ssmi  于2020年4月23日周四 下午4:18写道：
>
> > 那如果是两个retract算子后的流进行time interval join，
> > 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
> >
> > Benchao Li  于2020年4月23日周四 下午4:11写道：
> >
> > > 内置的*聚合函数*应该是都能处理retract消息的。
> > > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> > window的确是会需要处理retract，除此之外，regular
> > > group by也需要。
> > >
> > > lec ssmi  于2020年4月23日周四 下午4:05写道：
> > >
> > > > 谢谢。
> > > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > > 但是对于Table
> > > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
> > > > for  datastream bounded over aggregate  。 是否说只有over
> window的时候才有retract?
> > > >
> > > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > > >
> > > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > > >
> > > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > > >
> > > > > >
> > > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > > >
> > > > > > > Hi lec,
> > > > > > >
> > > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > >
> > > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late
> > > > fire的时候。
> > > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > > >
> > > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > >
> > > > > > > 这个也不绝对。大部分时候是。
> > > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > > >
> > > > > > >
> > > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > >
> > > > > > > 是的。
> > > > > > >
> > > > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > > > >
> > > > > > > > Hi:
> > > > > > > >有几个问题想咨询下大佬：
> > > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > > >
> > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Benchao Li
> > > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > > University
> > > > > > > Tel:+86-15650713730
> > > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Benchao Li
> > > > > School of Electronics Engineering and Computer Science, Peking
> > > University
> > > > > Tel:+86-15650713730
> > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Benchao Li
> > > School of Electronics Engineering and Computer Science, Peking
> University
> > > Tel:+86-15650713730
> > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

回复：关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-23 Thread 1101300123

我重新在源码里打了一些日志编译后，之前的问题不见了，试了好多次没有复现了，之前因为集成clickhouse 改过源码的delete代码，不知道是不是这个引起的
在2020年4月23日 16:23，Leonard Xu 写道：
Hi,
我本地复现了下，用1.10.0发现的你的sql是ok的，结果也符合预期☺️，如下[1]：
看到你建了JIRA，我们在issue里继续跟进吧

祝好，
Leonard Xu

[1]
mysql> select * from order_state_cnt;
++--+--+
| order_date | product_code | cnt  |
++--+--+
| 2020-04-01 | product1 |3 |
| 2020-04-01 | product2 |5 |
| 2020-04-01 | product1 |5 |
| 2020-04-01 | product2 |9 |
++--+--+
4 rows in set (0.00 sec)

mysql> select * from order_state_cnt;
++--+--+
| order_date | product_code | cnt  |
++--+--+
| 2020-04-01 | product1 |3 |
| 2020-04-01 | product2 |5 |
| 2020-04-01 | product1 |5 |
| 2020-04-01 | product2 |9 |
| 2020-04-01 | product1 |2 |
| 2020-04-01 | product2 |4 |
++--+--+
6 rows in set (0.00 sec)



在 2020年4月23日，10:48，1101300123  写道：



我给你一些数据和代码吧!和我真实场景错误一样
订单主表：orders
13点两条记录；order_state是状态 0取消 1待支付
{"order_no":"order1","order_state":1,"pay_time":"","create_time":"2020-04-01 
13:00:00","update_time":"2020-04-01 13:00:00"}
{"order_no":"order2","order_state":1,"pay_time":"","create_time":"2020-04-01 
13:00:00","update_time":"2020-04-01 13:00:00"}


13:15
来了一条新的记录 取消订单
{"order_no":"order1","order_state":0,"pay_time":"","create_time":"2020-04-01 
13:00:00","update_time":"2020-04-01 13:15:00"}


订单明细表：order_detail
4条记录
{"order_no":"order1","product_code":"product1","quantity":3,"create_time":"2020-04-01
 13:00:00","update_time":"2020-04-01 13:00:00"}
{"order_no":"order1","product_code":"product2","quantity":5,"create_time":"2020-04-01
 13:00:00","update_time":"2020-04-01 13:00:00"}
{"order_no":"order2","product_code":"product1","quantity":2,"create_time":"2020-04-01
 13:00:00","update_time":"2020-04-01 13:00:00"}
{"order_no":"order2","product_code":"product2","quantity":4,"create_time":"2020-04-01
 13:00:00","update_time":"2020-04-01 13:00:00"}


需求的要求是当订单创建后我们就要统计该订单对应的商品数量，而当订单状态变为取消时我们要减掉该订单对应的商品数量。


代码
package Learn.kafkasql;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

public class SqlCount {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env 
=StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
StreamTableEnvironment tenv = StreamTableEnvironment.create(env,settings);

tenv.sqlUpdate("CREATE TABLE orders " +
"(" +
"   order_no string," +
"   order_state  int," +
"   pay_time string," +
"   create_time  string," +
"   update_time  string" +
" ) " +
"   WITH (" +
"   'connector.type' = 'kafka',   " +
"   'connector.version' = 'universal', " +//--kafka版本
"   'connector.topic' = 'tp_orders'," +//--kafkatopic
"   'connector.properties.zookeeper.connect' = 
'192.168.179.120:2181', " +
"   'connector.properties.bootstrap.servers' = 
'192.168.179.120:9092'," +
"   'connector.properties.group.id' = 'testGroup'," +
"   'connector.startup-mode' = 'latest-offset'," +
"   'format.type' = 'json'  " +//--数据为json格式
" )");
tenv.sqlUpdate("CREATE TABLE order_detail " +
"(" +
"   order_no string," +
"   product_code string," +
"   quantity int," +
"   create_time  string," +
"   update_time  string" +
" ) " +
"   WITH (" +
"   'connector.type' = 'kafka', " +
"   'connector.version' = 'universal',  " +//--kafka版本
"   'connector.topic' = 'tp_order_detail'," +//--kafkatopic
"   'connector.properties.zookeeper.connect' = 
'192.168.179.120:2181', " +
"   'connector.properties.bootstrap.servers' = 
'192.168.179.120:9092'," +
"   'connector.properties.group.id' = 'testGroup'," +
"   'connector.startup-mode' = 'latest-offset'," +
"   'format.type' = 'json'  " +//--数据为json格式
" )");

tenv.sqlUpdate("CREATE TABLE product_sale" +
" (" +
"  order_date string," +
"  product_code string," +
"  cnt int" +
"  ) " +
" WITH (" +
"   'connector.type' = 'jdbc', " +
"   'connector.url' = 
'jdbc:mysql://192.168.179.120:3306/flink?serverTimezone=UTC=true', " +
"   'connector.table' = 'order_state_cnt', " +
"   'connector.driver' = 'com.mysql.jdbc.Driver',

Re: retract的问题

2020-04-23 Thread Benchao Li

time interval join不允许输入是非append的。


lec ssmi  于2020年4月23日周四 下午4:18写道：

> 那如果是两个retract算子后的流进行time interval join，
> 已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？
>
> Benchao Li  于2020年4月23日周四 下午4:11写道：
>
> > 内置的*聚合函数*应该是都能处理retract消息的。
> > 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> > 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over
> window的确是会需要处理retract，除此之外，regular
> > group by也需要。
> >
> > lec ssmi  于2020年4月23日周四 下午4:05写道：
> >
> > > 谢谢。
> > > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > > 但是对于Table
> > > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
> > > for  datastream bounded over aggregate  。 是否说只有over window的时候才有retract?
> > >
> > >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> > >
> > > Benchao Li  于2020年4月23日周四 下午3:59写道：
> > >
> > > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > > >
> > > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > > >
> > > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > > >
> > > > >
> > > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > > >
> > > > > > Hi lec,
> > > > > >
> > > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > >
> > > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late
> > > fire的时候。
> > > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > > >
> > > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > >
> > > > > > 这个也不绝对。大部分时候是。
> > > > > > 这个取决于这个算子本身是不是会consume
> > > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > > >
> > > > > >
> > > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > >
> > > > > > 是的。
> > > > > >
> > > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > > >
> > > > > > > Hi:
> > > > > > >有几个问题想咨询下大佬：
> > > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > > >
> > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Benchao Li
> > > > > > School of Electronics Engineering and Computer Science, Peking
> > > > University
> > > > > > Tel:+86-15650713730
> > > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > University
> > > > Tel:+86-15650713730
> > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: 关于RetractStream流写mysql出现java.sql.SQLException: No value specified for parameter 1问题

2020-04-23 Thread Leonard Xu

Hi,
我本地复现了下，用1.10.0发现的你的sql是ok的，结果也符合预期☺️，如下[1]：
看到你建了JIRA，我们在issue里继续跟进吧

祝好，
Leonard Xu

[1]
mysql> select * from order_state_cnt;
++--+--+
| order_date | product_code | cnt  |
++--+--+
| 2020-04-01 | product1 |3 |
| 2020-04-01 | product2 |5 |
| 2020-04-01 | product1 |5 |
| 2020-04-01 | product2 |9 |
++--+--+
4 rows in set (0.00 sec)

mysql> select * from order_state_cnt;
++--+--+
| order_date | product_code | cnt  |
++--+--+
| 2020-04-01 | product1 |3 |
| 2020-04-01 | product2 |5 |
| 2020-04-01 | product1 |5 |
| 2020-04-01 | product2 |9 |
| 2020-04-01 | product1 |2 |
| 2020-04-01 | product2 |4 |
++--+--+
6 rows in set (0.00 sec)



> 在 2020年4月23日，10:48，1101300123  写道：
> 
> 
> 
> 我给你一些数据和代码吧!和我真实场景错误一样
> 订单主表：orders
> 13点两条记录；order_state是状态 0取消 1待支付
> {"order_no":"order1","order_state":1,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","order_state":1,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:00:00"}
> 
> 
> 13:15
> 来了一条新的记录 取消订单
> {"order_no":"order1","order_state":0,"pay_time":"","create_time":"2020-04-01 
> 13:00:00","update_time":"2020-04-01 13:15:00"}
> 
> 
> 订单明细表：order_detail
> 4条记录
> {"order_no":"order1","product_code":"product1","quantity":3,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order1","product_code":"product2","quantity":5,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","product_code":"product1","quantity":2,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> {"order_no":"order2","product_code":"product2","quantity":4,"create_time":"2020-04-01
>  13:00:00","update_time":"2020-04-01 13:00:00"}
> 
> 
> 需求的要求是当订单创建后我们就要统计该订单对应的商品数量，而当订单状态变为取消时我们要减掉该订单对应的商品数量。
> 
> 
> 代码
> package Learn.kafkasql;
> 
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
> import org.apache.flink.table.api.EnvironmentSettings;
> import org.apache.flink.table.api.Table;
> import org.apache.flink.table.api.java.StreamTableEnvironment;
> import org.apache.flink.types.Row;
> 
> public class SqlCount {
> public static void main(String[] args) throws Exception {
>StreamExecutionEnvironment env 
> =StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> EnvironmentSettings settings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
> StreamTableEnvironment tenv = StreamTableEnvironment.create(env,settings);
> 
> tenv.sqlUpdate("CREATE TABLE orders " +
> "(" +
> "   order_no string," +
> "   order_state  int," +
> "   pay_time string," +
> "   create_time  string," +
> "   update_time  string" +
> " ) " +
> "   WITH (" +
> "   'connector.type' = 'kafka',   " +
> "   'connector.version' = 'universal', " +//--kafka版本
> "   'connector.topic' = 'tp_orders'," +//--kafkatopic
> "   'connector.properties.zookeeper.connect' = 
> '192.168.179.120:2181', " +
> "   'connector.properties.bootstrap.servers' = 
> '192.168.179.120:9092'," +
> "   'connector.properties.group.id' = 'testGroup'," +
> "   'connector.startup-mode' = 'latest-offset'," +
> "   'format.type' = 'json'  " +//--数据为json格式
> " )");
> tenv.sqlUpdate("CREATE TABLE order_detail " +
> "(" +
> "   order_no string," +
> "   product_code string," +
> "   quantity int," +
> "   create_time  string," +
> "   update_time  string" +
> " ) " +
> "   WITH (" +
> "   'connector.type' = 'kafka', " +
> "   'connector.version' = 'universal',  " +//--kafka版本
> "   'connector.topic' = 'tp_order_detail'," +//--kafkatopic
> "   'connector.properties.zookeeper.connect' = 
> '192.168.179.120:2181', " +
> "   'connector.properties.bootstrap.servers' = 
> '192.168.179.120:9092'," +
> "   'connector.properties.group.id' = 'testGroup'," +
> "   'connector.startup-mode' = 'latest-offset'," +
> "   'format.type' = 'json'  " +//--数据为json格式
> " )");
> 
> tenv.sqlUpdate("CREATE TABLE product_sale" +
> " (" +
> "  order_date string," +
> "  product_code string," +
> "  cnt int" +
> "  ) " +
> " WITH (" +
> "   'connector.type' = 'jdbc', " +
> "   'connector.url' = 
> 'jdbc:mysql://192.168.179.120:3306/flink?serverTimezone=UTC=true', " +
> "

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread 宇张

》》加上 这部分配置之后对应的TableFactory文件里面有对应的KafkaFactory信息了，虽说程序还是无法运行，但是错误变为jar包冲突了，也就不是先前加载不到的错误；
但是感觉每次都配置这些貌似对用户不太友好。

org.codehaus.janino.CompilerFactory cannot be cast to
org.codehaus.commons.compiler.ICompilerFactory





org.apache.flink:flink-table-common
org.apache.flink:flink-table-api-java
org.apache.flink:flink-table-api-java-bridge_2.11
org.apache.flink:flink-table-planner-blink_2.11
org.apache.flink:flink-connector-kafka-0.11_2.11
org.apache.flink:flink-connector-kafka-0.9_2.11
org.apache.flink:flink-connector-kafka-0.10_2.11
org.apache.flink:flink-connector-kafka-base_2.11
org.apache.flink:flink-jdbc_2.11
org.apache.flink:flink-json








Apache Flink
UTF-8


com.akulaku.data.main.StreamMain




On Thu, Apr 23, 2020 at 4:07 PM Jingsong Li  wrote:

> Hi 张，
>
> 加上这个[1]试试：
>
> 
>   
>implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
>   
>implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
> Apache Flink
> UTF-8
>   
> 
>
>
> [1]https://github.com/apache/flink/blob/master/pom.xml#L1654
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 23, 2020 at 3:56 PM 宇张  wrote:
>
> > 下面配置中，如果不加flink-json模块是可以打出kafkafactory的，加了flink-json模块就只剩下
> > JsonRowFormatFactory一个类，kafka的Factory就打印不出来了，所以是不是某一部分导致冲突了，
> > 但我看我先前flink1.9的工程，里面也无法打印kafkaFactory类，只有一个
> > GenericInMemoryCatalogFactory类，但flink1.9和1.10对比，看发布文档类加载策略有过改动
> >
> > org.apache.flink:flink-connector-kafka-0.11_2.11
> > org.apache.flink:flink-connector-kafka-base_2.11
> > org.apache.flink:flink-json
> >
> >
> > On Thu, Apr 23, 2020 at 3:43 PM Jingsong Li 
> > wrote:
> >
> > > > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > > 有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > >
> > > @tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找
> > >
> > > @宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Apr 23, 2020 at 3:35 PM tison  wrote:
> > >
> > > > 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > tison  于2020年4月23日周四 下午3:34写道：
> > > >
> > > > > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在
> > > Client
> > > > > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > 宇张  于2020年4月23日周四 上午11:53写道：
> > > > >
> > > > >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > >> KafkaTableSourceSinkFactory
> > > > >> 吗？（同时 class loading 为 child-first）
> > > > >> 》》是的
> > > > >>
> > > > >> On Thu, Apr 23, 2020 at 11:42 AM tison 
> > wrote:
> > > > >>
> > > > >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > > > >> > >这个能拿到
> > > > >> >
> > > > >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > > >> > KafkaTableSourceSinkFactory
> > > > >> > 吗？（同时 class loading 为 child-first）
> > > > >> >
> > > > >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > > > 有问题。之前
> > > > >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > > >> >
> > > > >> > Best,
> > > > >> > tison.
> > > > >> >
> > > > >> >
> > > > >> > 宇张  于2020年4月23日周四 上午11:36写道：
> > > > >> >
> > > > >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > > > >> > >
> > > > >> > > 
> > > > >> > > org.apache.maven.plugins
> > > > >> > > maven-shade-plugin
> > > > >> > > 
> > > > >> > > 
> > > > >> > > 
> > > > >> > > package
> > > > >> > > 
> > > > >> > > shade
> > > > >> > > 
> > > > >> > > 
> > > > >> > > 
> > > > >> > >  > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > > > >> > >
> > > > >> > > com.akulaku.data.main.StreamMain
> > > > >> > > 
> > > > >> > > 
> > > > >> > >
> > > > >> > > 
> > > > >> > > 
> > > > >> > > *:*
> > > > >> > > 
> > > > >> > > META-INF/*.SF
> > > > >> > > META-INF/*.DSA
> > > > >> > > META-INF/*.RSA
> > > > >> > > 
> > > > >> > > 
> > > > >> > > 
> > > > >> > >
> > > > >> > > 
> > > > >> > > 
> > > > >> > > 
> > > > >> > >
> > > > >> > > org.apache.flink:flink-table-common
> > > > >> > >
> > > > >> > > org.apache.flink:flink-table-api-java
> > > > >> > >
> > > > >> > >
> > >

Re: retract的问题

2020-04-23 Thread lec ssmi

那如果是两个retract算子后的流进行time interval join，
已经join成功并且发送出去的记录，也会先DELETE掉，再INSERT，然后将这两条记录发送下游？

Benchao Li  于2020年4月23日周四 下午4:11写道：

> 内置的*聚合函数*应该是都能处理retract消息的。
> 普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
> 我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over window的确是会需要处理retract，除此之外，regular
> group by也需要。
>
> lec ssmi  于2020年4月23日周四 下午4:05写道：
>
> > 谢谢。
> > 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> > 但是对于Table
> > API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> > 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
> > for  datastream bounded over aggregate  。 是否说只有over window的时候才有retract?
> >
> >
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
> >
> > Benchao Li  于2020年4月23日周四 下午3:59写道：
> >
> > > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> > >
> > > lec ssmi  于2020年4月23日周四 下午3:45写道：
> > >
> > > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > > >
> > > >
> > > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > > >
> > > > > Hi lec,
> > > > >
> > > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > >
> > > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late
> > fire的时候。
> > > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > > >
> > > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > >
> > > > > 这个也不绝对。大部分时候是。
> > > > > 这个取决于这个算子本身是不是会consume
> > > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > > >
> > > > >
> > > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > >
> > > > > 是的。
> > > > >
> > > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > > >
> > > > > > Hi:
> > > > > >有几个问题想咨询下大佬：
> > > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > > >
> > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Benchao Li
> > > > > School of Electronics Engineering and Computer Science, Peking
> > > University
> > > > > Tel:+86-15650713730
> > > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Benchao Li
> > > School of Electronics Engineering and Computer Science, Peking
> University
> > > Tel:+86-15650713730
> > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

Re: retract的问题

2020-04-23 Thread Leonard Xu

Hi

> 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。

现在确实缺少这方面的文档，简单的办法可以看下这个类org.apache.flink.table.plan.nodes.datastream.DataStreamRel

如果需要了解可以看下都有哪些算子实现这些方法即可
def needsUpdatesAsRetraction: Boolean = false
def producesUpdates: Boolean = false
def consumesRetractions: Boolean = false
def producesRetractions: Boolean = false

祝好，
Leonard Xu

Re: retract的问题

2020-04-23 Thread Benchao Li

内置的*聚合函数*应该是都能处理retract消息的。
普通的*scalar函数*不需要特殊处理，retract和append消息对它来说都是一样的。
我理解应该主要是UDAF可能需要注意一下是否需要处理retract消息，over window的确是会需要处理retract，除此之外，regular
group by也需要。

lec ssmi  于2020年4月23日周四 下午4:05写道：

> 谢谢。
> 其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
> 但是对于Table
> API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
> 我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
> for  datastream bounded over aggregate  。 是否说只有over window的时候才有retract?
>
> 另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。
>
> Benchao Li  于2020年4月23日周四 下午3:59写道：
>
> > 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
> >
> > lec ssmi  于2020年4月23日周四 下午3:45写道：
> >
> > > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> > >
> > >
> > > Benchao Li  于2020年4月23日周四 下午3:39写道：
> > >
> > > > Hi lec,
> > > >
> > > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > >
> > > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late
> fire的时候。
> > > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > > >
> > > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > >
> > > > 这个也不绝对。大部分时候是。
> > > > 这个取决于这个算子本身是不是会consume
> > > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > > >
> > > >
> > 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > >
> > > > 是的。
> > > >
> > > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > > >
> > > > > Hi:
> > > > >有几个问题想咨询下大佬：
> > > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > > >
> > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > University
> > > > Tel:+86-15650713730
> > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread Jingsong Li

Hi 张，

加上这个[1]试试：


  
  
  
  
Apache Flink
UTF-8
  



[1]https://github.com/apache/flink/blob/master/pom.xml#L1654

Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 3:56 PM 宇张  wrote:

> 下面配置中，如果不加flink-json模块是可以打出kafkafactory的，加了flink-json模块就只剩下
> JsonRowFormatFactory一个类，kafka的Factory就打印不出来了，所以是不是某一部分导致冲突了，
> 但我看我先前flink1.9的工程，里面也无法打印kafkaFactory类，只有一个
> GenericInMemoryCatalogFactory类，但flink1.9和1.10对比，看发布文档类加载策略有过改动
>
> org.apache.flink:flink-connector-kafka-0.11_2.11
> org.apache.flink:flink-connector-kafka-base_2.11
> org.apache.flink:flink-json
>
>
> On Thu, Apr 23, 2020 at 3:43 PM Jingsong Li 
> wrote:
>
> > > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > 有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> >
> > @tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找
> >
> > @宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Apr 23, 2020 at 3:35 PM tison  wrote:
> >
> > > 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > tison  于2020年4月23日周四 下午3:34写道：
> > >
> > > > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在
> > Client
> > > > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > 宇张  于2020年4月23日周四 上午11:53写道：
> > > >
> > > >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > >> KafkaTableSourceSinkFactory
> > > >> 吗？（同时 class loading 为 child-first）
> > > >> 》》是的
> > > >>
> > > >> On Thu, Apr 23, 2020 at 11:42 AM tison 
> wrote:
> > > >>
> > > >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > > >> > >这个能拿到
> > > >> >
> > > >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > > >> > KafkaTableSourceSinkFactory
> > > >> > 吗？（同时 class loading 为 child-first）
> > > >> >
> > > >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > > 有问题。之前
> > > >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > > >> >
> > > >> > Best,
> > > >> > tison.
> > > >> >
> > > >> >
> > > >> > 宇张  于2020年4月23日周四 上午11:36写道：
> > > >> >
> > > >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > > >> > >
> > > >> > > 
> > > >> > > org.apache.maven.plugins
> > > >> > > maven-shade-plugin
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > package
> > > >> > > 
> > > >> > > shade
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >  > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > > >> > >
> > > >> > > com.akulaku.data.main.StreamMain
> > > >> > > 
> > > >> > > 
> > > >> > >
> > > >> > > 
> > > >> > > 
> > > >> > > *:*
> > > >> > > 
> > > >> > > META-INF/*.SF
> > > >> > > META-INF/*.DSA
> > > >> > > META-INF/*.RSA
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >
> > > >> > > org.apache.flink:flink-table-common
> > > >> > >
> > > >> > > org.apache.flink:flink-table-api-java
> > > >> > >
> > > >> > >
> > org.apache.flink:flink-table-api-java-bridge_2.11
> > > >> > >
> > > >> > >
> org.apache.flink:flink-table-planner-blink_2.11
> > > >> > >
> > > >> > >
> > org.apache.flink:flink-connector-kafka-0.11_2.11
> > > >> > >
> > > >> > >
> > org.apache.flink:flink-connector-kafka-base_2.11
> > > >> > >
> > >  org.apache.flink:flink-json
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > com.ibm.icu
> > > >> > >
> > > >> > >
> > > >>
> > org.apache.flink.table.shaded.com.ibm.icu
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > > 
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Apr 23, 2020 at 10:53 AM Jingsong Li <
> > > jingsongl...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > Flink的connector发现机制是通过java
> > > >> > spi服务发现机制的，所以你的services下文件不包含Kafka相关的内容就不会加载到。
> > > >> > > >
> > > >> > > > > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的
> > > >> > > >
> > > >> > > > 只是类文件是没有用的，没地方引用到它。
> > > >> > > >
> > > >> > > > 你试试[1]中的方法？添加combine.children
> > > >> > > >
> > > >> > > > [1]
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Re: retract的问题

2020-04-23 Thread lec ssmi

谢谢。
其实，如果从DataStream编程的角度上来说，下游是能够收到一个Tuple2类型的数据，也就是能够硬编码处理retract的结果。
但是对于Table
API来说，特别是SQL，内置函数本身并没有一个增加处理Retract的逻辑（当然，可能内置算子已经包含了，我没有去看而已）。
我在编写UDAF的时候，里面有个retract方法，注释写的是: This function must be implemented
for  datastream bounded over aggregate  。 是否说只有over window的时候才有retract?
另外，对于我们写的UDF，UDTF，其实也没有提供retract的方式，毕竟传入的参数只是字段值，而没有DataStream中的Tuple2中的Boolean值。其他的内置方法也一样，好像对于retract的处理，sql中只有UDAF里面有所提及。

Benchao Li  于2020年4月23日周四 下午3:59写道：

> 这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。
>
> lec ssmi  于2020年4月23日周四 下午3:45写道：
>
> > 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
> >
> >
> > Benchao Li  于2020年4月23日周四 下午3:39写道：
> >
> > > Hi lec,
> > >
> > >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > >
> > > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late fire的时候。
> > > 还有些算子本身不会产生，但是会传递，比如calc算子
> > >
> > >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > >
> > > 这个也不绝对。大部分时候是。
> > > 这个取决于这个算子本身是不是会consume
> > > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> > >
> > >
> 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > >
> > > 是的。
> > >
> > > lec ssmi  于2020年4月23日周四 下午3:25写道：
> > >
> > > > Hi:
> > > >有几个问题想咨询下大佬：
> > > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > > >
> >  3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > > >
> > >
> > >
> > > --
> > >
> > > Benchao Li
> > > School of Electronics Engineering and Computer Science, Peking
> University
> > > Tel:+86-15650713730
> > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

Re: retract的问题

2020-04-23 Thread Benchao Li

这个暂时还没有一篇文档来介绍这部分内容。如果你要了解全部细节，可能只能从源码的角度来了解了。

lec ssmi  于2020年4月23日周四 下午3:45写道：

> 这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。
>
>
> Benchao Li  于2020年4月23日周四 下午3:39写道：
>
> > Hi lec,
> >
> >  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> >
> > 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> > 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late fire的时候。
> > 还有些算子本身不会产生，但是会传递，比如calc算子
> >
> >  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> >
> > 这个也不绝对。大部分时候是。
> > 这个取决于这个算子本身是不是会consume
> > retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
> >
> >  3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> >
> > 是的。
> >
> > lec ssmi  于2020年4月23日周四 下午3:25写道：
> >
> > > Hi:
> > >有几个问题想咨询下大佬：
> > >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> > >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> > >
>  3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread 宇张

下面配置中，如果不加flink-json模块是可以打出kafkafactory的，加了flink-json模块就只剩下
JsonRowFormatFactory一个类，kafka的Factory就打印不出来了，所以是不是某一部分导致冲突了，
但我看我先前flink1.9的工程，里面也无法打印kafkaFactory类，只有一个
GenericInMemoryCatalogFactory类，但flink1.9和1.10对比，看发布文档类加载策略有过改动

org.apache.flink:flink-connector-kafka-0.11_2.11
org.apache.flink:flink-connector-kafka-base_2.11
org.apache.flink:flink-json


On Thu, Apr 23, 2020 at 3:43 PM Jingsong Li  wrote:

> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> 有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
>
> @tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找
>
> @宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 23, 2020 at 3:35 PM tison  wrote:
>
> > 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
> >
> > Best,
> > tison.
> >
> >
> > tison  于2020年4月23日周四 下午3:34写道：
> >
> > > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在
> Client
> > > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > 宇张  于2020年4月23日周四 上午11:53写道：
> > >
> > >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > >> KafkaTableSourceSinkFactory
> > >> 吗？（同时 class loading 为 child-first）
> > >> 》》是的
> > >>
> > >> On Thu, Apr 23, 2020 at 11:42 AM tison  wrote:
> > >>
> > >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > >> > >这个能拿到
> > >> >
> > >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > >> > KafkaTableSourceSinkFactory
> > >> > 吗？（同时 class loading 为 child-first）
> > >> >
> > >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> > 有问题。之前
> > >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> > >> >
> > >> > Best,
> > >> > tison.
> > >> >
> > >> >
> > >> > 宇张  于2020年4月23日周四 上午11:36写道：
> > >> >
> > >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > >> > >
> > >> > > 
> > >> > > org.apache.maven.plugins
> > >> > > maven-shade-plugin
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > package
> > >> > > 
> > >> > > shade
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >  > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > >> > >
> > >> > > com.akulaku.data.main.StreamMain
> > >> > > 
> > >> > > 
> > >> > >
> > >> > > 
> > >> > > 
> > >> > > *:*
> > >> > > 
> > >> > > META-INF/*.SF
> > >> > > META-INF/*.DSA
> > >> > > META-INF/*.RSA
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >
> > >> > > org.apache.flink:flink-table-common
> > >> > >
> > >> > > org.apache.flink:flink-table-api-java
> > >> > >
> > >> > >
> org.apache.flink:flink-table-api-java-bridge_2.11
> > >> > >
> > >> > > org.apache.flink:flink-table-planner-blink_2.11
> > >> > >
> > >> > >
> org.apache.flink:flink-connector-kafka-0.11_2.11
> > >> > >
> > >> > >
> org.apache.flink:flink-connector-kafka-base_2.11
> > >> > >
> >  org.apache.flink:flink-json
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > com.ibm.icu
> > >> > >
> > >> > >
> > >>
> org.apache.flink.table.shaded.com.ibm.icu
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >
> > >> > >
> > >> > > On Thu, Apr 23, 2020 at 10:53 AM Jingsong Li <
> > jingsongl...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > Flink的connector发现机制是通过java
> > >> > spi服务发现机制的，所以你的services下文件不包含Kafka相关的内容就不会加载到。
> > >> > > >
> > >> > > > > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的
> > >> > > >
> > >> > > > 只是类文件是没有用的，没地方引用到它。
> > >> > > >
> > >> > > > 你试试[1]中的方法？添加combine.children
> > >> > > >
> > >> > > > [1]
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> > >> > > >
> > >> > > > Best,
> > >> > > > Jingsong Lee
> > >> > > >
> > >> > > > On Thu, Apr 23, 2020 at 10:37 AM 宇张 
> wrote:
> > >> > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> 我这面采用shade打包方式进行了尝试，发现依然运行出错，运行错误日志与assembly打包产生的错误日志一致，就是上面提到的错误，而且shade和assembly打包产生的
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> META-INF/services/org.apache.flink.table.factories.TableFactory文件及里面的内容一致，而且两种打包方式运行时是都能加载到KafkaFactory类文件的，所以貌似不是打包导致的问题，而更像是bug
> >

Re: retract的问题

2020-04-23 Thread lec ssmi

这个难道没有一个列表，或者是配置开关之类的吗？难道只能一个一个地尝试？各种算子连接在一起，更难判断了。


Benchao Li  于2020年4月23日周四 下午3:39写道：

> Hi lec,
>
>  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
>
> 这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
> 另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late fire的时候。
> 还有些算子本身不会产生，但是会传递，比如calc算子
>
>  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
>
> 这个也不绝对。大部分时候是。
> 这个取决于这个算子本身是不是会consume
> retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。
>
>  3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
>
> 是的。
>
> lec ssmi  于2020年4月23日周四 下午3:25写道：
>
> > Hi:
> >有几个问题想咨询下大佬：
> >   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
> >   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
> >   3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread Jingsong Li

> 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
有问题。之前FileSystem 相关解析就出过类似的 ClassLoader 的 BUG

@tison 不管怎么样，也得保证jar里的SPI文件包含Kafka的类，不然SPI没法找

@宇张 建议你仔细看下[1]，这个pom是能打出正确的SPI文件的

[1]
https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104

Best,
Jingsong Lee

On Thu, Apr 23, 2020 at 3:35 PM tison  wrote:

> 另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...
>
> Best,
> tison.
>
>
> tison  于2020年4月23日周四 下午3:34写道：
>
> > 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在 Client
> > 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
> >
> > Best,
> > tison.
> >
> >
> > 宇张  于2020年4月23日周四 上午11:53写道：
> >
> >> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> >> KafkaTableSourceSinkFactory
> >> 吗？（同时 class loading 为 child-first）
> >> 》》是的
> >>
> >> On Thu, Apr 23, 2020 at 11:42 AM tison  wrote:
> >>
> >> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> >> > >这个能拿到
> >> >
> >> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> >> > KafkaTableSourceSinkFactory
> >> > 吗？（同时 class loading 为 child-first）
> >> >
> >> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader
> 有问题。之前
> >> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> >> >
> >> > Best,
> >> > tison.
> >> >
> >> >
> >> > 宇张  于2020年4月23日周四 上午11:36写道：
> >> >
> >> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> >> > >
> >> > > 
> >> > > org.apache.maven.plugins
> >> > > maven-shade-plugin
> >> > > 
> >> > > 
> >> > > 
> >> > > package
> >> > > 
> >> > > shade
> >> > > 
> >> > > 
> >> > > 
> >> > >  >> > >
> >> > >
> >> > >
> >> >
> >>
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> >> > >
> >> > > com.akulaku.data.main.StreamMain
> >> > > 
> >> > > 
> >> > >
> >> > > 
> >> > > 
> >> > > *:*
> >> > > 
> >> > > META-INF/*.SF
> >> > > META-INF/*.DSA
> >> > > META-INF/*.RSA
> >> > > 
> >> > > 
> >> > > 
> >> > >
> >> > > 
> >> > > 
> >> > > 
> >> > >
> >> > > org.apache.flink:flink-table-common
> >> > >
> >> > > org.apache.flink:flink-table-api-java
> >> > >
> >> > > org.apache.flink:flink-table-api-java-bridge_2.11
> >> > >
> >> > > org.apache.flink:flink-table-planner-blink_2.11
> >> > >
> >> > > org.apache.flink:flink-connector-kafka-0.11_2.11
> >> > >
> >> > > org.apache.flink:flink-connector-kafka-base_2.11
> >> > >
>  org.apache.flink:flink-json
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > com.ibm.icu
> >> > >
> >> > >
> >> org.apache.flink.table.shaded.com.ibm.icu
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > >
> >> > >
> >> > > On Thu, Apr 23, 2020 at 10:53 AM Jingsong Li <
> jingsongl...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > Flink的connector发现机制是通过java
> >> > spi服务发现机制的，所以你的services下文件不包含Kafka相关的内容就不会加载到。
> >> > > >
> >> > > > > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的
> >> > > >
> >> > > > 只是类文件是没有用的，没地方引用到它。
> >> > > >
> >> > > > 你试试[1]中的方法？添加combine.children
> >> > > >
> >> > > > [1]
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> >> > > >
> >> > > > Best,
> >> > > > Jingsong Lee
> >> > > >
> >> > > > On Thu, Apr 23, 2020 at 10:37 AM 宇张  wrote:
> >> > > >
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> 我这面采用shade打包方式进行了尝试，发现依然运行出错，运行错误日志与assembly打包产生的错误日志一致，就是上面提到的错误，而且shade和assembly打包产生的
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> META-INF/services/org.apache.flink.table.factories.TableFactory文件及里面的内容一致，而且两种打包方式运行时是都能加载到KafkaFactory类文件的，所以貌似不是打包导致的问题，而更像是bug
> >> > > > > 下面是我maven插件配置：
> >> > > > >
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > org.apache.maven.plugins
> >> > > > > maven-shade-plugin
> >> > > > > 
> >> > > > > 
> >> > > > > 
> >> > > > > package
> >> > > > > 
> >> > > > > shade
> >> > > > > 
> >> > >

Re: retract的问题

2020-04-23 Thread Benchao Li

Hi lec,

 1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？

这个是某些算子会有这个行为，比如普通的group by，就会发送retract消息。
另外有一些算子是在某些特定配置下才会有这个行为，比如window operator，在配置了early fire或者late fire的时候。
还有些算子本身不会产生，但是会传递，比如calc算子

 2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。

这个也不绝对。大部分时候是。
这个取决于这个算子本身是不是会consume retraction，目前我好想没见到有算子会消费retraction，但是不产生retraction的。

 3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？

是的。

lec ssmi  于2020年4月23日周四 下午3:25写道：

> Hi:
>有几个问题想咨询下大佬：
>   1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
>   2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
>   3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread tison

另外你 shaded 里面去 shaded com.ibm.icu 也意义不明...

Best,
tison.


tison  于2020年4月23日周四 下午3:34写道：

> 这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在 Client
> 端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。
>
> Best,
> tison.
>
>
> 宇张  于2020年4月23日周四 上午11:53写道：
>
>> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
>> KafkaTableSourceSinkFactory
>> 吗？（同时 class loading 为 child-first）
>> 》》是的
>>
>> On Thu, Apr 23, 2020 at 11:42 AM tison  wrote:
>>
>> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
>> > >这个能拿到
>> >
>> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
>> > KafkaTableSourceSinkFactory
>> > 吗？（同时 class loading 为 child-first）
>> >
>> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader 有问题。之前
>> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > 宇张  于2020年4月23日周四 上午11:36写道：
>> >
>> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
>> > >
>> > > 
>> > > org.apache.maven.plugins
>> > > maven-shade-plugin
>> > > 
>> > > 
>> > > 
>> > > package
>> > > 
>> > > shade
>> > > 
>> > > 
>> > > 
>> > > > > >
>> > >
>> > >
>> >
>> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>> > >
>> > > com.akulaku.data.main.StreamMain
>> > > 
>> > > 
>> > >
>> > > 
>> > > 
>> > > *:*
>> > > 
>> > > META-INF/*.SF
>> > > META-INF/*.DSA
>> > > META-INF/*.RSA
>> > > 
>> > > 
>> > > 
>> > >
>> > > 
>> > > 
>> > > 
>> > >
>> > > org.apache.flink:flink-table-common
>> > >
>> > > org.apache.flink:flink-table-api-java
>> > >
>> > > org.apache.flink:flink-table-api-java-bridge_2.11
>> > >
>> > > org.apache.flink:flink-table-planner-blink_2.11
>> > >
>> > > org.apache.flink:flink-connector-kafka-0.11_2.11
>> > >
>> > > org.apache.flink:flink-connector-kafka-base_2.11
>> > > org.apache.flink:flink-json
>> > > 
>> > > 
>> > > 
>> > > 
>> > > 
>> > > com.ibm.icu
>> > >
>> > >
>> org.apache.flink.table.shaded.com.ibm.icu
>> > > 
>> > > 
>> > > 
>> > > 
>> > > 
>> > > 
>> > >
>> > >
>> > > On Thu, Apr 23, 2020 at 10:53 AM Jingsong Li 
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > Flink的connector发现机制是通过java
>> > spi服务发现机制的，所以你的services下文件不包含Kafka相关的内容就不会加载到。
>> > > >
>> > > > > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的
>> > > >
>> > > > 只是类文件是没有用的，没地方引用到它。
>> > > >
>> > > > 你试试[1]中的方法？添加combine.children
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
>> > > >
>> > > > Best,
>> > > > Jingsong Lee
>> > > >
>> > > > On Thu, Apr 23, 2020 at 10:37 AM 宇张  wrote:
>> > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> 我这面采用shade打包方式进行了尝试，发现依然运行出错，运行错误日志与assembly打包产生的错误日志一致，就是上面提到的错误，而且shade和assembly打包产生的
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> META-INF/services/org.apache.flink.table.factories.TableFactory文件及里面的内容一致，而且两种打包方式运行时是都能加载到KafkaFactory类文件的，所以貌似不是打包导致的问题，而更像是bug
>> > > > > 下面是我maven插件配置：
>> > > > >
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > org.apache.maven.plugins
>> > > > > maven-shade-plugin
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > package
>> > > > > 
>> > > > > shade
>> > > > > 
>> > > > > 
>> > > > > 
>> > > > > > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>> > > > >
>> > > > > com.akulaku.data.main.StreamMain
>> > > > > 
>> > > > > 
>> > > > >
>> > > > > 
>> > > > > 
>> > > > > *:*
>> > > > > 
>> > > > > META-INF/*.SF
>> > > > > META-INF/*.DSA
>> > > > > META-INF/*.RSA
>> > > > > 
>> > > > > 
>> > > > >

Re: 关于Flink1.10 Standalone 模式任务提交

2020-04-23 Thread tison

这个问题我建议你记一个 JIRA 然后提供一个可复现的程序。因为你如果是 Flink Standalone Session 模式，在 Client
端编译失败抛出如上异常，不应该跟放不放在 lib 下有什么关系。这边听你说感觉也很奇怪，可能需要本地复现一下比较好判断。

Best,
tison.


宇张  于2020年4月23日周四 上午11:53写道：

> 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> KafkaTableSourceSinkFactory
> 吗？（同时 class loading 为 child-first）
> 》》是的
>
> On Thu, Apr 23, 2020 at 11:42 AM tison  wrote:
>
> > >》拿到ClassLoader后看下能不能取到KafkaTableSourceSinkFactory的class
> > >这个能拿到
> >
> > 你的意思是，UberJar 不放在 lib 里，在用户程序里通过线程上下文 ClassLoader 能加载到
> > KafkaTableSourceSinkFactory
> > 吗？（同时 class loading 为 child-first）
> >
> > 如果是这样，听起来 client 的 classloading 策略没啥问题，似乎是 SPI 加载那边的 ClassLoader 有问题。之前
> > FileSystem 相关解析就出过类似的 ClassLoader 的 BUG
> >
> > Best,
> > tison.
> >
> >
> > 宇张  于2020年4月23日周四 上午11:36写道：
> >
> > > 我尝试进行了添加，程序依然无法运行，异常信息和上面一致，下面是我的shade配置：
> > >
> > > 
> > > org.apache.maven.plugins
> > > maven-shade-plugin
> > > 
> > > 
> > > 
> > > package
> > > 
> > > shade
> > > 
> > > 
> > > 
> > >  > >
> > >
> > >
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > >
> > > com.akulaku.data.main.StreamMain
> > > 
> > > 
> > >
> > > 
> > > 
> > > *:*
> > > 
> > > META-INF/*.SF
> > > META-INF/*.DSA
> > > META-INF/*.RSA
> > > 
> > > 
> > > 
> > >
> > > 
> > > 
> > > 
> > >
> > > org.apache.flink:flink-table-common
> > >
> > > org.apache.flink:flink-table-api-java
> > >
> > > org.apache.flink:flink-table-api-java-bridge_2.11
> > >
> > > org.apache.flink:flink-table-planner-blink_2.11
> > >
> > > org.apache.flink:flink-connector-kafka-0.11_2.11
> > >
> > > org.apache.flink:flink-connector-kafka-base_2.11
> > > org.apache.flink:flink-json
> > > 
> > > 
> > > 
> > > 
> > > 
> > > com.ibm.icu
> > >
> > >
> org.apache.flink.table.shaded.com.ibm.icu
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > >
> > > On Thu, Apr 23, 2020 at 10:53 AM Jingsong Li 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Flink的connector发现机制是通过java
> > spi服务发现机制的，所以你的services下文件不包含Kafka相关的内容就不会加载到。
> > > >
> > > > > 而且两种打包方式运行时是都能加载到KafkaFactory类文件的
> > > >
> > > > 只是类文件是没有用的，没地方引用到它。
> > > >
> > > > 你试试[1]中的方法？添加combine.children
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-uber-blink/pom.xml#L104
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Thu, Apr 23, 2020 at 10:37 AM 宇张  wrote:
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> 我这面采用shade打包方式进行了尝试，发现依然运行出错，运行错误日志与assembly打包产生的错误日志一致，就是上面提到的错误，而且shade和assembly打包产生的
> > > > >
> > > > >
> > > >
> > >
> >
> META-INF/services/org.apache.flink.table.factories.TableFactory文件及里面的内容一致，而且两种打包方式运行时是都能加载到KafkaFactory类文件的，所以貌似不是打包导致的问题，而更像是bug
> > > > > 下面是我maven插件配置：
> > > > >
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > org.apache.maven.plugins
> > > > > maven-shade-plugin
> > > > > 
> > > > > 
> > > > > 
> > > > > package
> > > > > 
> > > > > shade
> > > > > 
> > > > > 
> > > > > 
> > > > >  > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> > > > >
> > > > > com.akulaku.data.main.StreamMain
> > > > > 
> > > > > 
> > > > >
> > > > > 
> > > > > 
> > > > > *:*
> > > > > 
> > > > > META-INF/*.SF
> > > > > META-INF/*.DSA
> > > > > META-INF/*.RSA
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > >
> > > > >
> > > > > On Wed, Apr 22, 2020 at 8:00 PM Jingsong Li <
> jingsongl...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

retract的问题

2020-04-23 Thread lec ssmi

Hi:
   有几个问题想咨询下大佬：
  1.retract在什么时候触发呢？是有groupby或者窗口就默认retract吗，还是需要配置？
  2.如果上游操作有retract，那么不是所有的下游都带有retract性质了？不然下游计算的数据就不准了。
  3.sql操作的话，如果上游是有retract的，那下游select然后print,会把DELETE和INSERT这两条记录都print出来？

how to enable retract?

2020-04-23 Thread lec ssmi

Hi:
  Is  there an aggregation operation or window operation, the result is
with retract characteristics?

Re: batch range sort support

2020-04-23 Thread Benchao Li

Hi Jingsong,

Thanks for your quick response. I've CC'ed Chongchen who understands the
scenario much better.


Jingsong Li  于2020年4月23日周四 下午12:34写道：

> Hi, Benchao,
>
> Glad to see your requirement about range partition.
> I have a branch to support range partition: [1]
>
> Can you describe your scene in more detail? What sink did you use for your
> jobs? A simple and complete business scenario? This can help the community
> judge the importance of the range partition.
>
> [1]https://github.com/JingsongLi/flink/commits/range
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 23, 2020 at 12:15 PM Benchao Li  wrote:
>
>> Hi,
>>
>> Currently the sort operator in blink planner is global, which has
>> bottleneck if we sort a lot of data.
>>
>> And I found 'table.exec.range-sort.enabled' config in BatchExecSortRule,
>> which makes me very exciting.
>> After enabling this config, I found that it's not implemented completely
>> now. This config changes the distribution
>>  from SINGLETON to range for sort operator, however in BatchExecExchange
>> we do not deal with range
>> distribution, and will throw UnsupportedOperationException.
>>
>> My question is,
>> 1. Is this config just a mistake when we merge blink into flink, and we
>> actually didn't plan to implement this?
>> 2. If this is in the plan, then which version may we expect it to be
>> ready?
>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>
>
> --
> Best, Jingsong Lee
>


-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: RocksDB default logging configuration

2020-04-23 Thread Chesnay Schepler

AFAIK this is not possible; the client doesn't know anything about the 
cluster configuration.


FLINK-15747 proposes to add an additional config option for controlling 
the logging behavior.


The only workaround I can think of would be to create a custom Flink 
distribution with a modified RocksDBStateBackend which always sets these 
options by default.



On 23/04/2020 03:24, Bajaj, Abhinav wrote:


Bumping this one again to catch some attention.

*From: *"Bajaj, Abhinav" 
*Date: *Monday, April 20, 2020 at 3:23 PM
*To: *"user@flink.apache.org" 
*Subject: *RocksDB default logging configuration

Hi,

Some of our teams ran into the disk space issues because of RocksDB 
default logging configuration - FLINK-15068 
.


It seems the workaround suggested uses the OptionsFactory to set some 
of the parameters from inside the job.


Since we provision the Flink cluster(version 1.7.1) for the teams, we 
control the RocksDB statebackend configuration from flink-conf.yaml.


And it seems there isn’t any related RocksDB configuration 
 
to set in flink-conf.yaml.


Is there a way for the job developer to retrieve the default 
statebackend information from the cluster in the job and set the 
DBOptions on top of it?


Appreciate the help!

~ Abhinav Bajaj

PS: Sharing below snippet as desired option if possible -

StreamExecutionEnvironment streamExecEnv = 
StreamExecutionEnvironment./getExecutionEnvironment/();


StateBackend stateBackend = streamExecEnv.getDefaultStateBackend();

stateBackend.setOptions(new OptionsFactory() {

@Override
public DBOptions createDBOptions(DBOptions dbOptions) {
  dbOptions.setInfoLogLevel(InfoLogLevel./WARN_LEVEL/);
  dbOptions.setMaxLogFileSize(1024 * 1024)
  return dbOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions 
columnFamilyOptions) {

  return columnFamilyOptions;
}

});

89 matches

Mail list logo