Re: Flink 中类似 Struct 的数据结构

2019-07-01 Thread Luan Cooper
Hi huangjia 看起来是真的没有,我看看 1.9 的 Struct 实现版本 orc/parquet/avro 都有 struct 啊 感谢 On Mon, Jul 1, 2019 at 5:36 PM yelun <986463...@qq.com> wrote: > > > > 在 2019年7月1日,下午4:20,Luan Cooper 写道: > > > > 补充 > > 标题纠正:Flink SQL 中类似 Struct 的数据结构 > > > > On Mon, Jul 1, 2019 at 4:00 PM Luan Cooper wrote: > > >

Re: flink tasks在taskmanager上分布不均衡

2019-07-01 Thread Xintong Song
你好, 社区此前已经发现你所遇到的问题了,会在后续版本中修复,目前规划的是在1.7.3, 1.8.2, 1.9.0几个版本中修复。详情可以参考: https://issues.apache.org/jira/browse/FLINK-12122 Thank you~ Xintong Song On Tue, Jul 2, 2019 at 11:27 AM Ever <439674...@qq.com> wrote: > 我们测试环境的flink集群(flink 1.8),taskmanager有3个,每个有10个slot。 > > 然后我有一个job,

flink tasks??taskmanager????????????

2019-07-01 Thread Ever
??flink(flink 1.8)??taskmanager??3??10??slot?? job?? ??3taskmanager??slot??(??taskmanager??slot?? cpu). job??tasktaskmanager??

Re:File Naming Pattern from HadoopOutputFormat

2019-07-01 Thread Haibo Sun
Hi, Andreas I think the following things may be what you want. 1. For writing Avro, I think you can extend AvroOutputFormat and override the getDirectoryFileName() method to customize a file name, as shown below. The javadoc of AvroOutputFormat:

Re: Flink的Slot是如何做到平均划分TM内存的?

2019-07-01 Thread Xintong Song
Hi 徐涛, Flink并不能保证TM的资源是严格平分给所有slot的。正如你所言,JVM中不同线程的资源并无严格隔离。所谓的平均划分更多的是调度上的考虑,可以理解为在调度时认为一个slot的资源相当于TM资源的1/n (n为slot数)。 有一个特例,对于DataSet作业使用到的managed memory,Flink目前是保证了TM的managed memory平均划分给所有slot的。Managed memory由TM上的MemoryManager管理,task在运行期间向MemoryManager申请内存,因此可以控制每个slot中task申请的内存上限。 Thank

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Ken Krugler
Hi Stephan, Thanks for responding, comments inline below… Regards, — Ken > On Jun 26, 2019, at 7:50 AM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that the DataSet API has stability issues at scale. >

Random errors reading binary files in batch workflow

2019-07-01 Thread Ken Krugler
Hi all, My new latest issue is that regularly (but not always) I get a java.io.UTFDataFormatException when trying to read in serialized records. I can re-run the exact same workflow, on the same cluster, with the same input data, and sometimes it works. It seems like the higher the

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Ken Krugler
Hi Till, Thanks for following up. I’ve got answers to other emails on this thread pending, but wanted to respond to this one now. > On Jul 1, 2019, at 7:20 AM, Till Rohrmann wrote: > > Quick addition for problem (1): The AkkaRpcActor should serialize the > response if it is a remote RPC and

File Naming Pattern from HadoopOutputFormat

2019-07-01 Thread Hailu, Andreas
Hello Flink team, I'm writing Avro and Parquet files to HDFS, and I've would like to include a UUID as a part of the file name. Our files in HDFS currently follow this pattern: tmp-r-1.snappy.parquet tmp-r-2.snappy.parquet ... I'm using a custom output format which extends a

How does Flink recovers uncommited Kafka offset in AsyncIO?

2019-07-01 Thread wang xuchen
Hi Flink experts, I am prototyping a real time system that reads from Kafka source with Flink and calls out to an external system as part of the event processing. One of the most important requirements are read from Kafka should NEVER stall, even in face of some async external calls slowness

[no subject]

2019-07-01 Thread wang xuchen
Hi Flink experts, I am prototyping a real time system that reads from Kafka source with Flink and calls out to an external system as part of the event processing. One of the most important requirements are read from Kafka should NEVER stall, even in face of some async external calls slowness

Re: [FLINK-10868] job cannot be exited immediately if job manager is timed out for some reason

2019-07-01 Thread Peter Huang
Hi Anyang, Thanks for rising the question. I didn't test the PR in batch mode, the observation helps me to have better implementation. From my understanding, if rm to a job manager heartbeat timeout, the job manager connection will be closed, so it will not be reconnected. Are you running batch

Re: [FLINK-10868] job cannot be exited immediately if job manager is timed out for some reason

2019-07-01 Thread Till Rohrmann
Hi Anyang, as far as I can tell, FLINK-10868 has not been merged into Flink yet. Thus, I cannot tell much about how well it works. The case you are describing should be properly handled in a version which get's merged though. I guess what needs to happen is that once the JM reconnects to the RM

Re: [FLINK-10868] the job cannot be exited immediately if job manager is timed out

2019-07-01 Thread Till Rohrmann
Hi Young, as far as I can tell, FLINK-10868 has not been merged into Flink yet. Thus, I cannot tell much about how well it works. The case you are describing should be properly handled in a version which get's merged though. I guess what needs to happen is that once the JM reconnects to the RM it

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
Quick addition for problem (1): The AkkaRpcActor should serialize the response if it is a remote RPC and send an AkkaRpcException if the response's size exceeds the maximum frame size. This should be visible on the call site since the future should be completed with this exception. I'm wondering

Re: LookupableTableSource question

2019-07-01 Thread Flavio Pompermaier
I probably messed up with the meaning of eval()..thus it is called once for every distinct key (that could be composed by a combination of fields)? So, the other question is..how do I enable Blink planner support? Since when is LATERAL TABLE available in Flink? Is it equivalent to using temporal

Re: Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-07-01 Thread M Singh
Thanks Jeff and Xintong for your pointers. On Friday, June 28, 2019, 10:44:35 AM EDT, Jeff Zhang wrote: This is due to flink doesn't unify the execution in different enviroments. The community has discuss it before about how to enhance the flink client api. The initial proposal is

Apache Flink - Multiple Kinesis stream consumers

2019-07-01 Thread M Singh
Hi: I am trying to understand how does Flink coordinate multiple kinesis consumers and am trying to find out: 1. Is it possible to read same kinesis stream independently multiple times within a single application ? How does Flink coordinate consuming same kinesis multiple times in a single

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
Hi Ken, in order to further debug your problems it would be helpful if you could share the log files on DEBUG level with us. For problem (2), I suspect that it has been caused by Flink releasing TMs too early. This should be fixed with FLINK-10941 which is part of Flink 1.8.1. The 1.8.1 release

Fw:Re:Re: About "Flink 1.7.0 HA based on zookeepers "

2019-07-01 Thread Alex.Hu
Hi,All: I found some problems about on kubernates flink of 1.6.0 mentioned by Till in "HA for 1.6.0 job cluster with docker-compose" in the email list, but I found that Jira of flink-10291 in the email has been shut down in 1.7.0, and I also found similar errors in on kubernates flink of

Re: Flink 中类似 Struct 的数据结构

2019-07-01 Thread yelun
> 在 2019年7月1日,下午4:20,Luan Cooper 写道: > > 补充 > 标题纠正:Flink SQL 中类似 Struct 的数据结构 > > On Mon, Jul 1, 2019 at 4:00 PM Luan Cooper wrote: > >> Hi >> >> 我在做 from Kafka to Hive by Flink 的数据持久化操作,操作方式是 Flink SQL >> 需要把某一些列,持久化为 Hive Struct 类型 >> 但是发现 Flink Data Type

回复: Flink 中类似 Struct 的数据结构

2019-07-01 Thread huangjia
同问 : 这个类型如果不存在,可以使用自定义的TypeInformation。 | | huangjia | | 15928798...@163.com | 签名由网易邮箱大师定制 在2019年7月1日 16:20,Luan Cooper 写道: 补充 标题纠正:Flink SQL 中类似 Struct 的数据结构 On Mon, Jul 1, 2019 at 4:00 PM Luan Cooper wrote: Hi 我在做 from Kafka to Hive by Flink 的数据持久化操作,操作方式是 Flink SQL 需要把某一些列,持久化为 Hive

Re: Flink 1.9 进度跟踪方法

2019-07-01 Thread Luan Cooper
好的 感谢~ On Mon, Jul 1, 2019 at 4:31 PM Yun Tang wrote: > hi Luan > > 你可以在 Flink的邮件列表里面去追踪 > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-9-0-td28701.html > ,据我所知,在JIRA上似乎没有一个总览的issue去追踪进度。 > > 祝好 > 唐云 > >

Re: Flink 1.9 进度跟踪方法

2019-07-01 Thread Yun Tang
hi Luan 你可以在 Flink的邮件列表里面去追踪 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-9-0-td28701.html,据我所知,在JIRA上似乎没有一个总览的issue去追踪进度。 祝好 唐云 From: Luan Cooper Sent: Monday, July 1, 2019 16:17 To:

Re: Flink 中类似 Struct 的数据结构

2019-07-01 Thread Luan Cooper
补充 标题纠正:Flink SQL 中类似 Struct 的数据结构 On Mon, Jul 1, 2019 at 4:00 PM Luan Cooper wrote: > Hi > > 我在做 from Kafka to Hive by Flink 的数据持久化操作,操作方式是 Flink SQL > 需要把某一些列,持久化为 Hive Struct 类型 > 但是发现 Flink Data Type 中没有对应的数据类型,想在社区确认下这个情况是这样的吗?有相关的 JIRA Issue 可以关注吗? > > 感谢 >

Flink 1.9 进度跟踪方法

2019-07-01 Thread Luan Cooper
Hi 我看了 Flink 1.9 特性介绍,想用到现有流计算项目中 但是我找了官方 Apache eJIRA Issue 没有找到 1.9 Feature List 对应的 JIRA Ticket 比如有一个 Flink 1.9 的 Issue,里面有所有 1.9 计划中的 Feature List 同时子 issue 中有 feature 的进度 请问有这样的 Release 跟踪 Issue 吗? 感谢

Flink 中类似 Struct 的数据结构

2019-07-01 Thread Luan Cooper
Hi 我在做 from Kafka to Hive by Flink 的数据持久化操作,操作方式是 Flink SQL 需要把某一些列,持久化为 Hive Struct 类型 但是发现 Flink Data Type 中没有对应的数据类型,想在社区确认下这个情况是这样的吗?有相关的 JIRA Issue 可以关注吗? 感谢

Re:Re: Re:Flink batch job memory/disk leak when invoking set method on a static Configuration object.

2019-07-01 Thread Haibo Sun
Hi, Vadim I tried many times with the master branch code and failed to reproduce this issue. Which version of Flink did you use? For the Configuration class in your code, I use `org. apache. hadoop. conf. Configuration`. The configurations I enabled in flink-conf.yaml are as follows

Re: Calculate a 4 hour time window for sum,count with sum and count output every 5 mins

2019-07-01 Thread Felipe Gutierrez
No, there is no specific reason. I am using it because I am computing the HyperLogLog over a window. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Mon, Jul 1, 2019 at 12:34 AM Vijay Balakrishnan