Re: [DISCUSS] Flink's supported APIs and Hive query syntax

2022-03-08 Thread Zou Dan
Hi Martijn,
Thanks for bringing this up.
Hive SQL (using in Hive & Spark) plays an important role in batch processing, 
it has almost become de facto standard in batch processing. In our company, 
there are hundreds of thousands of spark jobs each day.
IMO, if we want to promote Flink batch, Hive syntax compatibility is a crucial 
point of it.
Thanks to this feature, we have migrated 800+ Spark jobs to Flink smoothly.

So, I quite agree with putting more effort into Hive syntax compatibility.

Best,
Dan Zou

> 2022年3月7日 19:23,Martijn Visser  写道:
> 
> query



Re: Flink Overwrite parameters in ExecutorUtils

2022-02-18 Thread Zou Dan
Hi Austin,
Thinks for your reply. For example, we want to test if we could improve the 
throughput by changing the value of `execution.buffer-timeout`.

Best,
Dan Zou

> 2022年2月19日 00:33,Austin Cawley-Edwards  写道:
> 
> Hi Dan,
> 
> I'm not exactly sure why, but could you help me understand the use case for 
> changing these parameters in Flink SQL? 
> 
> Thanks,
> Austin
> 
> On Fri, Feb 18, 2022 at 8:01 AM Zou Dan  <mailto:zoud...@163.com>> wrote:
> Hi,
> I am using Flink Batch SQL in version 1.11. I find that Flink will overwrite 
> some configurations in ExecutorUtils, which means this parameters bacome 
> un-configurable, such as `pipeline.object-reuse` and 
> `execution.buffer-timeout`, and the default value for this parameters will be 
> not align with the doc 
> https://nightlies.apache.org/flink/flink-docs-release-1.11/ops/config.html 
> <https://nightlies.apache.org/flink/flink-docs-release-1.11/ops/config.html>. 
> I wonder if there is any special reason for this?
> 
> Best,
> Dan Zou
> 



Flink Overwrite parameters in ExecutorUtils

2022-02-18 Thread Zou Dan
Hi,
I am using Flink Batch SQL in version 1.11. I find that Flink will overwrite 
some configurations in ExecutorUtils, which means this parameters bacome 
un-configurable, such as `pipeline.object-reuse` and 
`execution.buffer-timeout`, and the default value for this parameters will be 
not align with the doc 
https://nightlies.apache.org/flink/flink-docs-release-1.11/ops/config.html. I 
wonder if there is any special reason for this?

Best,
Dan Zou



Re: flink1.11 流式读取hive怎么设置 process_time 和event_time?

2020-08-30 Thread Zou Dan
Event time 是通过 DDL 中 watermark 语句设置的,具体可以参考文档 [1]

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table
 


Best,
Dan Zou

> 2020年8月30日 下午9:42,me  写道:
> 
> flink1.11 可以使用在使用select语句时,显式的指定是流式读取,流式的读出出来之后如果想使用实时计算中的特性窗口函数然后指定时间语义  
> 事件时间和处理时间,但是flink sql需要显示的定义数据中的时间字段才能识别为 event_time,求问这个怎么去设置。



Re: 1.11版本,关于TableEnvironment.executeSql("insert into ..."),job名称设置的问题

2020-08-23 Thread Zou Dan
据我所知,这种执行方式目前没法设置 jobName

> 2020年8月21日 上午11:11,Asahi Lee <978466...@qq.com> 写道:
> 
> 你好!
>   我通过表环境执行insert into语句提交作业,我该如何设置我的job名称呢?
> 
> 
> 程序:
> EnvironmentSettings bbSettings = 
> EnvironmentSettings.newInstance().useBlinkPlanner().build();
> TableEnvironment bsTableEnv = TableEnvironment.create(bbSettings);
> 
> String sourceDDL = "CREATE TABLE datagen (  " +
>" f_random INT,  " +
>" f_random_str STRING,  " +
>" ts AS localtimestamp,  " +
>" WATERMARK FOR ts AS ts  " +
>") WITH (  " +
>" 'connector' = 'datagen',  " +
>" 'rows-per-second'='10',  " +
>" 'fields.f_random.min'='1',  " +
>" 'fields.f_random.max'='5',  " +
>" 'fields.f_random_str.length'='10'  " +
>")";
> 
> bsTableEnv.executeSql(sourceDDL);
> Table datagen = bsTableEnv.from("datagen");
> 
> System.out.println(datagen.getSchema());
> 
> String sinkDDL = "CREATE TABLE print_table (" +
>" f_random int," +
>" c_val bigint, " +
>" wStart TIMESTAMP(3) " +
>") WITH ('connector' = 'print') ";
> bsTableEnv.executeSql(sinkDDL);
> 
> System.out.println(bsTableEnv.from("print_table").getSchema());
> 
> Table table = bsTableEnv.sqlQuery("select f_random, count(f_random_str), 
> TUMBLE_START(ts, INTERVAL '5' second) as wStart from datagen group by 
> TUMBLE(ts, INTERVAL '5' second), f_random");
> bsTableEnv.executeSql("insert into print_table select * from " + table);




Re: taskmanager引用通用jdbc连接池的问题

2020-08-23 Thread Zou Dan
每个 taskmanager 对应一个独立的 JVM,如果你的实现是一个 JVM 中唯一一个连接池,那么就是 50 * 2

> 2020年8月21日 上午11:09,Bruce  写道:
> 
> M



Re: flink on yarn配置问题

2020-08-23 Thread Zou Dan
Hi, 一旦, root cause 应该是下面这个日志
The number of requested virtual cores for application master 1 exceeds the
maximum number of virtual cores 0 available in the Yarn Cluster.

我简单看了一下代码,应该是你们 yarn 节点上没有可用的资源,numYarnMaxVcores = 0

> 2020年8月21日 下午11:11,赵一旦 mailto:hinobl...@gmail.com>> 写道:
> 
> The number of requested virtual cores for application master 1 exceeds the
> maximum number of virtual cores 0 available in the Yarn Cluster.



Re: flink 1.11 udtf动态定义返回类型

2020-08-12 Thread Zou Dan
动态定义你指的是说字段的类型和数量都不是固定的吗?这个应该是不行的。你举的 1.10 例子也不是动态的呀

> 2020年8月12日 下午5:32,李强  写道:
> 
> flink 1.11 udtf可以像1.10那样自定义返回类型不我希望可以像flink 1.10这样:
> 
> @Override
>   public TypeInformation return new RowTypeInfo(Types.STRING, 
> Types.STRING);
>   }
> 
> 
> 不希望像flink 1.11这样
> @FunctionHint(output = @DataTypeHint("ROW 
> 
> udtf返回的字段个数和类型我们希望是可以动态的定义,就想flink 1.10那样