Re: Schema with TypeInformation or DataType

2020-04-16 Thread Jark Wu
Hi Tison, Migration from TypeInformation to DataType is a large work and will across many releases. As far as I can tell, we will finalize the work in 1.11. As godfrey said above, Flink SQL & Table API should always use DataType, DataStream uses TypeInformation. Schema already supports DataType t

Re: Schema with TypeInformation or DataType

2020-04-16 Thread godfrey he
Hi tison, >1. Will TypeInformation be deprecated and we use DataType as type system everywhere? AFAIK, runtime will still supports TypeInformation, while table module supports DataType > 2. Schema in Table API currently support only TypeInformation to register a field, shall we support the DataTy

Schema with TypeInformation or DataType

2020-04-16 Thread tison
Hi, I notice that our type system has two branches. One is TypeInformation while the other is DataType. It is said that Table API will use DataType but there are several questions about this statement: 1. Will TypeInformation be deprecated and we use DataType as type system everywhere? 2. Schema

Re: Flink SQL Gateway

2020-04-16 Thread godfrey he
Hi Flavio, Thanks for the detailed explanation. I think we should let Catalog know this concept first, then TableEnvironment or SQL Gateway can do more stuff based on that. But "trigger" is Database domain concept, I think it need more discuss whether Flink should support this. also cc @bowenl...

Re: instance number of user defined function

2020-04-16 Thread lec ssmi
appreciating our reply.

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Yun Tang
Hi Stephen I think the state name [1] which would be changed every time might the root cause. I am not familiar with Beam code, would it be possible to create so many operator states? Did you configure some parameters wrongly? [1] https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
I posted to the beam mailing list: https://lists.apache.org/thread.html/rb2ebfad16d85bcf668978b3defd442feda0903c20db29c323497a672%40%3Cuser.beam.apache.org%3E I think this is related to a Beam feature called RequiresStableInput (which my pipeline is using). It will create a new operator (or keyed

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Diogo Santos
Hi Till, definitely seems to be a strange issue. The first time the job is loaded (with a clean instance of the Cluster) the job goes well, but if it is canceled or started again the issue came. I built an example here https://github.com/congd123/flink-s3-example You can generate the artifact o

Unsubscribe

2020-04-16 Thread Jose Cisneros
Unsubscribe

UNSUBSCRIBE

2020-04-16 Thread JOHN MILLER
Greetings Please unsubscribe me from your mailing list JOhn M

Re: Flink SQL Gateway

2020-04-16 Thread Jeff Zhang
Hi Flavio, If you would like to use have a UI to register data sources, run flink sql and preview the sql result, then you can use zeppelin directly. You can check the tutorial here, 1) Get started https://link.medium.com/oppqD6dIg5 2) Batch https://link.medium.com/

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
Correction. I've actually found a place where it potentially might be creating a new operator state per checkpoint: https://github.com/apache/beam/blob/4fc924a8193bb9495c6b7ba755ced576bb8a35d5/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/stableinput/Buff

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
I can't say that I ever call that directly. The beam library that I'm using does call it in a couple places: https://github.com/apache/beam/blob/v2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L422-L429 But it seems t

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-16 Thread Yun Gao
Hi Kaan, For the first issue, I think the two implementation should have difference and the first should be slower, but I think which one to use should be depend on your algorithm if it could compute incrementally only with the changed edges. However, as far as I know I think most graph algo

Re: How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-16 Thread Theo Diefenthal
Hi, I think you could utilize AsyncIO in your case with just using a local thread pool [1]. Best regards Theo [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html Von: "Elkhan Dadashov" An: "user" Gesendet: Donnerstag, 16. April 2020 10:37:55

Re: Flink SQL Gateway

2020-04-16 Thread Flavio Pompermaier
Basically we want to give a UI to the user to register its data sources (i.e. catalogs in the Flink world), preview them (SELECT * LIMIT 100 for example) but, in the case of JDBC catalogs, also to see relationships and triggers. We don't want to reimplement the wheel so we would like to reuse and c

Re: instance number of user defined function

2020-04-16 Thread godfrey he
Hi, An UDTF will be wrapped into an operator, an operator instance will be executed by a slot (or parallelism/thread) , About operator, task, slot, you can refer to [1] for more details. A TM (a JVM process) may has multiple slots, that means a JVM process may has multiple UDTF instances. It's bet

Re: Flink SQL Gateway

2020-04-16 Thread godfrey he
Hi Flavio, Since 1.11(master), Flink supports "CREATE CATALOG ..." [1], we can use this statement create catalog dynamically. Currently, Catalog[2] dose not supports any operations on TRIGGER. Flink can't also use such info now. What's your user scenario? [1] https://issues.apache.org/jira/brows

Re: Re: FlinkSQL query error when specify json-schema.

2020-04-16 Thread Benchao Li
Hi wanglei, Yes, your observation is correct. Currently the type derivation relies on legacy types, which only support (38, 18) as decimal precisions. wangl...@geekplus.com.cn 于2020年4月16日周四 下午6:54写道: > > Thanks, I have tried. > > 'format.derive-schema' = 'true' will work. > > But if i insist t

Re: Re: FlinkSQL query error when specify json-schema.

2020-04-16 Thread wangl...@geekplus.com.cn
Thanks, I have tried. 'format.derive-schema' = 'true' will work. But if i insist to use format.json-schema, the CREATE TABLE must be writtten as: `id` DECIMAL(38,18), `timestamp` DECIMAL(38,18) wangl...@geekplus.com.cn From: Benchao Li Date: 2020-04-16 16:56 To: wangl...@gee

instance number of user defined function

2020-04-16 Thread lec ssmi
Hi: I always wonder how much instance has been initialized in the whole flink application. Suppose there is such a scenario: I have a UDTF called '*mongo_join'* through which the flink table can join with external different mongo table according to the parameters passed in.

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-16 Thread Kaan Sancak
If the vertex type is POJO what happens during the union of the graph? Is there a persistent approach, or can we define a function handle such occasions? Would there be a performance difference between two cases: 1) Graph graph = … // From edges list graph = graph.runScatterGath

Re: Flink SQL Gateway

2020-04-16 Thread godfrey he
Hi Flavio, that's great~ Best, Godfrey Flavio Pompermaier 于2020年4月16日周四 下午5:01写道: > Great, I'm very interested in trying it out! > Maybe we can also help with the development because we need something like > that. > Thanks a lot for the pointers > > On Thu, Apr 16, 2020 at 10:55 AM godfrey he

Re: FlinkSQL query error when specify json-schema.

2020-04-16 Thread Benchao Li
Hi wanglei, You don't need to specify 'format.json-schema', the format can derive schema from the DDL. Your exception above means the schema in 'format.json-schema' and DDL are not match. wangl...@geekplus.com.cn 于2020年4月16日周四 下午4:21写道: > > CREATE TABLE user_log( > `id` INT, > `timestam

Flink upgrade to 1.10: function

2020-04-16 Thread seeksst
Hi, All Recently, I try to upgrade flink from 1.8.2 to 1.10, but i meet some problem about function. In 1.8.2, there are just Built-In function and User-defined Functions, but in 1.10, there are 4 categories of funtions. I defined a function which named JSON_VALUE in my system, it doesn’t exist

Re: Flink SQL Gateway

2020-04-16 Thread Flavio Pompermaier
Great, I'm very interested in trying it out! Maybe we can also help with the development because we need something like that. Thanks a lot for the pointers On Thu, Apr 16, 2020 at 10:55 AM godfrey he wrote: > Hi Flavio, > > We prose FLIP-91[1] to support SQL Gateway at the beginning of this year

Re: Flink SQL Gateway

2020-04-16 Thread godfrey he
Hi Flavio, We prose FLIP-91[1] to support SQL Gateway at the beginning of this year. After a long discussion, we reached an agreement that SQL Gateway is an eco-system under ververia as first step.[2] Which could help SQL Gateway move forward faster. Now we almost finish first version development,

Flink SQL Gateway

2020-04-16 Thread Flavio Pompermaier
Hi Jeff, FLIP-24 [1] proposed to develop a SQL gateway to query Flink via SQL but since then no progress has been made on that point. Do you think that Zeppelin could be used somehow as a SQL Gateway towards Flink for the moment? Any chance that a Flink SQL Gateway could ever be developed? Is there

How to scale a streaming Flink pipeline without abusing parallelism for long computation tasks?

2020-04-16 Thread Elkhan Dadashov
Hi Flink users, I have a basic Flnk pipeline, doing flatmap. inside flatmap, I get the input, path it to the client library to compute some result. That library execution takes around 30 seconds to 2 minutes (depending on the input ) for producing the output from the given input ( it is time-ser

FlinkSQL query error when specify json-schema.

2020-04-16 Thread wangl...@geekplus.com.cn
CREATE TABLE user_log( `id` INT, `timestamp` BIGINT ) WITH ( 'connector.type' = 'kafka', 'connector.version' = 'universal', 'connector.topic' = 'wanglei_jsontest', 'connector.startup-mode' = 'latest-offset', 'connector.properties.0.key' = 'zookeeper.connect', 'conn

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-16 Thread Kaan Sancak
Thanks for the reply. Turns out that my serializer was writing one of the fields wrong. I fixed it and everything seems to be working correctly for now. Best Kaan On Apr 16, 2020, at 3:05 AM, Till Rohrmann wrote: Hi Kaan, I'm not entirely sure what's going wrong w/o having a minimal code exa

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
For future reference, here is the stack trace in an easier to read format: Caused by: java.lang.NoClassDefFoundError: org/joda/time/format/DateTimeParserBucket at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:825 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtil

Re: AvroParquetWriter issues writing to S3

2020-04-16 Thread Till Rohrmann
Hi Diogo, thanks for reporting this issue. It looks quite strange to be honest. flink-s3-fs-hadoop-1.10.0.jar contains the DateTimeParserBucket class. So either this class wasn't loaded when starting the application from scratch or there could be a problem with the plugin mechanism on restarts. I'

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-16 Thread Till Rohrmann
Hi Kaan, I'm not entirely sure what's going wrong w/o having a minimal code example which is able to reproduce the problem. So if you could provide us with this, that would allow us to look into it. Cheers, Till On Wed, Apr 15, 2020 at 6:59 PM Kaan Sancak wrote: > Thanks that is working now! >