Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-10 Thread Timo Walther
I also think that a real-time channel is long overdue. The Flink community in China has shown that such a platform can be useful for improving the collaboration within the community. The DingTalk channel of 10k+ users collectively helping each other is great to see. It could also reduce the

[REMINDER] Final Call for Presentations for Flink Forward San Francisco 2022

2022-05-06 Thread Timo Walther
Hi everyone, I would like to send out a final reminder. We have already received some great submissions for FlinkForward San Francisco 2022. Nevertheless, we decided to extend the deadline by another week to give people a second chance to work on their abstracts and presentation ideas. This

Re: How to return JSON Object from UDF

2022-05-06 Thread Timo Walther
'net.minidev.json.JSONObject' is neither publicly accessible nor does it have a corresponding getter method. Thanks and Regards , Surendra Lalwani On Fri, May 6, 2022 at 2:43 PM Timo Walther wrote: Hi Surendra, in general we would like to encourage users to use the SQL type system

Re: How to return JSON Object from UDF

2022-05-06 Thread Timo Walther
Hi Surendra, in general we would like to encourage users to use the SQL type system classes instead of RAW types. Otherwise they are simply black boxes in the SQL engine. A STRING or ROW type might be more appropriate. You can use @DataTypeHint(value = "RAW")  // defaults to Object.class

[ANNOUNCE] Call for Presentations is open for Flink Forward San Francisco 2022 in-person!

2022-03-22 Thread Timo Walther
pm PDT! See you there! Timo Walther Program Committee Chair PS: Regarding Covid-19 regulations, we are following the CDC guidelines closely. As we near closer to the event, we will update our policy accordingly.

Re: Resolving a CatalogTable

2022-01-28 Thread Timo Walther
Hi Balazs, you are right, the new APIs only allow the serialization of resolved instances. This ensures that only validated, correct instances are put into the persistent storage such as a database. The framework will always provide resolved instances and call the corresponding methods with

Re: Uploading jar on multiple flink job managers

2021-12-31 Thread Timo Walther
Hi Puneet, are we talking about the `web.upload.dir` [1] ? Maybe others have a better solution for your problem, but have you thought about configuring an NFS or some other distributed file system as the JAR directory? In this case it should be available to all JobManagers. Regards, Timo

Re: TypeInformation | Flink

2021-12-31 Thread Timo Walther
Hi Siddhesh, how to use a ProcessFunction is documented here: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/process_function/ .process() is similar to .map() but with more Flink specific methods available. Anyway, a simple map() should also do the job. But

Re: flink-table-api-scala-bridge missing source files

2021-12-31 Thread Timo Walther
Hi Yuval, feel free to open an issue for this. Looks like a bug in our release artifacts. We should definitely investigate how to solve this as the ScalaDocs are crucial for the development experience. Regards, Timo On 27.12.21 03:22, Zhipeng Zhang wrote: Hi Yuval, It seems that scala

Re: org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat class missing from Flink 1.14 ?

2021-12-21 Thread Timo Walther
Hi Tuomas, are you sure that all dependencies have been upgraded to Flink 1.14. Connector dependencies that still reference Flink 1.13 might cause issues. JdbcBatchingOutputFormat has been refactored in this PR: https://github.com/apache/flink/pull/16528 I hope this helps. Regards, Timo

Re: Svar: WindowOperator TestHarness

2021-12-17 Thread Timo Walther
ow1.getTransformation val operator = transform.getOperator However the **.getTransformation** method seems to not be exposed for the windowed and aggregated DataStream. We´re using Flink 1.13.2 so far, could it be due to public test API exposition? Kind regards, Pierre and Lars *Fra: *Timo Walt

Re: Passing arbitrary Hadoop s3a properties from FileSystem SQL Connector options

2021-12-13 Thread Timo Walther
Hi Timothy, unfortunetaly, this is not supported yet. However, the effort will be tracked under the following ticket: https://issues.apache.org/jira/browse/FLINK-19589 I will loop-in Arvid (in CC) which might help you in contributing the missing functioniality. Regards, Timo On 10.12.21

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
It turned out this was a bug and will be fixed in the next (non-log4j) patch version: https://issues.apache.org/jira/browse/FLINK-23704 Regards, Timo On 13.12.21 14:11, Timo Walther wrote: Hi Morgan, I was assuming that it is caused by some invalid metrics configuration. But I wasn't aware

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
, with 1.14 and the new KafkaSource/KafkaSink these metrics are just not being generated. Can we confirm that it has been implemented? Regards, Morgan. *From:* Timo Walther *Sent:* 13 December 2021 09:53:08 *To:* user

Re: Latency monitoring in Flink 1.14.0

2021-12-13 Thread Timo Walther
Hi Morgan, did you see this: https://stackguides.com/questions/68917956/read-flink-latency-tracking-metric-in-datadog Also `metrics.latency.granularity` must be set in the Flink configuration. Not sure if `-D` forwards this properly. Timo On 10.12.21 18:31, Geldenhuys, Morgan Karl

Re: CVE-2021-44228 - Log4j2 vulnerability

2021-12-13 Thread Timo Walther
While we are working to upgrade the affected dependencies of all components, we recommend users follow the advisory of the Apache Log4j Community. Also Ververica platform can be patched with a similar approach: To configure the JVMs used by Ververica Platform, you can pass custom Java options

Re: WindowOperator TestHarness

2021-12-12 Thread Timo Walther
Hi Lars, you can take a look at how org.apache.flink.streaming.api.datastream.WindowedStream#WindowedStream constructs the graph under the hood. In particular, it uses org.apache.flink.streaming.runtime.operators.windowing.WindowOperatorBuilder which constructs the InternalWindowFunction you

Re: CoGroupedStreams and disableAutoGeneratedUIDs

2021-12-12 Thread Timo Walther
Hi Dan, if there is no way of setting a uid(), then it sounds like a bug in the API that should be fixed. Feel free to open an issue for it. Regards, Timo On 13.12.21 08:19, Schwalbe Matthias wrote: Hi Dan, When I run into such problem I consider using the not so @public api levels: *

Re: Regarding the size of Flink cluster

2021-12-10 Thread Timo Walther
Hi Jessy, let me try to answer some of your questions. > 16 Task Managers with 1 task slot and 1 CPU each Every additional task manager also involves management overhead. So I would suggest option 1. But in the end you need to perform some benchmarks yourself. I could also imagine that a

Re: Table DataStream Conversion Lost Watermark

2021-12-06 Thread Timo Walther
ndowsget() { return INSTANCE; } @Override public CollectionassignWindows( Object element, long timestamp, WindowAssignerContext context) { return Collections.singletonList(new TimeWindow(Long.MIN_VALUE, Long.MAX_VALUE)); } @Override public TriggergetDefaultTrigger(

Re: How to express the datatype of sparksql collect_list(named_struct(...))inflinksql?

2021-11-10 Thread Timo Walther
There are multiple ways of having a more generic UDF. I will use pseudo code here: // supports any input def eval(@DataTypeHint(inputGroup = ANY) Object o): String = { } // or you use no annotations at all and simply define a strategy // default input strategy is wildcard def eval(Map[Row,

Re: to join or not to join, that is the question...

2021-11-05 Thread Timo Walther
Union can be an option if you want to unify the streams first and then apply a key by on the common stream. Otherwise connect() is the way to go. See an example for joining here: https://github.com/twalthr/flink-api-examples/blob/main/src/main/java/com/ververica/Example_06_DataStream_Join.java

Re: Table DataStream Conversion Lost Watermark

2021-11-05 Thread Timo Walther
Hi Yunfeng, by default the fromDataStream does not propagate watermarks into Table API. Because Table API needs a time attribute in the schema that corresponds to the watermarking. A time attribute will also put back into the stream record during toDataStream. Please take a look at:

Re: Kryo Serialization issues in Flink Jobs.

2021-11-02 Thread Timo Walther
Hi Prasanna, it could be a bug where the ExecutionConfig is not forwarded properly to all locations where the KryoSerializer is used. As a first step for debugging, I would recommend to create a custom TypeInformation (most methods are not relevant except for createTypeSerializer and

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-22 Thread Timo Walther
in near future. :) Best, Peter On Wed, Oct 20, 2021 at 5:55 PM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Peter, as a temporary workaround I would simply implement a UDF like: public class EverythingToString extends ScalarFunction {     public String eval(@Dat

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Timo Walther
, which are not handy to perform SQL statements on. It is already discussed here: https://www.mail-archive.com/user@flink.apache.org/msg9.html <https://www.mail-archive.com/user@flink.apache.org/msg9.html> Best, Peter On Wed, Oct 20, 2021 at 5:21 PM Timo Walther <ma

Re: Huge backpressure when using AggregateFunction with Session Window

2021-10-20 Thread Timo Walther
Hi Ori, this sounds indeed strange. Can you also reproduce this behavior locally with a faker source? We should definitely add a profiler and see where the bottleneck lies. Which Flink version and state backend are you using? Regards, Timo On 20.10.21 16:17, Ori Popowski wrote: I have a

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Timo Walther
A current workaround is to use DataStream API to read the data and provide your custom Avro schema to configure the format. Then switch to Table API. StreamTableEnvironment.fromDataStream(...) accepts all data types. Enum classes will be represented as RAW types but you can forward them as

Re: Usecase for flink

2021-09-10 Thread Timo Walther
If your graphs fit in memory (at least after an initial partitioning), you could use any external library for graph processing within a single node in a Flink ProcessFunction. Flink is a general data processor that allows to have arbitrary logic where user code is allowed. Regards, Timo On

Re: Streaming Patterns and Best Practices - featuring Apache Flink

2021-09-10 Thread Timo Walther
Thanks for sharing this with us Devin. If you haven't considered it already, maybe this could also be something for next Flink Forward? Regards, Timo On 02.09.21 21:02, Devin Bost wrote: I just released a new video that features Apache Flink in several design patterns: Streaming Patterns

Re: Job crashing with RowSerializer EOF exception

2021-09-10 Thread Timo Walther
I assume you are still using toAppendStream or toRetractStream? Otherwise I'm wondering where the RowSerializer is actually coming from. The new planner doesn't use a row serializer. Debugging serializer issue is difficult. We need more information about the pipeline. Regards, Timo On

Re: Questions regarding broadcast join in Flink

2021-09-10 Thread Timo Walther
Hi Gerald, actually, this is a typical issue when performing a streaming join. An ideal solution would be to block the main stream until the broadcast stream is ready. However, this is currently not supported in the API. In any case, a user needs to handle this in a use case specific way to

Re: Usecase for flink

2021-09-10 Thread Timo Walther
Hi Dipanjan, Gelly is built on top of the DataSet API which is a batch-only API that is slowly phasing out. It is not possible to connect a DataStream API program with a DataSet API program unless you go through a connector such as CSV in between. Regards, Timo On 10.09.21 09:09,

Re: Issue while creating Hive table from Kafka topic

2021-09-10 Thread Timo Walther
It seems that your Kafka clients dependency is not in your JAR file. ByteArrayDeserializer is a symptom that seems to occur often. At least, I can find a similar question on Stackoverflow:

Re: Required built-in function [plus] could not be found in any catalog.

2021-09-08 Thread Timo Walther
Hi, did you try to use a different order? Core module first and then Hive module? The compatibility layer should work sufficiently for regular Hive UDFs that don't aggregate data. Hive aggregation functions should work well in batch scenarios. However, for streaming pipeline the aggregate

Re: Kafka connector depending on Table API

2021-08-31 Thread Timo Walther
Hi Maciek, thanks for testing the RC! You are absolutely right. This is a bug. I will create an issue for it. Thanks again, Timo On 31.08.21 16:33, Maciek Próchniak wrote: Hello, we are testing 1.14 RC0 and we discovered that we need to include table-api as dependency when using kafka

Re: Flink sql CLI parsing avro format data error 解析avro数据失败

2021-08-30 Thread Timo Walther
t;,"type":["null","string"],"default":null},{"name":"ccc","type":["null","string"],"default":null},{"name":"ddd","type":"string"}]} At 2021-08-30 15:03:49,

Re: Flink sql CLI parsing avro format data error 解析avro数据失败

2021-08-30 Thread Timo Walther
Hi, could it be that there is some corrupt record in your Kafka topic? Maybe you can read from a different offset to verify that. In general, I cannot spot an obivious mistake in your schema. Regards, Timo On 28.08.21 14:32, Wayne wrote: i have Apache Avro schema 我的avro schema 如下 |{

Re: Issue with Flink SQL using RocksDB backend

2021-07-28 Thread Timo Walther
Hi Yuval, having a locally reproducible result would be great. Also more information about the used data types. Because this could be a serializer issue that messes up the binary format. Regards, Timo On 27.07.21 07:37, Yuval Itzchakov wrote: Hi Jing, Yes, FIRST is a UDAF. I've been

Re: foreach exec sql

2021-07-28 Thread Timo Walther
Btw you are executing a lot of Flink jobs in parallel with this because the submission is async. Maybe the concept of a StatementSet via TableEnvironment.createStatementSet() helps. Regards, Timo On 27.07.21 10:56, Caizhi Weng wrote: Hi! Try this: sql.zipWithIndex.foreach { case (sql, idx)

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
ckhole it immediately works without errors. I have the redacted the names and paths. Thanks, Natu On Thu, Jul 22, 2021 at 2:24 PM Timo Walther <mailto:twal...@apache.org>> wrote: Maybe you can share also which connector/format you are using? What is the DDL? Regards, Timo

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
1.13.1 as the supported flink version. No custom code all through Flink SQL on UI no jars. Thanks, Natu On Thu, Jul 22, 2021 at 2:08 PM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Natu, Ververica Platform 2.5 has updated the bundled Hadoop version but this should n

Re: Issue with Flink jobs after upgrading to Flink 1.13.1/Ververica 2.5 - java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2021-07-22 Thread Timo Walther
Hi Natu, Ververica Platform 2.5 has updated the bundled Hadoop version but this should not result in a NoClassDefFoundError exception. How are you submitting your SQL jobs? You don't use Ververica's SQL service but have built a regular JAR file, right? If this is the case, can you share your

Re: Question about POJO rules - why fields should be public or have public setter/getter?

2021-07-14 Thread Timo Walther
Hi Naehee, the serializer for case classes is generated using the Scala macro that is also responsible for extracting the TypeInformation implcitly from your DataStream API program. It should be possible to use POJO serializer with case classes. But wouldn't it be easier to just use regular

Re: [External] NullPointerException on accumulator after Checkpointing

2021-07-14 Thread Timo Walther
Hi Clemens, first of all can you try to use the MapView within an accumulator POJO class. This might solve your exception. I'm not sure if we support the views as top-level accumulators. In any case this seems to be a bug. I will open an issue once I get you feedback. We might simply throw

Re: Upsert Kafka SQL Connector used as a sink does not generate an upsert stream

2021-07-14 Thread Timo Walther
Hi Carlos, currently, the changelog output might not always be optimal. We are continously improving this. For the upsert Kafka connector, we have added an reducing buffer to avoid those tombstone messages: https://issues.apache.org/jira/browse/FLINK-21191 Unfortunately, this is only

Re: High DirectByteBuffer Usage

2021-07-14 Thread Timo Walther
Hi Hemant, did you checkout the dedicated page for memory configuration and troubleshooting: https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory

Re: Flink 1.13.1 PartitionNotFoundException

2021-07-14 Thread Timo Walther
Hi Debraj, I could find quite a few older emails that were suggesting to play around with the `taskmanager.network.request-backoff.max` option. This was also recomended in the link that you shared. Have you tried it? Here is some background:

Re: Process finite stream and notify upon completion

2021-07-14 Thread Timo Walther
Hi Tamir, a nice property of watermarks is that they are kind of synchronized across input operators and their partitions (i.e. parallel instances). Bounded sources will emit a final MAX_WATERMARK once they have processed all data. When you receive a MAX_WATERMARK in your current operator,

Re: Handling Large Broadcast States

2021-06-18 Thread Timo Walther
Hi Rion, as far as I know we also don't support broadcast streaming joins in Table API/SQL. Are you sure that you need a broadcast pattern? Or would a regular hash join using connect() with a CoProcessFunction also work for you? Maybe with an artifical key to spread the load more evently?

Re: Please advise bootstrapping large state

2021-06-18 Thread Timo Walther
bleEnvImpl.translate(BatchTableEnvImpl.scala:555) at org.apache.flink.table.api.internal.BatchTableEnvImpl.translate(BatchTableEnvImpl.scala:537) at org.apache.flink.table.api.bridge.java.internal.BatchTableEnvironmentImpl.toDataSet(BatchTableEnvironmentImpl.scala:101) On Thu, Jun 17, 2021 at 12:51 AM Timo

Re: Please advise bootstrapping large state

2021-06-17 Thread Timo Walther
Hi Marco, which operations do you want to execute in the bootstrap pipeline? Maybe you don't need to use SQL and old planner. At least this would simplify the friction by going through another API layer. The JDBC connector can be directly be used in DataSet API as well. Regards, Timo On

Re: Alternatives to TypeConversions fromDataTypeToLegacyInfo / fromLegacyInfoToDataType

2021-06-04 Thread Timo Walther
eling says yes unless some form of backwards compatibility is going to be written specifically for the usecase. On Fri, Jun 4, 2021, 16:33 Timo Walther <mailto:twal...@apache.org>> wrote: Hi Yuval, TypeConversions.fromDataTypeToLegacyInfo was only a utility to bridge betw

Re: Error with extracted type from custom partitioner key

2021-06-04 Thread Timo Walther
Hi Ken, non-POJOs are serialized with Kryo. This might not give you optimal performance. You can register a custom Kryo serializer in ExecutionConfig to speed up the serialization. Alternatively, you can implement `ResultTypeQueryable` provide a custom type information with a custom

Re: Alternatives to TypeConversions fromDataTypeToLegacyInfo / fromLegacyInfoToDataType

2021-06-04 Thread Timo Walther
order to do that I need the TypeInformation[Row] produced in order to pass into the various state functions. @Timo Walther <mailto:twal...@apache.org> I would love your help on this. -- Best Regards, Yuval Itzchakov.

Re: When to prefer toDataStream over toAppendStream or toRetractStream?

2021-05-25 Thread Timo Walther
Hi Yik San, `toDataStream` and `toChangelogStream` are the new API's for a smooth integration of Table API and DataStream API. You can find the full documentation here: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/data_stream_api/ Since `toDataStream` and

Re: Writing ARRAY type through JDBC:PostgreSQL

2021-05-21 Thread Timo Walther
Hi Federico, if ARRAY doesn't work, this is definitely a bug. Either in the documentation or in the implementation. I will loop in Jingsong Li who can help. In any case, feel free to open a JIRA ticket already. Regards, Timo On 30.04.21 14:44, fgahan wrote: Hi Timo, I´m attaching the

Re: Timestamp type mismatch between Flink, Iceberg, and Avro

2021-05-21 Thread Timo Walther
Hi Xingcan, we had a couple of discussions around the timestamp topic in Flink and have a clear picture nowadays. Some background: https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit# So whenever an instant or epoch time is required, TIMESTAMP_LTZ is the way

Re: Kafka dynamic topic for Sink in SQL

2021-05-21 Thread Timo Walther
Hi Ben, if I remember correctly, this topic came up a couple of times. But we haven't implemented it yet, the existing implementation can be easily adapted for that. The "target topic" would be an additional persisted metadata column in SQL terms. All you need to do is to adapt

Re: Useful methods getting deprecated

2021-05-17 Thread Timo Walther
Hi, I agree that both `connect` and `registerTableSource` are useful for generating Table API pipelines. It is likely that both API methods will get a replacement in the near future. Let me explain the current status briefly: connect(): The CREATE TABLE DDL evolved faster than connect().

Re: Flink SQL on Yarn For Help

2021-05-17 Thread Timo Walther
You check if there is a configuration option listed here: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/ If it is, you can add it to config/flink-config.yaml. Maybe others have other pointers. Otherwise you will need to use Table API instead of SQL

Re: Convert DataStream to Table with the same columns in Row

2021-05-14 Thread Timo Walther
Hi John, please check the type that is coming in from the DataStream API via dataStream.getType(). It should be an instance of RowTypeInfo otherwise the Table API cannot extract the columns correctly. Usually, you can overwrite the type of the last DataStream operation using the

Re: Flink SQL on Yarn For Help

2021-05-14 Thread Timo Walther
Hi Yunhui, officially we don't support YARN in the SQL Client yet. This is mostly because it is not tested. However, it could work due to the fact that we are using regular Flink submission APIs under the hood. Are you submitting to a job or session cluster? Maybe you can also share the

Re: Data type serialization and testing

2021-04-30 Thread Timo Walther
Hi Dave, maybe it would be better to execute your tests against a local cluster instead of the mini cluster. Also object reuse should be disabled and chaining should be disabled to force serialization. Maybe others have better ideas. Regards, Timo On 30.04.21 10:25, Dave Maughan wrote:

Re: TypeSerializer Example

2021-04-30 Thread Timo Walther
, Thanks! I will take a look at the links. Can you please share if you have any simple (or complex) example of Avro state data structures? Thanks, Sandeep On 30-Apr-2021, at 4:46 PM, Timo Walther wrote: Hi Sandeep, did you have a chance to look at this documentation page? https

Re: Writing ARRAY type through JDBC:PostgreSQL

2021-04-30 Thread Timo Walther
Hi Federico, could you also share the full stack trace with us? According to the docs, the ARRAY type should be supported: https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/jdbc.html#data-type-mapping Can you also try to use `cities ARRAY` in your CREATE TABLE,

Re: TypeSerializer Example

2021-04-30 Thread Timo Walther
Hi Sandeep, did you have a chance to look at this documentation page? https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/custom_serialization.html The interfaces might not be easy to implement but are very powerful to address compatibility issues. You can also look into

Re: Guaranteeing event order in a KeyedProcessFunction

2021-04-30 Thread Timo Walther
Hi Miguel, your initial idea sounds not too bad but why do you want to key by timestamp? Usually, you can simply key your stream by a custom key and store the events in a ListState until a watermark comes in. But if you really want to have some kind of global event-time order, you have two

Re: How to create a java unit test for Flink SQL with nested field and watermark?

2021-04-30 Thread Timo Walther
Hi, there are multiple ways to create a table for testing: - use the datagen connector - use the filesystem connector with CSV data - and beginning from Flink 1.13 your code snippets becomes much simpler Regards, Timo On 29.04.21 20:35, Svend wrote: I found an answer to my own question! For

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Timo Walther
Hi Dylan, streamEnv.setRuntimeMode(RuntimeExecutionMode.BATCH); is currently not supported by the Table & SQL API. For now, val settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build() determines the mode. Thus, I would remove the line again. If you want to

Re: SingleValueAggFunction received more than one element error with LISTAGG

2021-04-08 Thread Timo Walther
Hi, which Flink version are you using? Could you also share the resulting plan with us using `TableEnvironment.explainSql()`? Thanks, Timo On 07.04.21 17:29, soumoks123 wrote: I receive the following error when trying to use the LISTAGG function in Table API. java.lang.RuntimeException:

Re: Compression with rocksdb backed state

2021-04-08 Thread Timo Walther
Hi Deepthi, 1. Correct 2. Correct 3. Incremental snapshots simply manage references to RocksDB's sstables. You can find a full explanation here [1]. Thus, the payload is a blackbox for Flink and Flink's compression flag has no impact. So we fully rely what RocksDB offers. 4. Correct I hope

Re: Flink 1.12.2 sql api use parquet format error

2021-04-08 Thread Timo Walther
Hi, can you check the content of the JAR file that you are submitting? There should be a `META-INF/services` directory with a `org.apache.flink.table.factories.Factory` file that should list the Parque format. See also here:

Re: OrcTableSource in flink 1.12

2021-03-23 Thread Timo Walther
would like to read from orc files, run a query and transform the result. I do not necessarily need it to be with the DataSet API. Regards , Nikola On Mon, Mar 22, 2021 at 6:49 PM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Nikola, the OrcTableSource has not been upd

Re: OrcTableSource in flink 1.12

2021-03-22 Thread Timo Walther
Hi Nikola, the OrcTableSource has not been updated to be used in a SQL DDL. You can define your own table factory [1] that translates properties into a object to create instances or use `org.apache.flink.table.api.TableEnvironment#fromTableSource`. I recommend the latter option. Please

Re: [Flink SQL] Leniency of JSON parsing

2021-03-16 Thread Timo Walther
*To:* Timo Walther ; ro...@apache.org *Cc:* user *Subject:* Re: [Flink SQL] Leniency of JSON parsing Hi Roman! Seems like that option is no longer available. Best Regards, Sebastian *From:* Roman Khachatryan *Sent:* Friday

Re: Time Temporal Join

2021-03-16 Thread Timo Walther
Hi Satyam, first of all your initial join query can also work, you just need to make sure that no time attribute is in the SELECT clause. As the exception indicates, you need to cast all time attributes to TIMESTAMP. The reason for this is some major design issue that is also explained here

Re: Find many strange measurements in metrics database of influxdb

2021-03-16 Thread Timo Walther
Hi Tim, "from table1" might be the operator that reads "table1" also known as the table scan operator. Could you share more of the metrics and their values? Most of them should be explained in https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/metrics/#system-metrics Regards,

Re: Editing job graph at runtime

2021-03-16 Thread Timo Walther
Hi Jessy, to be precise, the JobGraph is not used at runtime. It is translated into an ExecutionGraph. But nevertheless such patterns are possible but require a bit of manual implementation. Option 1) You stop the job with a savepoint and restart the application with slightly different

Re: Handle late message with flink SQL

2021-03-16 Thread Timo Walther
Hi, your explanation makes sense but I'm wondering how the implementation would look like. This would mean bigger changes in a Flink fork, right? Late data handling in SQL is a frequently asked question. Currently, we don't have a good way of supporting it. Usually, we recommend to use

Re: LocalWatermarkAssigner causes predicate pushdown to be skipped

2021-03-05 Thread Timo Walther
Hi Yuval, sorry that nobody replied earlier. Somehow your email fell through the cracks. If I understand you correctly, could would like to implement a table source that implements both `SupportsWatermarkPushDown` and `SupportsFilterPushDown`? The current behavior might be on purpose.

Re: Convert BIGINT to TIMESTAMP in pyflink when using datastream api

2021-03-05 Thread Timo Walther
Hi Shilpa, Shuiqiang is right. Currently, we recommend to use SQL DDL until the connect API is updated. See here: https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/create/#create-table Especially the WATERMARK section shows how to declare a rowtime attribute.

Re: Need information on latency metrics

2021-03-05 Thread Timo Walther
Hi Suchithra, did you see this section in the docs? https://ci.apache.org/projects/flink/flink-docs-stable/ops/metrics.html#latency-tracking Regards, Timo On 05.03.21 15:31, V N, Suchithra (Nokia - IN/Bangalore) wrote: Hi, I am using flink 1.12.1 version and trying to explore latency

Re: How to emit after a merge?

2021-03-05 Thread Timo Walther
wrote: Hi Timo, If I understand correctly, the UDF only simplifies the query, but not doing anything functionally different. Please correct me if I am wrong, thank you! Best, Yik San On Thu, Mar 4, 2021 at 8:34 PM Timo Walther <mailto:twal...@apache.org>> wrote: Yes, implement

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Timo Walther
s better for us to be more focused on a single planner. Your proposed roadmap looks good to me, +1 from my side and thanks again for all your efforts! Best, Kurt On Thu, Feb 25, 2021 at 5:01 PM Timo Walther wrote: Hi everyone, since Flink 1.9 we have supported two SQL planners. Most of the origi

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Timo Walther
s better for us to be more focused on a single planner. Your proposed roadmap looks good to me, +1 from my side and thanks again for all your efforts! Best, Kurt On Thu, Feb 25, 2021 at 5:01 PM Timo Walther wrote: Hi everyone, since Flink 1.9 we have supported two SQL planners. Most of the origi

Re: How to emit after a merge?

2021-03-04 Thread Timo Walther
e case? Thank you. On Thu, Mar 4, 2021 at 4:41 PM Timo Walther <mailto:twal...@apache.org>> wrote: Hi Yik, if I understand you correctly you would like to avoid the deletions in your stream? You could filter the deletions manually in DataStream API before writing

Re: How to emit after a merge?

2021-03-04 Thread Timo Walther
Hi Yik, if I understand you correctly you would like to avoid the deletions in your stream? You could filter the deletions manually in DataStream API before writing them to Kafka. Semantically the deletions are required to produce a correct result because the runtime is not aware of a key

Re: [Flink-SQL] FLOORing OR CEILing a DATE or TIMESTAMP to WEEK uses Thursdays as week start

2021-03-02 Thread Timo Walther
Hi Sebastián, it might be the case that some time functions are not correct due to the underlying refactoring of data structures. I will loop in Leonard in CC that currently works on improving this situation as part of FLIP-162 [1]. @Leonard: Is this wrong behavior on your list? Regards,

Re: How to pass PROCTIME through an aggregate

2021-02-26 Thread Timo Walther
Hi Rex, as far as I know, we recently allowed PROCTIME() also at arbitrary locations in the query. So you don't have to pass it through the aggregate but you can call it afterwards again. Does that work in your use case? Something like: SELECT i, COUNT(*) FROM customers GROUP BY i,

Re: BackPressure in RowTime Task of FlinkSql Job

2021-02-26 Thread Timo Walther
Hi Aeden, the rowtime task is actually just a simple map function that extracts the event-time timestamp into a field of the row for the next operator. It should not be the problem. Can you share a screenshot of your pipeline? What is your watermarking strategy? Is it possible that you are

Re: Is it possible to specify max process memory in flink 1.8.2, similar to what is possible in flink 1.11

2021-02-26 Thread Timo Walther
Hi Barisa, by looking at the 1.8 documentation [1] it was possible to configure the off heap memory as well. Also other memory options were already present. So I don't think that you need an upgrade to 1.11 immediately. Please let us know if you could fix your problem, otherwise we can try to

Re: Best way to implemented non-windowed join

2021-02-26 Thread Timo Walther
Hi Yaroslav, I think your approach is correct. Union is perfect to implement multiway joins if you normalize the type of all streams before. It can simply be a composite type with the key and a member variable for each stream where only one of those variables is not null. A keyed process

Re: Handling Data Separation / Watermarking from Kafka in Flink

2021-02-26 Thread Timo Walther
Hi Rion, I think what David was refering to is that you do the entire time handling yourself in process function. That means not using the `context.timerService()` or `onTimer()` that Flink provides but calling your own logic based on the timestamps that enter your process function and the

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-26 Thread Timo Walther
Until we have more information, maybe this is also helpful: https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#inverted-class-loading-and-classloader-resolution-order On 26.02.21 09:20, Timo Walther wrote: If this problems affects multiple people

Re: org.codehaus.janino.CompilerFactory cannot be cast ....

2021-02-26 Thread Timo Walther
If this problems affects multiple people, feel free to open an issue that explains how to easily reproduce the problem. This helps us or contributors to provide a fix. Regards, Timo On 26.02.21 05:08, sofya wrote: What was the actual solution? Did you have to modify pom? -- Sent from:

Re: Compile time checking of SQL

2021-02-22 Thread Timo Walther
dated when I do `mvn compile` or any target that runs that so that basic syntax checking is performed without having to submit the job to the cluster. On Thu, 18 Feb 2021 at 16:17, Timo Walther <mailto:twal...@apache.org>> wrote: Hi Sebastián, what do you consider as compil

Re: How is proctime represented?

2021-02-19 Thread Timo Walther
Chesnay is right. The PROCTIME() is lazy evaluated and executed when its result is needed as an argument for another expression or function. So within the pipeline the column is NULL but when you want to compute something e.g. CAST(proctime AS TIMESTAMP(3)) it will be materialized into the

Re: [Flink SQL] FLOOR(timestamp TO WEEK) not working

2021-02-18 Thread Timo Walther
Hi Sebastián, which Flink version are you using? And which precision do the timestamps have? This looks clearly like a bug to me. We should open an issue in JIRA. Regards, Timo On 18.02.21 16:17, Sebastián Magrí wrote: While using said function in a query I'm getting a query compilation

Re: Compile time checking of SQL

2021-02-18 Thread Timo Walther
Hi Sebastián, what do you consider as compile time? If you mean some kind of SQL editor, you could take a look at Ververica platform (the community edition is free): https://www.ververica.com/blog/data-pipelines-with-flink-sql-on-ververica-platform Otherwise Flink SQL is always validated at

  1   2   3   4   5   6   7   >