[jira] [Created] (FLINK-14849) Can not submit job when use hive connector
Jingsong Lee created FLINK-14849: Summary: Can not submit job when use hive connector Key: FLINK-14849 URL: https://issues.apache.org/jira/browse/FLINK-14849 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: Jingsong Lee {code:java} With: org.apache.hive hive-exec 3.1.1 Caused by: java.lang.ClassCastException: org.codehaus.janino.CompilerFactory cannot be cast to org.codehaus.commons.compiler.ICompilerFactory at org.codehaus.commons.compiler.CompilerFactoryFactory.getCompilerFactory(CompilerFactoryFactory.java:129) at org.codehaus.commons.compiler.CompilerFactoryFactory.getDefaultCompilerFactory(CompilerFactoryFactory.java:79) at org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile(JaninoRelMetadataProvider.java:432) ... 68 more {code} After https://issues.apache.org/jira/browse/FLINK-13749 , flink-client will use default child-first resolve-order. If user jar has some conflict dependents, there will be some problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14848) BaseRowSerializer.toBinaryRow wrongly process null for objects with variable-length part
Zhenghua Gao created FLINK-14848: Summary: BaseRowSerializer.toBinaryRow wrongly process null for objects with variable-length part Key: FLINK-14848 URL: https://issues.apache.org/jira/browse/FLINK-14848 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.10.0 Reporter: Zhenghua Gao For the fixed-length objects, the writer calls setNullAt() to update fixed-length part(which set null bits and initialize fixed-length part with 0; For the variable-length objects, the writer calls setNullAt to update fixed-length part and need to assign & initialize variable-length part -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14847) Support retrieving Hive PK constraints
Rui Li created FLINK-14847: -- Summary: Support retrieving Hive PK constraints Key: FLINK-14847 URL: https://issues.apache.org/jira/browse/FLINK-14847 Project: Flink Issue Type: Task Components: Connectors / Hive Reporter: Rui Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14846) Correct the default writerbuffer size documentation of RocksDB
Yun Tang created FLINK-14846: Summary: Correct the default writerbuffer size documentation of RocksDB Key: FLINK-14846 URL: https://issues.apache.org/jira/browse/FLINK-14846 Project: Flink Issue Type: Bug Components: Runtime / State Backends Reporter: Yun Tang Fix For: 1.10.0 When introduce {{RocksDBConfigurableOptions}}, the default writer buffer size is referenced from RocksDB's javadoc. Unfortunately, RocksDB's official javadoc was described incorrectly as {{4MB}} for a long time until I create a [PR|https://github.com/facebook/rocksdb/pull/5670] to correct it. This also leads [our description|https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#state-backend-rocksdb-writebuffer-size] of default write-buffer size not correct, we should fix this to avoid to mislead users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Support configure remote flink jar
There is a related use case (not specific to HDFS) that I came across: It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system. Thomas On Mon, Nov 18, 2019 at 7:40 PM Yang Wang wrote: > Hi tison, > > Thanks for your starting this discussion. > * For user customized flink-dist jar, it is an useful feature. Since it > could avoid to upload the flink-dist jar > every time. Especially in production environment, it could accelerate the > submission process. > * For the standard flink-dist jar, FLINK-13938[1] could solve > the problem.Upload a official flink release > binary to distributed storage(hdfs) first, and then all the submission > could benefit from it. Users could > also upload the customized flink-dist jar to accelerate their submission. > > If the flink-dist jar could be specified to a remote path, maybe the user > jar have the same situation. > > [1]. https://issues.apache.org/jira/browse/FLINK-13938 > > tison 于2019年11月19日周二 上午11:17写道: > > > Hi forks, > > > > Recently, our customers ask for a feature configuring remote flink jar. > > I'd like to reach to you guys > > to see whether or not it is a general need. > > > > ATM Flink only supports configures local file as flink jar via `-yj` > > option. If we pass a HDFS file > > path, due to implementation detail it will fail with > > IllegalArgumentException. In the story we support > > configure remote flink jar, this limitation is eliminated. We also make > > use of YARN locality so that > > reducing uploading overhead, instead, asking YARN to localize the jar on > > AM container started. > > > > Besides, it possibly has overlap with FLINK-13938. I'd like to put the > > discussion on our > > mailing list first. > > > > Are you looking forward to such a feature? > > > > @Yang Wang: this feature is different from that we discussed offline, it > > only focuses on flink jar, not > > all ship files. > > > > Best, > > tison. > > >
Re: [DISCUSS] Support configure remote flink jar
Hi tison, Thanks for your starting this discussion. * For user customized flink-dist jar, it is an useful feature. Since it could avoid to upload the flink-dist jar every time. Especially in production environment, it could accelerate the submission process. * For the standard flink-dist jar, FLINK-13938[1] could solve the problem.Upload a official flink release binary to distributed storage(hdfs) first, and then all the submission could benefit from it. Users could also upload the customized flink-dist jar to accelerate their submission. If the flink-dist jar could be specified to a remote path, maybe the user jar have the same situation. [1]. https://issues.apache.org/jira/browse/FLINK-13938 tison 于2019年11月19日周二 上午11:17写道: > Hi forks, > > Recently, our customers ask for a feature configuring remote flink jar. > I'd like to reach to you guys > to see whether or not it is a general need. > > ATM Flink only supports configures local file as flink jar via `-yj` > option. If we pass a HDFS file > path, due to implementation detail it will fail with > IllegalArgumentException. In the story we support > configure remote flink jar, this limitation is eliminated. We also make > use of YARN locality so that > reducing uploading overhead, instead, asking YARN to localize the jar on > AM container started. > > Besides, it possibly has overlap with FLINK-13938. I'd like to put the > discussion on our > mailing list first. > > Are you looking forward to such a feature? > > @Yang Wang: this feature is different from that we discussed offline, it > only focuses on flink jar, not > all ship files. > > Best, > tison. >
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (non-binding) It is great to have a new end-to-end test framework, even it is only for performance tests now. Best, Yang Jingsong Li 于2019年11月19日周二 上午9:54写道: > +1 (non-binding) > > Best, > Jingsong Lee > > On Mon, Nov 18, 2019 at 7:59 PM Becket Qin wrote: > > > +1 (binding) on having the test suite. > > > > BTW, it would be good to have a few more details about the performance > > tests. For example: > > 1. How do the testing records look like? The size and key distributions. > > 2. The resources for each task. > > 3. The intended configuration for the jobs. > > 4. What exact source and sink it would use. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Mon, Nov 18, 2019 at 7:25 PM Zhijiang > .invalid> > > wrote: > > > > > +1 (binding)! > > > > > > It is a good thing to enhance our testing work. > > > > > > Best, > > > Zhijiang > > > > > > > > > -- > > > From:Hequn Cheng > > > Send Time:2019 Nov. 18 (Mon.) 18:22 > > > To:dev > > > Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing > > Framework > > > > > > +1 (binding)! > > > I think this would be very helpful to detect regression problems. > > > > > > Best, Hequn > > > > > > On Mon, Nov 18, 2019 at 4:28 PM vino yang > wrote: > > > > > > > +1 (non-binding) > > > > > > > > Best, > > > > Vino > > > > > > > > jincheng sun 于2019年11月18日周一 下午2:31写道: > > > > > > > > > +1 (binding) > > > > > > > > > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li > > > > > wrote: > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > Thanks Yu Li for driving on this. > > > > > > > > > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > I would like to start the vote for FLIP-83 [1] which is > > discussed > > > > and > > > > > > > > reached consensus in the discussion thread [2]. > > > > > > > > > > > > > > > > The vote will be open for at least 72 hours (excluding > > weekend). > > > > I'll > > > > > > try > > > > > > > > to close it by 2019-11-20 21:00 CST, unless there is an > > objection > > > > or > > > > > > not > > > > > > > > enough votes. > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > > > > > [2] https://s.apache.org/7fqrz > > > > > > > > > > > > > > > > Best Regards, > > > > > > > > Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Best, Jingsong Lee >
[jira] [Created] (FLINK-14845) Introduce data compression to blocking shuffle.
Yingjie Cao created FLINK-14845: --- Summary: Introduce data compression to blocking shuffle. Key: FLINK-14845 URL: https://issues.apache.org/jira/browse/FLINK-14845 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Yingjie Cao Currently, blocking shuffle writer writes raw output data to disk without compression. For IO bounded scenario, this can be optimized by compressing the output data. It is better to introduce a compression mechanism and offer users a config option to let the user decide whether to compress the shuffle data. Actually, we hava implemented compression in our inner Flink version and here are some key points: 1. Where to compress/decompress? Compressing at upstream and decompressing at downstream. 2. Which thread do compress/decompress? Task threads do compress/decompress. 3. Data compression granularity. Per buffer. 4. How to handle that when data size become even bigger after compression? Give up compression in this case and introduce an extra flag to identify if the data was compressed, that is, the output may be a mixture of compressed and uncompressed data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Support configure remote flink jar
Hi forks, Recently, our customers ask for a feature configuring remote flink jar. I'd like to reach to you guys to see whether or not it is a general need. ATM Flink only supports configures local file as flink jar via `-yj` option. If we pass a HDFS file path, due to implementation detail it will fail with IllegalArgumentException. In the story we support configure remote flink jar, this limitation is eliminated. We also make use of YARN locality so that reducing uploading overhead, instead, asking YARN to localize the jar on AM container started. Besides, it possibly has overlap with FLINK-13938. I'd like to put the discussion on our mailing list first. Are you looking forward to such a feature? @Yang Wang: this feature is different from that we discussed offline, it only focuses on flink jar, not all ship files. Best, tison.
[jira] [Created] (FLINK-14844) ConnectorCatalogTable should not patch the row type with TimeAttributes for batch mode
Danny Chen created FLINK-14844: -- Summary: ConnectorCatalogTable should not patch the row type with TimeAttributes for batch mode Key: FLINK-14844 URL: https://issues.apache.org/jira/browse/FLINK-14844 Project: Flink Issue Type: Improvement Components: Table SQL / API Affects Versions: 1.9.1 Reporter: Danny Chen Fix For: 1.10.0 In current code, ConnectorCatalogTable patch up the row type with time attributes for both batch and stream mode, which is wrong, because batch mode does not need that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector
Hi everyone, I've put the catalog part design in separate doc with more details for easier communication. https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing I would love to hear your thoughts on this. Best, Yijie On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen wrote: > Hi everyone, > > Glad to receive your valuable feedbacks. > > I'd first separate the Pulsar catalog as another doc and show more design > and implementation details there. > > For the current FLIP-72, I would separate it into the sink part for > current work and keep the source part as future works until we reach > FLIP-27 finals. > > I also reply to some of the comments in the design doc. I will rewrite the > catalog part in regarding to Bowen's advice in both email and comments. > > Thanks for the help again. > > Best, > Yijie > > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong wrote: > >> Hi Yijie, >> >> I also agree with Jark on separating the Catalog part into another FLIP. >> >> With FLIP-27[1] also in the air, it is also probably great to split and >> unblock the sink implementation contribution. >> I would suggest either putting in a detail implementation plan section in >> the doc, or (maybe too much separation?) splitting them into different >> FLIPs. What do you guys think? >> >> -- >> Rong >> >> [1] >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface >> >> On Wed, Oct 16, 2019 at 9:00 PM Jark Wu wrote: >> >> > Hi Yijie, >> > >> > Thanks for the design document. I agree with Bowen that the catalog part >> > needs more details. >> > And I would suggest to separate Pulsar Catalog as another FLIP. IMO, it >> has >> > little to do with source/sink. >> > Having a separate FLIP can unblock the contribution for sink (or source) >> > and keep the discussion more focus. >> > I also left some comments in the documentation. >> > >> > Thanks, >> > Jark >> > >> > On Thu, 17 Oct 2019 at 11:24, Yijie Shen >> > wrote: >> > >> > > Hi Bowen, >> > > >> > > Thanks for your comments. I'll add catalog details as you suggested. >> > > >> > > One more question: since we decide to not implement source part of the >> > > connector at the moment. >> > > What can users do with a Pulsar catalog? >> > > Create a table backed by Pulsar and check existing pulsar tables to >> see >> > > their schemas? Drop tables maybe? >> > > >> > > Best, >> > > Yijie >> > > >> > > On Thu, Oct 17, 2019 at 1:04 AM Bowen Li wrote: >> > > >> > > > Hi Yijie, >> > > > >> > > > Per the discussion, maybe you can move pulsar source to 'future >> work' >> > > > section in the FLIP for now? >> > > > >> > > > Besides, the FLIP seems to be quite rough at the moment, and I'd >> > > recommend >> > > > to add more details . >> > > > >> > > > A few questions mainly regarding the proposed pulsar catalog. >> > > > >> > > >- Can you provide some background of pulsar schema registry and >> how >> > it >> > > >works? >> > > >- The proposed design of pulsar catalog is very vague now, can >> you >> > > >share some details of how a pulsar catalog would work internally? >> > E.g. >> > > > - which APIs does it support exactly? E.g. I see from your >> > > > prototype that table creation is supported but not alteration. >> > > > - is it going to connect to a pulsar schema registry via a >> http >> > > > client or a pulsar client, etc >> > > > - will it be able to handle multiple versions of pulsar, or >> just >> > > > one? How is compatibility handles between different >> Flink-Pulsar >> > > versions? >> > > > - will it support only reading from pulsar schema registry , >> or >> > > > both read/write? Will it work end-to-end in Flink SQL for >> users >> > to >> > > create >> > > > and manipulate a pulsar table such as "CREATE TABLE t WITH >> > > > PROPERTIES(type=pulsar)" and "DROP TABLE t"? >> > > > - Is a pulsar topic always gonna be a non-partitioned table? >> How >> > is >> > > > a partitioned topic mapped to a Flink table? >> > > >- How to map Flink's catalog/database namespace to pulsar's >> > > >multi-tenant namespaces? I'm not very familiar with how multi >> > tenancy >> > > works >> > > >in pulsar, and some background context/use cases may help here >> too. >> > > E.g. >> > > > - can a pulsar client/consumer/producer be multiple-tenant at >> the >> > > > same time? >> > > > - how does authentication work in pulsar's multi-tenancy and >> the >> > > > catalog? asking since I didn't see the proposed pulsar catalog >> > has >> > > > username/password configs >> > > > - the FLIP seems propose mapping a pulsar cluster and >> > > > 'tenant/namespace' respectively to Flink's 'catalog' and >> > > 'database'. I >> > > > wonder whether it totally makes sense, or should we actually >> map >> > > "tenant" >> > > > to "catalog", and "namespace" to "database"? >> > > > >>
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (non-binding) Best, Jingsong Lee On Mon, Nov 18, 2019 at 7:59 PM Becket Qin wrote: > +1 (binding) on having the test suite. > > BTW, it would be good to have a few more details about the performance > tests. For example: > 1. How do the testing records look like? The size and key distributions. > 2. The resources for each task. > 3. The intended configuration for the jobs. > 4. What exact source and sink it would use. > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Nov 18, 2019 at 7:25 PM Zhijiang .invalid> > wrote: > > > +1 (binding)! > > > > It is a good thing to enhance our testing work. > > > > Best, > > Zhijiang > > > > > > -- > > From:Hequn Cheng > > Send Time:2019 Nov. 18 (Mon.) 18:22 > > To:dev > > Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing > Framework > > > > +1 (binding)! > > I think this would be very helpful to detect regression problems. > > > > Best, Hequn > > > > On Mon, Nov 18, 2019 at 4:28 PM vino yang wrote: > > > > > +1 (non-binding) > > > > > > Best, > > > Vino > > > > > > jincheng sun 于2019年11月18日周一 下午2:31写道: > > > > > > > +1 (binding) > > > > > > > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > > > > > > > +1 (non-binding) > > > > > > > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li > > > wrote: > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > Thanks Yu Li for driving on this. > > > > > > > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > I would like to start the vote for FLIP-83 [1] which is > discussed > > > and > > > > > > > reached consensus in the discussion thread [2]. > > > > > > > > > > > > > > The vote will be open for at least 72 hours (excluding > weekend). > > > I'll > > > > > try > > > > > > > to close it by 2019-11-20 21:00 CST, unless there is an > objection > > > or > > > > > not > > > > > > > enough votes. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > > > > [2] https://s.apache.org/7fqrz > > > > > > > > > > > > > > Best Regards, > > > > > > > Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Best, Jingsong Lee
Re: [VOTE] FLIP-79: Flink Function DDL Support (1.10 Release Feature Only)
Hi Dawid, As title of this thread says, only features related to 1.10 release (MVP) is accepted, and the other part is still under discussion. I updated the FLIP's status field to reflect that. Also added it to https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals I will help to shepherd this effort in 1.10 release. Thanks, Bowen On Mon, Nov 18, 2019 at 8:59 AM Peter Huang wrote: > Hi David, > > Sorry for my misbehavior. FLINK-7151 should be the parent tickets of the > function DDL effort. I recently created several tickets of it, currently > including > https://issues.apache.org/jira/browse/FLINK-14055 > https://issues.apache.org/jira/browse/FLINK-14711 > https://issues.apache.org/jira/browse/FLINK-14837 > https://issues.apache.org/jira/browse/FLINK-14841 > > For the FLIP status, I think probably Bowen can add more context here. > > Best Regards > Peter Huang > > > > > > > > > On Mon, Nov 18, 2019 at 5:04 AM Dawid Wysakowicz > wrote: > >> Hey, >> >> I've just realized this FLIP was accepted but it was not reflected in >> the wiki page. >> >> Moreover I could not find a single jira issue that would link to the >> FLIP. I've seen though a few other issues and PRs being opened regarding >> this FLIP (FLINK-7151?, FLINK-14837, FLINK-14055). I was also a bit >> confused with the relationship between these issues and e.g. >> FLINK-10232. I tried to clarify a bit the way I understood it, but maybe >> somebody more familiar with the whole story could improve it further. >> >> Could some of the committers that voted on this proposal help with a bit >> of shepherding? >> >> Best, >> >> Dawid >> >> On 12/11/2019 07:59, Yu Li wrote: >> > Thanks Bowen and Peter! Great to know. >> > >> > Best Regards, >> > Yu >> > >> > >> > On Tue, 12 Nov 2019 at 13:38, Peter Huang >> > wrote: >> > >> >> Thank you in advance. Once the PR is done from my side, I will @ you >> and >> >> yu for review. >> >> >> >> On Mon, Nov 11, 2019 at 9:15 PM Bowen Li wrote: >> >> >> >>> Thanks Peter! I assigned the ticket to you, and I can help with >> reviewing >> >>> and merging PRs of this FLIP. >> >>> >> >>> Ccing 1.10 release manager Yu that we are on track of FLIP-79. >> >>> >> >>> Cheers, >> >>> Bowen >> >>> >> >>> On Mon, Nov 11, 2019 at 9:03 PM Peter Huang < >> huangzhenqiu0...@gmail.com> >> >>> wrote: >> >>> >> Thanks, everyone for joining the discussion and giving feedback!. The >> voting time for FLIP-79 has passed. I'm closing the vote now. >> >> There were seven +1 votes, 3 of which are binding: >> - Bowen Li (binding) >> - Kurt Yong (binding) >> - Shuyi Chen (binding) >> >> - Terry Wang (non-binding) >> - Xuefu Zhang (non-binding) >> - Vino Yang (non-binding) >> - Jingsong Lee (non-binding) >> >> There were no disapproving votes. Thus, FLIP-79 has been accepted. >> Now, >> >>> we >> agreed on the function DDL syntax as listed on the FLIP >> and also the initial execution plan for release 1.10. Currently, I >> will >> mainly work on these two Jira tickets. >> >> 1) https://issues.apache.org/jira/browse/FLINK-7151 >> 2) https://issues.apache.org/jira/browse/FLINK-14711 >> >> >> >> Best Regards >> Peter Huang >> >> >> Best Regards >> Peter Huang >> >> On Mon, Nov 11, 2019 at 7:42 PM Jingsong Li >> wrote: >> >> > +1 (non-binding) >> > >> > Best, >> > Jingsong Lee >> > >> > On Tue, Nov 12, 2019 at 9:49 AM vino yang >> >>> wrote: >> >> +1 (non-binding) >> >> >> >> Best, >> >> Vino >> >> >> >> Xuefu Z 于2019年11月12日周二 上午3:27写道: >> >> >> >>> +1 (non-binding) >> >>> >> >>> On Mon, Nov 11, 2019 at 9:54 AM Shuyi Chen >> wrote: >> +1 (binding) >> >> On Sat, Nov 9, 2019 at 11:17 PM Kurt Young >> wrote: >> > +1 (binding) >> > >> > Best, >> > Kurt >> > >> > >> > On Sun, Nov 10, 2019 at 12:25 PM Peter Huang < >> >>> huangzhenqiu0...@gmail.com >> > wrote: >> > >> >> Hi Yu, >> >> >> >> Thanks for your reminder about the timeline of delivering >> >>> the >> > basic >> >> function DDL in release 1.10. >> >> As I replied to Xuefu, the "CREATE FUNCTION" and "DROP >> FUNCTION" >> >> can >> >> relatively easy achieve by revising the existing PR. >> >> Definitely, I probably need to start to work on a basic >> >>> version >> > of >> >> PR >> for >> >> "ALTER FUNCTION" and "SHOW FUNCTIONS". >> >> Please let me know if you have any suggestion to better >> >>> align >> the >> > timeline >> >> of the ongoing catalog related efforts. >> >> >> >> Best Regards >> >> Peter Huang >> >> >> >> >> >> On Sat, Nov 9, 2019 at 7:26 PM
[jira] [Created] (FLINK-14843) Streaming bucketing end-to-end test can fail with Output hash mismatch
Gary Yao created FLINK-14843: Summary: Streaming bucketing end-to-end test can fail with Output hash mismatch Key: FLINK-14843 URL: https://issues.apache.org/jira/browse/FLINK-14843 Project: Flink Issue Type: Bug Components: Connectors / FileSystem, Tests Affects Versions: 1.10.0 Environment: rev: dcc1330375826b779e4902176bb2473704dabb11 Reporter: Gary Yao *Description* Streaming bucketing end-to-end test ({{test_streaming_bucketing.sh}}) can fail with Output hash mismatch. {noformat} Number of running task managers has reached 4. Job (67212178694f8b2a9bc9d9572567a53f) is running. Waiting until all values have been produced Truncating buckets Number of produced values 26325/6 Truncating buckets Number of produced values 31315/6 Truncating buckets Number of produced values 36735/6 Truncating buckets Number of produced values 40705/6 Truncating buckets Number of produced values 46125/6 Truncating buckets Number of produced values 51135/6 Truncating buckets Number of produced values 56555/6 Truncating buckets Number of produced values 61935/6 Cancelling job 67212178694f8b2a9bc9d9572567a53f. Cancelled job 67212178694f8b2a9bc9d9572567a53f. Waiting for job (67212178694f8b2a9bc9d9572567a53f) to reach terminal state CANCELED ... Job (67212178694f8b2a9bc9d9572567a53f) reached terminal state CANCELED Job 67212178694f8b2a9bc9d9572567a53f was cancelled, time to verify FAIL Bucketing Sink: Output hash mismatch. Got 4e2d1859e41184a38e5bc95090fe9941, expected 01aba5ff77a0ef5e5cf6a727c248bdc3. head hexdump of actual: 000 ( 2 , 1 0 , 0 , S o m e p a y 010 l o a d . . . ) \n ( 2 , 1 0 , 1 020 , S o m e p a y l o a d . . . 030 ) \n ( 2 , 1 0 , 2 , S o m e p 040 a y l o a d . . . ) \n ( 2 , 1 0 050 , 3 , S o m e p a y l o a d . 060 . . ) \n ( 2 , 1 0 , 4 , S o m e 070 p a y l o a d . . . ) \n ( 2 , 080 1 0 , 5 , S o m e p a y l o a 090 d . . . ) \n ( 2 , 1 0 , 6 , S o 0a0 m e p a y l o a d . . . ) \n ( 0b0 2 , 1 0 , 7 , S o m e p a y l 0c0 o a d . . . ) \n ( 2 , 1 0 , 8 , 0d0 S o m e p a y l o a d . . . ) 0e0 \n ( 2 , 1 0 , 9 , S o m e p a 0f0 y l o a d . . . ) \n 0fa Stopping taskexecutor daemon (pid: 654547) on host gyao-desktop. Stopping standalonesession daemon (pid: 650368) on host gyao-desktop. Stopping taskexecutor daemon (pid: 650812) on host gyao-desktop. Skipping taskexecutor daemon (pid: 651347), because it is not running anymore on gyao-desktop. Skipping taskexecutor daemon (pid: 651795), because it is not running anymore on gyao-desktop. Skipping taskexecutor daemon (pid: 652249), because it is not running anymore on gyao-desktop. Stopping taskexecutor daemon (pid: 653481) on host gyao-desktop. Stopping taskexecutor daemon (pid: 654099) on host gyao-desktop. [FAIL] Test script contains errors. Checking of logs skipped. [FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' failed after 2 minutes and 3 seconds! Test exited with exit code 1 {noformat} *How to reproduce* Comment out the delay of 10s after the 1st TM is restarted to provoke the issue: {code:bash} echo "Restarting 1 TM" $FLINK_DIR/bin/taskmanager.sh start wait_for_number_of_running_tms 4 #sleep 10 echo "Killing 2 TMs" kill_random_taskmanager kill_random_taskmanager wait_for_number_of_running_tms 2 {code} Command to run the test: {noformat} FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
RE: SQL for Avro GenericRecords on Parquet
HI Peter. Thanks. This is my code . I used one of the parquet / avro tests as a reference. The code will fail on Test testScan(ParquetTestCase) failed with: java.lang.UnsupportedOperationException at org.apache.parquet.filter2.recordlevel.IncrementallyUpdatedFilterPredicate$ValueInspector.update(IncrementallyUpdatedFilterPredicate.java:71) at org.apache.parquet.filter2.recordlevel.FilteringPrimitiveConverter.addLong(FilteringPrimitiveConverter.java:105) at org.apache.parquet.column.impl.ColumnReaderImpl$2$4.writeValue(ColumnReaderImpl.java:268) CODE : import org.apache.avro.Schema; import org.apache.avro.generic.GenericRecord; import org.apache.avro.generic.GenericRecordBuilder; import org.apache.avro.specific.SpecificRecord; import org.apache.avro.specific.SpecificRecordBuilderBase; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.io.ParallelIteratorInputFormat; import org.apache.flink.api.java.io.TupleCsvInputFormat; import org.apache.flink.api.java.tuple.Tuple; import org.apache.flink.core.fs.FileSystem; import org.apache.flink.core.fs.Path; import org.apache.flink.formats.parquet.ParquetTableSource; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.api.java.BatchTableEnvironment; import org.apache.flink.table.api.java.StreamTableEnvironment; import org.apache.flink.table.sinks.CsvTableSink; import org.apache.flink.table.sinks.TableSink; import org.apache.flink.test.util.MultipleProgramsTestBase; import org.apache.flink.types.Row; import org.apache.avro.generic.IndexedRecord; import org.apache.parquet.avro.AvroSchemaConverter; import org.apache.parquet.schema.MessageType; import org.junit.BeforeClass; import org.junit.ClassRule; import org.junit.Test; import org.junit.rules.TemporaryFolder; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.UUID; import static org.junit.Assert.assertEquals; import org.apache.parquet.avro.AvroParquetWriter; import org.apache.parquet.hadoop.ParquetWriter; public class ParquetTestCase extends MultipleProgramsTestBase { private static String avroSchema = "{\n" + " \"name\": \"SimpleRecord\",\n" + " \"type\": \"record\",\n" + " \"fields\": [\n" + "{ \"default\": null, \"name\": \"timestamp_edr\", \"type\": [ \"null\", \"long\" ]},\n" + "{ \"default\": null, \"name\": \"id\", \"type\": [ \"null\", \"long\" ]},\n" + "{ \"default\": null, \"name\": \"recordType_\", \"type\": [ \"null\", \"string\"]}\n" + " ],\n" + " \"schema_id\": 1,\n" + " \"type\": \"record\"\n" + "}"; private static final AvroSchemaConverter SCHEMA_CONVERTER = new AvroSchemaConverter(); private static Schema schm = new Schema.Parser().parse(avroSchema); private static Path testPath; public ParquetTestCase() { super(TestExecutionMode.COLLECTION); } @BeforeClass public static void setup() throws Exception { GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(schm); List recs = new ArrayList<>(); for (int i = 0; i < 6; i++) { GenericRecord gr = genericRecordBuilder.set("timestamp_edr", System.currentTimeMillis() / 1000).set("id", 333L).set("recordType_", "Type1").build(); recs.add(gr); GenericRecord gr2 = genericRecordBuilder.set("timestamp_edr", System.currentTimeMillis() / 1000).set("id", 22L).set("recordType_", "Type2").build(); recs.add(gr2); } testPath = new Path("/tmp", UUID.randomUUID().toString()); ParquetWriter writer = AvroParquetWriter.builder( new org.apache.hadoop.fs.Path(testPath.toUri())).withSchema(schm).build(); for (IndexedRecord record : recs) { writer.write(record); } writer.close(); } private ParquetTableSource createParquetTableSource(Path path) throws IOException { MessageType nestedSchema = SCHEMA_CONVERTER.convert(schm); ParquetTableSource parquetTableSource = ParquetTableSource.builder() .path(path.getPath()) .forParquetSchema(nestedSchema) .build(); return parquetTableSource; } @Test public void testScan() throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); BatchTableEnvironment batchTableEnvironment = BatchTableEnvironment.create(env);
Re: [DISCUSSION] Kafka Metrics Reporter
Hi all Glad to see this topic in community. We at Alibaba also implemented a kafka metrics reporter and extend it to other message queues like Alibaba cloud log service [1] half a year ago. The reason why we not launch a similar discussion is that we previously thought we only provide a way to report metrics to kafka. Unlike current supported metrics reporter, e.g. InfluxDB, Graphite, they all have an easy-to-use data source in grafana to visualize metrics. Even with kafka metrics reporter, we still need another way to consume data out and work as a data source for observability platform, and this would be diverse for different companies. I think this is the main concern to include this in a popular open-source main repo, and I pretty agree with Becket's suggestion to contribute this as a flink-package and we could offer an end-to-end solution including how to visualize these metrics data. [1] https://www.alibabacloud.com/help/doc-detail/29003.htm Best Yun Tang On 11/18/19, 8:19 AM, "Becket Qin" wrote: Hi Gyula, Thanks for bringing this up. It is a useful addition to have a Kafka metrics reporter. I understand that we already have Prometheus and DataDog reporters in the Flink main repo. However, personally speaking, I would slightly prefer to have the Kafka metrics reporter as an ecosystem project instead of in the main repo due to the following reasons: 1. To keep core Flink more focused. So in general if a component is more relevant to external system rather than Flink, it might be good to keep it as an ecosystem project. And metrics reporter seems a good example of that. 2. This helps encourage more contributions to Flink ecosystem instead of giving the impression that anything in Flink ecosystem must be in Flink main repo. 3. To facilitate our ecosystem project authors, we have launched a website[1] to help the community keep track of and advertise the ecosystem projects. It looks a good place to put the Kafka metrics reporter. Regarding the message format, while I think use JSON by default is fine as it does not introduce much external dependency, I wonder if we should make the message format pluggable. Many companies probably already have their own serde format for all the Kafka messages. For example, maybe they would like to just use an Avro record for their metrics instead of introducing a new JSON format. Also in many cases, there could be a lot of metric messages sent by the Flink jobs. JSON format is less efficient and might have too much overhead in that case. Thanks, Jiangjie (Becket) Qin [1] https://flink-packages.org/ On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf wrote: > Hi Gyula, > > thank you for proposing this. +1 for adding a KafkaMetricsReporter. In > terms of the dependency we could go a similar route as for the "universal" > Flink Kafka Connector which to my knowledge always tracks the latest Kafka > version as of the Flink release and relies on compatibility of the > underlying KafkaClient. JSON sounds good to me. > > Cheers, > > Konstantin > > > > > > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra wrote: > > > Hi all! > > > > Several users have asked in the past about a Kafka based metrics reporter > > which can serve as a natural connector between arbitrary metric storage > > systems and a straightforward way to process Flink metrics downstream. > > > > I think this would be an extremely useful addition but I would like to > hear > > what others in the dev community think about it before submitting a > proper > > proposal. > > > > There are at least 3 questions to discuss here: > > > > > > *1. Do we want the Kafka metrics reporter in the Flink repo?*As it is > > much more generic than other metrics reporters already included, I would > > say yes. Also as almost everyone uses Flink with Kafka it would be a > > natural reporter choice for a lot of users. > > *2. How should we handle the Kafka dependency of the connector?* > > I think it would be an overkill to add different Kafka versions here, > > so I would use Kafka 2.+ which has the best compatibility and is future > > proof > > *3. What message format should we use?* > > I would go with JSON for readability and compatibility > > > > There is a relevant JIRA open for this already. > > https://issues.apache.org/jira/browse/FLINK-14531 > > > > We at Cloudera also promote this as a scalable way of pushing metrics to > > other systems so we are very happy to contribute an implementation or > > cooperate with others on building it. > > > > Please let me know what you think! > > > > Cheers, > > Gyula > > > > >
Re: SQL for Avro GenericRecords on Parquet
Hi Hanan, Thanks for reporting the issue. Would you please attach your test code here? I may help to investigate. Best Regards Peter Huang On Mon, Nov 18, 2019 at 2:51 AM Hanan Yehudai wrote: > I have tried to persist Generic Avro records in a parquet file and then > read it via ParquetTablesource – using SQL. > Seems that the SQL I not executed properly ! > > The persisted records are : > Id , type > 333,Type1 > 22,Type2 > 333,Type1 > 22,Type2 > 333,Type1 > 22,Type2 > 333,Type1 > 22,Type2 > 333,Type1 > 22,Type2 > 333,Type1 > 22,Type2 > > While SQL of SELECT id ,recordType_ FROM ParquetTable - return the > above ( which is correct) > Running : "SELECT id ,recordType_ FROM ParquetTable where > recordType_='Type1' " > Will result in : > 333,Type1 > 22,Type1 > 333,Type1 > 22,Type1 > 333,Type1 > 22,Type1 > 333,Type1 > 22,Type1 > 333,Type1 > 22,Type1 > 333,Type1 > 22,Type1 > > As if the equal sign is assignment and not equal … > > am I doing something wrong ? is it an issue of Generic record vs > SpecificRecords ? > > >
[jira] [Created] (FLINK-14842) add logging for functions loaded from modules
Bowen Li created FLINK-14842: Summary: add logging for functions loaded from modules Key: FLINK-14842 URL: https://issues.apache.org/jira/browse/FLINK-14842 Project: Flink Issue Type: Improvement Components: Table SQL / API Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.10.0 to help users identify which module provides that function -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-79: Flink Function DDL Support (1.10 Release Feature Only)
Hi David, Sorry for my misbehavior. FLINK-7151 should be the parent tickets of the function DDL effort. I recently created several tickets of it, currently including https://issues.apache.org/jira/browse/FLINK-14055 https://issues.apache.org/jira/browse/FLINK-14711 https://issues.apache.org/jira/browse/FLINK-14837 https://issues.apache.org/jira/browse/FLINK-14841 For the FLIP status, I think probably Bowen can add more context here. Best Regards Peter Huang On Mon, Nov 18, 2019 at 5:04 AM Dawid Wysakowicz wrote: > Hey, > > I've just realized this FLIP was accepted but it was not reflected in > the wiki page. > > Moreover I could not find a single jira issue that would link to the > FLIP. I've seen though a few other issues and PRs being opened regarding > this FLIP (FLINK-7151?, FLINK-14837, FLINK-14055). I was also a bit > confused with the relationship between these issues and e.g. > FLINK-10232. I tried to clarify a bit the way I understood it, but maybe > somebody more familiar with the whole story could improve it further. > > Could some of the committers that voted on this proposal help with a bit > of shepherding? > > Best, > > Dawid > > On 12/11/2019 07:59, Yu Li wrote: > > Thanks Bowen and Peter! Great to know. > > > > Best Regards, > > Yu > > > > > > On Tue, 12 Nov 2019 at 13:38, Peter Huang > > wrote: > > > >> Thank you in advance. Once the PR is done from my side, I will @ you and > >> yu for review. > >> > >> On Mon, Nov 11, 2019 at 9:15 PM Bowen Li wrote: > >> > >>> Thanks Peter! I assigned the ticket to you, and I can help with > reviewing > >>> and merging PRs of this FLIP. > >>> > >>> Ccing 1.10 release manager Yu that we are on track of FLIP-79. > >>> > >>> Cheers, > >>> Bowen > >>> > >>> On Mon, Nov 11, 2019 at 9:03 PM Peter Huang < > huangzhenqiu0...@gmail.com> > >>> wrote: > >>> > Thanks, everyone for joining the discussion and giving feedback!. The > voting time for FLIP-79 has passed. I'm closing the vote now. > > There were seven +1 votes, 3 of which are binding: > - Bowen Li (binding) > - Kurt Yong (binding) > - Shuyi Chen (binding) > > - Terry Wang (non-binding) > - Xuefu Zhang (non-binding) > - Vino Yang (non-binding) > - Jingsong Lee (non-binding) > > There were no disapproving votes. Thus, FLIP-79 has been accepted. > Now, > >>> we > agreed on the function DDL syntax as listed on the FLIP > and also the initial execution plan for release 1.10. Currently, I > will > mainly work on these two Jira tickets. > > 1) https://issues.apache.org/jira/browse/FLINK-7151 > 2) https://issues.apache.org/jira/browse/FLINK-14711 > > > > Best Regards > Peter Huang > > > Best Regards > Peter Huang > > On Mon, Nov 11, 2019 at 7:42 PM Jingsong Li > wrote: > > > +1 (non-binding) > > > > Best, > > Jingsong Lee > > > > On Tue, Nov 12, 2019 at 9:49 AM vino yang > >>> wrote: > >> +1 (non-binding) > >> > >> Best, > >> Vino > >> > >> Xuefu Z 于2019年11月12日周二 上午3:27写道: > >> > >>> +1 (non-binding) > >>> > >>> On Mon, Nov 11, 2019 at 9:54 AM Shuyi Chen > wrote: > +1 (binding) > > On Sat, Nov 9, 2019 at 11:17 PM Kurt Young > wrote: > > +1 (binding) > > > > Best, > > Kurt > > > > > > On Sun, Nov 10, 2019 at 12:25 PM Peter Huang < > >>> huangzhenqiu0...@gmail.com > > wrote: > > > >> Hi Yu, > >> > >> Thanks for your reminder about the timeline of delivering > >>> the > > basic > >> function DDL in release 1.10. > >> As I replied to Xuefu, the "CREATE FUNCTION" and "DROP > FUNCTION" > >> can > >> relatively easy achieve by revising the existing PR. > >> Definitely, I probably need to start to work on a basic > >>> version > > of > >> PR > for > >> "ALTER FUNCTION" and "SHOW FUNCTIONS". > >> Please let me know if you have any suggestion to better > >>> align > the > > timeline > >> of the ongoing catalog related efforts. > >> > >> Best Regards > >> Peter Huang > >> > >> > >> On Sat, Nov 9, 2019 at 7:26 PM Yu Li > >>> wrote: > >>> Thanks for driving this Peter! > >>> > >>> I agree it would be great if we could include this > >>> feature in > >> 1.10. > >>> However, FWIW, since we are following the time-based > >>> release > >> policy > [1] > >> and > >>> 1.10 release is approaching its feature freeze (planned > >>> to be > > at > >>> the > > end > >> of > >>> November) [2], I'm a little bit concerned about the > >>> schedule. > >>> [1] >
[jira] [Created] (FLINK-14841) Add create and drop function DDL in parser
Zhenqiu Huang created FLINK-14841: - Summary: Add create and drop function DDL in parser Key: FLINK-14841 URL: https://issues.apache.org/jira/browse/FLINK-14841 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0 Reporter: Zhenqiu Huang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[ANNOUNCE] Launch of flink-packages.org: A website to foster the Flink Ecosystem
Hi all, I would like to announce that Ververica, with the permission of the Flink PMC, is launching a website called flink-packages.org. This goes back to an effort proposed earlier in 2019 [1] The idea of the site is to help developers building extensions / connectors / API etc. for Flink to get attention for their project. At the same time, we want to help Flink users to find those ecosystem projects, so that they can benefit from the work. A voting and commenting functionality allows users to rate and and discuss about individual packages. You can find the website here: https://flink-packages.org/ The full announcement is available here: https://www.ververica.com/blog/announcing-flink-community-packages I'm happy to hear any feedback about the site. Best, Robert [1] https://lists.apache.org/thread.html/c306b8b6d5d2ca071071b634d647f47769760e1e91cd758f52a62c93@%3Cdev.flink.apache.org%3E
Re: [VOTE] FLIP-79: Flink Function DDL Support (1.10 Release Feature Only)
Hey, I've just realized this FLIP was accepted but it was not reflected in the wiki page. Moreover I could not find a single jira issue that would link to the FLIP. I've seen though a few other issues and PRs being opened regarding this FLIP (FLINK-7151?, FLINK-14837, FLINK-14055). I was also a bit confused with the relationship between these issues and e.g. FLINK-10232. I tried to clarify a bit the way I understood it, but maybe somebody more familiar with the whole story could improve it further. Could some of the committers that voted on this proposal help with a bit of shepherding? Best, Dawid On 12/11/2019 07:59, Yu Li wrote: > Thanks Bowen and Peter! Great to know. > > Best Regards, > Yu > > > On Tue, 12 Nov 2019 at 13:38, Peter Huang > wrote: > >> Thank you in advance. Once the PR is done from my side, I will @ you and >> yu for review. >> >> On Mon, Nov 11, 2019 at 9:15 PM Bowen Li wrote: >> >>> Thanks Peter! I assigned the ticket to you, and I can help with reviewing >>> and merging PRs of this FLIP. >>> >>> Ccing 1.10 release manager Yu that we are on track of FLIP-79. >>> >>> Cheers, >>> Bowen >>> >>> On Mon, Nov 11, 2019 at 9:03 PM Peter Huang >>> wrote: >>> Thanks, everyone for joining the discussion and giving feedback!. The voting time for FLIP-79 has passed. I'm closing the vote now. There were seven +1 votes, 3 of which are binding: - Bowen Li (binding) - Kurt Yong (binding) - Shuyi Chen (binding) - Terry Wang (non-binding) - Xuefu Zhang (non-binding) - Vino Yang (non-binding) - Jingsong Lee (non-binding) There were no disapproving votes. Thus, FLIP-79 has been accepted. Now, >>> we agreed on the function DDL syntax as listed on the FLIP and also the initial execution plan for release 1.10. Currently, I will mainly work on these two Jira tickets. 1) https://issues.apache.org/jira/browse/FLINK-7151 2) https://issues.apache.org/jira/browse/FLINK-14711 Best Regards Peter Huang Best Regards Peter Huang On Mon, Nov 11, 2019 at 7:42 PM Jingsong Li wrote: > +1 (non-binding) > > Best, > Jingsong Lee > > On Tue, Nov 12, 2019 at 9:49 AM vino yang >>> wrote: >> +1 (non-binding) >> >> Best, >> Vino >> >> Xuefu Z 于2019年11月12日周二 上午3:27写道: >> >>> +1 (non-binding) >>> >>> On Mon, Nov 11, 2019 at 9:54 AM Shuyi Chen wrote: +1 (binding) On Sat, Nov 9, 2019 at 11:17 PM Kurt Young wrote: > +1 (binding) > > Best, > Kurt > > > On Sun, Nov 10, 2019 at 12:25 PM Peter Huang < >>> huangzhenqiu0...@gmail.com > wrote: > >> Hi Yu, >> >> Thanks for your reminder about the timeline of delivering >>> the > basic >> function DDL in release 1.10. >> As I replied to Xuefu, the "CREATE FUNCTION" and "DROP FUNCTION" >> can >> relatively easy achieve by revising the existing PR. >> Definitely, I probably need to start to work on a basic >>> version > of >> PR for >> "ALTER FUNCTION" and "SHOW FUNCTIONS". >> Please let me know if you have any suggestion to better >>> align the > timeline >> of the ongoing catalog related efforts. >> >> Best Regards >> Peter Huang >> >> >> On Sat, Nov 9, 2019 at 7:26 PM Yu Li >>> wrote: >>> Thanks for driving this Peter! >>> >>> I agree it would be great if we could include this >>> feature in >> 1.10. >>> However, FWIW, since we are following the time-based >>> release >> policy [1] >> and >>> 1.10 release is approaching its feature freeze (planned >>> to be > at >>> the > end >> of >>> November) [2], I'm a little bit concerned about the >>> schedule. >>> [1] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases >>> [2] >>> >>> >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-10-td32824.html >>> Best Regards, >>> Yu >>> >>> >>> On Sat, 9 Nov 2019 at 04:12, Xuefu Z > wrote: Hi Peter, Thanks for driving this. I'm all-in for this. However, >>> as I >> read the >>> latest FLIP, I have a couple of questions/comments: 1. It seems that "JVM" is proposed as a language type in >> parallel to python. I'm not sure that's very intuitive. JVM stands >>> for >> "java >> virtual machine", so the language is really "JAVA", correct? I >>> know >>> "scala" > is >>> also
Re: [DISCUSS] Release flink-shaded 9.0
@Chesnay: I know you said that you are pretty busy these days. If we can't find anybody else to work on this, when would you be available to create the first RC? On Sun, Nov 17, 2019 at 6:48 AM Hequn Cheng wrote: > Hi, > > Big +1 to release 9.0. > It would be good if we can solve these security vulnerabilities. > > Thanks a lot for your nice work and kick off the release so quickly. > > > On Fri, Nov 15, 2019 at 11:50 PM Ufuk Celebi wrote: > > > From what I can see, the Jackson version bump fixes quite a few > > vulnerabilities. Therefore, I'd be +1 to release flink-shaded 9.0. > > > > Thanks for all the work to verify this on master already. > > > > – Ufuk > > > > > > On Fri, Nov 15, 2019 at 2:26 PM Chesnay Schepler > > wrote: > > > > > Hello, > > > > > > I'd like to kick off the next release for flink-shaded. Background is > > > that we recently bumped jackson to 2.10.1 to fix a variety of security > > > vulnerabilities, and it would be good to include them in the upcoming > > > 1.8.3/1.9.2 releases. > > > > > > The release would contain few changes beyond the jackson changes; > > > flink-shaded can now be compiled on Java 11 and an encoding issue for > > > the NOTICE files was fixed. > > > > > > So overall this should be very little overhead. > > > > > > I have already verified that the master would work with this version > > > (this being a reasonable indicator for it also working in previous > > > version). > > > > > > I'd also appreciate it if someone would volunteer to handle the > release; > > > I'm quite bogged down at the moment :( > > > > > > Regards, > > > > > > Chesnay > > > > > > > > >
[jira] [Created] (FLINK-14840) Use new Executor interface in SQL cli
Aljoscha Krettek created FLINK-14840: Summary: Use new Executor interface in SQL cli Key: FLINK-14840 URL: https://issues.apache.org/jira/browse/FLINK-14840 Project: Flink Issue Type: Sub-task Components: Table SQL / Client Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Currently, the SQL cli has custom code for job deployment in {{ProgramDeployer}}. We should replace this by using the newly introduced {{Executor}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (binding) on having the test suite. BTW, it would be good to have a few more details about the performance tests. For example: 1. How do the testing records look like? The size and key distributions. 2. The resources for each task. 3. The intended configuration for the jobs. 4. What exact source and sink it would use. Thanks, Jiangjie (Becket) Qin On Mon, Nov 18, 2019 at 7:25 PM Zhijiang wrote: > +1 (binding)! > > It is a good thing to enhance our testing work. > > Best, > Zhijiang > > > -- > From:Hequn Cheng > Send Time:2019 Nov. 18 (Mon.) 18:22 > To:dev > Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework > > +1 (binding)! > I think this would be very helpful to detect regression problems. > > Best, Hequn > > On Mon, Nov 18, 2019 at 4:28 PM vino yang wrote: > > > +1 (non-binding) > > > > Best, > > Vino > > > > jincheng sun 于2019年11月18日周一 下午2:31写道: > > > > > +1 (binding) > > > > > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > > > > > +1 (non-binding) > > > > > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li > > wrote: > > > > > > > > > +1 (non-binding) > > > > > > > > > > Thanks Yu Li for driving on this. > > > > > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I would like to start the vote for FLIP-83 [1] which is discussed > > and > > > > > > reached consensus in the discussion thread [2]. > > > > > > > > > > > > The vote will be open for at least 72 hours (excluding weekend). > > I'll > > > > try > > > > > > to close it by 2019-11-20 21:00 CST, unless there is an objection > > or > > > > not > > > > > > enough votes. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > > > [2] https://s.apache.org/7fqrz > > > > > > > > > > > > Best Regards, > > > > > > Yu > > > > > > > > > > > > > > > > > > > > >
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (binding)! It is a good thing to enhance our testing work. Best, Zhijiang -- From:Hequn Cheng Send Time:2019 Nov. 18 (Mon.) 18:22 To:dev Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework +1 (binding)! I think this would be very helpful to detect regression problems. Best, Hequn On Mon, Nov 18, 2019 at 4:28 PM vino yang wrote: > +1 (non-binding) > > Best, > Vino > > jincheng sun 于2019年11月18日周一 下午2:31写道: > > > +1 (binding) > > > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > > > +1 (non-binding) > > > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li > wrote: > > > > > > > +1 (non-binding) > > > > > > > > Thanks Yu Li for driving on this. > > > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > > > Hi All, > > > > > > > > > > I would like to start the vote for FLIP-83 [1] which is discussed > and > > > > > reached consensus in the discussion thread [2]. > > > > > > > > > > The vote will be open for at least 72 hours (excluding weekend). > I'll > > > try > > > > > to close it by 2019-11-20 21:00 CST, unless there is an objection > or > > > not > > > > > enough votes. > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > > [2] https://s.apache.org/7fqrz > > > > > > > > > > Best Regards, > > > > > Yu > > > > > > > > > > > > > >
SQL for Avro GenericRecords on Parquet
I have tried to persist Generic Avro records in a parquet file and then read it via ParquetTablesource – using SQL. Seems that the SQL I not executed properly ! The persisted records are : Id , type 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 22,Type2 333,Type1 22,Type2 While SQL of SELECT id ,recordType_ FROM ParquetTable - return the above ( which is correct) Running : "SELECT id ,recordType_ FROM ParquetTable where recordType_='Type1' " Will result in : 333,Type1 22,Type1 333,Type1 22,Type1 333,Type1 22,Type1 333,Type1 22,Type1 333,Type1 22,Type1 333,Type1 22,Type1 As if the equal sign is assignment and not equal … am I doing something wrong ? is it an issue of Generic record vs SpecificRecords ?
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (binding)! I think this would be very helpful to detect regression problems. Best, Hequn On Mon, Nov 18, 2019 at 4:28 PM vino yang wrote: > +1 (non-binding) > > Best, > Vino > > jincheng sun 于2019年11月18日周一 下午2:31写道: > > > +1 (binding) > > > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > > > +1 (non-binding) > > > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li > wrote: > > > > > > > +1 (non-binding) > > > > > > > > Thanks Yu Li for driving on this. > > > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > > > Hi All, > > > > > > > > > > I would like to start the vote for FLIP-83 [1] which is discussed > and > > > > > reached consensus in the discussion thread [2]. > > > > > > > > > > The vote will be open for at least 72 hours (excluding weekend). > I'll > > > try > > > > > to close it by 2019-11-20 21:00 CST, unless there is an objection > or > > > not > > > > > enough votes. > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > > [2] https://s.apache.org/7fqrz > > > > > > > > > > Best Regards, > > > > > Yu > > > > > > > > > > > > > >
[jira] [Created] (FLINK-14839) JobGraph#userJars & JobGraph#classpaths are non-null
Zili Chen created FLINK-14839: - Summary: JobGraph#userJars & JobGraph#classpaths are non-null Key: FLINK-14839 URL: https://issues.apache.org/jira/browse/FLINK-14839 Project: Flink Issue Type: Improvement Reporter: Zili Chen Assignee: Zili Chen Fix For: 1.10.0 So we can improve our codes a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] PyFlink User-Defined Function Resource Management
Thanks for driving this discussion, Dian! +1 for this proposal. It will help to reduce container failure due to the memory overuse. Some comments left in the design doc. Best, Yangze Guo On Mon, Nov 18, 2019 at 4:06 PM Xintong Song wrote: > > Sorry for the late reply. > > +1 for the general proposal. > > And one remainder, to use UNKNOWN resource requirement, we need to make > sure optimizer knowns which operators use off-heap managed memory, and > compute and set a fraction to the operators. See FLIP-53[1] for more > details, and I would suggest you to double check with @Zhu Zhu who works on > this part. > > Thank you~ > > Xintong Song > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > > On Tue, Nov 12, 2019 at 11:53 AM Dian Fu wrote: > > > Hi Jincheng, > > > > Thanks for the reply and also looking forward to the feedback from the > > community. > > > > Thanks, > > Dian > > > > > 在 2019年11月11日,下午2:34,jincheng sun 写道: > > > > > > Hi all, > > > > > > +1, Thanks for bring up this discussion Dian! > > > > > > The Resource Management is very important for PyFlink UDF. So, It's great > > > if anyone can add more comments or inputs in the design doc or feedback > > in > > > ML. :) > > > > > > Best, > > > Jincheng > > > > > > Dian Fu 于2019年11月5日周二 上午11:32写道: > > > > > >> Hi everyone, > > >> > > >> In FLIP-58[1] it will add the support of Python user-defined stateless > > >> function for Python Table API. It will launch a separate Python process > > for > > >> Python user-defined function execution. The resources used by the Python > > >> process should be managed properly by Flink’s resource management > > >> framework. FLIP-49[2] has proposed a unified memory management framework > > >> and PyFlink user-defined function resource management should be based on > > >> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about > > this. I > > >> draft a design doc[3] and want to start a discussion about PyFlink > > >> user-defined function resource management. > > >> > > >> Welcome any comments on the design doc or giving us feedback on the ML > > >> directly. > > >> > > >> Regards, > > >> Dian > > >> > > >> [1] > > >> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table > > >> [2] > > >> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > >> [3] > > >> > > https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m > > > >
Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework
+1 (non-binding) Best, Vino jincheng sun 于2019年11月18日周一 下午2:31写道: > +1 (binding) > > OpenInx 于2019年11月18日周一 下午12:09写道: > > > +1 (non-binding) > > > > On Mon, Nov 18, 2019 at 11:54 AM aihua li wrote: > > > > > +1 (non-binding) > > > > > > Thanks Yu Li for driving on this. > > > > > > > 在 2019年11月15日,下午8:10,Yu Li 写道: > > > > > > > > Hi All, > > > > > > > > I would like to start the vote for FLIP-83 [1] which is discussed and > > > > reached consensus in the discussion thread [2]. > > > > > > > > The vote will be open for at least 72 hours (excluding weekend). I'll > > try > > > > to close it by 2019-11-20 21:00 CST, unless there is an objection or > > not > > > > enough votes. > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework > > > > [2] https://s.apache.org/7fqrz > > > > > > > > Best Regards, > > > > Yu > > > > > > > > >
[jira] [Created] (FLINK-14838) Cleanup the description about container number config option in Scala and python shell doc
vinoyang created FLINK-14838: Summary: Cleanup the description about container number config option in Scala and python shell doc Key: FLINK-14838 URL: https://issues.apache.org/jira/browse/FLINK-14838 Project: Flink Issue Type: Improvement Components: Documentation Reporter: vinoyang Currently, the config option {{-n}} for Flink on Yarn has not been supported since Flink 1.8+. FLINK-12362 did the cleanup job about this config option. However, the scala shell and python doc still contains some description about {{-n}} which may make users confused. This issue used to track the cleanup work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14837) Support Function DDL in TableEnvironment
Zhenqiu Huang created FLINK-14837: - Summary: Support Function DDL in TableEnvironment Key: FLINK-14837 URL: https://issues.apache.org/jira/browse/FLINK-14837 Project: Flink Issue Type: Sub-task Components: Table SQL / API Affects Versions: 1.10.0 Reporter: Zhenqiu Huang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14836) Unable to set yarn container number for scala shell in yarn mode
Jeff Zhang created FLINK-14836: -- Summary: Unable to set yarn container number for scala shell in yarn mode Key: FLINK-14836 URL: https://issues.apache.org/jira/browse/FLINK-14836 Project: Flink Issue Type: Bug Affects Versions: 1.10.0 Reporter: Jeff Zhang In FLINK-12362, "-n" is removed which cause unable to set container number for scala shell in yarn mode. The root cause is that start-scala-shell didn't use `flink run` which means it doesn't support all the parameters of flink run (including -p). Either we need to revert FLINK-12362, or make start-scala-shell use `flink run` -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] PyFlink User-Defined Function Resource Management
Sorry for the late reply. +1 for the general proposal. And one remainder, to use UNKNOWN resource requirement, we need to make sure optimizer knowns which operators use off-heap managed memory, and compute and set a fraction to the operators. See FLIP-53[1] for more details, and I would suggest you to double check with @Zhu Zhu who works on this part. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management On Tue, Nov 12, 2019 at 11:53 AM Dian Fu wrote: > Hi Jincheng, > > Thanks for the reply and also looking forward to the feedback from the > community. > > Thanks, > Dian > > > 在 2019年11月11日,下午2:34,jincheng sun 写道: > > > > Hi all, > > > > +1, Thanks for bring up this discussion Dian! > > > > The Resource Management is very important for PyFlink UDF. So, It's great > > if anyone can add more comments or inputs in the design doc or feedback > in > > ML. :) > > > > Best, > > Jincheng > > > > Dian Fu 于2019年11月5日周二 上午11:32写道: > > > >> Hi everyone, > >> > >> In FLIP-58[1] it will add the support of Python user-defined stateless > >> function for Python Table API. It will launch a separate Python process > for > >> Python user-defined function execution. The resources used by the Python > >> process should be managed properly by Flink’s resource management > >> framework. FLIP-49[2] has proposed a unified memory management framework > >> and PyFlink user-defined function resource management should be based on > >> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about > this. I > >> draft a design doc[3] and want to start a discussion about PyFlink > >> user-defined function resource management. > >> > >> Welcome any comments on the design doc or giving us feedback on the ML > >> directly. > >> > >> Regards, > >> Dian > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table > >> [2] > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > >> [3] > >> > https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m > >