[jira] [Created] (FLINK-15240) is_generic key is missing for Flink table stored in HiveCatalog
Bowen Li created FLINK-15240: Summary: is_generic key is missing for Flink table stored in HiveCatalog Key: FLINK-15240 URL: https://issues.apache.org/jira/browse/FLINK-15240 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.10.0, 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15239) TM Metaspace memory leak
Rui Li created FLINK-15239: -- Summary: TM Metaspace memory leak Key: FLINK-15239 URL: https://issues.apache.org/jira/browse/FLINK-15239 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Reporter: Rui Li Fix For: 1.10.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15238) A sql can't generate a valid execution plan
xiaojin.wy created FLINK-15238: -- Summary: A sql can't generate a valid execution plan Key: FLINK-15238 URL: https://issues.apache.org/jira/browse/FLINK-15238 Project: Flink Issue Type: Bug Components: Table SQL / Client Affects Versions: 1.10.0 Reporter: xiaojin.wy The table and the query is like this: After execution the sql, the exception will appear: [ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query: LogicalProject(deptno=[$0], x=[$3]) LogicalJoin(condition=[true], joinType=[left]) LogicalTableScan(table=[[default_catalog, default_database, scott_dept]]) LogicalSort(sort0=[$0], dir0=[ASC], fetch=[1]) LogicalProject(empno=[$0]) LogicalTableScan(table=[[default_catalog, default_database, scott_emp]]) This exception indicates that the query uses an unsupported SQL feature. Please check the documentation for the set of currently supported SQL features. The whole exception is: Caused by: org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query:Caused by: org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query: LogicalProject(deptno=[$0], x=[$3]) LogicalJoin(condition=[true], joinType=[left]) LogicalTableScan(table=[[default_catalog, default_database, scott_dept]]) LogicalSort(sort0=[$0], dir0=[ASC], fetch=[1]) LogicalProject(empno=[$0]) LogicalTableScan(table=[[default_catalog, default_database, scott_emp]]) This exception indicates that the query uses an unsupported SQL feature.Please check the documentation for the set of currently supported SQL features. at org.apache.flink.table.plan.Optimizer.runVolcanoPlanner(Optimizer.scala:284) at org.apache.flink.table.plan.Optimizer.optimizeLogicalPlan(Optimizer.scala:199) at org.apache.flink.table.plan.StreamOptimizer.optimize(StreamOptimizer.scala:66) at org.apache.flink.table.planner.StreamPlanner.translateToType(StreamPlanner.scala:389) at org.apache.flink.table.planner.StreamPlanner.writeToRetractSink(StreamPlanner.scala:308) at org.apache.flink.table.planner.StreamPlanner.org$apache$flink$table$planner$StreamPlanner$$writeToSink(StreamPlanner.scala:272) at org.apache.flink.table.planner.StreamPlanner$$anonfun$2.apply(StreamPlanner.scala:166) at org.apache.flink.table.planner.StreamPlanner$$anonfun$2.apply(StreamPlanner.scala:145) at scala.Option.map(Option.scala:146) at org.apache.flink.table.planner.StreamPlanner.org$apache$flink$table$planner$StreamPlanner$$translate(StreamPlanner.scala:145) at org.apache.flink.table.planner.StreamPlanner$$anonfun$translate$1.apply(StreamPlanner.scala:117) at org.apache.flink.table.planner.StreamPlanner$$anonfun$translate$1.apply(StreamPlanner.scala:117) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.StreamPlanner.translate(StreamPlanner.scala:117) at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:680) at org.apache.flink.table.api.internal.TableEnvironmentImpl.insertIntoInternal(TableEnvironmentImpl.java:353) at org.apache.flink.table.api.internal.TableEnvironmentImpl.insertInto(TableEnvironmentImpl.java:341) at org.apache.flink.table.api.internal.TableImpl.insertInto(TableImpl.java:428) at org.apache.flink.table.client.gateway.local.LocalExecutor.lambda$executeQueryInternal$12(LocalExecutor.java:640) at org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:227) at org.apache.flink.table.client.gateway.local.LocalExecutor.executeQueryInternal(LocalExecutor.java:638) ... 8 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15237) CsvTableSource and ref class better to Extract into a module
xiaodao created FLINK-15237: --- Summary: CsvTableSource and ref class better to Extract into a module Key: FLINK-15237 URL: https://issues.apache.org/jira/browse/FLINK-15237 Project: Flink Issue Type: Improvement Components: Table SQL / API Affects Versions: 1.10.0 Reporter: xiaodao current the CsvTable ref class ,eg CsvTableSource is contain in flink-table-api-java-bridge, i think it's better to extract as a new connector module -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15236) Add a safety net for concurrent checkpoints on TM side
Congxian Qiu(klion26) created FLINK-15236: - Summary: Add a safety net for concurrent checkpoints on TM side Key: FLINK-15236 URL: https://issues.apache.org/jira/browse/FLINK-15236 Project: Flink Issue Type: Improvement Components: Runtime / Checkpointing Reporter: Congxian Qiu(klion26) As discussed in FLINK-13808, we can add additional config {{taskmanager.checkpoints.max-concurrent}} that limits the number of concurrent checkpoints on the TM for safety net. this configure {{taskmanager.checkpoints.max-concurrent}}, and the default value for maxConcurrentCheckpoints=1 is 1 and unlimited for maxConcurrentCheckpoints > 1. * If maxConcurrentCheckpoints = 1, the default {{taskmanager.checkpoints.max-concurrent}} is 1. * If maxConcurrentCheckpoints > 1 the default value for {{taskmanager.checkpoints.max-concurrent}}, is unlimited should not take manually triggered checkpoints/savepoints into account. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15235) create a Flink distribution for hive that includes all Hive dependencies
Bowen Li created FLINK-15235: Summary: create a Flink distribution for hive that includes all Hive dependencies Key: FLINK-15235 URL: https://issues.apache.org/jira/browse/FLINK-15235 Project: Flink Issue Type: Task Components: Connectors / Hive, Release System Reporter: Bowen Li consider create a Flink distribution for hive that includes all Hive dependencies, despite the existing FLink only distribution, to improve good quickstart experience for hive users -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15234) hive table created from flink catalog table cannot have null properties in parameters
Bowen Li created FLINK-15234: Summary: hive table created from flink catalog table cannot have null properties in parameters Key: FLINK-15234 URL: https://issues.apache.org/jira/browse/FLINK-15234 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.10.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15233) Improve Kafka connector properties make append update-mode as default
Jark Wu created FLINK-15233: --- Summary: Improve Kafka connector properties make append update-mode as default Key: FLINK-15233 URL: https://issues.apache.org/jira/browse/FLINK-15233 Project: Flink Issue Type: Bug Components: Connectors / Kafka Reporter: Jark Wu Currently, {{update-mode}} is a required properties of Kafka. However, Kafka only support append {{update-mode}}. It is weird and un-user-friendly to make users set such a properties mandatory. We can make this properties as optional and {{append}} by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15232) Print match candidates to improve NoMatchingTableFactoryException
Jingsong Lee created FLINK-15232: Summary: Print match candidates to improve NoMatchingTableFactoryException Key: FLINK-15232 URL: https://issues.apache.org/jira/browse/FLINK-15232 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Jingsong Lee Fix For: 1.10.0 Currently, all the required properties should exist and match, otherwise, {{NoMatchingTableFactoryException}} will be thrown. We can pick a best candidate to print where is wrong, print requiredContext and supportedProperties in different formats. In this way, users can know which property in requiredContext is lack, and which property is not allow in supportedProperties. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15231) Wrong HeapVector in AbstractHeapVector.createHeapColumn
Zhenghua Gao created FLINK-15231: Summary: Wrong HeapVector in AbstractHeapVector.createHeapColumn Key: FLINK-15231 URL: https://issues.apache.org/jira/browse/FLINK-15231 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.10.0 Reporter: Zhenghua Gao Fix For: 1.10.0 For TIMESTAMP WITHOUT TIME ZONE/TIMESTAMP WITH LOCAL TIME ZONE/DECIMAL types, AbstractHeapVector.createHeapColumn generates wrong HeapVectors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP policy for introducing config option keys
Hi all, It seems that this discussion is idle, but I want to resume it because I think this will make our community run faster and keep us in the safe side. I would like to summarize the discussions so far (please correct me if I'm wrong): 1. we all agree to have a VOTE in mailing list for such small API changes 2. the voting period can be shorter (48h?) -- Dawid, Jark, Timo 3. requires a formal FLIP process is too heavy -- Zili Chen, Jark, Dawid, Xingtong - FLIP number will explode - difficult to track major design changes - we can loosen some formal requirements for such changes 4. introduce a new kind of process -- Hequn, Jark, Xingtong -- My proposal is introducing a new process similar to Xingtong pointed. 1. For small API changes, we don't need a formal FLIP process. 2. The discussion can happen in JIRA issue, a [DISCUSS] thread is not mandatory. 3. The JIRA issue should describe API changes clearly in the description. 4. Once the proposal is finalized call a [VOTE] to have the proposal adopted. The vote requires 3 binding votes (Committers), and at least 2 days. This is a new kind of voting actions which should be added to Flink Bylaws. - Further discussion: 1. We need to define **what is small API changes**. 2. Do we need a place (wiki?) to track all such proposals/JIRA issues and how? Best, Jark On Wed, 16 Oct 2019 at 17:56, Timo Walther wrote: > Hi all, > > I agree with Jark. Having a voting with at least 3 binding votes makes > sense for API changes. It also forces people to question the > introduction of another config option that might make the configuration > of Flink more complicated. A FLIP is usually a bigger effort with long > term impacts on the general usability. A shorter voting period of 48 > hours for just a little config option sounds reasonable to me. > > Regards, > Timo > > On 16.10.19 10:36, Xintong Song wrote: > > Hi all, > > > > I think config option changes deserves a voting process, but not > > necessarily a FLIP. > > > > My concern for always having FLIPs on config option changes is that, we > > might result in too many FLIPs, which makes it difficult for people who > > wants to track the major design changes other than pure config option > > changes. > > > > > > My two cents, > > > > - For config option changes introduced by FLIPs, they need to be > explicitly > > described in the FLIP document and voted. We do have a section 'Public > > Interface' for that in the FLIP template. > > > > - For config option changes introduced by other JIRA tickets / PRs, they > > need to be voted in the ML. We can add a statement 'whether the PR > > introduce any config option changes' in the PR template, and if the > answer > > is yes, a link to the ML vote thread is required to be attached before > > getting the PR merged. > > > > > > What do you think? > > > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Oct 16, 2019 at 2:50 PM Zili Chen wrote: > > > >> Hi Jark & Hequn, > >> > >> Do you stick to introduce a looser FLIP? We possibly introduce a > redundant > >> extra type > >> of community consensus if we are able to just reuse the process of > current > >> FLIP. Given > >> the activity of our community I don't think it costs too much for a > config > >> option keys > >> change with 3 days at least voting required >3 committer votes. > >> > >> Best, > >> tison. > >> > >> > >> Hequn Cheng 于2019年10月16日周三 下午2:29写道: > >> > >>> Hi all, > >>> > >>> +1 to have a looser FLIP policy for these API changes. > >>> > >>> I think the concerns raised above are all valid. Besides the feedbacks > >> from > >>> @Jark, if we want to keep track of these changes, maybe we can create a > >> new > >>> kind of FLIP that is dedicated to these minor API changes? For example, > >> we > >>> can add a single wiki page and list all related JIRAs in it. The design > >>> details can be described in the JIRA. > >>> Another option is to simply add a new JIRA label to track these > changes. > >>> > >>> What do you think? > >>> > >>> Best, Hequn > >>> > >>> On Tue, Oct 15, 2019 at 8:43 PM Zili Chen > wrote: > >>> > The naming concern above can be a separated issue since it looks also > affect FLIP-54 and isn't limited for config option changes FLIP. > > Best, > tison. > > > Aljoscha Krettek 于2019年10月15日周二 下午8:37写道: > > > Another PR that introduces new config options: > > https://github.com/apache/flink/pull/9759 > > > >> On 15. Oct 2019, at 14:31, Zili Chen wrote: > >> > >> Hi Aljoscha & Dawid & Kostas, > >> > >> I agree that changes on config option keys deserve a FLIP and it is > >> reasonable > >> we commit the changes with a standard FLIP process so that ensure > >> the > > change > >> given proper visibility. > >> > >> My concern is about naming. Give
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi Timo, I understand we need further discussion about syntax/dialect for 1.11. But as Jark has pointed out, the current implementation violates the accepted design of FLIP-63, which IMO qualifies as a bug. Given that it's a bug and has great impact on the usability of our Hive integration, do you think we can fix it in 1.10? On Fri, Dec 13, 2019 at 12:24 AM Jingsong Li wrote: > Hi Timo, > > I am OK if you think they are not bug and they should not be included in > 1.10. > > I think they have been accepted in FLIP-63. And there is no objection. It > has been more than three months since the discussion of FLIP-63. It's been > six months since Flink added these two syntaxs. > > But I can also start discussion and vote thread for FLIP-63 again, to make > sure once again that everyone is happy. > > Best, > Jingsong Lee > > On Thu, Dec 12, 2019 at 11:35 PM Timo Walther wrote: > > > Hi Jingsong, > > > > I will also add my opinion here for future discussions: > > > > We had long discussions around SQL syntax in the past (e.g. for > > WATERMARK or the concept of SYSTEM/TEMPORARY catalog objects) but in the > > end all parties were happy and we came up with a good long-term solution > > that is unlikely to be changed in the near future. IMHO we can differ > > from the standard esp. for DDL statements (every SQL vendors has custom > > syntax there) but still we should have time to hear many opinions. > > However, for DQL and DML statements we need to be even more cautious > > because here we enter SQL standard areas. > > > > Happy to discuss a partion syntax for 1.11 and go with the solution that > > Jark proposed using a config option. > > > > Thanks, > > Timo > > > > > > On 12.12.19 09:40, Jark Wu wrote: > > > Hi Jingsong, > > > > > > Thanks for the explanation, I think I misunderstood your point at the > > > begining. > > > As FLIP-63 proposed, INSERT OVERWRITE and INSERT PARTITION syntax are > > added > > > to Flink's > > > SQL syntax, but CREATE PARTITION TABLE should be limited under Hive > > > dialect. > > > However, the current implementation is opposite, INSERT OVERWRITE > > > and INSERT PARTITION are > > > under the dialect limitation, but CREATE PARTITION TABLE is not. > > > > > > So it is indeed a bug which should be fixed in 1.10. > > > > > > Best, > > > Jark > > > > > > On Thu, 12 Dec 2019 at 16:35, Jingsong Li > > wrote: > > > > > >> Hi Jark, > > >> > > >> Let's recall FLIP-63, > > >> We supported these syntax in hive dialect at 1.9. All of my reasons > for > > >> launching FLIP-63 are to bring partition support to Flink itself. > > >> Not only batch, but also we have the need to stream jobs to write > > partition > > >> files today, which is also one of our very important application > > scenarios. > > >> > > >> The original intention of FLIP-63 is to bring all partition syntax to > > >> Flink, but in the end you and I have some different opinion in > creating > > >> partition table, so our consensus is to leave it in hive dialect, only > > it. > > >> > > >> [1] > > >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > >> > > >> Best, > > >> Jingsong Lee > > >> > > >> On Thu, Dec 12, 2019 at 4:05 PM Jark Wu wrote: > > >> > > >>> Hi jingsong, > > >>> > > >>> Watermark is not a standard syntax, that's why we had a FLIP and long > > >>> discussion to add it to > > >>> Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and > > >>> PARTITION syntax to > > >>> Flink's own syntax, we also need a FLIP or a VOTE, and this may > > can't > > >>> happen soon (we should > > >>> hear more people's opinions on this). > > >>> > > >>> Regarding to the sql-dialect configuration, I was not saying to > involve > > >> the > > >>> whole FLIP-89. I mean we can just > > >>> start a VOTE to expose it as `table.planner.sql-dialect` and include > it > > >> in > > >>> 1.10. The change can be very small, by > > >>> adding a ConfigOption and changing the implementation of > > >>> TableConfig#getSqlDialect/setSqlDialect. I believe > > >>> it is smaller and safer than changing the parser. > > >>> > > >>> Btw, I cc'ed Yu Li and Gary into the discussion, because release > > managers > > >>> should be aware of this. > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> > > >>> > > >>> On Thu, 12 Dec 2019 at 11:47, Danny Chan > wrote: > > >>> > > Thanks Jingsong for bring up this discussion ~ > > > > After reviewing FLIP-63, it seems that we have made a conclusion for > > >> the > > syntax > > > > - INSERT OVERWRITE ... > > - INSERT INTO … PARTITION > > > > Which means that they should not have the Hive dialect limitation, > so > > >> I’m > > inclined that the behaviors for SQL-CLI is unexpected, or a “bug” > that > > >>> need > > to fix. > > > > We did not make a conclusion for the syntax: > > > > - CREATE TABLE … PARTITIONED BY ... > > > > Which means that the behavior of i
[jira] [Created] (FLINK-15230) flink1.9.1 table API JSON schema array type exception
kevin created FLINK-15230: - Summary: flink1.9.1 table API JSON schema array type exception Key: FLINK-15230 URL: https://issues.apache.org/jira/browse/FLINK-15230 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table SQL / Planner Affects Versions: 1.9.1, 1.9.0 Environment: flink1.9.1 Reporter: kevin strings: \{ type: 'array', items: { type: 'string' } } .field("strings", Types.OBJECT_ARRAY(Types.STRING)) Exception in thread "main" org.apache.flink.table.api.ValidationException: Type LEGACY(BasicArrayTypeInfo) of table field 'strings' does not match with type BasicArrayTypeInfo of the field 'strings' of the TableSource return type.Exception in thread "main" org.apache.flink.table.api.ValidationException: Type LEGACY(BasicArrayTypeInfo) of table field 'strings' does not match with type BasicArrayTypeInfo of the field 'strings' of the TableSource return type. at org.apache.flink.table.planner.sources.TableSourceUtil$$anonfun$4.apply(TableSourceUtil.scala:121) at org.apache.flink.table.planner.sources.TableSourceUtil$$anonfun$4.apply(TableSourceUtil.scala:92) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.flink.table.planner.sources.TableSourceUtil$.computeIndexMapping(TableSourceUtil.scala:92) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:100) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:55) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:54) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlan(StreamExecTableSourceScan.scala:55) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:86) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:46) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:54) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlan(StreamExecCalc.scala:46) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToTransformation(StreamExecSink.scala:185) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:154) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:50) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:54) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:50) at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:61) at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:60) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:60) at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:149) at org.apache.flink.table.api.java.internal.StreamTableEnvironmentImpl.toDataStream(StreamTableEnvironmentImpl.java:319) at org.apache.flink.table.api.java.internal.StreamTableEnvironmentImpl.toAppendStream(StreamTableEnvironmentImpl.java:227) at org.apache.flink.table.api.java.internal.StreamTableEnvironmentImpl.toAppendStream(StreamTableEnvironmentImpl.java:218) at com.wf.flink.sql.TableSqlJson.main(TableSqlJson.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.Nati
[jira] [Created] (FLINK-15229) DDL for kafka connector following documentation doesn't work
Bowen Li created FLINK-15229: Summary: DDL for kafka connector following documentation doesn't work Key: FLINK-15229 URL: https://issues.apache.org/jira/browse/FLINK-15229 Project: Flink Issue Type: Bug Components: Connectors / Kafka, Documentation Reporter: Bowen Li Fix For: 1.10.0 https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connect.html#kafka-connector -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15228) Drop vendor specific deployment documentation
Seth Wiesman created FLINK-15228: Summary: Drop vendor specific deployment documentation Key: FLINK-15228 URL: https://issues.apache.org/jira/browse/FLINK-15228 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Seth Wiesman Fix For: 1.10.0, 1.11.0 Based on a mailing list discussion we want to drop vendor specific deployment documentation ml discussion: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-vendor-specific-deployment-documentation-td35457.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15227) Fix broken documentation build
Seth Wiesman created FLINK-15227: Summary: Fix broken documentation build Key: FLINK-15227 URL: https://issues.apache.org/jira/browse/FLINK-15227 Project: Flink Issue Type: Bug Reporter: Seth Wiesman Fix For: 1.10.0, 1.11.0 Incremental build: enabled Generating... Liquid Exception: Liquid syntax error (line 421): 'highlight' tag was never closed in dev/table/catalogs.zh.md Liquid syntax error (line 421): 'highlight' tag was never closed /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/block.rb:63:in `block in parse_body' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-c-4.0.0/lib/liquid/c.rb:30:in `block in parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-c-4.0.0/lib/liquid/c.rb:30:in `c_parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-c-4.0.0/lib/liquid/c.rb:30:in `parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/block.rb:58:in `parse_body' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/block.rb:12:in `parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/tag.rb:10:in `parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-c-4.0.0/lib/liquid/c.rb:30:in `c_parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-c-4.0.0/lib/liquid/c.rb:30:in `parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/document.rb:10:in `parse' /Users/sjwiesman/flink/docs/.rubydeps/ruby/2.5.0/gems/liquid-4.0.3/lib/liquid/document.rb:5:in `parse' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15226) Running job with application parameters fails on a job cluster
Biju Nair created FLINK-15226: - Summary: Running job with application parameters fails on a job cluster Key: FLINK-15226 URL: https://issues.apache.org/jira/browse/FLINK-15226 Project: Flink Issue Type: Bug Components: Runtime / Task Reporter: Biju Nair Trying to move a job which takes in application parameters running on a session cluster to a job cluster and it fails with the following error in the task manager {noformat} 2019-11-23 01:29:16,498 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Could not parse the command line options. org.apache.flink.runtime.entrypoint.FlinkParseException: Failed to parse the command line arguments. at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:52) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.loadConfiguration(TaskManagerRunner.java:315) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:284) Caused by: org.apache.commons.cli.MissingOptionException: Missing required option: c at org.apache.commons.cli.DefaultParser.checkRequiredOptions(DefaultParser.java:199) at org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:130) at org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:81) at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:50) ... 2 more usage: TaskManagerRunner -c [-D ] -c,--configDir Directory which contains the configuration file flink-conf.yml. -D use value for given property Exception in thread "main" org.apache.flink.runtime.entrypoint.FlinkParseException: Failed to parse the command line arguments. at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:52) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.loadConfiguration(TaskManagerRunner.java:315) at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:284) Caused by: org.apache.commons.cli.MissingOptionException: Missing required option: c at org.apache.commons.cli.DefaultParser.checkRequiredOptions(DefaultParser.java:199) at org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:130) at org.apache.commons.cli.DefaultParser.parse(DefaultParser.java:81) at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:50) ... 2 more{noformat} Looking at the code, this may be due to adding the {{configDir}} parameter at the end in t{{askManagers.sh}} script [here|https://github.com/apache/flink/blob/d43a1dd397dbc7c0559a07b83380fed164114241/flink-dist/src/main/flink-bin/bin/taskmanager.sh#L73]. Based on the commandline parser [logic|https://github.com/apache/flink/blob/6258a4c333ce9dba914621b13eac2f7d91f5cb72/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/parser/CommandLineParser.java#L50], parsing will stop once a parameter with out {{-D}} or {{--configDir/-c}} is encountered which in this case is true. If this diagnosis is correct, can the {{configDir}} parameter can be added to the beginning instead of the end in the args list in the job manager shell script? Thanks for your input in advance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Improve documentation / tooling around security of Flink
Hi all, There was recently a private report to the Flink PMC, as well as publicly [1] about Flink's ability to execute arbitrary code. In scenarios where Flink is accessible by somebody unauthorized, this can lead to issues. The PMC received a similar report in November 2018. I believe it would be good to warn our users a bit more prominently about the risks of accidentally opening up Flink to the public internet, or other unauthorized entities. I have collected the following potential solutions discussed so far: a) Add a check-security.sh script, or a check into the frontend if the JobManager can be reached on the public internet b) Add a prominent warning to the download page c) add an opt-out warning to the Flink logs / UI that can be disabled via the config. d) Bind the REST endpoint to localhost only, by default I'm curious to hear if others have other ideas what to do. I personally like to kick things off with b). Best, Robert [1] https://twitter.com/pyn3rd/status/1197397475897692160
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi Timo, I am OK if you think they are not bug and they should not be included in 1.10. I think they have been accepted in FLIP-63. And there is no objection. It has been more than three months since the discussion of FLIP-63. It's been six months since Flink added these two syntaxs. But I can also start discussion and vote thread for FLIP-63 again, to make sure once again that everyone is happy. Best, Jingsong Lee On Thu, Dec 12, 2019 at 11:35 PM Timo Walther wrote: > Hi Jingsong, > > I will also add my opinion here for future discussions: > > We had long discussions around SQL syntax in the past (e.g. for > WATERMARK or the concept of SYSTEM/TEMPORARY catalog objects) but in the > end all parties were happy and we came up with a good long-term solution > that is unlikely to be changed in the near future. IMHO we can differ > from the standard esp. for DDL statements (every SQL vendors has custom > syntax there) but still we should have time to hear many opinions. > However, for DQL and DML statements we need to be even more cautious > because here we enter SQL standard areas. > > Happy to discuss a partion syntax for 1.11 and go with the solution that > Jark proposed using a config option. > > Thanks, > Timo > > > On 12.12.19 09:40, Jark Wu wrote: > > Hi Jingsong, > > > > Thanks for the explanation, I think I misunderstood your point at the > > begining. > > As FLIP-63 proposed, INSERT OVERWRITE and INSERT PARTITION syntax are > added > > to Flink's > > SQL syntax, but CREATE PARTITION TABLE should be limited under Hive > > dialect. > > However, the current implementation is opposite, INSERT OVERWRITE > > and INSERT PARTITION are > > under the dialect limitation, but CREATE PARTITION TABLE is not. > > > > So it is indeed a bug which should be fixed in 1.10. > > > > Best, > > Jark > > > > On Thu, 12 Dec 2019 at 16:35, Jingsong Li > wrote: > > > >> Hi Jark, > >> > >> Let's recall FLIP-63, > >> We supported these syntax in hive dialect at 1.9. All of my reasons for > >> launching FLIP-63 are to bring partition support to Flink itself. > >> Not only batch, but also we have the need to stream jobs to write > partition > >> files today, which is also one of our very important application > scenarios. > >> > >> The original intention of FLIP-63 is to bring all partition syntax to > >> Flink, but in the end you and I have some different opinion in creating > >> partition table, so our consensus is to leave it in hive dialect, only > it. > >> > >> [1] > >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > >> > >> Best, > >> Jingsong Lee > >> > >> On Thu, Dec 12, 2019 at 4:05 PM Jark Wu wrote: > >> > >>> Hi jingsong, > >>> > >>> Watermark is not a standard syntax, that's why we had a FLIP and long > >>> discussion to add it to > >>> Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and > >>> PARTITION syntax to > >>> Flink's own syntax, we also need a FLIP or a VOTE, and this may > can't > >>> happen soon (we should > >>> hear more people's opinions on this). > >>> > >>> Regarding to the sql-dialect configuration, I was not saying to involve > >> the > >>> whole FLIP-89. I mean we can just > >>> start a VOTE to expose it as `table.planner.sql-dialect` and include it > >> in > >>> 1.10. The change can be very small, by > >>> adding a ConfigOption and changing the implementation of > >>> TableConfig#getSqlDialect/setSqlDialect. I believe > >>> it is smaller and safer than changing the parser. > >>> > >>> Btw, I cc'ed Yu Li and Gary into the discussion, because release > managers > >>> should be aware of this. > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> > >>> On Thu, 12 Dec 2019 at 11:47, Danny Chan wrote: > >>> > Thanks Jingsong for bring up this discussion ~ > > After reviewing FLIP-63, it seems that we have made a conclusion for > >> the > syntax > > - INSERT OVERWRITE ... > - INSERT INTO … PARTITION > > Which means that they should not have the Hive dialect limitation, so > >> I’m > inclined that the behaviors for SQL-CLI is unexpected, or a “bug” that > >>> need > to fix. > > We did not make a conclusion for the syntax: > > - CREATE TABLE … PARTITIONED BY ... > > Which means that the behavior of it is under-discussion, so it is okey > >> to > be without the HIVE dialect limitation, we do not actually have any > >> table > sources/sinks that support such a DDL so for current code base, users > should not be influenced by the behaviors change. > > So I’m > > +1 to remove the hive dialect limitations for INSERT OVERWRITE and > >> INSERT > PARTITION > +0 to add yaml dialect conf to SQL-CLI because FLIP-89 is not finished > yet, we better do this until FLIP-89 is resolved. > > Best, > Danny Chan > 在 2019年12月11日 +0800 PM5:29,Jingsong Li ,写道: > > Hi Dev, > > > > After cutting
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi Jingsong, I will also add my opinion here for future discussions: We had long discussions around SQL syntax in the past (e.g. for WATERMARK or the concept of SYSTEM/TEMPORARY catalog objects) but in the end all parties were happy and we came up with a good long-term solution that is unlikely to be changed in the near future. IMHO we can differ from the standard esp. for DDL statements (every SQL vendors has custom syntax there) but still we should have time to hear many opinions. However, for DQL and DML statements we need to be even more cautious because here we enter SQL standard areas. Happy to discuss a partion syntax for 1.11 and go with the solution that Jark proposed using a config option. Thanks, Timo On 12.12.19 09:40, Jark Wu wrote: Hi Jingsong, Thanks for the explanation, I think I misunderstood your point at the begining. As FLIP-63 proposed, INSERT OVERWRITE and INSERT PARTITION syntax are added to Flink's SQL syntax, but CREATE PARTITION TABLE should be limited under Hive dialect. However, the current implementation is opposite, INSERT OVERWRITE and INSERT PARTITION are under the dialect limitation, but CREATE PARTITION TABLE is not. So it is indeed a bug which should be fixed in 1.10. Best, Jark On Thu, 12 Dec 2019 at 16:35, Jingsong Li wrote: Hi Jark, Let's recall FLIP-63, We supported these syntax in hive dialect at 1.9. All of my reasons for launching FLIP-63 are to bring partition support to Flink itself. Not only batch, but also we have the need to stream jobs to write partition files today, which is also one of our very important application scenarios. The original intention of FLIP-63 is to bring all partition syntax to Flink, but in the end you and I have some different opinion in creating partition table, so our consensus is to leave it in hive dialect, only it. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support Best, Jingsong Lee On Thu, Dec 12, 2019 at 4:05 PM Jark Wu wrote: Hi jingsong, Watermark is not a standard syntax, that's why we had a FLIP and long discussion to add it to Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and PARTITION syntax to Flink's own syntax, we also need a FLIP or a VOTE, and this may can't happen soon (we should hear more people's opinions on this). Regarding to the sql-dialect configuration, I was not saying to involve the whole FLIP-89. I mean we can just start a VOTE to expose it as `table.planner.sql-dialect` and include it in 1.10. The change can be very small, by adding a ConfigOption and changing the implementation of TableConfig#getSqlDialect/setSqlDialect. I believe it is smaller and safer than changing the parser. Btw, I cc'ed Yu Li and Gary into the discussion, because release managers should be aware of this. Best, Jark On Thu, 12 Dec 2019 at 11:47, Danny Chan wrote: Thanks Jingsong for bring up this discussion ~ After reviewing FLIP-63, it seems that we have made a conclusion for the syntax - INSERT OVERWRITE ... - INSERT INTO … PARTITION Which means that they should not have the Hive dialect limitation, so I’m inclined that the behaviors for SQL-CLI is unexpected, or a “bug” that need to fix. We did not make a conclusion for the syntax: - CREATE TABLE … PARTITIONED BY ... Which means that the behavior of it is under-discussion, so it is okey to be without the HIVE dialect limitation, we do not actually have any table sources/sinks that support such a DDL so for current code base, users should not be influenced by the behaviors change. So I’m +1 to remove the hive dialect limitations for INSERT OVERWRITE and INSERT PARTITION +0 to add yaml dialect conf to SQL-CLI because FLIP-89 is not finished yet, we better do this until FLIP-89 is resolved. Best, Danny Chan 在 2019年12月11日 +0800 PM5:29,Jingsong Li ,写道: Hi Dev, After cutting out the branch of 1.10, I tried the following functions of SQL-CLI and found that it does not support: - insert overwrite - PARTITION (partcol1=val1, partcol2=val2 ...) The SQL pattern is: INSERT { INTO | OVERWRITE } TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement1 FROM from_statement; It is a surprise to me. The reason is that we only allow these two grammars in hive dialect. And SQL-CLI does not have an interface to switch dialects. Because it directly hinders the SQL-CLI's insert syntax in hive integration and seriously hinders the practicability of SQL-CLI. And we have introduced these two grammars in FLIP-63 [1] to Flink. Here are my question: 1.Should we remove hive dialect limitation for these two grammars? 2.Should we fix this in 1.10? [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support Best, Jingsong Lee -- Best, Jingsong Lee
[jira] [Created] (FLINK-15225) LeaderChangeClusterComponentsTest#testReelectionOfDispatcher occasionally requires 30 seconds
Chesnay Schepler created FLINK-15225: Summary: LeaderChangeClusterComponentsTest#testReelectionOfDispatcher occasionally requires 30 seconds Key: FLINK-15225 URL: https://issues.apache.org/jira/browse/FLINK-15225 Project: Flink Issue Type: Improvement Components: Runtime / Coordination, Tests Affects Versions: 1.10.0 Reporter: Chesnay Schepler {code:java} 20845 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl - Starting the SlotManager. 20845 [mini-cluster-io-thread-1] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess - Recover all persisted job graphs. 20845 [mini-cluster-io-thread-1] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess - Successfully recovered 0 persisted job graphs. 20845 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request. 20845 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger heartbeat request. 20845 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService - Received confirmation of leadership for leader akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d , session=e73f31a7-c45f-4328-addd-3d7aa17fd083 20845 [mini-cluster-io-thread-1] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcherd5f8446f-b44d-4bcd-88b3-65105c1c6ca0 . 20845 [pool-1-thread-1] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway. 20845 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d(addd3d7aa17fd083e73f31a7c45f4328). 20845 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway. 20845 [pool-1-thread-1] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway. 20846 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Resolved ResourceManager address, beginning registration 20846 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d(addd3d7aa17fd083e73f31a7c45f4328). 20846 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at ResourceManager attempt 1 (timeout=100ms) 20846 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/resourcemanager9fa41f3a-7bd4-4f72-ae07-76b75499c88d. Returning a org.apache.flink.runtime.resourcemanager.ResourceManagerGateway gateway. 20846 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Resolved ResourceManager address, beginning registration 20846 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at ResourceManager attempt 1 (timeout=100ms) 20846 [flink-akka.actor.default-dispatcher-3] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/taskmanager_285. Returning a org.apache.flink.runtime.taskexecutor.TaskExecutorGateway gateway. 20846 [mini-cluster-io-thread-1] INFO org.apache.flink.runtime.highavailability.nonha.embedded.EmbeddedLeaderService - Received confirmation of leadership for leader akka://flink/user/dispatcherd5f8446f-b44d-4bcd-88b3-65105c1c6ca0 , session=d20872e6-0b96-43ea-9da2-7590a255e906 20846 [pool-1-thread-1] DEBUG org.apache.flink.runtime.rpc.akka.AkkaRpcService - Try to connect to remote RPC endpoint with address akka://flink/user/dispatcherd5f8446f-b44d-4bcd-88b3-65105c1c6ca0. Returning a org.apache.flink.runtime.dispatcher.Dispa
[jira] [Created] (FLINK-15224) Resource requirements are not respected when fulfilling a slot request with unresolvedRootSlots from a SlotSharingManager
Zhu Zhu created FLINK-15224: --- Summary: Resource requirements are not respected when fulfilling a slot request with unresolvedRootSlots from a SlotSharingManager Key: FLINK-15224 URL: https://issues.apache.org/jira/browse/FLINK-15224 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.11.0 Reporter: Zhu Zhu In {{SchedulerImpl#allocateMultiTaskSlot}}, if a slot request cannot be fulfilled immediately with a resolved root slot(MultiTaskSlot that is fulfilled by an allocated slot) or with available slots, it will be assigned to a random unresolved root slot. It does not do resource requirements check in this case, so a large task slot can be assigned to a small shared slot (unresolved root slot) and when the shared slot received its physical slot offer, it will be recognized as oversubscribing and the slot would be released and related tasks would fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15223) Csv connector should unescape delimiter parameter character
Jark Wu created FLINK-15223: --- Summary: Csv connector should unescape delimiter parameter character Key: FLINK-15223 URL: https://issues.apache.org/jira/browse/FLINK-15223 Project: Flink Issue Type: Bug Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Reporter: Jark Wu Fix For: 1.10.0 As described in documentation[1], a csv format can use {{'format.line-delimiter' = '\n'}} to specify line delimiter. However, the property value is parsed into two characters "\n" , this result to reading failed. There is no workaround for now, unless fix it. The delimiter should be unescaped, e.g. using {{StringEscapeUtils.unescapeJava}}. Note that both old csv and new csv have the same problem, and both {{field-delimiter}} and {{line-delimiter}}. [1]: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connect.html#old-csv-format -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15222) Move state benchmark utils into core repository
Yu Li created FLINK-15222: - Summary: Move state benchmark utils into core repository Key: FLINK-15222 URL: https://issues.apache.org/jira/browse/FLINK-15222 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: Yu Li Currently we're maintaining the state benchmark utils in the flink-benchmark repository instead of core repository, which not only make it hard to find out compatibility issues if state backend codes are refactored and will cause problems like FLINK-15199, but also disobeys the instructions of flink-benchmark project: {quote} Recommended code structure is to define all benchmarks in Apache Flink and only wrap them here, in this repository, into executor classes. Such code structured is due to using GPL2 licensed jmh library for the actual execution of the benchmarks. Ideally we would prefer to have all of the code moved to Apache Flink {quote} We will improve this and prevent future incompatible problem in this JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [RESULT] [VOTE] Release 1.8.3, release candidate #3
The upstream pull request[1] is open—usually they merge within a day and the images are available shortly thereafter. [1] https://github.com/docker-library/official-images/pull/7112 -- Patrick Lucas On Thu, Dec 12, 2019 at 8:11 AM Hequn Cheng wrote: > Hi Patrick, > > The release has been announced. > > +1 to integrate the publication of Docker images into the Flink release > process. Thus we can leverage the current release procedure for the docker > image. > Looking forward to the proposal. > > Best, Hequn > > On Thu, Dec 12, 2019 at 1:52 PM Yang Wang wrote: > >> Hi Lucas, >> >> That's great if we could integrate the publication of Flink official >> docker >> images into >> the Flink release process. Since many users are using or starting to use >> Flink in >> container environments. >> >> >> Best, >> Yang >> >> Patrick Lucas 于2019年12月11日周三 下午11:44写道: >> >> > Thanks, Hequn! >> > >> > The Dockerfiles for the Flink images on Docker Hub for the 1.8.3 release >> > are prepared[1] and I'll open a pull request upstream[2] once the >> release >> > announcement has gone out. >> > >> > And stay tuned: I'm working on a proposal for integrating the >> publication >> > of these Docker images into the Flink release process and will send it >> out >> > to the dev list before or shortly after the holidays. >> > >> > [1] >> > >> > >> https://github.com/docker-flink/docker-flink/commit/4d85b71b7cf9fa4a38f8682ed13aa0f55445e32e >> > [2] https://github.com/docker-library/official-images/ >> > >> > On Wed, Dec 11, 2019 at 3:30 AM Hequn Cheng >> wrote: >> > >> > > Hi everyone, >> > > >> > > I'm happy to announce that we have unanimously approved this release. >> > > >> > > There are 10 approving votes, 3 of which are binding: >> > > * Jincheng (binding) >> > > * Ufuk (binding) >> > > * Till (binding) >> > > * Jingsong >> > > * Fabian >> > > * Danny >> > > * Yang Wang >> > > * Dian >> > > * Wei >> > > * Hequn >> > > >> > > There are no disapproving votes. >> > > >> > > Thanks everyone! >> > > >> > >> >
[jira] [Created] (FLINK-15221) supporting Exactly-once for table APi
chaiyongqiang created FLINK-15221: - Summary: supporting Exactly-once for table APi Key: FLINK-15221 URL: https://issues.apache.org/jira/browse/FLINK-15221 Project: Flink Issue Type: Wish Components: Table SQL / API Affects Versions: 1.9.0 Reporter: chaiyongqiang The Table Api doesn't support End to End Exactly once sematic like datastream Api. Does Flink have a plan for this? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
Thanks for your explanation. I think the proposal is reasonable. On Thu, Dec 12, 2019 at 3:32 AM Yangze Guo wrote: > Thanks for the feedback, Gary. > > Regarding the WordCount test: > - True. There is no test coverage increment compared to others. > However, I think each test case better not have multiple purposes so > that we could find out the root cause quickly. As discussed in > FLINK-15135[1], I prefer only including WordCount test as the first > step. If the time overhead of E2E tests become severe in the future, I > agree to remove it. WDYT? > - I think the main overhead comes from building the image. The > subsequent tests will run fast since they will not build it again. > > Regarding the Rocks test, I think it is a typical scenario using > off-heap memory. The main purpose is to verify the memory usage and > memory configuration in Mesos mode. Two typical use cases are off-heap > and on-heap. Thus, I think the following two test cases are valuable > to be included: > - A streaming task using heap backend. It should explicitly set the > “taskmanager.memory.managed.size” to zero to check the potential > unexpected usage of off-heap memory. > - A streaming task using rocks backend. It covers the scenario using > off-heap memory. > > Look forward to your kind feedback. > > [1]https://issues.apache.org/jira/browse/FLINK-15135 > > Best, > Yangze Guo > > > > On Wed, Dec 11, 2019 at 6:14 PM Gary Yao wrote: > > > > Thanks for driving this effort. Also +1 from my side. I have left a few > > questions below. > > > > > - Wordcount end-to-end test. For verifying the basic process of Mesos > > > deployment. > > > > Would this add additional test coverage compared to the > > "multiple submissions" test case? I am asking because the E2E tests are > > already > > expensive to run, and adding new tests should be carefully considered. > > > > > - State TTL RocksDb backend end-to-end test. For verifying memory > > > configuration behaviors, since Mesos has it’s own config options and > > > logics. > > > > Can you elaborate more on this? Which config options are relevant here? > > > > On Wed, Dec 11, 2019 at 9:58 AM Till Rohrmann > wrote: > > > > > +1 for building the image locally. If need should arise, then we could > > > change it always later. > > > > > > Cheers, > > > Till > > > > > > On Wed, Dec 11, 2019 at 4:05 AM Xintong Song > > > wrote: > > > > > > > Thanks, Yangtze. > > > > > > > > +1 for building the image locally. > > > > The time consumption for both building image locally and pulling it > from > > > > DockerHub sounds reasonable and affordable. Therefore, I'm also in > favor > > > of > > > > avoiding the cost maintaining a custom image. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo > wrote: > > > > > > > > > Thanks for the feedback, Yang. > > > > > > > > > > Some updates I want to share in this thread. > > > > > I have built a PoC version of Meos e2e test with WordCount > > > > > workflow.[1] Then, I ran it in the testing environment. As the > result > > > > > shown here[2]: > > > > > - For pulling image from DockerHub, it took 1 minute and 21 seconds > > > > > - For building it locally, it took 2 minutes and 54 seconds. > > > > > > > > > > I prefer building it locally. Although it is slower, I think the > time > > > > > overhead, comparing to the cost of maintaining the image in > DockerHub > > > > > and the whole test process, is trivial for building or pulling the > > > > > image. > > > > > > > > > > I look forward to hearing from you. ;) > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > [1] > > > > > > > > > > > > > https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0 > > > > > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957 > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang > > > wrote: > > > > > > > > > > > > Thanks Yangze for starting this discussion. > > > > > > > > > > > > Just share my thoughts. > > > > > > > > > > > > If the mesos official docker image could not meet our > requirement, i > > > > > suggest to build the image locally. > > > > > > We have done the same things for yarn e2e tests. This way is more > > > > > flexible and easy to maintain. However, > > > > > > i have no idea how long building the mesos image locally will > take. > > > > > Based on previous experience of yarn, i > > > > > > think it may not take too much time. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Yangze Guo 于2019年12月7日周六 下午4:25写道: > > > > > >> > > > > > >> Thanks for your feedback! > > > > > >> > > > > > >> @Till > > > > > >> Regarding the time overhead, I think it mainly come from the > network > > > > > >> transmission. For building the image locally, it will totally > > > download > > > > > >> 260MB files including the base image and packages. For pulling > from >
[jira] [Created] (FLINK-15220) Add startFromTimestamp in KafkaTableSource
Paul Lin created FLINK-15220: Summary: Add startFromTimestamp in KafkaTableSource Key: FLINK-15220 URL: https://issues.apache.org/jira/browse/FLINK-15220 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Reporter: Paul Lin KafkaTableSource supports all startup modes in DataStream API except `startFromTimestamp`, but it's a common and valid use case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15219) LocalEnvironment is not initializing plugins
Arvid Heise created FLINK-15219: --- Summary: LocalEnvironment is not initializing plugins Key: FLINK-15219 URL: https://issues.apache.org/jira/browse/FLINK-15219 Project: Flink Issue Type: Improvement Reporter: Arvid Heise -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation
A quick idea is that we separate the deployment from user program that it has always been done outside the program. On user program executed there is always a ClusterClient that communicates with an existing cluster, remote or local. It will be another thread so just for your information. Best, tison. tison 于2019年12月12日周四 下午4:40写道: > Hi Peter, > > Another concern I realized recently is that with current Executors > abstraction(FLIP-73) > I'm afraid that user program is designed to ALWAYS run on the client side. > Specifically, > we deploy the job in executor when env.execute called. This abstraction > possibly prevents > Flink runs user program on the cluster side. > > For your proposal, in this case we already compiled the program and run on > the client side, > even we deploy a cluster and retrieve job graph from program metadata, it > doesn't make > many sense. > > cc Aljoscha & Kostas what do you think about this constraint? > > Best, > tison. > > > Peter Huang 于2019年12月10日周二 下午12:45写道: > >> Hi Tison, >> >> Yes, you are right. I think I made the wrong argument in the doc. >> Basically, the packaging jar problem is only for platform users. In our >> internal deploy service, >> we further optimized the deployment latency by letting users to packaging >> flink-runtime together with the uber jar, so that we don't need to >> consider >> multiple flink version >> support for now. In the session client mode, as Flink libs will be shipped >> anyway as local resources of yarn. Users actually don't need to package >> those libs into job jar. >> >> >> >> Best Regards >> Peter Huang >> >> On Mon, Dec 9, 2019 at 8:35 PM tison wrote: >> >> > > 3. What do you mean about the package? Do users need to compile their >> > jars >> > inlcuding flink-clients, flink-optimizer, flink-table codes? >> > >> > The answer should be no because they exist in system classpath. >> > >> > Best, >> > tison. >> > >> > >> > Yang Wang 于2019年12月10日周二 下午12:18写道: >> > >> > > Hi Peter, >> > > >> > > Thanks a lot for starting this discussion. I think this is a very >> useful >> > > feature. >> > > >> > > Not only for Yarn, i am focused on flink on Kubernetes integration and >> > come >> > > across the same >> > > problem. I do not want the job graph generated on client side. >> Instead, >> > the >> > > user jars are built in >> > > a user-defined image. When the job manager launched, we just need to >> > > generate the job graph >> > > based on local user jars. >> > > >> > > I have some small suggestion about this. >> > > >> > > 1. `ProgramJobGraphRetriever` is very similar to >> > > `ClasspathJobGraphRetriever`, the differences >> > > are the former needs `ProgramMetadata` and the latter needs some >> > arguments. >> > > Is it possible to >> > > have an unified `JobGraphRetriever` to support both? >> > > 2. Is it possible to not use a local user jar to start a per-job >> cluster? >> > > In your case, the user jars has >> > > existed on hdfs already and we do need to download the jars to >> deployer >> > > service. Currently, we >> > > always need a local user jar to start a flink cluster. It is be great >> if >> > we >> > > could support remote user jars. >> > > >> In the implementation, we assume users package flink-clients, >> > > flink-optimizer, flink-table together within the job jar. Otherwise, >> the >> > > job graph generation within JobClusterEntryPoint will fail. >> > > 3. What do you mean about the package? Do users need to compile their >> > jars >> > > inlcuding flink-clients, flink-optimizer, flink-table codes? >> > > >> > > >> > > >> > > Best, >> > > Yang >> > > >> > > Peter Huang 于2019年12月10日周二 上午2:37写道: >> > > >> > > > Dear All, >> > > > >> > > > Recently, the Flink community starts to improve the yarn cluster >> > > descriptor >> > > > to make job jar and config files configurable from CLI. It improves >> the >> > > > flexibility of Flink deployment Yarn Per Job Mode. For platform >> users >> > > who >> > > > manage tens of hundreds of streaming pipelines for the whole org or >> > > > company, we found the job graph generation in client-side is another >> > > > pinpoint. Thus, we want to propose a configurable feature for >> > > > FlinkYarnSessionCli. The feature can allow users to choose the job >> > graph >> > > > generation in Flink ClusterEntryPoint so that the job jar doesn't >> need >> > to >> > > > be locally for the job graph generation. The proposal is organized >> as a >> > > > FLIP >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation >> > > > . >> > > > >> > > > Any questions and suggestions are welcomed. Thank you in advance. >> > > > >> > > > >> > > > Best Regards >> > > > Peter Huang >> > > > >> > > >> > >> >
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi Jingsong, Thanks for the explanation, I think I misunderstood your point at the begining. As FLIP-63 proposed, INSERT OVERWRITE and INSERT PARTITION syntax are added to Flink's SQL syntax, but CREATE PARTITION TABLE should be limited under Hive dialect. However, the current implementation is opposite, INSERT OVERWRITE and INSERT PARTITION are under the dialect limitation, but CREATE PARTITION TABLE is not. So it is indeed a bug which should be fixed in 1.10. Best, Jark On Thu, 12 Dec 2019 at 16:35, Jingsong Li wrote: > Hi Jark, > > Let's recall FLIP-63, > We supported these syntax in hive dialect at 1.9. All of my reasons for > launching FLIP-63 are to bring partition support to Flink itself. > Not only batch, but also we have the need to stream jobs to write partition > files today, which is also one of our very important application scenarios. > > The original intention of FLIP-63 is to bring all partition syntax to > Flink, but in the end you and I have some different opinion in creating > partition table, so our consensus is to leave it in hive dialect, only it. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > Best, > Jingsong Lee > > On Thu, Dec 12, 2019 at 4:05 PM Jark Wu wrote: > > > Hi jingsong, > > > > Watermark is not a standard syntax, that's why we had a FLIP and long > > discussion to add it to > > Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and > > PARTITION syntax to > > Flink's own syntax, we also need a FLIP or a VOTE, and this may can't > > happen soon (we should > > hear more people's opinions on this). > > > > Regarding to the sql-dialect configuration, I was not saying to involve > the > > whole FLIP-89. I mean we can just > > start a VOTE to expose it as `table.planner.sql-dialect` and include it > in > > 1.10. The change can be very small, by > > adding a ConfigOption and changing the implementation of > > TableConfig#getSqlDialect/setSqlDialect. I believe > > it is smaller and safer than changing the parser. > > > > Btw, I cc'ed Yu Li and Gary into the discussion, because release managers > > should be aware of this. > > > > Best, > > Jark > > > > > > > > On Thu, 12 Dec 2019 at 11:47, Danny Chan wrote: > > > > > Thanks Jingsong for bring up this discussion ~ > > > > > > After reviewing FLIP-63, it seems that we have made a conclusion for > the > > > syntax > > > > > > - INSERT OVERWRITE ... > > > - INSERT INTO … PARTITION > > > > > > Which means that they should not have the Hive dialect limitation, so > I’m > > > inclined that the behaviors for SQL-CLI is unexpected, or a “bug” that > > need > > > to fix. > > > > > > We did not make a conclusion for the syntax: > > > > > > - CREATE TABLE … PARTITIONED BY ... > > > > > > Which means that the behavior of it is under-discussion, so it is okey > to > > > be without the HIVE dialect limitation, we do not actually have any > table > > > sources/sinks that support such a DDL so for current code base, users > > > should not be influenced by the behaviors change. > > > > > > So I’m > > > > > > +1 to remove the hive dialect limitations for INSERT OVERWRITE and > INSERT > > > PARTITION > > > +0 to add yaml dialect conf to SQL-CLI because FLIP-89 is not finished > > > yet, we better do this until FLIP-89 is resolved. > > > > > > Best, > > > Danny Chan > > > 在 2019年12月11日 +0800 PM5:29,Jingsong Li ,写道: > > > > Hi Dev, > > > > > > > > After cutting out the branch of 1.10, I tried the following functions > > of > > > > SQL-CLI and found that it does not support: > > > > - insert overwrite > > > > - PARTITION (partcol1=val1, partcol2=val2 ...) > > > > The SQL pattern is: > > > > INSERT { INTO | OVERWRITE } TABLE tablename1 [PARTITION > (partcol1=val1, > > > > partcol2=val2 ...) select_statement1 FROM from_statement; > > > > It is a surprise to me. > > > > The reason is that we only allow these two grammars in hive dialect. > > And > > > > SQL-CLI does not have an interface to switch dialects. > > > > > > > > Because it directly hinders the SQL-CLI's insert syntax in hive > > > integration > > > > and seriously hinders the practicability of SQL-CLI. > > > > And we have introduced these two grammars in FLIP-63 [1] to Flink. > > > > Here are my question: > > > > 1.Should we remove hive dialect limitation for these two grammars? > > > > 2.Should we fix this in 1.10? > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > > > > > > > Best, > > > > Jingsong Lee > > > > > > > > -- > Best, Jingsong Lee >
Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation
Hi Peter, Another concern I realized recently is that with current Executors abstraction(FLIP-73) I'm afraid that user program is designed to ALWAYS run on the client side. Specifically, we deploy the job in executor when env.execute called. This abstraction possibly prevents Flink runs user program on the cluster side. For your proposal, in this case we already compiled the program and run on the client side, even we deploy a cluster and retrieve job graph from program metadata, it doesn't make many sense. cc Aljoscha & Kostas what do you think about this constraint? Best, tison. Peter Huang 于2019年12月10日周二 下午12:45写道: > Hi Tison, > > Yes, you are right. I think I made the wrong argument in the doc. > Basically, the packaging jar problem is only for platform users. In our > internal deploy service, > we further optimized the deployment latency by letting users to packaging > flink-runtime together with the uber jar, so that we don't need to consider > multiple flink version > support for now. In the session client mode, as Flink libs will be shipped > anyway as local resources of yarn. Users actually don't need to package > those libs into job jar. > > > > Best Regards > Peter Huang > > On Mon, Dec 9, 2019 at 8:35 PM tison wrote: > > > > 3. What do you mean about the package? Do users need to compile their > > jars > > inlcuding flink-clients, flink-optimizer, flink-table codes? > > > > The answer should be no because they exist in system classpath. > > > > Best, > > tison. > > > > > > Yang Wang 于2019年12月10日周二 下午12:18写道: > > > > > Hi Peter, > > > > > > Thanks a lot for starting this discussion. I think this is a very > useful > > > feature. > > > > > > Not only for Yarn, i am focused on flink on Kubernetes integration and > > come > > > across the same > > > problem. I do not want the job graph generated on client side. Instead, > > the > > > user jars are built in > > > a user-defined image. When the job manager launched, we just need to > > > generate the job graph > > > based on local user jars. > > > > > > I have some small suggestion about this. > > > > > > 1. `ProgramJobGraphRetriever` is very similar to > > > `ClasspathJobGraphRetriever`, the differences > > > are the former needs `ProgramMetadata` and the latter needs some > > arguments. > > > Is it possible to > > > have an unified `JobGraphRetriever` to support both? > > > 2. Is it possible to not use a local user jar to start a per-job > cluster? > > > In your case, the user jars has > > > existed on hdfs already and we do need to download the jars to deployer > > > service. Currently, we > > > always need a local user jar to start a flink cluster. It is be great > if > > we > > > could support remote user jars. > > > >> In the implementation, we assume users package flink-clients, > > > flink-optimizer, flink-table together within the job jar. Otherwise, > the > > > job graph generation within JobClusterEntryPoint will fail. > > > 3. What do you mean about the package? Do users need to compile their > > jars > > > inlcuding flink-clients, flink-optimizer, flink-table codes? > > > > > > > > > > > > Best, > > > Yang > > > > > > Peter Huang 于2019年12月10日周二 上午2:37写道: > > > > > > > Dear All, > > > > > > > > Recently, the Flink community starts to improve the yarn cluster > > > descriptor > > > > to make job jar and config files configurable from CLI. It improves > the > > > > flexibility of Flink deployment Yarn Per Job Mode. For platform > users > > > who > > > > manage tens of hundreds of streaming pipelines for the whole org or > > > > company, we found the job graph generation in client-side is another > > > > pinpoint. Thus, we want to propose a configurable feature for > > > > FlinkYarnSessionCli. The feature can allow users to choose the job > > graph > > > > generation in Flink ClusterEntryPoint so that the job jar doesn't > need > > to > > > > be locally for the job graph generation. The proposal is organized > as a > > > > FLIP > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation > > > > . > > > > > > > > Any questions and suggestions are welcomed. Thank you in advance. > > > > > > > > > > > > Best Regards > > > > Peter Huang > > > > > > > > > >
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi Jark, Let's recall FLIP-63, We supported these syntax in hive dialect at 1.9. All of my reasons for launching FLIP-63 are to bring partition support to Flink itself. Not only batch, but also we have the need to stream jobs to write partition files today, which is also one of our very important application scenarios. The original intention of FLIP-63 is to bring all partition syntax to Flink, but in the end you and I have some different opinion in creating partition table, so our consensus is to leave it in hive dialect, only it. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support Best, Jingsong Lee On Thu, Dec 12, 2019 at 4:05 PM Jark Wu wrote: > Hi jingsong, > > Watermark is not a standard syntax, that's why we had a FLIP and long > discussion to add it to > Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and > PARTITION syntax to > Flink's own syntax, we also need a FLIP or a VOTE, and this may can't > happen soon (we should > hear more people's opinions on this). > > Regarding to the sql-dialect configuration, I was not saying to involve the > whole FLIP-89. I mean we can just > start a VOTE to expose it as `table.planner.sql-dialect` and include it in > 1.10. The change can be very small, by > adding a ConfigOption and changing the implementation of > TableConfig#getSqlDialect/setSqlDialect. I believe > it is smaller and safer than changing the parser. > > Btw, I cc'ed Yu Li and Gary into the discussion, because release managers > should be aware of this. > > Best, > Jark > > > > On Thu, 12 Dec 2019 at 11:47, Danny Chan wrote: > > > Thanks Jingsong for bring up this discussion ~ > > > > After reviewing FLIP-63, it seems that we have made a conclusion for the > > syntax > > > > - INSERT OVERWRITE ... > > - INSERT INTO … PARTITION > > > > Which means that they should not have the Hive dialect limitation, so I’m > > inclined that the behaviors for SQL-CLI is unexpected, or a “bug” that > need > > to fix. > > > > We did not make a conclusion for the syntax: > > > > - CREATE TABLE … PARTITIONED BY ... > > > > Which means that the behavior of it is under-discussion, so it is okey to > > be without the HIVE dialect limitation, we do not actually have any table > > sources/sinks that support such a DDL so for current code base, users > > should not be influenced by the behaviors change. > > > > So I’m > > > > +1 to remove the hive dialect limitations for INSERT OVERWRITE and INSERT > > PARTITION > > +0 to add yaml dialect conf to SQL-CLI because FLIP-89 is not finished > > yet, we better do this until FLIP-89 is resolved. > > > > Best, > > Danny Chan > > 在 2019年12月11日 +0800 PM5:29,Jingsong Li ,写道: > > > Hi Dev, > > > > > > After cutting out the branch of 1.10, I tried the following functions > of > > > SQL-CLI and found that it does not support: > > > - insert overwrite > > > - PARTITION (partcol1=val1, partcol2=val2 ...) > > > The SQL pattern is: > > > INSERT { INTO | OVERWRITE } TABLE tablename1 [PARTITION (partcol1=val1, > > > partcol2=val2 ...) select_statement1 FROM from_statement; > > > It is a surprise to me. > > > The reason is that we only allow these two grammars in hive dialect. > And > > > SQL-CLI does not have an interface to switch dialects. > > > > > > Because it directly hinders the SQL-CLI's insert syntax in hive > > integration > > > and seriously hinders the practicability of SQL-CLI. > > > And we have introduced these two grammars in FLIP-63 [1] to Flink. > > > Here are my question: > > > 1.Should we remove hive dialect limitation for these two grammars? > > > 2.Should we fix this in 1.10? > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > > > > > Best, > > > Jingsong Lee > > > -- Best, Jingsong Lee
Re: [DISCUSS] Overwrite and partition inserting support in 1.10
Hi jingsong, Watermark is not a standard syntax, that's why we had a FLIP and long discussion to add it to Flink's SQL syntax. I think if we want to add INSERT OVERWRITE and PARTITION syntax to Flink's own syntax, we also need a FLIP or a VOTE, and this may can't happen soon (we should hear more people's opinions on this). Regarding to the sql-dialect configuration, I was not saying to involve the whole FLIP-89. I mean we can just start a VOTE to expose it as `table.planner.sql-dialect` and include it in 1.10. The change can be very small, by adding a ConfigOption and changing the implementation of TableConfig#getSqlDialect/setSqlDialect. I believe it is smaller and safer than changing the parser. Btw, I cc'ed Yu Li and Gary into the discussion, because release managers should be aware of this. Best, Jark On Thu, 12 Dec 2019 at 11:47, Danny Chan wrote: > Thanks Jingsong for bring up this discussion ~ > > After reviewing FLIP-63, it seems that we have made a conclusion for the > syntax > > - INSERT OVERWRITE ... > - INSERT INTO … PARTITION > > Which means that they should not have the Hive dialect limitation, so I’m > inclined that the behaviors for SQL-CLI is unexpected, or a “bug” that need > to fix. > > We did not make a conclusion for the syntax: > > - CREATE TABLE … PARTITIONED BY ... > > Which means that the behavior of it is under-discussion, so it is okey to > be without the HIVE dialect limitation, we do not actually have any table > sources/sinks that support such a DDL so for current code base, users > should not be influenced by the behaviors change. > > So I’m > > +1 to remove the hive dialect limitations for INSERT OVERWRITE and INSERT > PARTITION > +0 to add yaml dialect conf to SQL-CLI because FLIP-89 is not finished > yet, we better do this until FLIP-89 is resolved. > > Best, > Danny Chan > 在 2019年12月11日 +0800 PM5:29,Jingsong Li ,写道: > > Hi Dev, > > > > After cutting out the branch of 1.10, I tried the following functions of > > SQL-CLI and found that it does not support: > > - insert overwrite > > - PARTITION (partcol1=val1, partcol2=val2 ...) > > The SQL pattern is: > > INSERT { INTO | OVERWRITE } TABLE tablename1 [PARTITION (partcol1=val1, > > partcol2=val2 ...) select_statement1 FROM from_statement; > > It is a surprise to me. > > The reason is that we only allow these two grammars in hive dialect. And > > SQL-CLI does not have an interface to switch dialects. > > > > Because it directly hinders the SQL-CLI's insert syntax in hive > integration > > and seriously hinders the practicability of SQL-CLI. > > And we have introduced these two grammars in FLIP-63 [1] to Flink. > > Here are my question: > > 1.Should we remove hive dialect limitation for these two grammars? > > 2.Should we fix this in 1.10? > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > > > Best, > > Jingsong Lee >
Re: [ANNOUNCE] Apache Flink 1.8.3 released
Thanks Hequn for driving the release and everyone who makes this release possible! Thanks, Zhu Zhu Wei Zhong 于2019年12月12日周四 下午3:45写道: > Thanks Hequn for being the release manager. Great work! > > Best, > Wei > > 在 2019年12月12日,15:27,Jingsong Li 写道: > > Thanks Hequn for your driving, 1.8.3 fixed a lot of issues and it is very > useful to users. > Great work! > > Best, > Jingsong Lee > > On Thu, Dec 12, 2019 at 3:25 PM jincheng sun > wrote: > >> Thanks for being the release manager and the great work Hequn :) >> Also thanks to the community making this release possible! >> >> Best, >> Jincheng >> >> Jark Wu 于2019年12月12日周四 下午3:23写道: >> >>> Thanks Hequn for helping out this release and being the release manager. >>> Great work! >>> >>> Best, >>> Jark >>> >>> On Thu, 12 Dec 2019 at 15:02, Jeff Zhang wrote: >>> >>> > Great work, Hequn >>> > >>> > Dian Fu 于2019年12月12日周四 下午2:32写道: >>> > >>> >> Thanks Hequn for being the release manager and everyone who >>> contributed >>> >> to this release. >>> >> >>> >> Regards, >>> >> Dian >>> >> >>> >> 在 2019年12月12日,下午2:24,Hequn Cheng 写道: >>> >> >>> >> Hi, >>> >> >>> >> The Apache Flink community is very happy to announce the release of >>> >> Apache Flink 1.8.3, which is the third bugfix release for the Apache >>> Flink >>> >> 1.8 series. >>> >> >>> >> Apache Flink® is an open-source stream processing framework for >>> >> distributed, high-performing, always-available, and accurate data >>> streaming >>> >> applications. >>> >> >>> >> The release is available for download at: >>> >> https://flink.apache.org/downloads.html >>> >> >>> >> Please check out the release blog post for an overview of the >>> >> improvements for this bugfix release: >>> >> https://flink.apache.org/news/2019/12/11/release-1.8.3.html >>> >> >>> >> The full release notes are available in Jira: >>> >> https://issues.apache.org/jira/projects/FLINK/versions/12346112 >>> >> >>> >> We would like to thank all contributors of the Apache Flink community >>> who >>> >> made this release possible! >>> >> Great thanks to @Jincheng as a mentor during this release. >>> >> >>> >> Regards, >>> >> Hequn >>> >> >>> >> >>> >> >>> > >>> > -- >>> > Best Regards >>> > >>> > Jeff Zhang >>> > >>> >> > > -- > Best, Jingsong Lee > > >