Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs
Just share some of insights from operating SparkML side at scale - map reduce may not best way to iterative sync partitioned workers. - native hardware accelerations is key to adopt rapid changes in ML improvements in foreseeable future. Chen On Apr 29, 2019, at 11:02, jincheng sun wrote: > > Hi Shaoxuan, > > Thanks for doing more efforts for the enhances of the scalability and the > ease of use of Flink ML and make it one step further. Thank you for sharing > a lot of context information. > > big +1 for this proposal! > > Here only one suggestion, that is, It has been a short time until the > release of flink-1.9, so I recommend It's better to add a detailed > implementation plan to FLIP and google doc. > > What do you think? > > Best, > Jincheng > > Shaoxuan Wang 于2019年4月29日周一 上午10:34写道: > >> Hi everyone, >> >> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several >> months ago in this mail thread: >> >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html >> >> Luogen, Becket, Xu, Weihua and I have been working on this proposal >> offline in >> the past a few months. Now we want to share the first phase of the entire >> proposal with a FLIP. In this FLIP-39, we want to achieve several things >> (and hope those can be accomplished and released in Flink-1.9): >> >> - >> >> Provide a new set of ML core interface (on top of Flink TableAPI) >> - >> >> Provide a ML pipeline interface (on top of Flink TableAPI) >> - >> >> Provide the interfaces for parameters management and pipeline/mode >> persistence >> - >> >> All the above interfaces should facilitate any new ML algorithm. We will >> gradually add various standard ML algorithms on top of these new >> proposed >> interfaces to ensure their feasibility and scalability. >> >> >> Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by >> Xu and Me. >> >> >> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api >> >> >> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink >> >> You can find the videos & slides at >> https://www.ververica.com/flink-forward-san-francisco-2019 >> >> The design document for FLIP-39 can be found here: >> >> >> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo >> >> >> I am looking forward to your feedback. >> >> Regards, >> >> Shaoxuan >>
[jira] [Created] (FLINK-12361) Remove useless expression from runtime scheduler
Liya Fan created FLINK-12361: Summary: Remove useless expression from runtime scheduler Key: FLINK-12361 URL: https://issues.apache.org/jira/browse/FLINK-12361 Project: Flink Issue Type: Improvement Components: Runtime / Operators Reporter: Liya Fan Assignee: Liya Fan Attachments: image-2019-04-29-11-16-13-492.png In the scheduleTask method of Scheduler class, expression forceExternalLocation is useless, since it always evaluates to false: !image-2019-04-29-11-16-13-492.png! So it can be removed. Moreover, by removing this expression, the code structure can be made much simpler, because there are some branches relying this expression, which can also be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs
Hi Shaoxuan, Thanks for doing more efforts for the enhances of the scalability and the ease of use of Flink ML and make it one step further. Thank you for sharing a lot of context information. big +1 for this proposal! Here only one suggestion, that is, It has been a short time until the release of flink-1.9, so I recommend It's better to add a detailed implementation plan to FLIP and google doc. What do you think? Best, Jincheng Shaoxuan Wang 于2019年4月29日周一 上午10:34写道: > Hi everyone, > > Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several > months ago in this mail thread: > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html > > Luogen, Becket, Xu, Weihua and I have been working on this proposal > offline in > the past a few months. Now we want to share the first phase of the entire > proposal with a FLIP. In this FLIP-39, we want to achieve several things > (and hope those can be accomplished and released in Flink-1.9): > >- > >Provide a new set of ML core interface (on top of Flink TableAPI) >- > >Provide a ML pipeline interface (on top of Flink TableAPI) >- > >Provide the interfaces for parameters management and pipeline/mode >persistence >- > >All the above interfaces should facilitate any new ML algorithm. We will >gradually add various standard ML algorithms on top of these new > proposed >interfaces to ensure their feasibility and scalability. > > > Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by > Xu and Me. > > > https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api > > > https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink > > You can find the videos & slides at > https://www.ververica.com/flink-forward-san-francisco-2019 > > The design document for FLIP-39 can be found here: > > > https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo > > > I am looking forward to your feedback. > > Regards, > > Shaoxuan >
[DISCUSS] FLIP-39: Flink ML pipeline and ML libs
Hi everyone, Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several months ago in this mail thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html Luogen, Becket, Xu, Weihua and I have been working on this proposal offline in the past a few months. Now we want to share the first phase of the entire proposal with a FLIP. In this FLIP-39, we want to achieve several things (and hope those can be accomplished and released in Flink-1.9): - Provide a new set of ML core interface (on top of Flink TableAPI) - Provide a ML pipeline interface (on top of Flink TableAPI) - Provide the interfaces for parameters management and pipeline/mode persistence - All the above interfaces should facilitate any new ML algorithm. We will gradually add various standard ML algorithms on top of these new proposed interfaces to ensure their feasibility and scalability. Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by Xu and Me. https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink You can find the videos & slides at https://www.ververica.com/flink-forward-san-francisco-2019 The design document for FLIP-39 can be found here: https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo I am looking forward to your feedback. Regards, Shaoxuan
Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment
Sorry, I didn't see the original message already mentioned in these configurations. But it do works for me. I put these into flink-conf.yml, not try the way you did. On Mon, Apr 29, 2019 at 9:45 AM Yaoting Gong wrote: > Hi, @胡逸才, I've met the same problem. Add some configs as blow will help > you . > > env.java.home: /usr/jdk1.8.0_51containerized.master.env.JAVA_HOME: > /usr/jdk1.8.0_51containerized.taskmanager.env.JAVA_HOME: > /usr/jdk1.8.0_51yarn.taskmanager.env.JAVA_HOME: > /usr/jdk1.8.0_51 > > > > On Sun, Apr 28, 2019 at 1:58 PM 张军 wrote: > >> Thank you for your reminder, I will pay attention to this issue in the >> future. >> >> I read some flink source code and saw that there are a lot of new >> features of java8, such as CompletableFuture and Lambda expressions, which >> causes flink do not run in jdk 1.7 environment, so you may need to upgrade >> your jdk to 1.8. >> >> >> >> >> > 在 2019年4月28日,上午8:24,126 写道: >> > >> > Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的 >> > >> > 发自我的 iPhone >> > >> >> 在 2019年4月26日,17:48,胡逸才 写道: >> >> >> >> At present, all YARN clusters adopt JAVA 7 environment. >> >> >> >> While trying to use FLINK to handle the deployment of flow processing >> business scenarios, it was found that FLINK ON YARN mode always failed to >> perform a session task. The application log of YARN shows Unsupported >> major. minor version 52.0. >> >> >> >> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of >> the mailing list solution. And the startup command adds -yD yarn. >> taskmanager. env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env. >> JAVA_HOME= < JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= < >> JDK1.8PATH>. Flink session cluster in YARN can not run Application in JAVA >> 8 environment. >> >> >> >> So can I use Flink1.7.X submit Flink session cluster application in >> YARN under JAVA 7 environment? >> >> >> >> >> >> >> >>
Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment
Hi, @胡逸才, I've met the same problem. Add some configs as blow will help you . env.java.home: /usr/jdk1.8.0_51containerized.master.env.JAVA_HOME: /usr/jdk1.8.0_51containerized.taskmanager.env.JAVA_HOME: /usr/jdk1.8.0_51yarn.taskmanager.env.JAVA_HOME: /usr/jdk1.8.0_51 On Sun, Apr 28, 2019 at 1:58 PM 张军 wrote: > Thank you for your reminder, I will pay attention to this issue in the > future. > > I read some flink source code and saw that there are a lot of new features > of java8, such as CompletableFuture and Lambda expressions, which causes > flink do not run in jdk 1.7 environment, so you may need to upgrade your > jdk to 1.8. > > > > > > 在 2019年4月28日,上午8:24,126 写道: > > > > Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的 > > > > 发自我的 iPhone > > > >> 在 2019年4月26日,17:48,胡逸才 写道: > >> > >> At present, all YARN clusters adopt JAVA 7 environment. > >> > >> While trying to use FLINK to handle the deployment of flow processing > business scenarios, it was found that FLINK ON YARN mode always failed to > perform a session task. The application log of YARN shows Unsupported > major. minor version 52.0. > >> > >> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of > the mailing list solution. And the startup command adds -yD yarn. > taskmanager. env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env. > JAVA_HOME= < JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= < > JDK1.8PATH>. Flink session cluster in YARN can not run Application in JAVA > 8 environment. > >> > >> So can I use Flink1.7.X submit Flink session cluster application in > YARN under JAVA 7 environment? > >> > >> > >> > >
[jira] [Created] (FLINK-12360) Translate "Jobs and Scheduling" Page to Chinese
Armstrong Nova created FLINK-12360: -- Summary: Translate "Jobs and Scheduling" Page to Chinese Key: FLINK-12360 URL: https://issues.apache.org/jira/browse/FLINK-12360 Project: Flink Issue Type: Task Components: chinese-translation, Documentation Reporter: Armstrong Nova Assignee: Armstrong Nova Translate the internal page "[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html]; to Chinese -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12359) SystemResourcesMetricsITCase unstable
Chesnay Schepler created FLINK-12359: Summary: SystemResourcesMetricsITCase unstable Key: FLINK-12359 URL: https://issues.apache.org/jira/browse/FLINK-12359 Project: Flink Issue Type: Bug Components: Runtime / Metrics, Tests Affects Versions: 1.9.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.9.0 The {{SystemResourcesMetricsITCase}} checks that task managers register specific set of metrics if configured to do so. The test assumes that the TM is already started completely when the test starts, but this may not be the case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request
Yun Tang created FLINK-12358: Summary: Verify whether rest documenation needs to be updated when building pull request Key: FLINK-12358 URL: https://issues.apache.org/jira/browse/FLINK-12358 Project: Flink Issue Type: Improvement Components: Build System Reporter: Yun Tang Assignee: Yun Tang Currently, unlike configuration docs, rest-API docs have no any methods to check whether updated to latest code. This is really annoying and not easy to track if only checked by developers. I plan to check this in travis to verify whether any files have been updated by using `git status`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Warning from dev@flink.apache.org
it's a bot.It is says this before appending ".INVALID" to your email, Aitozi. Thank you./Sree Sent from Yahoo Mail on Android On Sat, Apr 27, 2019 at 1:21, aitozi wrote: Hi, dev-owner: I'm not quite understand the mail, can you tell me what's it point to ? Thanks, Aitozi On 2019/4/27, 1:40 AM, "dev-h...@flink.apache.org" wrote: Hi! This is the ezmlm program. I'm managing the dev@flink.apache.org mailing list. I'm working for my owner, who can be reached at dev-ow...@flink.apache.org. Messages to you from the dev mailing list seem to have been bouncing. I've attached a copy of the first bounce message I received. If this message bounces too, I will send you a probe. If the probe bounces, I will remove your address from the dev mailing list, without further notice. I've kept a list of which messages from the dev mailing list have bounced from your address. Copies of these messages may be in the archive. To retrieve a set of messages 123-145 (a maximum of 100 per request), send a short message to: To receive a subject and author list for the last 100 or so messages, send a short message to: Here are the message numbers: 28560 --- Enclosed is a copy of the bounce message I received. Return-Path: <> Received: (qmail 52822 invoked for bounce); 16 Apr 2019 10:34:54 - Date: 16 Apr 2019 10:34:54 - From: mailer-dae...@apache.org To: dev-return-285...@flink.apache.org Subject: failure notice
[jira] [Created] (FLINK-12357) Remove useless code in TableConfig
Hequn Cheng created FLINK-12357: --- Summary: Remove useless code in TableConfig Key: FLINK-12357 URL: https://issues.apache.org/jira/browse/FLINK-12357 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Hequn Cheng Assignee: Hequn Cheng -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component
Hi Yu, OK, now I know your comments more clearly. Now, answer your two questions: 1. the value of this work: As I mentioned in the last reply mail to you: "we found the queryable state is hard to use and it may cause few users to use this function. We may think the reason and the result affect each other. And IMO, currently, the queryable state's architecture caused this problem. So I opened a thread to see how to improve them." We try to improve this issue, to break the cycle of the reason and the result. About the queryable state, its value, I think it does not need to clarify, and the previous reply mail from others has verified it. We did not use this feature in critical scenarios, but there are many common scenarios suit this feature, e.g. : - calculations' period is very long, but need the more fine-grained real-time result, for example, get current measure value for real-time OLAP, get consume offset of message system and so on; - Debugging application - If the queryable state has a better use experience, IMO, more and more users would use this feature. 2. about duplicated work, I do not know. For now, the ledger project has not been joined into Flink's repository. But I can ping @Stephan, he maybe wants to answer this question. About a whole and global plan and view, I totally agree with you. I did not give more thought and details, I have replied to you about the reason: because I did not know the community's opinion and if it can be added in Flink's roadmap. All right, we can discuss more details. IMO, a more completed solution may contain these : - refactor query client's API, with meta-service, we may provide more useful API, e.g. scan all keys or scan a key range and so on, obviously, the client API need to adjust to provide new information for query; - introduce a query proxy server, which contains request router, metadata manage/sync, ACL, SLA, and more plugin(I think a plugin architecture is a good choice) or sub-component; - interact with JobManager - interact with TaskManager - plugin's loading strategy - refactor the real querier runs on each TaskManager, it needs to interact with the query proxy server; Obviously, each step can also be split into several steps. Hope for your suggestion and guidance. Any questions, pls let me know. Best, Vino Yu Li 于2019年4月28日周日 下午3:40写道: > TL;DR: IMO a more complete solution is to cover both query and meta request > serving in a new component. We could use the proposal here as step one but > we should have a global plan. And before improving a seemingly not widely > used feature, we'd better weigh the gain and efforts. > > Let me clarify the purpose of my previous questions, that before we start > detailed design and code development, it's better to get consensus on: > 1. What's the value of the work? > - As noticed, the queryable state feature has been implemented for some > while but not widely used in production (AFAIK), why? If it did been used > in critical scenarios, what those scenarios are? > - I think it's a good time discussing about this (since raised in this > thread by others) and confirm the value of efforts improving this feature. > 2. Would there be duplicated work? > - This is the main reason I asked about the relationship between ledger > and queryable-state. > > And some answers to the inline comments: > > bq. About the relationship between ledger and Queryable State, I also think > it is out of this thread > True, that's why I suggested to open another thread. But as mentioned > above, the question is relative if we think about the whole. > > bq. Yes, the QueryableState's isolation level is *Read Uncommitted*... > However, I think it would not affect we discuss how to improve the > queryable state's architecture, right? > Correct, but my real question here is what kind of application could bear > the changing query result. > > bq. The intermediate data is also valuable, for example, we just need a > partitioned data stream's real-time measure value. > In this case there must be some complicated operation in the pipeline which > causes long latency at sink? Could you talk more about the real-world case? > Thanks. > > bq. Your worry is reasonable. > Then I suggest to think it as a whole. We could split the implementation > into steps, but better to have a global plan, to make it really applicable > in production (under heavy load). > > Best Regards, > Yu > > > On Sun, 28 Apr 2019 at 14:48, vino yang wrote: > > > Hi yu, > > > > Thanks for your reply. I have some inline comment. > > > > Yu Li 于2019年4月28日周日 下午12:24写道: > > > > > Glad to see discussions around QueryableState in mailing list, and it > > seems > > > we have included a bigger scope in the discussion, that what's the data > > > model in Flink and how to (or is it possible to) use Flink as a > > database. I > > > suggest to open another thread for this bigger topic and personally I
[jira] [Created] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)
Paul Lin created FLINK-12356: Summary: Optimise version experssion of flink-shaded-hadoop2(-uber) Key: FLINK-12356 URL: https://issues.apache.org/jira/browse/FLINK-12356 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.8.1 Reporter: Paul Lin Assignee: Paul Lin Since the new version scheme for hadoop-based modules, we use version literals in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be replaced by `${parent.version}` variable for better management. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12355) KafkaITCase.testTimestamps is unstable
Yu Li created FLINK-12355: - Summary: KafkaITCase.testTimestamps is unstable Key: FLINK-12355 URL: https://issues.apache.org/jira/browse/FLINK-12355 Project: Flink Issue Type: Test Components: Tests Reporter: Yu Li The {{KafkaITCase.testTimestamps}} failed on Travis because it timed out. https://api.travis-ci.org/v3/job/525503117/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component
TL;DR: IMO a more complete solution is to cover both query and meta request serving in a new component. We could use the proposal here as step one but we should have a global plan. And before improving a seemingly not widely used feature, we'd better weigh the gain and efforts. Let me clarify the purpose of my previous questions, that before we start detailed design and code development, it's better to get consensus on: 1. What's the value of the work? - As noticed, the queryable state feature has been implemented for some while but not widely used in production (AFAIK), why? If it did been used in critical scenarios, what those scenarios are? - I think it's a good time discussing about this (since raised in this thread by others) and confirm the value of efforts improving this feature. 2. Would there be duplicated work? - This is the main reason I asked about the relationship between ledger and queryable-state. And some answers to the inline comments: bq. About the relationship between ledger and Queryable State, I also think it is out of this thread True, that's why I suggested to open another thread. But as mentioned above, the question is relative if we think about the whole. bq. Yes, the QueryableState's isolation level is *Read Uncommitted*... However, I think it would not affect we discuss how to improve the queryable state's architecture, right? Correct, but my real question here is what kind of application could bear the changing query result. bq. The intermediate data is also valuable, for example, we just need a partitioned data stream's real-time measure value. In this case there must be some complicated operation in the pipeline which causes long latency at sink? Could you talk more about the real-world case? Thanks. bq. Your worry is reasonable. Then I suggest to think it as a whole. We could split the implementation into steps, but better to have a global plan, to make it really applicable in production (under heavy load). Best Regards, Yu On Sun, 28 Apr 2019 at 14:48, vino yang wrote: > Hi yu, > > Thanks for your reply. I have some inline comment. > > Yu Li 于2019年4月28日周日 下午12:24写道: > > > Glad to see discussions around QueryableState in mailing list, and it > seems > > we have included a bigger scope in the discussion, that what's the data > > model in Flink and how to (or is it possible to) use Flink as a > database. I > > suggest to open another thread for this bigger topic and personally I > think > > the first question should be answered is what's the relationship between > > Flink ledger and QueryableState. > > > > *About the scope, yes, it seems it's big. Actually, I think the questions > you provided make it bigger than I have done.* > *Here I think we don't need to answer the two questions(we can discuss in > another thread, or answer it later).* > > *My original thought is that we found the queryable state is hard to use > and it may cause few users to use this function. We may think the reason > and the result affect each other. And IMO, currently, the queryable state's > architecture caused this problem. So I opened a thread to see how to > improve them. * > > *We mentioned these keywords e.g. "state、database" is to emphasize the > queryable state is very important. The data model and use Flink as a > database is not this thread's main topic (as Elias's reply said, many > issues cause the road to this goal is so long). This thread I assume we do > not change the state's core design and the goal is to bring a better query > solution.* > > *About the relationship between ledger and Queryable State, I also think it > is out of this thread.* > > > > > > Back to the user scenario itself, I'd like to post two open questions > about > > QueryableState for ad-hoc query: > > 1. Currently the isolation level of QueryableState is *Read Uncommitted* > > since failover might happen and cause data rollback. Although the > > "uncommitted" data will be replayed again and get final consistency, > > application will see unstable query result. Probably some kind of > > applications could bare such drawback but what exactly? > > > > *Yes, the QueryableState's isolation level is *Read Uncommitted*. I think > if we need a higher isolation level, may need other mechanisms to guarantee > this. I am sorry, I can not give the solution.* > *However, I think it would not affect we discuss how to improve the > queryable state's architecture, right?* > > > > > > 2. Currently in Flink sink is more commonly regarded as the "result > > partition" and state of operators in the pipeline more like "intermediate > > data". Used for debugging purpose is easy to understand but not for > ad-hoc > > query. Or in another word, what makes user prefer querying the state data > > instead of sink? Or why we need to query the intermediate data instead of > > the result? > > > > > *About the opinion that state of operators in the pipeline more like > "intermediate data". Yes, you are right. It's
Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component
Hi yu, Thanks for your reply. I have some inline comment. Yu Li 于2019年4月28日周日 下午12:24写道: > Glad to see discussions around QueryableState in mailing list, and it seems > we have included a bigger scope in the discussion, that what's the data > model in Flink and how to (or is it possible to) use Flink as a database. I > suggest to open another thread for this bigger topic and personally I think > the first question should be answered is what's the relationship between > Flink ledger and QueryableState. > *About the scope, yes, it seems it's big. Actually, I think the questions you provided make it bigger than I have done.* *Here I think we don't need to answer the two questions(we can discuss in another thread, or answer it later).* *My original thought is that we found the queryable state is hard to use and it may cause few users to use this function. We may think the reason and the result affect each other. And IMO, currently, the queryable state's architecture caused this problem. So I opened a thread to see how to improve them. * *We mentioned these keywords e.g. "state、database" is to emphasize the queryable state is very important. The data model and use Flink as a database is not this thread's main topic (as Elias's reply said, many issues cause the road to this goal is so long). This thread I assume we do not change the state's core design and the goal is to bring a better query solution.* *About the relationship between ledger and Queryable State, I also think it is out of this thread.* > > Back to the user scenario itself, I'd like to post two open questions about > QueryableState for ad-hoc query: > 1. Currently the isolation level of QueryableState is *Read Uncommitted* > since failover might happen and cause data rollback. Although the > "uncommitted" data will be replayed again and get final consistency, > application will see unstable query result. Probably some kind of > applications could bare such drawback but what exactly? > *Yes, the QueryableState's isolation level is *Read Uncommitted*. I think if we need a higher isolation level, may need other mechanisms to guarantee this. I am sorry, I can not give the solution.* *However, I think it would not affect we discuss how to improve the queryable state's architecture, right?* > > 2. Currently in Flink sink is more commonly regarded as the "result > partition" and state of operators in the pipeline more like "intermediate > data". Used for debugging purpose is easy to understand but not for ad-hoc > query. Or in another word, what makes user prefer querying the state data > instead of sink? Or why we need to query the intermediate data instead of > the result? > > *About the opinion that state of operators in the pipeline more like "intermediate data". Yes, you are right. It's intermediate data, and we need it in some scene.* *The valuable is that it represents "real-time". When querying a state, we need its current value, we can not wait for sink. The intermediate data is also valuable, for example, we just need a partitioned data stream's real-time measure value.* > Further back to the original topic proposed in this thread about > introducing a QueryableStateProxy, I could see some careful consideration > on query load on the proxy. However, under heavy load the pressure is not > only on query serving but also on meta requesting, which is handled by JM > for now. So to release JM pressure, we should also extract the meta serving > task out, and my suggestion is to introduce a new component like > *StateMetaServer* and take over both query and meta serving > responsibilities. > *I think the opinion of metadata's pressure and *StateMetaServer* are good. We need to care about them when we design.* *I mentioned the meta info(registry) in the two option's simple architecture picture. Although, I just emphasized the query proxy server, because it is the main component.* *Your worry is reasonable. The proxy server's architecture is good for processing this, such as the mechanisms of request flow control, pressure transfer to a single entry point(for opt2 and opt3, we can serve meta-query in a single process).* *Anyway, it just opened a discussion to listen to the community's opinion.* > > Best Regards, > Yu > > > On Sat, 27 Apr 2019 at 11:58, vino yang wrote: > > > Hi Elias, > > > > I agree with your opinion that "*Flink jobs don't sufficiently meet these > > requirements to work as a replacement for a data store.*". Actually, I > > think it's obviously not Flink's goal. If we think that the database > > contains the main two parts(inexactitude): data query and data store. > What > > I and Paul mean is the former. > > > > Yes, you have mentioned it's major value: ad hoc and debugging(IMO, > > especially for the former). To give a real-time calculation result is > very > > import for some scene(such as real-time measure for real-time OLAP) in a > > long-term (no-window or large window). > > > > So, my opinion: Queryable
[jira] [Created] (FLINK-12354) Add Reverse function supported in Table API and SQL
Zhanchun Zhang created FLINK-12354: -- Summary: Add Reverse function supported in Table API and SQL Key: FLINK-12354 URL: https://issues.apache.org/jira/browse/FLINK-12354 Project: Flink Issue Type: Sub-task Components: Table SQL / API Reporter: Zhanchun Zhang Assignee: Zhanchun Zhang Returns the string _{{str}}_ with the order of the characters reversed. eg: SELECT REVERSE('abc'); -> 'cba' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12353) Add missing module to collect_japicmp_reports.sh
Yun Tang created FLINK-12353: Summary: Add missing module to collect_japicmp_reports.sh Key: FLINK-12353 URL: https://issues.apache.org/jira/browse/FLINK-12353 Project: Flink Issue Type: Bug Components: Build System Reporter: Yun Tang Assignee: Yun Tang Currently, there are eight modules using japicmp plugin. However, only four of them would collect japicmp reports in {{collect_japicmp_reports.sh}}. I have to modify the shell script to collect all reports and therefore I plan to contribute this change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment
Thank you for your reminder, I will pay attention to this issue in the future. I read some flink source code and saw that there are a lot of new features of java8, such as CompletableFuture and Lambda expressions, which causes flink do not run in jdk 1.7 environment, so you may need to upgrade your jdk to 1.8. > 在 2019年4月28日,上午8:24,126 写道: > > Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的 > > 发自我的 iPhone > >> 在 2019年4月26日,17:48,胡逸才 写道: >> >> At present, all YARN clusters adopt JAVA 7 environment. >> >> While trying to use FLINK to handle the deployment of flow processing >> business scenarios, it was found that FLINK ON YARN mode always failed to >> perform a session task. The application log of YARN shows Unsupported major. >> minor version 52.0. >> >> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of the >> mailing list solution. And the startup command adds -yD yarn. taskmanager. >> env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env. JAVA_HOME= < >> JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= < JDK1.8PATH>. >> Flink session cluster in YARN can not run Application in JAVA 8 environment. >> >> So can I use Flink1.7.X submit Flink session cluster application in YARN >> under JAVA 7 environment? >> >> >>