Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Chen Qin
Just share some of insights from operating SparkML side at scale
- map reduce may not best way to iterative sync partitioned workers. 
- native hardware accelerations is key to adopt rapid changes in ML 
improvements in foreseeable future.

Chen

On Apr 29, 2019, at 11:02, jincheng sun  wrote:
> 
> Hi Shaoxuan,
> 
> Thanks for doing more efforts for the enhances of the scalability and the
> ease of use of Flink ML and make it one step further. Thank you for sharing
> a lot of context information.
> 
> big +1 for this proposal!
> 
> Here only one suggestion, that is, It has been a short time until the
> release of flink-1.9, so I recommend It's better to add a detailed
> implementation plan to FLIP and google doc.
> 
> What do you think?
> 
> Best,
> Jincheng
> 
> Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:
> 
>> Hi everyone,
>> 
>> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
>> months ago in this mail thread:
>> 
>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>> 
>> Luogen, Becket, Xu, Weihua and I have been working on this proposal
>> offline in
>> the past a few months. Now we want to share the first phase of the entire
>> proposal with a FLIP. In this FLIP-39, we want to achieve several things
>> (and hope those can be accomplished and released in Flink-1.9):
>> 
>>   -
>> 
>>   Provide a new set of ML core interface (on top of Flink TableAPI)
>>   -
>> 
>>   Provide a ML pipeline interface (on top of Flink TableAPI)
>>   -
>> 
>>   Provide the interfaces for parameters management and pipeline/mode
>>   persistence
>>   -
>> 
>>   All the above interfaces should facilitate any new ML algorithm. We will
>>   gradually add various standard ML algorithms on top of these new
>> proposed
>>   interfaces to ensure their feasibility and scalability.
>> 
>> 
>> Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
>> Xu and Me.
>> 
>> 
>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>> 
>> 
>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>> 
>> You can find the videos & slides at
>> https://www.ververica.com/flink-forward-san-francisco-2019
>> 
>> The design document for FLIP-39 can be found here:
>> 
>> 
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>> 
>> 
>> I am looking forward to your feedback.
>> 
>> Regards,
>> 
>> Shaoxuan
>> 


[jira] [Created] (FLINK-12361) Remove useless expression from runtime scheduler

2019-04-28 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12361:


 Summary: Remove useless expression from runtime scheduler
 Key: FLINK-12361
 URL: https://issues.apache.org/jira/browse/FLINK-12361
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Operators
Reporter: Liya Fan
Assignee: Liya Fan
 Attachments: image-2019-04-29-11-16-13-492.png

In the scheduleTask method of Scheduler class, expression forceExternalLocation 
is useless, since it always evaluates to false:

 !image-2019-04-29-11-16-13-492.png! 

So it can be removed. Moreover, by removing this expression, the code structure 
can be made much simpler, because there are some branches relying this 
expression, which can also be removed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread jincheng sun
Hi Shaoxuan,

Thanks for doing more efforts for the enhances of the scalability and the
ease of use of Flink ML and make it one step further. Thank you for sharing
a lot of context information.

big +1 for this proposal!

Here only one suggestion, that is, It has been a short time until the
release of flink-1.9, so I recommend It's better to add a detailed
implementation plan to FLIP and google doc.

What do you think?

Best,
Jincheng

Shaoxuan Wang  于2019年4月29日周一 上午10:34写道:

> Hi everyone,
>
> Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
> months ago in this mail thread:
>
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>
> Luogen, Becket, Xu, Weihua and I have been working on this proposal
> offline in
> the past a few months. Now we want to share the first phase of the entire
> proposal with a FLIP. In this FLIP-39, we want to achieve several things
> (and hope those can be accomplished and released in Flink-1.9):
>
>-
>
>Provide a new set of ML core interface (on top of Flink TableAPI)
>-
>
>Provide a ML pipeline interface (on top of Flink TableAPI)
>-
>
>Provide the interfaces for parameters management and pipeline/mode
>persistence
>-
>
>All the above interfaces should facilitate any new ML algorithm. We will
>gradually add various standard ML algorithms on top of these new
> proposed
>interfaces to ensure their feasibility and scalability.
>
>
> Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
> Xu and Me.
>
>
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>
>
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>
> You can find the videos & slides at
> https://www.ververica.com/flink-forward-san-francisco-2019
>
> The design document for FLIP-39 can be found here:
>
>
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>
>
> I am looking forward to your feedback.
>
> Regards,
>
> Shaoxuan
>


[DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-04-28 Thread Shaoxuan Wang
Hi everyone,

Weihua has proposed to rebuild Flink ML pipeline on top of TableAPI several
months ago in this mail thread:

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html

Luogen, Becket, Xu, Weihua and I have been working on this proposal offline in
the past a few months. Now we want to share the first phase of the entire
proposal with a FLIP. In this FLIP-39, we want to achieve several things
(and hope those can be accomplished and released in Flink-1.9):

   -

   Provide a new set of ML core interface (on top of Flink TableAPI)
   -

   Provide a ML pipeline interface (on top of Flink TableAPI)
   -

   Provide the interfaces for parameters management and pipeline/mode
   persistence
   -

   All the above interfaces should facilitate any new ML algorithm. We will
   gradually add various standard ML algorithms on top of these new proposed
   interfaces to ensure their feasibility and scalability.


Part of this FLIP has been present in Flink Forward 2019 @ San Francisco by
Xu and Me.

https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api

https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink

You can find the videos & slides at
https://www.ververica.com/flink-forward-san-francisco-2019

The design document for FLIP-39 can be found here:

https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo


I am looking forward to your feedback.

Regards,

Shaoxuan


Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment

2019-04-28 Thread Yaoting Gong
Sorry, I didn't see the original message already mentioned in these
configurations.  But it do works for me. I put these into flink-conf.yml,
not try the way you did.

On Mon, Apr 29, 2019 at 9:45 AM Yaoting Gong 
wrote:

> Hi, @胡逸才,  I've met the same problem.   Add some configs as blow will help
> you .
>
> env.java.home: /usr/jdk1.8.0_51containerized.master.env.JAVA_HOME:
> /usr/jdk1.8.0_51containerized.taskmanager.env.JAVA_HOME: 
> /usr/jdk1.8.0_51yarn.taskmanager.env.JAVA_HOME:
> /usr/jdk1.8.0_51
>
>
>
> On Sun, Apr 28, 2019 at 1:58 PM 张军  wrote:
>
>> Thank you for your reminder, I will pay attention to this issue in the
>> future.
>>
>> I read some flink source code and saw that there are a lot of new
>> features of java8, such as CompletableFuture and Lambda expressions, which
>> causes flink do not run in jdk 1.7 environment, so you may need to upgrade
>> your jdk to 1.8.
>>
>>
>>
>>
>> > 在 2019年4月28日,上午8:24,126  写道:
>> >
>> > Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的
>> >
>> > 发自我的 iPhone
>> >
>> >> 在 2019年4月26日,17:48,胡逸才  写道:
>> >>
>> >> At present, all YARN clusters adopt JAVA 7 environment.
>> >>
>> >> While trying to use FLINK to handle the deployment of flow processing
>> business scenarios, it was found that FLINK ON YARN mode always failed to
>> perform a session task. The application log of YARN shows Unsupported
>> major. minor version 52.0.
>> >>
>> >> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of
>> the mailing list solution. And the startup command adds -yD yarn.
>> taskmanager. env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env.
>> JAVA_HOME= < JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= <
>> JDK1.8PATH>. Flink session cluster in YARN can not run Application in JAVA
>> 8 environment.
>> >>
>> >> So can I use Flink1.7.X submit Flink session cluster application in
>> YARN under JAVA 7 environment?
>> >>
>> >>
>> >>
>>
>>


Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment

2019-04-28 Thread Yaoting Gong
Hi, @胡逸才,  I've met the same problem.   Add some configs as blow will help
you .

env.java.home: /usr/jdk1.8.0_51containerized.master.env.JAVA_HOME:
/usr/jdk1.8.0_51containerized.taskmanager.env.JAVA_HOME:
/usr/jdk1.8.0_51yarn.taskmanager.env.JAVA_HOME:
/usr/jdk1.8.0_51



On Sun, Apr 28, 2019 at 1:58 PM 张军  wrote:

> Thank you for your reminder, I will pay attention to this issue in the
> future.
>
> I read some flink source code and saw that there are a lot of new features
> of java8, such as CompletableFuture and Lambda expressions, which causes
> flink do not run in jdk 1.7 environment, so you may need to upgrade your
> jdk to 1.8.
>
>
>
>
> > 在 2019年4月28日,上午8:24,126  写道:
> >
> > Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的
> >
> > 发自我的 iPhone
> >
> >> 在 2019年4月26日,17:48,胡逸才  写道:
> >>
> >> At present, all YARN clusters adopt JAVA 7 environment.
> >>
> >> While trying to use FLINK to handle the deployment of flow processing
> business scenarios, it was found that FLINK ON YARN mode always failed to
> perform a session task. The application log of YARN shows Unsupported
> major. minor version 52.0.
> >>
> >> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of
> the mailing list solution. And the startup command adds -yD yarn.
> taskmanager. env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env.
> JAVA_HOME= < JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= <
> JDK1.8PATH>. Flink session cluster in YARN can not run Application in JAVA
> 8 environment.
> >>
> >> So can I use Flink1.7.X submit Flink session cluster application in
> YARN under JAVA 7 environment?
> >>
> >>
> >>
>
>


[jira] [Created] (FLINK-12360) Translate "Jobs and Scheduling" Page to Chinese

2019-04-28 Thread Armstrong Nova (JIRA)
Armstrong Nova created FLINK-12360:
--

 Summary: Translate "Jobs and Scheduling" Page to Chinese
 Key: FLINK-12360
 URL: https://issues.apache.org/jira/browse/FLINK-12360
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Reporter: Armstrong Nova
Assignee: Armstrong Nova


Translate the internal page 
"[https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html];
 to Chinese 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12359) SystemResourcesMetricsITCase unstable

2019-04-28 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-12359:


 Summary: SystemResourcesMetricsITCase unstable
 Key: FLINK-12359
 URL: https://issues.apache.org/jira/browse/FLINK-12359
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Tests
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.0


The {{SystemResourcesMetricsITCase}} checks that task managers register 
specific set of metrics if configured to do so. The test assumes that the TM is 
already started completely when the test starts, but this may not be the case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12358) Verify whether rest documenation needs to be updated when building pull request

2019-04-28 Thread Yun Tang (JIRA)
Yun Tang created FLINK-12358:


 Summary: Verify whether rest documenation needs to be updated when 
building pull request
 Key: FLINK-12358
 URL: https://issues.apache.org/jira/browse/FLINK-12358
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Yun Tang
Assignee: Yun Tang


Currently, unlike configuration docs, rest-API docs have no any methods to 
check whether updated to latest code. This is really annoying and not easy to 
track if only checked by developers.

I plan to check this in travis to verify whether any files have been updated by 
using `git status`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Warning from dev@flink.apache.org

2019-04-28 Thread Sree V
it's a bot.It is says this before appending ".INVALID" to your email, Aitozi.
Thank you./Sree

Sent from Yahoo Mail on Android 
 
  On Sat, Apr 27, 2019 at 1:21, aitozi wrote:   Hi, 
dev-owner:

I'm not quite understand the mail, can you tell me what's it point to ?

Thanks,
Aitozi


On 2019/4/27, 1:40 AM, "dev-h...@flink.apache.org"  
wrote:

    Hi! This is the ezmlm program. I'm managing the
    dev@flink.apache.org mailing list.
    
    I'm working for my owner, who can be reached
    at dev-ow...@flink.apache.org.
    
    
    Messages to you from the dev mailing list seem to
    have been bouncing. I've attached a copy of the first bounce
    message I received.
    
    If this message bounces too, I will send you a probe. If the probe bounces,
    I will remove your address from the dev mailing list,
    without further notice.
    
    
    I've kept a list of which messages from the dev mailing list have 
    bounced from your address.
    
    Copies of these messages may be in the archive.
    To retrieve a set of messages 123-145 (a maximum of 100 per request),
    send a short message to:
      
    
    To receive a subject and author list for the last 100 or so messages,
    send a short message to:
      
    
    Here are the message numbers:
    
      28560
    
    --- Enclosed is a copy of the bounce message I received.
    
    Return-Path: <>
    Received: (qmail 52822 invoked for bounce); 16 Apr 2019 10:34:54 -
    Date: 16 Apr 2019 10:34:54 -
    From: mailer-dae...@apache.org
    To: dev-return-285...@flink.apache.org
    Subject: failure notice
    
    

  


[jira] [Created] (FLINK-12357) Remove useless code in TableConfig

2019-04-28 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-12357:
---

 Summary: Remove useless code in TableConfig
 Key: FLINK-12357
 URL: https://issues.apache.org/jira/browse/FLINK-12357
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Hequn Cheng
Assignee: Hequn Cheng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component

2019-04-28 Thread vino yang
Hi Yu,

OK, now I know your comments more clearly.

Now, answer your two questions:

1. the value of this work:

As I mentioned in the last reply mail to you: "we found the queryable state
is hard to use and it may cause few users to use this function. We may
think the reason and the result affect each other. And IMO, currently, the
queryable state's architecture caused this problem. So I opened a thread to
see how to improve them." We try to improve this issue, to break the cycle
of the reason and the result.

About the queryable state, its value, I think it does not need to clarify,
and the previous reply mail from others has verified it.

We did not use this feature in critical scenarios, but there are many
common scenarios suit this feature, e.g. :


   - calculations' period is very long, but need the more fine-grained
   real-time result, for example, get current measure value for real-time
   OLAP, get consume offset of message system and so on;
   - Debugging application
   - 

If the queryable state has a better use experience, IMO, more and more
users would use this feature.

2. about duplicated work, I do not know. For now, the ledger project has
not been joined into Flink's repository. But I can ping @Stephan, he maybe
wants to answer this question.

About a whole and global plan and view, I totally agree with you. I did not
give more thought and details, I have replied to you about the reason:
because I did not know the community's opinion and if it can be added in
Flink's roadmap.

All right, we can discuss more details. IMO, a more completed solution may
contain these :


   - refactor query client's API, with meta-service, we may provide more
   useful API, e.g. scan all keys or scan a key range and so on, obviously,
   the client API need to adjust to provide new information for query;
   - introduce a query proxy server, which contains request router,
   metadata manage/sync, ACL, SLA, and more plugin(I think a plugin
   architecture is a good choice) or sub-component;
  - interact with JobManager
  - interact with TaskManager
  - plugin's loading strategy
   - refactor the real querier runs on each TaskManager, it needs to
   interact with the query proxy server;

Obviously, each step can also be split into several steps.

Hope for your suggestion and guidance. Any questions, pls let me know.

Best,
Vino


Yu Li  于2019年4月28日周日 下午3:40写道:

> TL;DR: IMO a more complete solution is to cover both query and meta request
> serving in a new component. We could use the proposal here as step one but
> we should have a global plan. And before improving a seemingly not widely
> used feature, we'd better weigh the gain and efforts.
>
> Let me clarify the purpose of my previous questions, that before we start
> detailed design and code development, it's better to get consensus on:
> 1. What's the value of the work?
> - As noticed, the queryable state feature has been implemented for some
> while but not widely used in production (AFAIK), why? If it did been used
> in critical scenarios, what those scenarios are?
> - I think it's a good time discussing about this (since raised in this
> thread by others) and confirm the value of efforts improving this feature.
> 2. Would there be duplicated work?
> - This is the main reason I asked about the relationship between ledger
> and queryable-state.
>
> And some answers to the inline comments:
>
> bq. About the relationship between ledger and Queryable State, I also think
> it is out of this thread
> True, that's why I suggested to open another thread. But as mentioned
> above, the question is relative if we think about the whole.
>
> bq. Yes, the QueryableState's isolation level is *Read Uncommitted*...
> However, I think it would not affect we discuss how to improve the
> queryable state's architecture, right?
> Correct, but my real question here is what kind of application could bear
> the changing query result.
>
> bq. The intermediate data is also valuable, for example, we just need a
> partitioned data stream's real-time measure value.
> In this case there must be some complicated operation in the pipeline which
> causes long latency at sink? Could you talk more about the real-world case?
> Thanks.
>
> bq. Your worry is reasonable.
> Then I suggest to think it as a whole. We could split the implementation
> into steps, but better to have a global plan, to make it really applicable
> in production (under heavy load).
>
> Best Regards,
> Yu
>
>
> On Sun, 28 Apr 2019 at 14:48, vino yang  wrote:
>
> > Hi yu,
> >
> > Thanks for your reply. I have some inline comment.
> >
> > Yu Li  于2019年4月28日周日 下午12:24写道:
> >
> > > Glad to see discussions around QueryableState in mailing list, and it
> > seems
> > > we have included a bigger scope in the discussion, that what's the data
> > > model in Flink and how to (or is it possible to) use Flink as a
> > database. I
> > > suggest to open another thread for this bigger topic and personally I

[jira] [Created] (FLINK-12356) Optimise version experssion of flink-shaded-hadoop2(-uber)

2019-04-28 Thread Paul Lin (JIRA)
Paul Lin created FLINK-12356:


 Summary: Optimise version experssion of flink-shaded-hadoop2(-uber)
 Key: FLINK-12356
 URL: https://issues.apache.org/jira/browse/FLINK-12356
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.8.1
Reporter: Paul Lin
Assignee: Paul Lin


Since the new version scheme for hadoop-based modules, we use version literals 
in `flink-shaded-hadoop` and `flink-shaded-hadoop2`, and it can be replaced by 
`${parent.version}` variable for better management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12355) KafkaITCase.testTimestamps is unstable

2019-04-28 Thread Yu Li (JIRA)
Yu Li created FLINK-12355:
-

 Summary: KafkaITCase.testTimestamps is unstable
 Key: FLINK-12355
 URL: https://issues.apache.org/jira/browse/FLINK-12355
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Yu Li


The {{KafkaITCase.testTimestamps}} failed on Travis because it timed out.
https://api.travis-ci.org/v3/job/525503117/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component

2019-04-28 Thread Yu Li
TL;DR: IMO a more complete solution is to cover both query and meta request
serving in a new component. We could use the proposal here as step one but
we should have a global plan. And before improving a seemingly not widely
used feature, we'd better weigh the gain and efforts.

Let me clarify the purpose of my previous questions, that before we start
detailed design and code development, it's better to get consensus on:
1. What's the value of the work?
- As noticed, the queryable state feature has been implemented for some
while but not widely used in production (AFAIK), why? If it did been used
in critical scenarios, what those scenarios are?
- I think it's a good time discussing about this (since raised in this
thread by others) and confirm the value of efforts improving this feature.
2. Would there be duplicated work?
- This is the main reason I asked about the relationship between ledger
and queryable-state.

And some answers to the inline comments:

bq. About the relationship between ledger and Queryable State, I also think
it is out of this thread
True, that's why I suggested to open another thread. But as mentioned
above, the question is relative if we think about the whole.

bq. Yes, the QueryableState's isolation level is *Read Uncommitted*...
However, I think it would not affect we discuss how to improve the
queryable state's architecture, right?
Correct, but my real question here is what kind of application could bear
the changing query result.

bq. The intermediate data is also valuable, for example, we just need a
partitioned data stream's real-time measure value.
In this case there must be some complicated operation in the pipeline which
causes long latency at sink? Could you talk more about the real-world case?
Thanks.

bq. Your worry is reasonable.
Then I suggest to think it as a whole. We could split the implementation
into steps, but better to have a global plan, to make it really applicable
in production (under heavy load).

Best Regards,
Yu


On Sun, 28 Apr 2019 at 14:48, vino yang  wrote:

> Hi yu,
>
> Thanks for your reply. I have some inline comment.
>
> Yu Li  于2019年4月28日周日 下午12:24写道:
>
> > Glad to see discussions around QueryableState in mailing list, and it
> seems
> > we have included a bigger scope in the discussion, that what's the data
> > model in Flink and how to (or is it possible to) use Flink as a
> database. I
> > suggest to open another thread for this bigger topic and personally I
> think
> > the first question should be answered is what's the relationship between
> > Flink ledger and QueryableState.
> >
>
> *About the scope, yes, it seems it's big. Actually, I think the questions
> you provided make it bigger than I have done.*
> *Here I think we don't need to answer the two questions(we can discuss in
> another thread, or answer it later).*
>
> *My original thought is that we found the queryable state is hard to use
> and it may cause few users to use this function. We may think the reason
> and the result affect each other. And IMO, currently, the queryable state's
> architecture caused this problem. So I opened a thread to see how to
> improve them. *
>
> *We mentioned these keywords e.g. "state、database" is to emphasize the
> queryable state is very important. The data model and use Flink as a
> database is not this thread's main topic (as Elias's reply said, many
> issues cause the road to this goal is so long). This thread I assume we do
> not change the state's core design and the goal is to bring a better query
> solution.*
>
> *About the relationship between ledger and Queryable State, I also think it
> is out of this thread.*
>
>
> >
> > Back to the user scenario itself, I'd like to post two open questions
> about
> > QueryableState for ad-hoc query:
> > 1. Currently the isolation level of QueryableState is *Read Uncommitted*
> > since failover might happen and cause data rollback. Although the
> > "uncommitted" data will be replayed again and get final consistency,
> > application will see unstable query result. Probably some kind of
> > applications could bare such drawback but what exactly?
> >
>
> *Yes, the QueryableState's isolation level is *Read Uncommitted*. I think
> if we need a higher isolation level, may need other mechanisms to guarantee
> this. I am sorry, I can not give the solution.*
> *However, I think it would not affect we discuss how to improve the
> queryable state's architecture, right?*
>
>
> >
> > 2. Currently in Flink sink is more commonly regarded as the "result
> > partition" and state of operators in the pipeline more like "intermediate
> > data". Used for debugging purpose is easy to understand but not for
> ad-hoc
> > query. Or in another word, what makes user prefer querying the state data
> > instead of sink? Or why we need to query the intermediate data instead of
> > the result?
> >
> >
> *About the opinion that state of operators in the pipeline more like
> "intermediate data". Yes, you are right. It's 

Re: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component

2019-04-28 Thread vino yang
Hi yu,

Thanks for your reply. I have some inline comment.

Yu Li  于2019年4月28日周日 下午12:24写道:

> Glad to see discussions around QueryableState in mailing list, and it seems
> we have included a bigger scope in the discussion, that what's the data
> model in Flink and how to (or is it possible to) use Flink as a database. I
> suggest to open another thread for this bigger topic and personally I think
> the first question should be answered is what's the relationship between
> Flink ledger and QueryableState.
>

*About the scope, yes, it seems it's big. Actually, I think the questions
you provided make it bigger than I have done.*
*Here I think we don't need to answer the two questions(we can discuss in
another thread, or answer it later).*

*My original thought is that we found the queryable state is hard to use
and it may cause few users to use this function. We may think the reason
and the result affect each other. And IMO, currently, the queryable state's
architecture caused this problem. So I opened a thread to see how to
improve them. *

*We mentioned these keywords e.g. "state、database" is to emphasize the
queryable state is very important. The data model and use Flink as a
database is not this thread's main topic (as Elias's reply said, many
issues cause the road to this goal is so long). This thread I assume we do
not change the state's core design and the goal is to bring a better query
solution.*

*About the relationship between ledger and Queryable State, I also think it
is out of this thread.*


>
> Back to the user scenario itself, I'd like to post two open questions about
> QueryableState for ad-hoc query:
> 1. Currently the isolation level of QueryableState is *Read Uncommitted*
> since failover might happen and cause data rollback. Although the
> "uncommitted" data will be replayed again and get final consistency,
> application will see unstable query result. Probably some kind of
> applications could bare such drawback but what exactly?
>

*Yes, the QueryableState's isolation level is *Read Uncommitted*. I think
if we need a higher isolation level, may need other mechanisms to guarantee
this. I am sorry, I can not give the solution.*
*However, I think it would not affect we discuss how to improve the
queryable state's architecture, right?*


>
> 2. Currently in Flink sink is more commonly regarded as the "result
> partition" and state of operators in the pipeline more like "intermediate
> data". Used for debugging purpose is easy to understand but not for ad-hoc
> query. Or in another word, what makes user prefer querying the state data
> instead of sink? Or why we need to query the intermediate data instead of
> the result?
>
>
*About the opinion that state of operators in the pipeline more like
"intermediate data". Yes, you are right. It's intermediate data, and we
need it in some scene.*
*The valuable is that it represents "real-time". When querying a state, we
need its current value, we can not wait for sink. The intermediate data is
also valuable, for example, we just need a partitioned data stream's
real-time measure value.*


> Further back to the original topic proposed in this thread about
> introducing a QueryableStateProxy, I could see some careful consideration
> on query load on the proxy. However, under heavy load the pressure is not
> only on query serving but also on meta requesting, which is handled by JM
> for now. So to release JM pressure, we should also extract the meta serving
> task out, and my suggestion is to introduce a new component like
> *StateMetaServer* and take over both query and meta serving
> responsibilities.
>

*I think the opinion of metadata's pressure and *StateMetaServer* are good.
We need to care about them when we design.*
*I mentioned the meta info(registry) in the two option's simple
architecture picture. Although, I just emphasized the query proxy server,
because it is the main component.*

*Your worry is reasonable. The proxy server's architecture is good for
processing this, such as the mechanisms of request flow control, pressure
transfer to a single entry point(for opt2 and opt3, we can serve meta-query
in a single process).*

*Anyway, it just opened a discussion to listen to the community's opinion.*


>
> Best Regards,
> Yu
>
>
> On Sat, 27 Apr 2019 at 11:58, vino yang  wrote:
>
> > Hi Elias,
> >
> > I agree with your opinion that "*Flink jobs don't sufficiently meet these
> > requirements to work as a replacement for a data store.*".  Actually, I
> > think it's obviously not Flink's goal. If we think that the database
> > contains the main two parts(inexactitude): data query and data store.
> What
> > I and Paul mean is the former.
> >
> > Yes, you have mentioned it's major value: ad hoc and debugging(IMO,
> > especially for the former). To give a real-time calculation result is
> very
> > import for some scene(such as real-time measure for real-time OLAP) in a
> > long-term (no-window or large window).
> >
> > So, my opinion: Queryable 

[jira] [Created] (FLINK-12354) Add Reverse function supported in Table API and SQL

2019-04-28 Thread Zhanchun Zhang (JIRA)
Zhanchun Zhang created FLINK-12354:
--

 Summary: Add Reverse function supported in Table API and SQL
 Key: FLINK-12354
 URL: https://issues.apache.org/jira/browse/FLINK-12354
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Zhanchun Zhang
Assignee: Zhanchun Zhang


Returns the string _{{str}}_ with the order of the characters reversed.

eg: SELECT REVERSE('abc');  -> 'cba'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12353) Add missing module to collect_japicmp_reports.sh

2019-04-28 Thread Yun Tang (JIRA)
Yun Tang created FLINK-12353:


 Summary: Add missing module to collect_japicmp_reports.sh
 Key: FLINK-12353
 URL: https://issues.apache.org/jira/browse/FLINK-12353
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Yun Tang
Assignee: Yun Tang


Currently, there are eight modules using japicmp plugin. However, only four of 
them would collect japicmp reports in {{collect_japicmp_reports.sh}}. I have to 
modify the shell script to collect all reports and therefore I plan to 
contribute this change.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: How to let Flink 1.7.X run Flink session cluster on YARN in Java 7 default environment

2019-04-28 Thread 张军
Thank you for your reminder, I will pay attention to this issue in the future.

I read some flink source code and saw that there are a lot of new features of 
java8, such as CompletableFuture and Lambda expressions, which causes flink do 
not run in jdk 1.7 environment, so you may need to upgrade your jdk to 1.8.




> 在 2019年4月28日,上午8:24,126  写道:
> 
> Flink源码中用到了很多java1.8的特性,所以用jdk1.7是不行的
> 
> 发自我的 iPhone
> 
>> 在 2019年4月26日,17:48,胡逸才  写道:
>> 
>> At present, all YARN clusters adopt JAVA 7 environment.
>> 
>> While trying to use FLINK to handle the deployment of flow processing 
>> business scenarios, it was found that FLINK ON YARN mode always failed to 
>> perform a session task. The application log of YARN shows Unsupported major. 
>> minor version 52.0.
>> 
>> I tried to add env. java. home: < JDK 1.8PATH > in flink-conf. yaml of the 
>> mailing list solution. And the startup command adds -yD yarn. taskmanager. 
>> env. JAVA_HOME= < JDK1.8PATH>、-yD containerized. master. env. JAVA_HOME= < 
>> JDK1.8PATH>, -yD containerized. taskmanager. env. JAVA_HOME= < JDK1.8PATH>. 
>> Flink session cluster in YARN can not run Application in JAVA 8 environment.
>> 
>> So can I use Flink1.7.X submit Flink session cluster application in YARN 
>> under JAVA 7 environment?
>> 
>> 
>>