Re: [DISCUSS] flink connector for sink to apache/incubator-druid

2019-11-04 Thread Qi Luo
Hi Xiao Dao,

Is it possible to ingest into Druid with exactly-once guarantee?

Thanks,
Qi

On Mon, Nov 4, 2019 at 4:18 PM xiao dao  wrote:

> Dear Flink Community!
> I would like to open the discussion of contributiong incubator-druid Flink
> connector to Flink.
>
> ## A brief introduction to Apache/incubator-druid
>
> Apache Druid[1] (incubating) is a real-time analytics database designed for
> fast slice-and-dice analytics ("OLAP" queries) on large data sets. Druid is
> most often used as a database for powering use cases where real-time
> ingest,
> fast query performance, and high uptime are important. As such, Druid is
> commonly used for powering GUIs of analytical applications, or as a backend
> for highly-concurrent APIs that need fast aggregations. Nowadays, druid has
> been adopted by
> more and more companies[2].
>
> ## The status of druid Flink connector
> The druid Flink connector we are planning to contribute is buid upon flink
> 1.9.1 and druid 0.16.0. The main features are:
> -Sink streaming results to druid with at-least-once semantics. use data
> ingestion by Tranquility which is support of druid.
>
> ## Reference
> [1]  https://druid.apache.org/ 
> [2]  https://druid.apache.org/druid-powered
> 
>
> Best,
> xiao dao
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>


Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-09-17 Thread Qi Luo
Fantastic! We're also very interested in this feature.

+Boxiu

On Tue, Sep 17, 2019 at 11:31 AM Danny Chan  wrote:

> In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar in
> our new module flink-sql-parser. And we proposed to use computed column to
> describe the time attribute of process time in the design doc FLINK SQL
> DDL, so user may create a table with process time attribute as follows:
> create table T1(
>   a int,
>   b bigint,
>   c varchar,
>   d as PROCTIME,
> ) with (
>   'k1' = 'v1',
>   'k2' = 'v2'
> );
>
> The column d would be a process time attribute for table T1.
>
> Besides that, computed  columns have several other use cases, such as
> these [2]:
>
>
> • Virtual generated columns can be used as a way to simplify and unify
> queries. A complicated condition can be defined as a generated column and
> referred to from multiple queries on the table to ensure that all of them
> use exactly the same condition.
> • Stored generated columns can be used as a materialized cache for
> complicated conditions that are costly to calculate on the fly.
> • Generated columns can simulate functional indexes: Use a generated
> column to define a functional expression and index it. This can be useful
> for working with columns of types that cannot be indexed directly, such as
> JSON columns.
> • For stored generated columns, the disadvantage of this approach is that
> values are stored twice; once as the value of the generated column and once
> in the index.
> • If a generated column is indexed, the optimizer recognizes query
> expressions that match the column definition and uses indexes from the
> column as appropriate during query execution(Not supported yet).
>
>
>
> Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6 [2] and
> ORACLE-11g [3].
>
> This is the design doc:
>
> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
>
> Any suggestions are appreciated, thanks.
>
> [1]
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016
> [2]
> https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
> [3] https://oracle-base.com/articles/11g/virtual-columns-11gr1
>
> Best,
> Danny Chan
>


Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-04 Thread Qi Luo
We also see strong demands from our SQL users for JSON/Date related
functions.

Also +Anyang Hu 

On Wed, Sep 4, 2019 at 9:51 PM Jark Wu  wrote:

> Hi Forward,
>
> Thanks for bringing this discussion and preparing the nice design.
> I think it's nice to have the JSON functions in the next release.
> We have received some requirements for this feature.
>
> I can help to shepherd this JSON functions effort and will leave comments
>  in the design doc in the next days.
>
> Hi Danny,
>
> The new introduced JSON functions are from SQL:2016, not from MySQL.
> So there no JSON type is needed. According to the SQL:2016, the
> representation of JSON data can be "character string" which is also
> the current implementation in Calcite[1].
>
> Best,
> Jark
>
>
> [1]: https://calcite.apache.org/docs/reference.html#json-functions
>
>
> On Wed, 4 Sep 2019 at 21:22, Xu Forward  wrote:
>
> > hi Danny Chan ,Thank you very much for your reply, your help can help me
> > further improve this discussion.
> > Best
> > forward
> >
> > Danny Chan  于2019年9月4日周三 下午8:50写道:
> >
> > > Thanks Xu Forward for bring up this topic, I think the JSON functions
> are
> > > very useful especially for those MySQL users.
> > >
> > > I saw that you have done some work within the Apache Calcite, that’s a
> > > good start, but this is one concern from me, Flink doesn’t support JSON
> > > type internal, so how to represent a JSON object in Flink maybe a key
> > point
> > > we need to resolve. In Calcite, we use ANY type to represent as the
> JSON,
> > > but I don’t think it is the right way to go, maybe we can have a
> > discussion
> > > here.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年9月4日 +0800 PM8:34,Xu Forward ,写道:
> > > > Hi everybody,
> > > >
> > > > I'd like to kick off a discussion on Support JSON functions in Flink
> > SQL.
> > > >
> > > > The entire plan is divided into two steps:
> > > > 1. Implement Support SQL 2016-2017 JSON functions in Flink SQL[1].
> > > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink SQL,
> > such
> > > as
> > > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > > >
> > > > Would love to hear your thoughts.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > > >
> > > > Best,
> > > > ForwardXu
> > >
> >
>


Re: Flink SQL - Support Computed Columns in DDL?

2019-09-03 Thread Qi Luo
Hi Jark and Danny,

Glad to hear your plan on this!

One of our use cases is to define some column as rowtime (which is not of
type timestamp). Computed column seems to be a natural fit for that.

Thanks,
Qi

On Tue, Sep 3, 2019 at 7:46 PM Jark Wu  wrote:

> Hi Qi,
>
> The computed column is not fully supported in 1.9. We will start a design
> discussion in the dev mailing list soon. Please stay tuned!
>
> Btw, could you share with us what's the case why do you want to use
> computed column?
>
> Best,
> Jark
>
> On Tue, 3 Sep 2019 at 19:25, Danny Chan  wrote:
>
> > Yeah, we are planning to implement this feature in release-1.10, wait for
> > our good news !
> >
> > Best,
> > Danny Chan
> > 在 2019年9月3日 +0800 PM6:19,Qi Luo ,写道:
> > > Hi folks,
> > >
> > > Computed columns in Flink SQL DDL is currently disabled in both old
> > planner
> > > and Blink planner (throws "Computed columns for DDL is not supported
> > yet!"
> > > exception in SqlToOperationConverter).
> > >
> > > I searched through the JIRA but found no relevant issues. Do we have
> any
> > > plans to support this nice feature?
> > >
> > > Thanks,
> > > Qi
> >
>


Flink SQL - Support Computed Columns in DDL?

2019-09-03 Thread Qi Luo
Hi folks,

Computed columns in Flink SQL DDL is currently disabled in both old planner
and Blink planner (throws "Computed columns for DDL is not supported yet!"
exception in SqlToOperationConverter).

I searched through the JIRA but found no relevant issues. Do we have any
plans to support this nice feature?

Thanks,
Qi


Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread qi luo
Congratulations and thanks for the hard work!

Qi

> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai  wrote:
> 
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.9.0, which is the latest major release.
> 
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
> 
> Please check out the release blog post for an overview of the improvements 
> for this new major release:
> https://flink.apache.org/news/2019/08/22/release-1.9.0.html 
> 
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>  
> 
> 
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> 
> Cheers,
> Gordon



Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-01 Thread qi luo
Agree that using Optional will improve code robustness. However we’re 
hesitating to use Optional in data intensive operations.

For example, SingleInputGate is already creating Optional for every 
BufferOrEvent in getNextBufferOrEvent(). How much performance gain would we get 
if it’s replaced by null check?

Regards,
Qi

> On Aug 1, 2019, at 11:00 PM, Andrey Zagrebin  wrote:
> 
> Hi all,
> 
> This is the next follow up discussion about suggestions for the recent
> thread about code style guide in Flink [1].
> 
> In general, one could argue that any variable, which is nullable, can be
> replaced by wrapping it with Optional to explicitly show that it can be
> null. Examples are:
> 
>   - returned values to force user to check not null
>   - optional function arguments, e.g. with implicit default values
>   - even class fields as e.g. optional config options with implicit
>   default values
> 
> 
> At the same time, we also have @Nullable annotation to express this
> intention.
> 
> Also, when the class Optional was introduced, Oracle posted a guideline
> about its usage [2]. Basically, it suggests to use it mostly in APIs for
> returned values to inform and force users to check the returned value
> instead of returning null and avoid NullPointerException.
> 
> Wrapping with Optional also comes with the performance overhead.
> 
> Following the Oracle's guide in general, the suggestion is:
> 
>   - Avoid using Optional in any performance critical code
>   - Use Optional only to return nullable values in the API/public methods
>   unless it is performance critical then rather use @Nullable
>   - Passing an Optional argument to a method can be allowed if it is
>   within a private helper method and simplifies the code, example is in [3]
>   - Optional should not be used for class fields
> 
> 
> Please, feel free to share you thoughts.
> 
> Best,
> Andrey
> 
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> [2]
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> [3]
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95



Re: [ANNOUNCE] Flink 1.9 release branch has been created

2019-07-12 Thread qi luo
Great work!

> On Jul 12, 2019, at 2:58 PM, Hequn Cheng  wrote:
> 
> Hi Kurt,
> 
> Great work and thanks for the update.
> 
> FYI, I find a bug[1] on Table API accidentally. The PR has already been
> opened. I think it would be good if the fix can be included in the 1.9.
> 
> Best,
> Hequn
> 
> [1] https://issues.apache.org/jira/browse/FLINK-13196
> 
> On Fri, Jul 12, 2019 at 2:40 PM Kurt Young  wrote:
> 
>> Hi devs,
>> 
>> I just created the branch for the Flink 1.9 release [1] and updated the
>> version on master to 1.10-SNAPSHOT. This unblocks the master from
>> merging new features into it.
>> 
>> If you are working on a 1.9 relevant bug fix, then it is important to merge
>> it into the release-1.9 and master branch.
>> 
>> I’ll create a first release candidate shortly, stay tuned!
>> 
>> Best,
>> Kurt
>> 
>> [1] https://github.com/apache/flink/tree/release-1.9
>> 



Re: [ANNOUNCE] Feature freeze for Apache Flink 1.9.0 release

2019-07-11 Thread qi luo
No worries. Glad to know the progress :-)

> On Jul 12, 2019, at 11:53 AM, Kurt Young  wrote:
> 
> Hi qi,
> 
> We are about to cut branch, the announce mail will be sent after that.
> Sorry for the slight delay.
> 
> Best,
> Kurt
> 
> 
> On Fri, Jul 12, 2019 at 11:43 AM qi luo  wrote:
> 
>> Do we have a new timeline for 1.9 branch cut?
>> 
>> Thanks,
>> Qi
>> 
>>> On Jul 12, 2019, at 11:42 AM, qi luo  wrote:
>>> 
>>> Any news on this?
>>> 
>>> Thanks,
>>> Qi
>>> 
>>>> On Jul 11, 2019, at 11:13 PM, Stephan Ewen  wrote:
>>>> 
>>>> Number (6) is not a feature but a bug fix, so no need to block on
>> that...
>>>> 
>>>> On Thu, Jul 11, 2019 at 4:27 PM Kurt Young  wrote:
>>>> 
>>>>> Hi Chesnay,
>>>>> 
>>>>> Here is the JIRA list I have collected, all of them are under
>> reviewing:
>>>>> 
>>>>> 1. Hive UDF support (FLINK-13024, FLINK-13225)
>>>>> 2. Partition prune support (FLINK-13115)
>>>>> 3. Set StreamGraph properties in blink planner (FLINK-13121)
>>>>> 4. Support HBase upsert sink (FLINK-10245)
>>>>> 5. Support JDBC TableFactory (FLINK-13118)
>>>>> 6. Fix the bug of throwing IOException while
>> FileBufferReader#nextBuffer
>>>>> (FLINK-13110)
>>>>> 7. Bookkeeping of available resources of allocated slots in SlotPool
>>>>> (FLINK-12765)
>>>>> 8. Introduce ScheduleMode#LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST
>>>>> (FLINK-13187)
>>>>> 9. Add support for batch slot requests to SlotPoolImpl (FLINK-13166)
>>>>> 10. Complete slot requests in request order (FLINK-13165)
>>>>> 
>>>>> Best,
>>>>> Kurt
>>>>> 
>>>>> 
>>>>> On Thu, Jul 11, 2019 at 6:12 PM Chesnay Schepler 
>>>>> wrote:
>>>>> 
>>>>>> Can we get JIRA's attached to these items so people out of the loop
>> can
>>>>>> track the progress?
>>>>>> 
>>>>>> On 05/07/2019 16:06, Kurt Young wrote:
>>>>>>> Here are the features I collected which are under actively developing
>>>>> and
>>>>>>> close
>>>>>>> to merge:
>>>>>>> 
>>>>>>> 1. Bridge blink planner to unified table environment and remove
>>>>>> TableConfig
>>>>>>> from blink planner
>>>>>>> 2. Support timestamp with local time zone and partition pruning in
>>>>> blink
>>>>>>> planner
>>>>>>> 3. Support JDBC & HBase lookup function and upsert sink
>>>>>>> 4. StreamExecutionEnvironment supports executing job with
>> StreamGraph,
>>>>>> and
>>>>>>> blink planner should set proper properties to StreamGraph
>>>>>>> 5. Set resource profiles to task and enable managed memory as
>> resource
>>>>>>> profile
>>>>>>> 
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jul 5, 2019 at 9:37 PM Kurt Young  wrote:
>>>>>>> 
>>>>>>>> Hi devs,
>>>>>>>> 
>>>>>>>> It's July 5 now and we should announce feature freeze and cut the
>>>>> branch
>>>>>>>> as planned. However, some components seems still not ready yet and
>>>>>>>> various features are still under development or review.
>>>>>>>> 
>>>>>>>> But we also can not extend the freeze day again which will further
>>>>> delay
>>>>>>>> the
>>>>>>>> release date. I think freeze new features today and have another
>>>>> couple
>>>>>>>> of buffer days, letting features which are almost ready have a
>> chance
>>>>> to
>>>>>>>> get in is a reasonable solution.
>>>>>>>> 
>>>>>>>> I hereby announce features of Flink 1.9.0 are freezed, *July 11*
>> will
>>>>> be
>>>>>>>> the
>>>>>>>> day for cutting branch.  Since the feature freeze has effectively
>> took
>>

Re: [ANNOUNCE] Feature freeze for Apache Flink 1.9.0 release

2019-07-11 Thread qi luo
Do we have a new timeline for 1.9 branch cut?

Thanks,
Qi

> On Jul 12, 2019, at 11:42 AM, qi luo  wrote:
> 
> Any news on this?
> 
> Thanks,
> Qi
> 
>> On Jul 11, 2019, at 11:13 PM, Stephan Ewen  wrote:
>> 
>> Number (6) is not a feature but a bug fix, so no need to block on that...
>> 
>> On Thu, Jul 11, 2019 at 4:27 PM Kurt Young  wrote:
>> 
>>> Hi Chesnay,
>>> 
>>> Here is the JIRA list I have collected, all of them are under reviewing:
>>> 
>>> 1. Hive UDF support (FLINK-13024, FLINK-13225)
>>> 2. Partition prune support (FLINK-13115)
>>> 3. Set StreamGraph properties in blink planner (FLINK-13121)
>>> 4. Support HBase upsert sink (FLINK-10245)
>>> 5. Support JDBC TableFactory (FLINK-13118)
>>> 6. Fix the bug of throwing IOException while FileBufferReader#nextBuffer
>>> (FLINK-13110)
>>> 7. Bookkeeping of available resources of allocated slots in SlotPool
>>> (FLINK-12765)
>>> 8. Introduce ScheduleMode#LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST
>>> (FLINK-13187)
>>> 9. Add support for batch slot requests to SlotPoolImpl (FLINK-13166)
>>> 10. Complete slot requests in request order (FLINK-13165)
>>> 
>>> Best,
>>> Kurt
>>> 
>>> 
>>> On Thu, Jul 11, 2019 at 6:12 PM Chesnay Schepler 
>>> wrote:
>>> 
>>>> Can we get JIRA's attached to these items so people out of the loop can
>>>> track the progress?
>>>> 
>>>> On 05/07/2019 16:06, Kurt Young wrote:
>>>>> Here are the features I collected which are under actively developing
>>> and
>>>>> close
>>>>> to merge:
>>>>> 
>>>>> 1. Bridge blink planner to unified table environment and remove
>>>> TableConfig
>>>>> from blink planner
>>>>> 2. Support timestamp with local time zone and partition pruning in
>>> blink
>>>>> planner
>>>>> 3. Support JDBC & HBase lookup function and upsert sink
>>>>> 4. StreamExecutionEnvironment supports executing job with StreamGraph,
>>>> and
>>>>> blink planner should set proper properties to StreamGraph
>>>>> 5. Set resource profiles to task and enable managed memory as resource
>>>>> profile
>>>>> 
>>>>> Best,
>>>>> Kurt
>>>>> 
>>>>> 
>>>>> On Fri, Jul 5, 2019 at 9:37 PM Kurt Young  wrote:
>>>>> 
>>>>>> Hi devs,
>>>>>> 
>>>>>> It's July 5 now and we should announce feature freeze and cut the
>>> branch
>>>>>> as planned. However, some components seems still not ready yet and
>>>>>> various features are still under development or review.
>>>>>> 
>>>>>> But we also can not extend the freeze day again which will further
>>> delay
>>>>>> the
>>>>>> release date. I think freeze new features today and have another
>>> couple
>>>>>> of buffer days, letting features which are almost ready have a chance
>>> to
>>>>>> get in is a reasonable solution.
>>>>>> 
>>>>>> I hereby announce features of Flink 1.9.0 are freezed, *July 11* will
>>> be
>>>>>> the
>>>>>> day for cutting branch.  Since the feature freeze has effectively took
>>>>>> place,
>>>>>> I kindly ask committers to refrain from merging features that are
>>>> planned
>>>>>> for
>>>>>> future releases into the master branch for the time being before the
>>> 1.9
>>>>>> branch
>>>>>> is cut. We understand this might be a bit inconvenient, thanks for the
>>>>>> cooperation here.
>>>>>> 
>>>>>> Best,
>>>>>> Kurt
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 5, 2019 at 5:19 PM 罗齐  wrote:
>>>>>> 
>>>>>>> Hi Gordon,
>>>>>>> 
>>>>>>> Will branch 1.9 be cut out today? We're really looking forward to the
>>>>>>> blink features in 1.9.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Qi
>>>>>>> 
>>>>>>> On Wed, Jun 26, 2019 at 7:18 PM Tzu-Li 

Re: [ANNOUNCE] Feature freeze for Apache Flink 1.9.0 release

2019-07-11 Thread qi luo
Any news on this?

Thanks,
Qi

> On Jul 11, 2019, at 11:13 PM, Stephan Ewen  wrote:
> 
> Number (6) is not a feature but a bug fix, so no need to block on that...
> 
> On Thu, Jul 11, 2019 at 4:27 PM Kurt Young  wrote:
> 
>> Hi Chesnay,
>> 
>> Here is the JIRA list I have collected, all of them are under reviewing:
>> 
>> 1. Hive UDF support (FLINK-13024, FLINK-13225)
>> 2. Partition prune support (FLINK-13115)
>> 3. Set StreamGraph properties in blink planner (FLINK-13121)
>> 4. Support HBase upsert sink (FLINK-10245)
>> 5. Support JDBC TableFactory (FLINK-13118)
>> 6. Fix the bug of throwing IOException while FileBufferReader#nextBuffer
>> (FLINK-13110)
>> 7. Bookkeeping of available resources of allocated slots in SlotPool
>> (FLINK-12765)
>> 8. Introduce ScheduleMode#LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST
>> (FLINK-13187)
>> 9. Add support for batch slot requests to SlotPoolImpl (FLINK-13166)
>> 10. Complete slot requests in request order (FLINK-13165)
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Thu, Jul 11, 2019 at 6:12 PM Chesnay Schepler 
>> wrote:
>> 
>>> Can we get JIRA's attached to these items so people out of the loop can
>>> track the progress?
>>> 
>>> On 05/07/2019 16:06, Kurt Young wrote:
 Here are the features I collected which are under actively developing
>> and
 close
 to merge:
 
 1. Bridge blink planner to unified table environment and remove
>>> TableConfig
 from blink planner
 2. Support timestamp with local time zone and partition pruning in
>> blink
 planner
 3. Support JDBC & HBase lookup function and upsert sink
 4. StreamExecutionEnvironment supports executing job with StreamGraph,
>>> and
 blink planner should set proper properties to StreamGraph
 5. Set resource profiles to task and enable managed memory as resource
 profile
 
 Best,
 Kurt
 
 
 On Fri, Jul 5, 2019 at 9:37 PM Kurt Young  wrote:
 
> Hi devs,
> 
> It's July 5 now and we should announce feature freeze and cut the
>> branch
> as planned. However, some components seems still not ready yet and
> various features are still under development or review.
> 
> But we also can not extend the freeze day again which will further
>> delay
> the
> release date. I think freeze new features today and have another
>> couple
> of buffer days, letting features which are almost ready have a chance
>> to
> get in is a reasonable solution.
> 
> I hereby announce features of Flink 1.9.0 are freezed, *July 11* will
>> be
> the
> day for cutting branch.  Since the feature freeze has effectively took
> place,
> I kindly ask committers to refrain from merging features that are
>>> planned
> for
> future releases into the master branch for the time being before the
>> 1.9
> branch
> is cut. We understand this might be a bit inconvenient, thanks for the
> cooperation here.
> 
> Best,
> Kurt
> 
> 
> On Fri, Jul 5, 2019 at 5:19 PM 罗齐  wrote:
> 
>> Hi Gordon,
>> 
>> Will branch 1.9 be cut out today? We're really looking forward to the
>> blink features in 1.9.
>> 
>> Thanks,
>> Qi
>> 
>> On Wed, Jun 26, 2019 at 7:18 PM Tzu-Li (Gordon) Tai <
>>> tzuli...@apache.org>
>> wrote:
>> 
>>> Thanks for the updates so far everyone!
>>> 
>>> Since support for the new Blink-based Table / SQL runner and
>>> fine-grained
>>> recovery are quite prominent features for 1.9.0,
>>> and developers involved in these topics have already expressed that
>>> these
>>> could make good use for another week,
>>> I think it definitely makes sense to postpone the feature freeze.
>>> 
>>> The new date for feature freeze and feature branch cut for 1.9.0
>> will
>>> be
>>> *July
>>> 5*.
>>> 
>>> Please update on this thread if there are any further concerns!
>>> 
>>> Cheers,
>>> Gordon
>>> 
>>> On Tue, Jun 25, 2019 at 9:05 PM Chesnay Schepler <
>> ches...@apache.org>
>>> wrote:
>>> 
 On the fine-grained recovery / batch scheduling side we could make
>>> good
 use of another week.
 Currently we are on track to have the _feature_ merged, but without
 having done a great deal of end-to-end testing.
 
 On 25/06/2019 15:01, Kurt Young wrote:
> Hi Aljoscha,
> 
> I also feel an additional week can make the remaining work more
>>> easy. At
> least
> we don't have to check in lots of commits in both branches
>> (master &
> release-1.9).
> 
> Best,
> Kurt
> 
> 
> On Tue, Jun 25, 2019 at 8:27 PM Aljoscha Krettek <
>>> aljos...@apache.org>
> wrote:
> 
>> A few threads are converging around supporting the new
>> Blink-based
>>> Table
>> API Runner/Planner. I think hitting the currently proposed
>

Re: [DISCUSS] Proposal of external shuffle service

2019-01-29 Thread qi luo
Very clear. Thanks!

> On Jan 28, 2019, at 10:29 PM, zhijiang  wrote:
> 
> Hi Qi,
> 
> Thanks for the concerns of this proposal. In Blink we implemented the 
> YarnShuffleService which is mainly used for batch jobs in production and some 
> benchmark before. This YarnShuffleService is not within the current proposed 
> ShuffleManager interface and there is also no ShuffleMaster component in JM 
> side. You can regard that as a simple and special implementation version. And 
> the YarnShuffleService can further be refactored within this proposed shuffle 
> manager architecture. 
> 
> Best,
> Zhijiang
> 
> --
> From:qi luo 
> Send Time:2019年1月28日(星期一) 20:55
> To:dev ; zhijiang 
> Cc:Till Rohrmann ; Andrey Zagrebin 
> 
> Subject:Re: [DISCUSS] Proposal of external shuffle service
> 
> Hi Zhijiang,
> 
> I see there’s a YarnShuffleService in newly released Blink branch. Is there 
> any relationship between that YarnShuffleService and  your external shuffle 
> service?
> 
> Regards,
> Qi
> 
> > On Jan 28, 2019, at 8:07 PM, zhijiang  
> > wrote:
> > 
> > Hi till,
> > 
> > Very glad to receive your feedbacks and it is atually very helpful.
> > 
> > The proposed ShuffleMaster in JM would be involved in many existing 
> > processes, such as task deployment, task failover, TM release, so it might 
> > be interactive with corresponding Scheduler, FailoverStrategy, SlotPool 
> > components. In the first version we try to focus on deploying process which 
> > is described in detail in the FLIP. Concerning the other improvements based 
> > on the proposed architecuture, we just mentioned the basic ideas and have 
> > not given the whole detail process. But I think it is reasonable and 
> > natural to solve these issues based on that. And we would further give more 
> > details for other future steps.
> > 
> > I totally agree with your thought of handling TM release. Currently once 
> > the task is finished, the corresponding slot is regared as free no matter 
> > whether the produced partition is consumed or not. Actually we could think 
> > both task and its partitionsoccupy resources in slot. So the slot can be 
> > regared as free until the internal partition is consumed and released. Then 
> > the TM release logic is also improved meanwhile. I think your suggestions 
> > below already gives the detail and specific process for this improvement.
> > 
> > I am in favor of launching a separate thread for this discussion again, 
> > thanks for the advice!
> > 
> > Best,
> > Zhijiang
> > 
> > 
> > --
> > From:Till Rohrmann 
> > Send Time:2019年1月28日(星期一) 19:14
> > To:dev ; zhijiang 
> > Cc:Andrey Zagrebin 
> > Subject:Re: [DISCUSS] Proposal of external shuffle service
> > 
> > Thanks for creating the FLIP-31 for the external shuffle service Zhijiang. 
> > It looks good to me. 
> > 
> > One thing which is not fully clear to me yet is how the lifecycle 
> > management of the partitions integrates with the slot management. At the 
> > moment, conceptually we consider the partition data being owned by the TM 
> > if I understood it correctly. This means the ShuffleMaster is asked whether 
> > a TM can be freed. However, the JobMaster only thinks in terms of slots and 
> > not TMs. Thus, the logic would be that the JM asks the ShuffleMaster 
> > whether it can return a certain slot. Atm the freeing of slots is done by 
> > the `SlotPool` and, thus this would couple the `SlotPool` and the 
> > `ShuffleMaster`. Maybe we need to introduce some mechanism to signal when a 
> > slot has still some occupied resources. In the shared slot case, one could 
> > think of allocating a dummy slot in the shared slot which we only release 
> > after the partition data has been consumed.
> > 
> > In order to give this design document a little bit more visibility, I would 
> > suggest to post it again on the dev mailing list in a separate thread under 
> > the title "[DISCUSS] Flip-31: Pluggable Shuffle Manager" or something like 
> > this.
> > 
> > Cheers,
> > Till
> > On Mon, Jan 21, 2019 at 7:05 AM zhijiang 
> >  wrote:
> > Hi all,
> > 
> > FYI, I created the FLIP-31 under [1] for this proposal and created some 
> > subtasks under umbrella jira [2].
> > Welcome any concerns in previous google doc or speific jiras.
> > 
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
> > [2] https://issues.apache.org/jira/browse/FLINK-10653
> > 
> > Best,
> > Zhijiang
> > --
> > From:zhijiang 
> > Send Time:2019年1月15日(星期二) 17:55
> > To:Andrey Zagrebin 
> > Cc:dev 
> > Subject:Re: [DISCUSS] Proposal of external shuffle service
> > 
> > Hi all,
> > 
> > After continuous discussion with Andrey offline, we already reach an 
> > agreement for this proposal and co-author the latest google doc under [1].
> > 
> > We plan to creat

Re: [DISCUSS] Proposal of external shuffle service

2019-01-28 Thread qi luo
Hi Zhijiang,

I see there’s a YarnShuffleService in newly released Blink branch. Is there any 
relationship between that YarnShuffleService and  your external shuffle service?

Regards,
Qi

> On Jan 28, 2019, at 8:07 PM, zhijiang  
> wrote:
> 
> Hi till,
> 
> Very glad to receive your feedbacks and it is atually very helpful.
> 
> The proposed ShuffleMaster in JM would be involved in many existing 
> processes, such as task deployment, task failover, TM release, so it might be 
> interactive with corresponding Scheduler, FailoverStrategy, SlotPool 
> components. In the first version we try to focus on deploying process which 
> is described in detail in the FLIP. Concerning the other improvements based 
> on the proposed architecuture, we just mentioned the basic ideas and have not 
> given the whole detail process. But I think it is reasonable and natural to 
> solve these issues based on that. And we would further give more details for 
> other future steps.
> 
> I totally agree with your thought of handling TM release. Currently once the 
> task is finished, the corresponding slot is regared as free no matter whether 
> the produced partition is consumed or not. Actually we could think both task 
> and its partitionsoccupy resources in slot. So the slot can be regared as 
> free until the internal partition is consumed and released. Then the TM 
> release logic is also improved meanwhile. I think your suggestions below 
> already gives the detail and specific process for this improvement.
> 
> I am in favor of launching a separate thread for this discussion again, 
> thanks for the advice!
> 
> Best,
> Zhijiang
> 
> 
> --
> From:Till Rohrmann 
> Send Time:2019年1月28日(星期一) 19:14
> To:dev ; zhijiang 
> Cc:Andrey Zagrebin 
> Subject:Re: [DISCUSS] Proposal of external shuffle service
> 
> Thanks for creating the FLIP-31 for the external shuffle service Zhijiang. It 
> looks good to me. 
> 
> One thing which is not fully clear to me yet is how the lifecycle management 
> of the partitions integrates with the slot management. At the moment, 
> conceptually we consider the partition data being owned by the TM if I 
> understood it correctly. This means the ShuffleMaster is asked whether a TM 
> can be freed. However, the JobMaster only thinks in terms of slots and not 
> TMs. Thus, the logic would be that the JM asks the ShuffleMaster whether it 
> can return a certain slot. Atm the freeing of slots is done by the `SlotPool` 
> and, thus this would couple the `SlotPool` and the `ShuffleMaster`. Maybe we 
> need to introduce some mechanism to signal when a slot has still some 
> occupied resources. In the shared slot case, one could think of allocating a 
> dummy slot in the shared slot which we only release after the partition data 
> has been consumed.
> 
> In order to give this design document a little bit more visibility, I would 
> suggest to post it again on the dev mailing list in a separate thread under 
> the title "[DISCUSS] Flip-31: Pluggable Shuffle Manager" or something like 
> this.
> 
> Cheers,
> Till
> On Mon, Jan 21, 2019 at 7:05 AM zhijiang  
> wrote:
> Hi all,
> 
> FYI, I created the FLIP-31 under [1] for this proposal and created some 
> subtasks under umbrella jira [2].
> Welcome any concerns in previous google doc or speific jiras.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
> [2] https://issues.apache.org/jira/browse/FLINK-10653
> 
> Best,
> Zhijiang
> --
> From:zhijiang 
> Send Time:2019年1月15日(星期二) 17:55
> To:Andrey Zagrebin 
> Cc:dev 
> Subject:Re: [DISCUSS] Proposal of external shuffle service
> 
> Hi all,
> 
> After continuous discussion with Andrey offline, we already reach an 
> agreement for this proposal and co-author the latest google doc under [1].
> 
> We plan to create FLIP and sub-tasks by the end of this week, and the first 
> MVP wishes to be covered in FLINK 1.8.
> 
> Welcome any feedbacks and suggestions! :)
> 
> [1] 
> https://docs.google.com/document/d/1l7yIVNH3HATP4BnjEOZFkO2CaHf1sVn_DSxS2llmkd8/edit?usp=sharing
> 
> Best,
> Zhijiang
> 
> 
> --
> From:zhijiang 
> Send Time:2018年12月25日(星期二) 15:33
> To:Andrey Zagrebin 
> Cc:dev 
> Subject:Re: [DISCUSS] Proposal of external shuffle service
> 
> Hi Andrey,
> 
> Thanks for efficient response for the UnknownShuffleDeploymentDescriptor 
> issue.
> 
> It is reasonable for considering this special case on both ShuffleMaster and 
> ShuffleService sides.
> On upstream ShuffleService side, the created ResultPartitionWriter decides 
> whether to notify ShuffleMaster of consumable partition when outputs the 
> first buffer or finishes.
> On ShuffleMaster side, it might define a method in ShuffleMaster interface 
> for handling this notification message from upstream side, and then 
> interna

Re: [DISCUSS] Bot for stale PRs on GitHub

2019-01-11 Thread qi luo
+1 for the stable bot, as it will help bring valuable PR out to be reviewed.

> On Jan 11, 2019, at 6:26 PM, Driesprong, Fokko  wrote:
> 
> +1 I'm in favor of the Stale bot.
> 
> We use the Stalebot at Apache Airflow as well, and it really helps smoothen
> the reviewing process. Keep in mind that the number of PR's processed by
> the Stalebot is limited at each run. So you won't get a gazillion
> notifications, but just a few every couple of days. Just enough to prune
> the list of PR's.
> Most of the really old PR's are not relevant anymore, so its good practice
> to close these. If the person who still thinks it is relevant, the PR will
> be revisited and can still be considered merging. Otherwise, the PR will be
> closed by the bot. There is no value in having the old PR's hanging around.
> Having 500 open PR's doesn't look really good at the project in my opinion.
> My suggestion would be to give it a try.
> 
> Cheers, Fokko
> 
> Op do 10 jan. 2019 om 12:45 schreef Chesnay Schepler :
> 
>>> The bot will remind both reviewers and contributors that they have to
>> be active on a PR, I found that useful on some PRs that I had open at Beam
>> 
>> I don't think we really want every contributor bumping their PR
>> regularly. This will create unbearable noise and, if they actually
>> update it, will lead to them wasting a lot of time since we won't
>> suddenly start reviewing it.
>> 
>> On 10.01.2019 12:06, Aljoscha Krettek wrote:
>>> For reference, this is the older staleness discussion:
>> https://lists.apache.org/thread.html/d53bee8431776f38ebaf8f5678b1ffd9513cd65ce15d821bbdca95aa@%3Cdev.flink.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/d53bee8431776f38ebaf8f5678b1ffd9513cd65ce15d821bbdca95aa@%3Cdev.flink.apache.org%3E
>>> 
>>> 
>>> My main arguments for automatic closing of PRs are:
>>> 
>>>  - This will eventually close out old, stale PRs, making the number we
>> see in Github better reflect the actual state
>>>  - The bot will remind both reviewers and contributors that they have
>> to be active on a PR, I found that useful on some PRs that I had open at
>> Beam
>>> 
>>> Aljoscha
>>> 
 On 10. Jan 2019, at 11:21, Chesnay Schepler  wrote:
 
 Without any new argument for doing so, I'm still against it.
 
 On 10.01.2019 09:54, Aljoscha Krettek wrote:
> Hi,
> 
> I know we had similar discussions in the past but I’d like to bring up
>> this topic again.
> 
> What do you think about adding a stale bot (
>> https://probot.github.io/apps/stale/ )
>> to our Github Repo? This would automatically nag about stale PRs and close
>> them after a (configurable) time of inactivity. This would do two things:
> 
> (1) Clean up old PRs that truly are outdated and stale
> (2) Remind both contributor and reviewers about PRs that are still
>> good and are on the verge of getting stale, thus potentially speeding up
>> review or facilitating it in the first place
> 
> Best,
> Aljoscha
 
>>> 
>> 
>> 



Re: [DISCUSS] Creating last bug fix release for 1.5 branch

2018-12-08 Thread qi luo
Hi Till,

Does Flink has an agreement on how long will a major version be supported? Some 
companies may need a long time to upgrade Flink major versions in production. 
If Flink terminates support for a major version too quickly, it may be a 
concern for companies.

Best,
Qi

> On Dec 8, 2018, at 10:57 AM, vino yang  wrote:
> 
> Hi Till,
> 
> I think it makes sense to release a bug fix version (especially some
> serious bug fixes) for flink 1.5.
> Consider that some companies' production environments are more cautious
> about upgrading large versions.
> I think some organizations are still using 1.5.x or even 1.4.x.
> 
> Best,
> Vino
> 
> Till Rohrmann  于2018年12月7日周五 下午11:39写道:
> 
>> Dear community,
>> 
>> I wanted to reach out to you and discuss whether we should release a last
>> bug fix release for the 1.5 branch.
>> 
>> Since we have already released Flink 1.7.0, we only need to support the
>> 1.6.x and 1.7.x branches (last two major releases). However, the current
>> release-1.5 branch contains 45 unreleased fixes. Some of the fixes address
>> serializer duplication problems (FLINK-10839, FLINK-10693), fixing
>> retractions (FLINK-10674) or prevent a deadlock in the
>> SpillableSubpartition (FLINK-10491). I think it would be nice for our users
>> if we officially terminated the Flink 1.5.x support with a last 1.5.6
>> release. What do you think?
>> 
>> Cheers,
>> Till
>> 



Apply for flink contributor permission

2018-11-27 Thread qi luo
Hi Flink team,

Could you pls give me the contributor permission, as I’m working on JIRA issue 
[1]. My Jira ID is *Qi* (luoqi...@gmail.com )

[1] https://issues.apache.org/jira/browse/FLINK-10941 


Thanks,
Qi