Batch reading from Cassandra

2020-02-23 Thread Lasse Nedergaard
Hi.

We would like to do some batch analytics on our data set stored in Cassandra 
and are looking for an efficient way to load data from a single table. Not by 
key, but random 15%, 50% or 100% 
Data bricks has create an efficient way to load Cassandra data into Apache 
Spark and they are doing it by reading from the underlying SS tables to load in 
parallel. 
Do we have something similarly in Flink, or how is the most efficient way to 
load all, or many random data from a single Cassandra table into Flink? 

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard

CfP: Workshop on Large Scale RDF Analytics (LASCAR-20) at ESWC'20

2020-02-23 Thread Hajira Jabeen

We apologize for cross-postings.
We appreciate your great help in forwarding this CFP to your
colleagues and friends.


Call for Papers & Posters for the 2nd Workshop on Large Scale RDF
Analytics (LASCAR-20), collocated with the Extended Semantic Web
Conference (ESWC) 2020

Venue & Dates:
Heraklion, Crete, Greece, May 31st, 2020

Workshop Website:http://lascar.sda.tech

[Topics of interest]

LASCAR-20 seeks original articles and posters describing theoretical
and practical methods as well as techniques for performing scalable
analytics on knowledge graphs. All papers must be original and not
simultaneously submitted to another journal or conference. The
following paper and poster categories are welcome:

  * Decentralized KG data management including parsing, compression,
partitioning and smart indexing
  * Large scale KG enrichment using link prediction, entity
resolution, entity disambiguation or similarity estimation
  * Machine Learning e.g. clustering, blocking, or anomaly detection
  * Complex analytics with distributed KG embeddings
  * Connecting property Graphs with RDF and reasoning
  * Use-cases presenting semantic technologies at industry scale

[Important Dates]

Electronic submission of full papers:   February 28th, 2020
Notification of paper acceptance:   March 27th, 2020
Camera-ready of accepted papers:April 10th, 2020
Workshop day:   May 31st, 2020

[Submission Guidelines]

All papers should be formatted according to the standard LNCS Style.
All papers will be peer reviewed using the single-blind approach.
Authors of the accepted papers will be asked to register for the
workshop and will have the opportunity to present and participate in
the workshop. Long papers should not be longer than 10 pages including
the references, and short papers should not exceed 6 pages, including
all references. The accepted papers will be published online in CEUR
Workshop Proceedings (CEUR-WS.org). Proceedings will be available for
download after the conference. The pre-print will be made available
during the conference. The authors of the accepted posters will be
invited to present their posters at the workshop.

Please submit your work via EasyChair
(https://www.easychair.org/conferences/?conf=lascar20)

Dr.  Hajira Jabeen
Senior researcher,
SDA, Universität Bonn.

http://sda.cs.uni-bonn.de/people/dr-hajira-jabeen/


[ANNOUNCE] Weekly Community Update 2020/07

2020-02-23 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with updates on the next
release cycle, a set of proposal for Flink's web user interface, a couple
of discussions around our development process and a bit more.

Flink Development
==

* [releases] Stephan proposes an "anticipated feature freeze date" for
Flink 1.11 around the end of April, and hence a release in May. This would
make the next release a short one, which seems to be generally well
received. Piotr and Zhijiang will be our release managers for Flink 1.11.
[1]

* [releases] flink-shaded 10.0 has been released. [2]

* [web ui] Yadong has split up the improvement proposal to Flink's web user
interface (FLIP-75) into multiple votes. Each thread contains a live demo
of the proposed feature.
  * [FLIP-98]: Highlight backpressure task in job graph [3]
  * [FLIP-99]: Make maximally shown exceptions configurable [4]
  * [FLIP-100] Show the attempt history of tasks [5]
  * [FLIP-101] Expose Information on Pending Slots in the Web User
Interface [6]
  * [FLIP-102] More Taskmanager metrics, in particular memory resources [7]
  * [FLIP-103] List all log files and make them downloadable [8]
  * [FLIP-104] More Flink Master metrics, in particular memory resources [9]

* [development process] Hequn Cheng has started discussion on improving the
FLIP process, in particular the way we deal with discussion on the wiki,
mailing list and Google Docs. The discussion is ongoing, but there seems to
be consensus not to use Google Docs more than right now, but to maybe even
eliminate it from the process all together. In the discussion David raised
a related point, that the FLIP document is often not updated after its
implementation and changes between the original plan and the final
implementation are not documented anywhere. [10]

* [development process] Xingtong proposes to update the PR description
template as it is quite outdated. The discussion shows that the template is
generally considered useful, but it indeed needs an update. [11]

* [deployment] Aljoscha has started a discussion to either drop or extend
the support for Flink's windows scripts. [12]

* [connectors] The connector for ElasticSearch 2.x will be dropped in Flink
1.11. Elastic Search 5.x will still be supported in Flink 1.11 and
revisited for Flink 1.12. [13]

* [state] Stephan proposes to drop savepoint compatibility with Flink 1.2.
A stateful upgrade from Flink 1.2 to Flink 1.11 would still be possible by
first upgrading to an intermediate Flink version. [14]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Kicking-off-the-1-11-release-cycle-tp37817.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-shaded-10-0-released-tp37815.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-98-Better-Back-Pressure-Detection-tp37893.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-99-Make-Max-Exception-Configurable-tp37895.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-100-Add-Attempt-Information-tp37896.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-101-Add-Pending-Slots-Detail-tp37897.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-tp37898.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-103-Better-TM-JM-Log-Display-tp37899.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-104-Add-More-Metrics-to-Jobmanager-tp37901.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvements-on-FLIP-Process-tp37785.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Update-the-pull-request-description-template-tp37755.html
[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Extend-or-maintain-shell-script-support-for-Windows-tp37868.html
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-connectors-for-Elasticsearch-2-x-and-5-x-tp37471.html
[14]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Savepoint-Compatibility-with-Flink-1-2-tp37908.html

Notable Bugs
==

* [FLINK-16111] [1.10] Flink's Kubernetes deployment does not respect the
"taskmanager.cpu.cores" configuration [15]
* [FLINK-16115] [1.10] The OSS (Alibaba Cloud's Object Storage Service)
filesystem does not work as a plugin. [16]

[15] https://issues.apache.org/jira/browse/FLINK-16111
[16] https://issues.apache.org/jira/browse/FLINK-16115

Events, Blog Posts, Misc
===

* Jingsong Lee is now an Apache Flink committer. Congratulations! [17]

* Seth has published a post on the Apache Flink blog on Flink SQL DDL: "No
Java Required: Configured Sources and Sinks in SQL" [18]

* Upcoming Meetups
* On February 26th, Prateep Kumar will host an online eve

Re: yarn session: one JVM per task

2020-02-23 Thread Xintong Song
Hi David,

In general, I don't think you can control all parallel subtasks of a
certain task run in the same JVM process with the current Flink.

If you job scale is very small, one thing you might try is to have only one
task manager in the Flink session cluster. You need to make sure the task
manager has enough cpu/memory resources and slots for running your job.

Thank you~

Xintong Song



On Sun, Feb 23, 2020 at 1:11 AM David Morin 
wrote:

> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink
> that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>


Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-23 Thread Yang Wang
Hi Singh,

Glad to hear that you are looking to run Flink on the Kubernetes. I am
trying to answer your question based on my limited knowledge and
others could correct me and add some more supplements.

I think the biggest difference between session cluster and per-job cluster
on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
will be started for the only one job and no any other jobs could be
submitted.
Once the job is finished, then the Flink cluster will be
destroyed immediately.
The second point is one-step submission. You do not need to start a Flink
cluster first and then submit a job to the existing session.

> Are there any benefits with regards to
1. Configuring the jobs
No matter you are using the per-job cluster or submitting to the existing
session cluster, they share the configuration mechanism. You do not have
to change any codes and configurations.

2. Scaling the taskmanager
Since you are using the Standalone cluster on Kubernetes, it do not provide
an active resourcemanager. You need to use external tools to monitor and
scale up the taskmanagers. The active integration is still evolving and you
could have a taste[1].

3. Restarting jobs
For the session cluster, you could directly cancel the job and re-submit.
And
for per-job cluster, when the job is canceled, you need to start a new
per-job
cluster from the latest savepoint.

4. Managing the flink jobs
The rest api and flink command line could be used to managing the jobs(e.g.
flink cancel, etc.). I think there is no difference for session and per-job
here.

5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the
config map and then mount into the jobmanager/taskmanager pod, then both
session and per-job could support this way.

6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which
have tasks
on this taskmanager will failed.
Both session and per-job could be configured with high availability and
recover
from the latest checkpoint.

> Is there any need for specifying volume for the pods?
No, you do not need to specify a volume for pod. All the data in the pod
local directory is temporary. When a pod crashed and relaunched, the
taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
from the latest checkpoint.


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html

M Singh  于2020年2月23日周日 上午2:28写道:

> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Zhu Zhu
Congratulations Jingsong!

Thanks,
Zhu Zhu

Fabian Hueske  于2020年2月22日周六 上午1:30写道:

> Congrats Jingsong!
>
> Cheers, Fabian
>
> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong  >:
>
> > Congratulations Jingsong!!
> >
> > Cheers,
> > Rong
> >
> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
> >
> > > Congrats, Jingsong!
> > >
> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
> > > wrote:
> > >
> > >> Congratulations Jingsong!
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao  wrote:
> > >>
> > >>>   Congratulations Jingsong!
> > >>>
> > >>>Best,
> > >>>Yun
> > >>>
> > >>> --
> > >>> From:Jingsong Li 
> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
> > >>> To:Hequn Cheng 
> > >>> Cc:Yang Wang ; Zhijiang <
> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> godfrey
> > >>> he ; dev ; user <
> > >>> user@flink.apache.org>
> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >>>
> > >>> Thanks everyone~
> > >>>
> > >>> It's my pleasure to be part of the community. I hope I can make a
> > better
> > >>> contribution in future.
> > >>>
> > >>> Best,
> > >>> Jingsong Lee
> > >>>
> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> wrote:
> > >>> Congratulations Jingsong! Well deserved.
> > >>>
> > >>> Best,
> > >>> Hequn
> > >>>
> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > wrote:
> > >>> Congratulations!Jingsong. Well deserved.
> > >>>
> > >>>
> > >>> Best,
> > >>> Yang
> > >>>
> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
> > >>> Congrats Jingsong! Welcome on board!
> > >>>
> > >>> Best,
> > >>> Zhijiang
> > >>>
> > >>> --
> > >>> From:Zhenghua Gao 
> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
> > >>> To:godfrey he 
> > >>> Cc:dev ; user 
> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >>>
> > >>> Congrats Jingsong!
> > >>>
> > >>>
> > >>> *Best Regards,*
> > >>> *Zhenghua Gao*
> > >>>
> > >>>
> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > wrote:
> > >>> Congrats Jingsong! Well deserved.
> > >>>
> > >>> Best,
> > >>> godfrey
> > >>>
> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
> > >>> Congratulations!Jingsong. You deserve it
> > >>>
> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
> > >>> Congrats Jingsong!
> > >>>
> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
> > >>>
> > >>> > Congrats Jingsong!
> > >>> >
> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
> > >>> > >
> > >>> > > Congratulations Jingsong! Well deserved.
> > >>> > >
> > >>> > > Best,
> > >>> > > Jark
> > >>> > >
> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> > >>> > >
> > >>> > >> Congratulations! Jingsong
> > >>> > >>
> > >>> > >>
> > >>> > >> Best,
> > >>> > >> Dan Zou
> > >>> > >>
> > >>> >
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best, Jingsong Lee
> > >>>
> > >>>
> > >>>
> >
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread jincheng sun
Congratulations Jingsong!

Best,
Jincheng


Zhu Zhu  于2020年2月24日周一 上午11:55写道:

> Congratulations Jingsong!
>
> Thanks,
> Zhu Zhu
>
> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>
>> Congrats Jingsong!
>>
>> Cheers, Fabian
>>
>> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong > >:
>>
>> > Congratulations Jingsong!!
>> >
>> > Cheers,
>> > Rong
>> >
>> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>> >
>> > > Congrats, Jingsong!
>> > >
>> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
>> > > wrote:
>> > >
>> > >> Congratulations Jingsong!
>> > >>
>> > >> Cheers,
>> > >> Till
>> > >>
>> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
>> wrote:
>> > >>
>> > >>>   Congratulations Jingsong!
>> > >>>
>> > >>>Best,
>> > >>>Yun
>> > >>>
>> > >>> --
>> > >>> From:Jingsong Li 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
>> > >>> To:Hequn Cheng 
>> > >>> Cc:Yang Wang ; Zhijiang <
>> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
>> godfrey
>> > >>> he ; dev ; user <
>> > >>> user@flink.apache.org>
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Thanks everyone~
>> > >>>
>> > >>> It's my pleasure to be part of the community. I hope I can make a
>> > better
>> > >>> contribution in future.
>> > >>>
>> > >>> Best,
>> > >>> Jingsong Lee
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
>> wrote:
>> > >>> Congratulations Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
>> > wrote:
>> > >>> Congratulations!Jingsong. Well deserved.
>> > >>>
>> > >>>
>> > >>> Best,
>> > >>> Yang
>> > >>>
>> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>> > >>> Congrats Jingsong! Welcome on board!
>> > >>>
>> > >>> Best,
>> > >>> Zhijiang
>> > >>>
>> > >>> --
>> > >>> From:Zhenghua Gao 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
>> > >>> To:godfrey he 
>> > >>> Cc:dev ; user 
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>>
>> > >>> *Best Regards,*
>> > >>> *Zhenghua Gao*
>> > >>>
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
>> > wrote:
>> > >>> Congrats Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> godfrey
>> > >>>
>> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>> > >>> Congratulations!Jingsong. You deserve it
>> > >>>
>> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
>> wrote:
>> > >>>
>> > >>> > Congrats Jingsong!
>> > >>> >
>> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>> > >>> > >
>> > >>> > > Congratulations Jingsong! Well deserved.
>> > >>> > >
>> > >>> > > Best,
>> > >>> > > Jark
>> > >>> > >
>> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >>> > >
>> > >>> > >> Congratulations! Jingsong
>> > >>> > >>
>> > >>> > >>
>> > >>> > >> Best,
>> > >>> > >> Dan Zou
>> > >>> > >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards
>> > >>>
>> > >>> Jeff Zhang
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best, Jingsong Lee
>> > >>>
>> > >>>
>> > >>>
>> >
>>
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Congxian Qiu
Congratulations Jingsong!

Best,
Congxian


jincheng sun  于2020年2月24日周一 下午1:38写道:

> Congratulations Jingsong!
>
> Best,
> Jincheng
>
>
> Zhu Zhu  于2020年2月24日周一 上午11:55写道:
>
>> Congratulations Jingsong!
>>
>> Thanks,
>> Zhu Zhu
>>
>> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>>
>>> Congrats Jingsong!
>>>
>>> Cheers, Fabian
>>>
>>> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
>>> walter...@gmail.com>:
>>>
>>> > Congratulations Jingsong!!
>>> >
>>> > Cheers,
>>> > Rong
>>> >
>>> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>>> >
>>> > > Congrats, Jingsong!
>>> > >
>>> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
>>> > > wrote:
>>> > >
>>> > >> Congratulations Jingsong!
>>> > >>
>>> > >> Cheers,
>>> > >> Till
>>> > >>
>>> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
>>> wrote:
>>> > >>
>>> > >>>   Congratulations Jingsong!
>>> > >>>
>>> > >>>Best,
>>> > >>>Yun
>>> > >>>
>>> > >>> --
>>> > >>> From:Jingsong Li 
>>> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
>>> > >>> To:Hequn Cheng 
>>> > >>> Cc:Yang Wang ; Zhijiang <
>>> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
>>> godfrey
>>> > >>> he ; dev ; user <
>>> > >>> user@flink.apache.org>
>>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>> > >>>
>>> > >>> Thanks everyone~
>>> > >>>
>>> > >>> It's my pleasure to be part of the community. I hope I can make a
>>> > better
>>> > >>> contribution in future.
>>> > >>>
>>> > >>> Best,
>>> > >>> Jingsong Lee
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
>>> wrote:
>>> > >>> Congratulations Jingsong! Well deserved.
>>> > >>>
>>> > >>> Best,
>>> > >>> Hequn
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
>>> > wrote:
>>> > >>> Congratulations!Jingsong. Well deserved.
>>> > >>>
>>> > >>>
>>> > >>> Best,
>>> > >>> Yang
>>> > >>>
>>> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>>> > >>> Congrats Jingsong! Welcome on board!
>>> > >>>
>>> > >>> Best,
>>> > >>> Zhijiang
>>> > >>>
>>> > >>> --
>>> > >>> From:Zhenghua Gao 
>>> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
>>> > >>> To:godfrey he 
>>> > >>> Cc:dev ; user 
>>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>> > >>>
>>> > >>> Congrats Jingsong!
>>> > >>>
>>> > >>>
>>> > >>> *Best Regards,*
>>> > >>> *Zhenghua Gao*
>>> > >>>
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
>>> > wrote:
>>> > >>> Congrats Jingsong! Well deserved.
>>> > >>>
>>> > >>> Best,
>>> > >>> godfrey
>>> > >>>
>>> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>>> > >>> Congratulations!Jingsong. You deserve it
>>> > >>>
>>> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>>> > >>> Congrats Jingsong!
>>> > >>>
>>> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
>>> wrote:
>>> > >>>
>>> > >>> > Congrats Jingsong!
>>> > >>> >
>>> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>>> > >>> > >
>>> > >>> > > Congratulations Jingsong! Well deserved.
>>> > >>> > >
>>> > >>> > > Best,
>>> > >>> > > Jark
>>> > >>> > >
>>> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>>> > >>> > >
>>> > >>> > >> Congratulations! Jingsong
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> Best,
>>> > >>> > >> Dan Zou
>>> > >>> > >>
>>> > >>> >
>>> > >>> >
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best Regards
>>> > >>>
>>> > >>> Jeff Zhang
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best, Jingsong Lee
>>> > >>>
>>> > >>>
>>> > >>>
>>> >
>>>
>>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Yu Li
Congratulations Jingsong! Well deserved.

Best Regards,
Yu


On Mon, 24 Feb 2020 at 14:10, Congxian Qiu  wrote:

> Congratulations Jingsong!
>
> Best,
> Congxian
>
>
> jincheng sun  于2020年2月24日周一 下午1:38写道:
>
>> Congratulations Jingsong!
>>
>> Best,
>> Jincheng
>>
>>
>> Zhu Zhu  于2020年2月24日周一 上午11:55写道:
>>
>>> Congratulations Jingsong!
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>>>
 Congrats Jingsong!

 Cheers, Fabian

 Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
 walter...@gmail.com>:

 > Congratulations Jingsong!!
 >
 > Cheers,
 > Rong
 >
 > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
 >
 > > Congrats, Jingsong!
 > >
 > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann >>> >
 > > wrote:
 > >
 > >> Congratulations Jingsong!
 > >>
 > >> Cheers,
 > >> Till
 > >>
 > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
 wrote:
 > >>
 > >>>   Congratulations Jingsong!
 > >>>
 > >>>Best,
 > >>>Yun
 > >>>
 > >>> --
 > >>> From:Jingsong Li 
 > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
 > >>> To:Hequn Cheng 
 > >>> Cc:Yang Wang ; Zhijiang <
 > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
 godfrey
 > >>> he ; dev ; user <
 > >>> user@flink.apache.org>
 > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
 > >>>
 > >>> Thanks everyone~
 > >>>
 > >>> It's my pleasure to be part of the community. I hope I can make a
 > better
 > >>> contribution in future.
 > >>>
 > >>> Best,
 > >>> Jingsong Lee
 > >>>
 > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
 wrote:
 > >>> Congratulations Jingsong! Well deserved.
 > >>>
 > >>> Best,
 > >>> Hequn
 > >>>
 > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
 > wrote:
 > >>> Congratulations!Jingsong. Well deserved.
 > >>>
 > >>>
 > >>> Best,
 > >>> Yang
 > >>>
 > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
 > >>> Congrats Jingsong! Welcome on board!
 > >>>
 > >>> Best,
 > >>> Zhijiang
 > >>>
 > >>> --
 > >>> From:Zhenghua Gao 
 > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
 > >>> To:godfrey he 
 > >>> Cc:dev ; user 
 > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
 > >>>
 > >>> Congrats Jingsong!
 > >>>
 > >>>
 > >>> *Best Regards,*
 > >>> *Zhenghua Gao*
 > >>>
 > >>>
 > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
 > wrote:
 > >>> Congrats Jingsong! Well deserved.
 > >>>
 > >>> Best,
 > >>> godfrey
 > >>>
 > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
 > >>> Congratulations!Jingsong. You deserve it
 > >>>
 > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
 > >>> Congrats Jingsong!
 > >>>
 > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
 wrote:
 > >>>
 > >>> > Congrats Jingsong!
 > >>> >
 > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
 > >>> > >
 > >>> > > Congratulations Jingsong! Well deserved.
 > >>> > >
 > >>> > > Best,
 > >>> > > Jark
 > >>> > >
 > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
 > >>> > >
 > >>> > >> Congratulations! Jingsong
 > >>> > >>
 > >>> > >>
 > >>> > >> Best,
 > >>> > >> Dan Zou
 > >>> > >>
 > >>> >
 > >>> >
 > >>>
 > >>>
 > >>> --
 > >>> Best Regards
 > >>>
 > >>> Jeff Zhang
 > >>>
 > >>>
 > >>>
 > >>> --
 > >>> Best, Jingsong Lee
 > >>>
 > >>>
 > >>>
 >

>>>


Re: FsStateBackend vs RocksDBStateBackend

2020-02-23 Thread Yu Li
Yes FsStateBackend would be the best fit for state access performance in
this case. Just a reminder that FsStateBackend will upload the full dataset
to DFS during checkpointing, so please watch the network bandwidth usage
and make sure it won't become a new bottleneck.

Best Regards,
Yu


On Fri, 21 Feb 2020 at 20:56, Robert Metzger  wrote:

> I would try the FsStateBackend in this scenario, as you have enough memory
> available.
>
> On Thu, Jan 30, 2020 at 5:26 PM Ran Zhang  wrote:
>
>> Hi Gordon,
>>
>> Thanks for your reply! Regarding state size - we are at 200-300gb but we
>> have 120 parallelism which will make each task handle ~2 - 3 gb state.
>> (when we submit the job we are setting tm memory to 15g.) In this scenario
>> what will be the best fit for statebackend?
>>
>> Thanks,
>> Ran
>>
>> On Wed, Jan 29, 2020 at 6:37 PM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi Ran,
>>>
>>> On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang 
>>> wrote:
>>>
 Hi all,

 We have a Flink app that uses a KeyedProcessFunction, and in the
 function it requires a ValueState(of TreeSet) and the processElement method
 needs to access and update it. We tried to use RocksDB as our stateBackend
 but the performance is not good, and intuitively we think it was because of
 the serialization / deserialization on each processElement call.

>>>
>>> As you have already pointed out, serialization behaviour is a major
>>> difference between the 2 state backends, and will directly impact
>>> performance due to the extra runtime overhead in RocksDB.
>>> If you plan to continue using the RocksDB state backend, make sure to
>>> use MapState instead of ValueState where possible, since every access to
>>> the ValueState in the RocksDB backend requires serializing / deserializing
>>> the whole value.
>>> For MapState, de-/serialization happens per K-V access. Whether or not
>>> this makes sense would of course depend on your state access pattern.
>>>
>>>
 Then we tried to switch to use FsStateBackend (which keeps the
 in-flight data in the TaskManager’s memory according to doc), and it could
 resolve the performance issue. *So we want to understand better what
 are the tradeoffs in choosing between these 2 stateBackend.* Our
 checkpoint size is 200 - 300 GB in stable state. For now we know one
 benefits of RocksDB is it supports incremental checkpoint, but would love
 to know what else we are losing in choosing FsStateBackend.

>>>
>>> As of now, feature-wise both backends support asynchronous snapshotting,
>>> state schema evolution, and access via the State Processor API.
>>> In the end, the major factor for deciding between the two state backends
>>> would be your expected state size.
>>> That being said, it could be possible in the future that savepoint
>>> formats for the backends are changed to be compatible, meaning that you
>>> will be able to switch between different backends upon restore [1].
>>>
>>>

 Thanks a lot!
 Ran Zhang

>>>
>>> Cheers,
>>> Gordon
>>>
>>>  [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State
>>>
>>


Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-23 Thread Yu Li
+1 for dropping savepoint compatibility with Flink 1.2.

Best Regards,
Yu


On Sat, 22 Feb 2020 at 22:05, Ufuk Celebi  wrote:

> Hey Stephan,
>
> +1.
>
> Reading over the linked ticket and your description here, I think it makes
> a lot of sense to go ahead with this. Since it's possible to upgrade via
> intermediate Flink releases as a fail-safe I don't have any concerns.
>
> – Ufuk
>
>
> On Fri, Feb 21, 2020 at 4:34 PM Till Rohrmann 
> wrote:
> >
> > +1 for dropping savepoint compatibility with Flink 1.2.
> >
> > Cheers,
> > Till
> >
> > On Thu, Feb 20, 2020 at 6:55 PM Stephan Ewen  wrote:
> >>
> >> Thank you for the feedback.
> >>
> >> Here is the JIRA issue with some more explanation also about the
> background and implications:
> >> https://jira.apache.org/jira/browse/FLINK-16192
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >> On Thu, Feb 20, 2020 at 2:26 PM vino yang 
> wrote:
> >>>
> >>> +1 for dropping Savepoint compatibility with Flink 1.2
> >>>
> >>> Flink 1.2 is quite far away from the latest 1.10. Especially after the
> release of Flink 1.9 and 1.10, the code and architecture have undergone
> major changes.
> >>>
> >>> Currently, I am updating state migration tests for Flink 1.10. I can
> still see some binary snapshot files of version 1.2. If we agree on this
> topic, we may be able to alleviate some of the burdens(remove those binary
> files) when the migration tests would be updated later.
> >>>
> >>> Best,
> >>> Vino
> >>>
> >>> Theo Diefenthal  于2020年2月20日周四
> 下午9:04写道:
> 
>  +1 for dropping compatibility.
> 
>  I personally think that it is very important for a project to keep a
> good pace in developing that old legacy stuff must be dropped from time to
> time. As long as there is an upgrade routine (via going to another flink
> release) that's fine.
> 
>  
>  Von: "Stephan Ewen" 
>  An: "dev" , "user" 
>  Gesendet: Donnerstag, 20. Februar 2020 11:11:43
>  Betreff: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2
> 
>  Hi all!
>  For some cleanup and simplifications, it would be helpful to drop
> Savepoint compatibility with Flink version 1.2. That version was released
> almost three years ago.
> 
>  I would expect that no one uses that old version any more in a way
> that they actively want to upgrade directly to 1.11.
> 
>  Even if, there is still the way to first upgrade to another version
> (like 1.9) and then upgrade to 1.11 from there.
> 
>  Any concerns to drop that support?
> 
>  Best,
>  Stephan
> 
> 
>  --
>  SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln
>  Theo Diefenthal
> 
>  T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575
>  theo.diefent...@scoop-software.de - www.scoop-software.de
>  Sitz der Gesellschaft: Köln, Handelsregister: Köln,
>  Handelsregisternummer: HRB 36625
>  Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen,
>  Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel
>
>>