Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Stavros Kontopoulos
Congrats!

On Fri, Feb 10, 2017 at 9:11 PM, Matthias J. Sax  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Congrats!
>
> On 2/10/17 2:00 AM, Ufuk Celebi wrote:
> > Hey everyone,
> >
> > I'm very happy to announce that the Flink PMC has accepted Stefan
> > Richter to become a committer of the Apache Flink project.
> >
> > Stefan is part of the community for almost a year now and worked
> > on major features of the latest 1.2 release, most notably rescaling
> > and backwards compatibility of program state.
> >
> > Please join me in welcoming Stefan. :-)
> >
> > – Ufuk
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIYBAEBCgAGBQJYnhBjAAoJELz8Z8hxAGOivzAP3ApZhA/YvlEu/daM4OYl1D3V
> I0nn3GVT4aqj7KK824n6EUOkgAUlo79+AlbUxMST+Sf9KW8TFnDzEHbCQB943Ovq
> x5uE7aiLGRfFo5xFNBEmjdgHEfUTWUOgx6ZX1UbYoAIoqi1VeNcIAwJm+g/UzVw2
> P+d1QuK33Mlf8RqiupuPj/WrxUhBuLDh9F8OkzM3miSehPsf5yXrnTyqlTBJwvjl
> Xgx3OHSo5b7Ht2MCmL2p7U7UT6e8vJ9+v/NvBR22kbWva8RKbbHb8xbD9AtZGwL5
> LSq9nGMP8wnp8hOHTKDB3/MQXWIA6skqUe/91TEOrTTvbXhQ/hEOF7MJyPAhRR6k
> z0YdRpScgv+4ab+MO6OtyMBIcaJVfZ+a1FS3QPN+jzxgrHrmnFZBsWgJMeoSxqa9
> XqC3Af8k6FykrWgOwO+N1QhSDPXgqd8LCH1I/qA1cTE41xTwXlOghsUK0VDBiTJ3
> u1yLr0XQz8+/E+dzXCd2yfj4ry1CXJJ/9fwOw3iAIOlTmwwUvHUulHXr4/vz/B8X
> N2w+Zgp5sbUH+PBVeoUuJMPxRLTh73mstUVCCsCHr1fxMb+XbFYR+Xp9XSqFk8G0
> s/y2AqbVfmyJ9V71pkbs7qhMVMUAX/I4epl9VywXNfPh/vLpcJBbaOoamUOftroL
> 8CEQl1u8+gb0fTA=
> =PdEn
> -END PGP SIGNATURE-
>


Flink ML - NaN Handling

2017-02-10 Thread Stavros Kontopoulos
Hello guys,

Is there a story for this (might have been discussed earlier)? I see
differences between scikit-learn and numpy. Do we standardize on
scikit-learn?

PS. I am working on the preprocessing stuff.

Best,
Stavros


Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Paris Carbone
Congratz Stefan! Keep up with the neat contributions

> On 10 Feb 2017, at 17:10, Haohui Mai  wrote:
> 
> Congrats!
> On Fri, Feb 10, 2017 at 8:08 AM Henry Saputra 
> wrote:
> 
>> Congrats and welcome!
>> 
>> On Fri, Feb 10, 2017 at 7:45 AM, Tzu-Li (Gordon) Tai 
>> wrote:
>> 
>>> Great news! Welcome Stefan :-D
>>> 
>>> 
>>> On February 10, 2017 at 11:36:14 PM, Aljoscha Krettek (
>> aljos...@apache.org)
>>> wrote:
>>> 
>>> Welcome! :-)
>>> 
>>> On Fri, 10 Feb 2017 at 16:10 Till Rohrmann  wrote:
>>> 
 Great to have you on board as a committer Stefan :-)
 
 On Fri, Feb 10, 2017 at 3:32 PM, Greg Hogan 
>> wrote:
 
> Welcome, Stefan, and thank you for your contributions!
> 
> On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi  wrote:
> 
>> Hey everyone,
>> 
>> I'm very happy to announce that the Flink PMC has accepted Stefan
>> Richter to become a committer of the Apache Flink project.
>> 
>> Stefan is part of the community for almost a year now and worked on
>> major features of the latest 1.2 release, most notably rescaling
>> and
>> backwards compatibility of program state.
>> 
>> Please join me in welcoming Stefan. :-)
>> 
>> – Ufuk
>> 
> 
 
>>> 
>> 



Re: New Flink team member - Kate Eri.

2017-02-10 Thread Katherin Eri
I have created the ticket to discuss GPU related questions futher
https://issues.apache.org/jira/browse/FLINK-5782

пт, 10 февр. 2017 г. в 18:16, Katherin Eri :

> Thank you, Trevor!
>
> You have shared very valuable points; I will consider them.
>
> So I think, I should create finally ticket at Flink’s JIRA, at least for
> Flink's GPU support and move the related discussion there?
>
> I will contact to Suneel regarding DL4J, thanks!
>
>
> пт, 10 февр. 2017 г. в 17:44, Trevor Grant :
>
> Also RE: DL4J integration.
>
> Suneel had done some work on this a while back, and ran into issues.  You
> might want to chat with him about the pitfalls and 'gotchyas' there.
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant 
> wrote:
>
> > Sorry for chiming in late.
> >
> > GPUs on Flink.  Till raised a good point- you need to be able to fall
> back
> > to non-GPU resources if they aren't available.
> >
> > Fun fact: this has already been developed for Flink vis-a-vis the Apache
> > Mahout project.
> >
> > In short- Mahout exposes a number of tensor functions (vector %*% matrix,
> > matrix %*% matrix, etc).  If compiled for GPU support, those operations
> are
> > completed via GPU- and if no GPUs are in fact available, Mahout math
> falls
> > back to CPUs (and finally back to the JVM).
> >
> > How this should work is Flink takes care of shipping data around the
> > cluster, and when data arrives at the local node- is dumped out to GPU
> for
> > calculation, loaded back up and shipped back around cluster.  In
> practice,
> > the lack of a persist method for intermediate results makes this
> > troublesome (not because of GPUs but for calculating any sort of complex
> > algorithm we expect to be able to cache intermediate results).
> >
> > +1 to FLINK-1730
> >
> > Everything in Mahout is modular- distributed engine
> > (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA /
> > Write-your-own), algorithms, etc.
> >
> > So to sum up, you're noting the redundancy between ML packages in terms
> of
> > algorithms- I would recommend checking out Mahout before rolling your own
> > GPU integration (else risk redundantly integrating GPUs). If nothing
> else-
> > it should give you some valuable insight regarding design considerations.
> > Also FYI the goal of the Apache Mahout project is to address that problem
> > precisely- implement an algorithm once in a mathematically expressive
> DSL,
> > which is abstracted above the engine so the same code easily ports
> between
> > engines / native solvers (i.e. CPU/GPU).
> >
> > https://github.com/apache/mahout/tree/master/viennacl-omp
> > https://github.com/apache/mahout/tree/master/viennacl
> >
> > Best,
> > tg
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri 
> > wrote:
> >
> >> Thank you Felix, for provided information.
> >>
> >> Currently I analyze the provided integration of Flink with SystemML.
> >>
> >> And also gather the information for the ticket  FLINK-1730
> >> , maybe we will take
> it
> >> to work, to unlock SystemML/Flink integration.
> >>
> >>
> >>
> >> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz  >> d>:
> >>
> >> > Hi Kate,
> >> >
> >> > 1) - Broadcast:
> >> >
> >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+
> >> Only+send+data+to+each+taskmanager+once+for+broadcasts
> >> >  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
> >> >
> >> > 2) I have no idea about the GPU implementation. The SystemML mailing
> >> list
> >> > will probably help you out their.
> >> >
> >> > Best regards,
> >> > Felix
> >> >
> >> > 2017-02-08 14:33 GMT+01:00 Katherin Eri :
> >> >
> >> > > Thank you Felix, for your point, it is quite interesting.
> >> > >
> >> > > I will take a look at the code, of the provided Flink integration.
> >> > >
> >> > > 1)You have these problems with Flink: >>we realized that the
> lack
> >> of
> >> > a
> >> > > caching operator and a broadcast issue highly effects the
> performance,
> >> > have
> >> > > you already asked about this the community? In case yes: please
> >> provide
> >> > the
> >> > > reference to the ticket or the topic of letter.
> >> > >
> >> > > 2)You have said, that SystemML provides GPU support. I have seen
> >> > > SystemML’s source code and would like to ask: why you have decided
> to
> >> > > implement your own 

[jira] [Created] (FLINK-5782) Support GPU calculations

2017-02-10 Thread Kate Eri (JIRA)
Kate Eri created FLINK-5782:
---

 Summary: Support GPU calculations
 Key: FLINK-5782
 URL: https://issues.apache.org/jira/browse/FLINK-5782
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.3.0
Reporter: Kate Eri
Priority: Minor


This ticket was initiated as continuation of the dev discussion thread: [New 
Flink team member - Kate Eri (Integration with DL4J 
topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]  
Recently we have proposed the idea to integrate 
[Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. 
It is known that DL models training is resource demanding process, so training 
on CPU could converge much longer than on GPU.  

But not only for DL training GPU usage could be supposed, but also for 
optimization of graph analytics and other typical data manipulations, nice 
overview of GPU related problems is presented [Accelerating Spark workloads 
using 
GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].

Currently the community pointed the following issues to consider:
1)  Flink would like to avoid to write one more time its own GPU support, 
to reduce engineering burden. That’s why such libraries like 
[ND4J|http://nd4j.org/userguide]  should be considered. 
2)  Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to 
optimize linear algebra calculations, ND4J can’t be integrated as is, because 
it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe 
this issue should be simply closed to enable ND4J usage?
3)  The calculations would have to work with both available and not 
available GPUs. If the system detects that GPUs are available, then ideally it 
would exploit them. Thus GPU resource management could be incorporated in 
[FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested).
4)  It was mentioned that as far Flink takes care of shipping data around 
the cluster, also it will perform its dump out to GPU for calculation and load 
back up. In practice, the lack of a persist method for intermediate results 
makes this troublesome (not because of GPUs but for calculating any sort of 
complex algorithm we expect to be able to cache intermediate results).
That’s why the Ticket 
[FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be 
implemented to solve such problem.  
5)  Also it was recommended to take a look at Apache Mahout, at least to 
get the experience with  GPU integration and check its
https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl 

6)  Also experience of Netflix regarding this question could be considered: 
[Distributed Neural Networks with GPUs in the AWS 
Cloud|http://techblog.netflix.com/search/label/CUDA]   

This is considered as master ticket for GPU related ticktes




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5781) Generation HTML from ConfigOption

2017-02-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5781:
--

 Summary: Generation HTML from ConfigOption
 Key: FLINK-5781
 URL: https://issues.apache.org/jira/browse/FLINK-5781
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


Use the ConfigOption instances to generate a HTML page that we can use to 
include in the docs configuration page.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5780) Extend ConfigOption with descriptions

2017-02-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5780:
--

 Summary: Extend ConfigOption with descriptions
 Key: FLINK-5780
 URL: https://issues.apache.org/jira/browse/FLINK-5780
 Project: Flink
  Issue Type: Sub-task
  Components: Core, Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


The {{ConfigOption}} type is meant to replace the flat {{ConfigConstants}}. As 
part of automating the generation of a docs config page we need to extend  
{{ConfigOption}} with description fields.

>From the ML discussion, these could be:
{code}
void shortDescription(String);
void longDescription(String);
{code}

In practice, the description string should contain HTML/Markdown.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5779) Auto generate configuration docs

2017-02-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5779:
--

 Summary: Auto generate configuration docs
 Key: FLINK-5779
 URL: https://issues.apache.org/jira/browse/FLINK-5779
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


As per discussion on the mailing list we need to improve on the configuration 
documentation page 
(http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Organizing-Documentation-for-Configuration-Options-td15773.html).

We decided to try to (semi) automate this in order to not get of sync in the 
future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5778) Split FileStateHandle into fileName and basePath

2017-02-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5778:
--

 Summary: Split FileStateHandle into fileName and basePath
 Key: FLINK-5778
 URL: https://issues.apache.org/jira/browse/FLINK-5778
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


Store the statePath as a basePath and a fileName and allow to overwrite the 
basePath. We cannot overwrite the base path as long as the state handle is 
still in flight and not persisted. Otherwise we risk a resource leak.

We need this in order to be able to relocate savepoints.

{code}
interface RelativeBaseLocationStreamStateHandle {

   void clearBaseLocation();

   void setBaseLocation(String baseLocation);

}
{code}

FileStateHandle should implement this and the SavepointSerializer should 
forward the calls when a savepoint is stored or loaded, clear before store and 
set after load.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5777) Pass savepoint information to CheckpointingOperation

2017-02-10 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5777:
--

 Summary: Pass savepoint information to CheckpointingOperation
 Key: FLINK-5777
 URL: https://issues.apache.org/jira/browse/FLINK-5777
 Project: Flink
  Issue Type: Sub-task
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


In order to make savepoints self contained in a single directory, we need to 
pass some information to {{StreamTask#CheckpointingOperation}}.

I propose to extend the {{CheckpointMetaData}} for this.

We currently have some overlap with CheckpointMetaData, CheckpointMetrics, and 
manually passed checkpoint ID and checkpoint timestamps. We should restrict 
CheckpointMetaData to the integral meta data that needs to be passed to 
StreamTask#CheckpointingOperation.

This means that we move the CheckpointMetrics out of the CheckpointMetaData and 
the BarrierBuffer/BarrierTracker create CheckpointMetrics separately and send 
it back with the acknowledge message.

CheckpointMetaData should be extended with the following properties:
- boolean isSavepoint
- String targetDirectory

There are two code paths that lead to the CheckpointingOperation:

1. From CheckpointCoordinator via RPC to StreamTask#triggerCheckpoint
- Execution#triggerCheckpoint(long, long) 
=> triggerCheckpoint(CheckpointMetaData)
- TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, long, long) 
=> TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, 
CheckpointMetaData)
- Task#triggerCheckpointBarrier(long, long) =>  
Task#triggerCheckpointBarrier(CheckpointMetaData)

2. From intermediate streams via the CheckpointBarrier to  
StreamTask#triggerCheckpointOnBarrier
- triggerCheckpointOnBarrier(CheckpointMetaData)
=> triggerCheckpointOnBarrier(CheckpointMetaData, CheckpointMetrics)
- CheckpointBarrier(long, long) => CheckpointBarrier(CheckpointMetaData)
- AcknowledgeCheckpoint(CheckpointMetaData)
=> AcknowledgeCheckpoint(long, CheckpointMetrics)

The state backends provide another stream factory that is called in 
CheckpointingOperation when the meta data indicates savepoint. The state 
backends can choose whether they return the regular checkpoint stream factory 
in that case or a special one for savepoints. That way backends that don’t 
checkpoint to a file system can special case savepoints easily.

- FsStateBackend: return special FsCheckpointStreamFactory with different 
directory layout
- MemoryStateBackend: return regular checkpoint stream factory 
(MemCheckpointStreamFactory) => The _metadata file will contain all state as 
the state handles are part of it





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Haohui Mai
Congrats!
On Fri, Feb 10, 2017 at 8:08 AM Henry Saputra 
wrote:

> Congrats and welcome!
>
> On Fri, Feb 10, 2017 at 7:45 AM, Tzu-Li (Gordon) Tai 
> wrote:
>
> > Great news! Welcome Stefan :-D
> >
> >
> > On February 10, 2017 at 11:36:14 PM, Aljoscha Krettek (
> aljos...@apache.org)
> > wrote:
> >
> > Welcome! :-)
> >
> > On Fri, 10 Feb 2017 at 16:10 Till Rohrmann  wrote:
> >
> > > Great to have you on board as a committer Stefan :-)
> > >
> > > On Fri, Feb 10, 2017 at 3:32 PM, Greg Hogan 
> wrote:
> > >
> > > > Welcome, Stefan, and thank you for your contributions!
> > > >
> > > > On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi  wrote:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > I'm very happy to announce that the Flink PMC has accepted Stefan
> > > > > Richter to become a committer of the Apache Flink project.
> > > > >
> > > > > Stefan is part of the community for almost a year now and worked on
> > > > > major features of the latest 1.2 release, most notably rescaling
> and
> > > > > backwards compatibility of program state.
> > > > >
> > > > > Please join me in welcoming Stefan. :-)
> > > > >
> > > > > – Ufuk
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Henry Saputra
Congrats and welcome!

On Fri, Feb 10, 2017 at 7:45 AM, Tzu-Li (Gordon) Tai 
wrote:

> Great news! Welcome Stefan :-D
>
>
> On February 10, 2017 at 11:36:14 PM, Aljoscha Krettek (aljos...@apache.org)
> wrote:
>
> Welcome! :-)
>
> On Fri, 10 Feb 2017 at 16:10 Till Rohrmann  wrote:
>
> > Great to have you on board as a committer Stefan :-)
> >
> > On Fri, Feb 10, 2017 at 3:32 PM, Greg Hogan  wrote:
> >
> > > Welcome, Stefan, and thank you for your contributions!
> > >
> > > On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi  wrote:
> > >
> > > > Hey everyone,
> > > >
> > > > I'm very happy to announce that the Flink PMC has accepted Stefan
> > > > Richter to become a committer of the Apache Flink project.
> > > >
> > > > Stefan is part of the community for almost a year now and worked on
> > > > major features of the latest 1.2 release, most notably rescaling and
> > > > backwards compatibility of program state.
> > > >
> > > > Please join me in welcoming Stefan. :-)
> > > >
> > > > – Ufuk
> > > >
> > >
> >
>


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-10 Thread Henry Saputra
Awesome! Congrats and Welcome, Jark and Kostas K.

- Henry

On Tue, Feb 7, 2017 at 12:16 PM, Fabian Hueske  wrote:

> Hi everybody,
>
> I'm very happy to announce that Jark Wu and Kostas Kloudas accepted the
> invitation of the Flink PMC to become committers of the Apache Flink
> project.
>
> Jark and Kostas are longtime members of the Flink community.
> Both are actively driving Flink's development and contributing to its
> community in many ways.
>
> Please join me in welcoming Kostas and Jark as committers.
>
> Fabian
>


Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Aljoscha Krettek
Welcome! :-)

On Fri, 10 Feb 2017 at 16:10 Till Rohrmann  wrote:

> Great to have you on board as a committer Stefan :-)
>
> On Fri, Feb 10, 2017 at 3:32 PM, Greg Hogan  wrote:
>
> > Welcome, Stefan, and thank you for your contributions!
> >
> > On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi  wrote:
> >
> > > Hey everyone,
> > >
> > > I'm very happy to announce that the Flink PMC has accepted Stefan
> > > Richter to become a committer of the Apache Flink project.
> > >
> > > Stefan is part of the community for almost a year now and worked on
> > > major features of the latest 1.2 release, most notably rescaling and
> > > backwards compatibility of program state.
> > >
> > > Please join me in welcoming Stefan. :-)
> > >
> > > – Ufuk
> > >
> >
>


[jira] [Created] (FLINK-5776) Improve XXMapRunner support create instance by carrying constructor parameters

2017-02-10 Thread sunjincheng (JIRA)
sunjincheng created FLINK-5776:
--

 Summary: Improve XXMapRunner support create instance by carrying 
constructor parameters
 Key: FLINK-5776
 URL: https://issues.apache.org/jira/browse/FLINK-5776
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: sunjincheng
Assignee: sunjincheng


At present, MapRunner FlatMapRunner only supports create non-parameter 
instance, but sometimes we need to carry constructor parameters to instantiate, 
so I would like to improve XXMapRunner support create instance by carrying 
constructor parameters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Welcome Jark Wu and Kostas Kloudas as committers

2017-02-10 Thread Aljoscha Krettek
Congrats! :-)

On Wed, 8 Feb 2017 at 11:29 Robert Metzger  wrote:

> Welcome on board guys!
>
> If you want to try our your new privileges, you can add yourself here:
> http://flink.apache.org/community.html#people (through the
> apache/flink-web
> repo)
>
> On Wed, Feb 8, 2017 at 10:52 AM, Till Rohrmann 
> wrote:
>
> > Congratulations Jark and Kostas :-)
> >
> > On Wed, Feb 8, 2017 at 10:49 AM, Paris Carbone  wrote:
> >
> > > welcome aboard Kostas and Jark :)
> > >
> > > Paris
> > >
> > > > On 7 Feb 2017, at 21:16, Fabian Hueske  wrote:
> > > >
> > > > Hi everybody,
> > > >
> > > > I'm very happy to announce that Jark Wu and Kostas Kloudas accepted
> the
> > > > invitation of the Flink PMC to become committers of the Apache Flink
> > > > project.
> > > >
> > > > Jark and Kostas are longtime members of the Flink community.
> > > > Both are actively driving Flink's development and contributing to its
> > > > community in many ways.
> > > >
> > > > Please join me in welcoming Kostas and Jark as committers.
> > > >
> > > > Fabian
> > >
> > >
> >
>


Using QueryableState inside Flink jobs (and Parameter Server implementation)

2017-02-10 Thread Gábor Hermann

Hi all,

TL;DR: Is it worth to implement a special QueryableState for querying 
state from another part of a Flink streaming job and aligning it with 
fault tolerance?


I've been thinking about implementing a Parameter Server with/within 
Flink. A Parameter Server is basically a specialized key-value store 
optimized for training distributed machine learning models. So not only 
the training data, but also the model is distributed. Range queries are 
typical, and usually vectors and matrices are stored as values.


More generally, an integrated key-value store might also be useful in 
the Streaming API. Although external key-value stores can be used inside 
UDFs for the same purpose, aligning them with the fault tolerance 
mechanism of Flink could be hard. What if state distributed by a key (in 
the current Streaming API) could be queried from another operator? Much 
like QueryableState, but querying *inside* the Flink job. We could make 
use of the fact that state has been queried from inside to optimize 
communication and integrate fault tolerance.


The question is whether the Flink community would like such feature, and 
if so how to do it?


I could elaborate my ideas if needed, and I'm happy to create a design 
doc, but before that, I'd like to know what you all think about this. 
Also, I don't know if I'm missing something, so please correct me. Here 
are some quick notes regarding the integrated KV-store:


Pros
- It could allow easier implementation of more complicated use-cases.
E.g. updating users preferences simultaneously based on each others 
preferences when events happen between them such as making a connection, 
liking each other posts, or going to the same concert. User preferences 
are distributed as a state, an event about user A liking user B gets 
sent to A's state and queries the state of B, then updates the state of 
B. There have been questions on the user mailing list for similar 
problems [1].
- Integration with fault tolerance. User does not have to maintain two 
systems consistently.
- Optimization potentials. At the social network example maybe other 
users on the same partitions with user A need the state of user B, so we 
don't have to send around user B twice.
- Could specialize to a Parameter Server for simple (and efficient) 
implementation of (possibly online) machine learning. E.g. sparse 
logistic regression, LDA, matrix factorization for recommendation systems.


Cons
- Lot of development effort.
- "Client-server" architecture goes against the DAG dataflow model.

Two approaches for the implementation in the streaming API:

1) An easy way to implement this is to use iterations (or the proposed 
loops API). We can achieve two-way communication by two operators in a 
loop: a worker (W) and a Parameter Server (PS), see the diagram [2]. (An 
additional nested loop in the PS could add replication opportunities). 
Then we would get fault tolerance "for free" by the work of Paris [3]. 
It would also be on top of the Streaming API, with no effect on the runtime.


2) A problem with the loop approach is that coordination between PS 
nodes and worker nodes can only be done on the data stream. We could not 
really use e.g. Akka for async coordination. A harder but more flexible 
way is to use lower-level interfaces of Flink and touch the runtime. 
Then we would have to take care of fault tolerance too.


(As a side note: in the batch API generalizing delta iterations could be 
a solution for Parameter Server [4].)


Thanks for any feedback :)

Cheers,
Gabor

[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sharded-state-2-step-operation-td8631.html

[2] https://i.imgur.com/GsliUIh.png
[3] https://github.com/apache/flink/pull/1668
[4] 
https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran




Re: New Flink team member - Kate Eri.

2017-02-10 Thread Trevor Grant
Also RE: DL4J integration.

Suneel had done some work on this a while back, and ran into issues.  You
might want to chat with him about the pitfalls and 'gotchyas' there.



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant 
wrote:

> Sorry for chiming in late.
>
> GPUs on Flink.  Till raised a good point- you need to be able to fall back
> to non-GPU resources if they aren't available.
>
> Fun fact: this has already been developed for Flink vis-a-vis the Apache
> Mahout project.
>
> In short- Mahout exposes a number of tensor functions (vector %*% matrix,
> matrix %*% matrix, etc).  If compiled for GPU support, those operations are
> completed via GPU- and if no GPUs are in fact available, Mahout math falls
> back to CPUs (and finally back to the JVM).
>
> How this should work is Flink takes care of shipping data around the
> cluster, and when data arrives at the local node- is dumped out to GPU for
> calculation, loaded back up and shipped back around cluster.  In practice,
> the lack of a persist method for intermediate results makes this
> troublesome (not because of GPUs but for calculating any sort of complex
> algorithm we expect to be able to cache intermediate results).
>
> +1 to FLINK-1730
>
> Everything in Mahout is modular- distributed engine
> (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA /
> Write-your-own), algorithms, etc.
>
> So to sum up, you're noting the redundancy between ML packages in terms of
> algorithms- I would recommend checking out Mahout before rolling your own
> GPU integration (else risk redundantly integrating GPUs). If nothing else-
> it should give you some valuable insight regarding design considerations.
> Also FYI the goal of the Apache Mahout project is to address that problem
> precisely- implement an algorithm once in a mathematically expressive DSL,
> which is abstracted above the engine so the same code easily ports between
> engines / native solvers (i.e. CPU/GPU).
>
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl
>
> Best,
> tg
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri 
> wrote:
>
>> Thank you Felix, for provided information.
>>
>> Currently I analyze the provided integration of Flink with SystemML.
>>
>> And also gather the information for the ticket  FLINK-1730
>> , maybe we will take it
>> to work, to unlock SystemML/Flink integration.
>>
>>
>>
>> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz > d>:
>>
>> > Hi Kate,
>> >
>> > 1) - Broadcast:
>> >
>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+
>> Only+send+data+to+each+taskmanager+once+for+broadcasts
>> >  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
>> >
>> > 2) I have no idea about the GPU implementation. The SystemML mailing
>> list
>> > will probably help you out their.
>> >
>> > Best regards,
>> > Felix
>> >
>> > 2017-02-08 14:33 GMT+01:00 Katherin Eri :
>> >
>> > > Thank you Felix, for your point, it is quite interesting.
>> > >
>> > > I will take a look at the code, of the provided Flink integration.
>> > >
>> > > 1)You have these problems with Flink: >>we realized that the lack
>> of
>> > a
>> > > caching operator and a broadcast issue highly effects the performance,
>> > have
>> > > you already asked about this the community? In case yes: please
>> provide
>> > the
>> > > reference to the ticket or the topic of letter.
>> > >
>> > > 2)You have said, that SystemML provides GPU support. I have seen
>> > > SystemML’s source code and would like to ask: why you have decided to
>> > > implement your own integration with cuda? Did you try to consider
>> ND4J,
>> > or
>> > > because it is younger, you support your own implementation?
>> > >
>> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz :
>> > >
>> > > > Hi Katherin,
>> > > >
>> > > > we are also working in a similar direction. We implemented a
>> prototype
>> > to
>> > > > integrate with SystemML:
>> > > > https://github.com/apache/incubator-systemml/pull/119
>> > > > SystemML provides many different matrix formats, operations, GPU
>> > support
>> > > > and a couple of DL algorithms. Unfortunately, we realized that the
>> lack
>> > > of
>> > > > a caching operator and a broadcast issue highly effects the
>> performance
>> > > > (e.g. compared to Spark). At the moment I am trying to tackle the
>> > > broadcast
>> > > > 

[jira] [Created] (FLINK-5775) NullReferenceException when running job on local cluster

2017-02-10 Thread Colin Breame (JIRA)
Colin Breame created FLINK-5775:
---

 Summary: NullReferenceException when running job on local cluster
 Key: FLINK-5775
 URL: https://issues.apache.org/jira/browse/FLINK-5775
 Project: Flink
  Issue Type: Bug
  Components: DataStream API
Affects Versions: 1.2.0
Reporter: Colin Breame



When the job is submitted to a local Flink cluster (started using 
'start-local.sh'), the exception below is produced.

It might be worth pointing out that the job reads from a local file (using 
StreamExecutionEnvironment.readFile()).

{code}
Caused by: java.lang.NullPointerException
at org.apache.flink.core.fs.Path.normalizePath(Path.java:258)
at org.apache.flink.core.fs.Path.(Path.java:144)
at 
org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:138)
at 
org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:97)
at org.apache.flink.core.fs.FileSystem.exists(FileSystem.java:464)
at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.exists(SafetyNetWrapperFileSystem.java:99)
at 
org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:191)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:78)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Greg Hogan
Welcome, Stefan, and thank you for your contributions!

On Fri, Feb 10, 2017 at 5:00 AM, Ufuk Celebi  wrote:

> Hey everyone,
>
> I'm very happy to announce that the Flink PMC has accepted Stefan
> Richter to become a committer of the Apache Flink project.
>
> Stefan is part of the community for almost a year now and worked on
> major features of the latest 1.2 release, most notably rescaling and
> backwards compatibility of program state.
>
> Please join me in welcoming Stefan. :-)
>
> – Ufuk
>


回复:[ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Zhuoluo Yang
Congrats Stefan!



--发件人:Ufuk 
Celebi日 期:2017年02月10日 18:00:09收件人:; 
抄 送:主 题:[ANNOUNCE] Welcome Stefan 
Richter as a new committerHey everyone,

I'm very happy to announce that the Flink PMC has accepted Stefan
Richter to become a committer of the Apache Flink project.

Stefan is part of the community for almost a year now and worked on
major features of the latest 1.2 release, most notably rescaling and
backwards compatibility of program state.

Please join me in welcoming Stefan. :-)

– Ufuk


[jira] [Created] (FLINK-5774) ContinuousFileProcessingTest has test instability

2017-02-10 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-5774:
---

 Summary: ContinuousFileProcessingTest has test instability
 Key: FLINK-5774
 URL: https://issues.apache.org/jira/browse/FLINK-5774
 Project: Flink
  Issue Type: Bug
  Components: Streaming Connectors
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


Inserting a {{Thread.sleep(200)}} in 
{{ContinuousFileMonitoringFunction.monitorDirAndForwardSplits()}} will make the 
tests fail reliably. Normally, it occurs now and then due to "natural" slow 
downs on Travis:
log: https://api.travis-ci.org/jobs/199977242/log.txt?deansi=true

The condition that wait's for the file monitoring function to reach a certain 
state needs to be tougher.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: New Flink team member - Kate Eri.

2017-02-10 Thread Trevor Grant
Sorry for chiming in late.

GPUs on Flink.  Till raised a good point- you need to be able to fall back
to non-GPU resources if they aren't available.

Fun fact: this has already been developed for Flink vis-a-vis the Apache
Mahout project.

In short- Mahout exposes a number of tensor functions (vector %*% matrix,
matrix %*% matrix, etc).  If compiled for GPU support, those operations are
completed via GPU- and if no GPUs are in fact available, Mahout math falls
back to CPUs (and finally back to the JVM).

How this should work is Flink takes care of shipping data around the
cluster, and when data arrives at the local node- is dumped out to GPU for
calculation, loaded back up and shipped back around cluster.  In practice,
the lack of a persist method for intermediate results makes this
troublesome (not because of GPUs but for calculating any sort of complex
algorithm we expect to be able to cache intermediate results).

+1 to FLINK-1730

Everything in Mahout is modular- distributed engine
(Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA /
Write-your-own), algorithms, etc.

So to sum up, you're noting the redundancy between ML packages in terms of
algorithms- I would recommend checking out Mahout before rolling your own
GPU integration (else risk redundantly integrating GPUs). If nothing else-
it should give you some valuable insight regarding design considerations.
Also FYI the goal of the Apache Mahout project is to address that problem
precisely- implement an algorithm once in a mathematically expressive DSL,
which is abstracted above the engine so the same code easily ports between
engines / native solvers (i.e. CPU/GPU).

https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl

Best,
tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri 
wrote:

> Thank you Felix, for provided information.
>
> Currently I analyze the provided integration of Flink with SystemML.
>
> And also gather the information for the ticket  FLINK-1730
> , maybe we will take it
> to work, to unlock SystemML/Flink integration.
>
>
>
> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz  invalid>:
>
> > Hi Kate,
> >
> > 1) - Broadcast:
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-
> 5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts
> >  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
> >
> > 2) I have no idea about the GPU implementation. The SystemML mailing list
> > will probably help you out their.
> >
> > Best regards,
> > Felix
> >
> > 2017-02-08 14:33 GMT+01:00 Katherin Eri :
> >
> > > Thank you Felix, for your point, it is quite interesting.
> > >
> > > I will take a look at the code, of the provided Flink integration.
> > >
> > > 1)You have these problems with Flink: >>we realized that the lack
> of
> > a
> > > caching operator and a broadcast issue highly effects the performance,
> > have
> > > you already asked about this the community? In case yes: please provide
> > the
> > > reference to the ticket or the topic of letter.
> > >
> > > 2)You have said, that SystemML provides GPU support. I have seen
> > > SystemML’s source code and would like to ask: why you have decided to
> > > implement your own integration with cuda? Did you try to consider ND4J,
> > or
> > > because it is younger, you support your own implementation?
> > >
> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz :
> > >
> > > > Hi Katherin,
> > > >
> > > > we are also working in a similar direction. We implemented a
> prototype
> > to
> > > > integrate with SystemML:
> > > > https://github.com/apache/incubator-systemml/pull/119
> > > > SystemML provides many different matrix formats, operations, GPU
> > support
> > > > and a couple of DL algorithms. Unfortunately, we realized that the
> lack
> > > of
> > > > a caching operator and a broadcast issue highly effects the
> performance
> > > > (e.g. compared to Spark). At the moment I am trying to tackle the
> > > broadcast
> > > > issue. But caching is still a problem for us.
> > > >
> > > > Best regards,
> > > > Felix
> > > >
> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri :
> > > >
> > > > > Thank you, Till.
> > > > >
> > > > > 1)  Regarding ND4J, I didn’t know about such a pity and
> critical
> > > > > restriction of it -> lack of sparsity optimizations, and you are
> > right:
> > > > > this issue is still actual for them. I saw that Flink uses Breeze,
> > but
> > > I
> > > > > thought its usage caused by some historical reasons.
> > > > >
> > > > > 2)  Regarding integration with DL4J, I have read the source
> code
> > 

Re: New Flink team member - Kate Eri.

2017-02-10 Thread Katherin Eri
Thank you Felix, for provided information.

Currently I analyze the provided integration of Flink with SystemML.

And also gather the information for the ticket  FLINK-1730
, maybe we will take it
to work, to unlock SystemML/Flink integration.



чт, 9 февр. 2017 г. в 0:17, Felix Neutatz :

> Hi Kate,
>
> 1) - Broadcast:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts
>  - Caching: https://issues.apache.org/jira/browse/FLINK-1730
>
> 2) I have no idea about the GPU implementation. The SystemML mailing list
> will probably help you out their.
>
> Best regards,
> Felix
>
> 2017-02-08 14:33 GMT+01:00 Katherin Eri :
>
> > Thank you Felix, for your point, it is quite interesting.
> >
> > I will take a look at the code, of the provided Flink integration.
> >
> > 1)You have these problems with Flink: >>we realized that the lack of
> a
> > caching operator and a broadcast issue highly effects the performance,
> have
> > you already asked about this the community? In case yes: please provide
> the
> > reference to the ticket or the topic of letter.
> >
> > 2)You have said, that SystemML provides GPU support. I have seen
> > SystemML’s source code and would like to ask: why you have decided to
> > implement your own integration with cuda? Did you try to consider ND4J,
> or
> > because it is younger, you support your own implementation?
> >
> > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz :
> >
> > > Hi Katherin,
> > >
> > > we are also working in a similar direction. We implemented a prototype
> to
> > > integrate with SystemML:
> > > https://github.com/apache/incubator-systemml/pull/119
> > > SystemML provides many different matrix formats, operations, GPU
> support
> > > and a couple of DL algorithms. Unfortunately, we realized that the lack
> > of
> > > a caching operator and a broadcast issue highly effects the performance
> > > (e.g. compared to Spark). At the moment I am trying to tackle the
> > broadcast
> > > issue. But caching is still a problem for us.
> > >
> > > Best regards,
> > > Felix
> > >
> > > 2017-02-07 16:22 GMT+01:00 Katherin Eri :
> > >
> > > > Thank you, Till.
> > > >
> > > > 1)  Regarding ND4J, I didn’t know about such a pity and critical
> > > > restriction of it -> lack of sparsity optimizations, and you are
> right:
> > > > this issue is still actual for them. I saw that Flink uses Breeze,
> but
> > I
> > > > thought its usage caused by some historical reasons.
> > > >
> > > > 2)  Regarding integration with DL4J, I have read the source code
> of
> > > > DL4J/Spark integration, that’s why I have declined my idea of reuse
> of
> > > > their word2vec implementation for now, for example. I can perform
> > deeper
> > > > investigation of this topic, if it required.
> > > >
> > > >
> > > >
> > > > So I feel that we have the following picture:
> > > >
> > > > 1)  DL integration investigation, could be part of Apache Bahir.
> I
> > > can
> > > > perform futher investigation of this topic, but I thik we need some
> > > > separated ticket for this to track this activity.
> > > >
> > > > 2)  GPU support, required for DL is interesting, but requires
> ND4J
> > > for
> > > > example.
> > > >
> > > > 3)  ND4J couldn’t be incorporated because it doesn’t support
> > sparsity
> > > >  [1].
> > > >
> > > > Regarding ND4J is this the single blocker for incorporation of it or
> > may
> > > be
> > > > some others known?
> > > >
> > > >
> > > > [1] https://deeplearning4j.org/roadmap.html
> > > >
> > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann :
> > > >
> > > > Thanks for initiating this discussion Katherin. I think you're right
> > that
> > > > in general it does not make sense to reinvent the wheel over and over
> > > > again. Especially if you only have limited resources at hand. So if
> we
> > > > could integrate Flink with some existing library that would be great.
> > > >
> > > > In the past, however, we couldn't find a good library which provided
> > > enough
> > > > freedom to integrate it with Flink. Especially if you want to have
> > > > distributed and somewhat high-performance implementations of ML
> > > algorithms
> > > > you would have to take Flink's execution model (capabilities as well
> as
> > > > limitations) into account. That is mainly the reason why we started
> > > > implementing some of the algorithms "natively" on Flink.
> > > >
> > > > If I remember correctly, then the problem with ND4J was and still is
> > that
> > > > it does not support sparse matrices which was a requirement from our
> > > side.
> > > > As far as I know, it is quite common that you have sparse data
> > structures
> > > > when dealing with large scale problems. That's why we built our own
> > > > 

[ANNOUNCE] Welcome Stefan Richter as a new committer

2017-02-10 Thread Ufuk Celebi
Hey everyone,

I'm very happy to announce that the Flink PMC has accepted Stefan
Richter to become a committer of the Apache Flink project.

Stefan is part of the community for almost a year now and worked on
major features of the latest 1.2 release, most notably rescaling and
backwards compatibility of program state.

Please join me in welcoming Stefan. :-)

– Ufuk


[jira] [Created] (FLINK-5771) DelimitedInputFormat does not correctly handle muli-byte delimiters

2017-02-10 Thread Colin Breame (JIRA)
Colin Breame created FLINK-5771:
---

 Summary: DelimitedInputFormat does not correctly handle muli-byte 
delimiters
 Key: FLINK-5771
 URL: https://issues.apache.org/jira/browse/FLINK-5771
 Project: Flink
  Issue Type: Bug
  Components: filesystem-connector
Affects Versions: 1.2.0
Reporter: Colin Breame


The DelimitedInputFormat does not correctly handle multi-byte delimiters.

The reader sometimes misses a delimiter if it is preceded by the first byte 
from the delimiter.  This results in two records (or more) being returned from 
a single call to nextRecord.

See attached test case.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)