Re: [VOTE] Release 1.8.0, release candidate #5

2019-04-08 Thread Aljoscha Krettek
@Thomas There is a note about this in the documentation release notes: 
https://github.com/apache/flink/blob/5d46437fee92a6e1ec8c0653849a6b27733e99ad/docs/release-notes/flink-1.8.md#change-in-the-maven-modules-of-table-api-flink-11064
 
.
 I’ll add this to the release blog post.

The shaded hadoop2 version number is like this to satisfy our build infra. If 
it were the other way round we would have 
flink-shaded-hadoop2-1.9-SNAPSHOT-2.8.3, and I think the nightly maven snapshot 
repos don’t accept that as a snapshot version and our nightly releases  were 
not built for a while because of this.

Aljoscha

> On 8. Apr 2019, at 08:54, Tzu-Li (Gordon) Tai  wrote:
> 
> +1 (binding)
> 
> Carried over from previous vote on earlier RC, with the following
> additional tests:
> 
> - Verified a few upgrade scenarios for the key / namespace serializer in
> RocksDB state backend
> - Verified checksums and signatures
> - Built source (without Hadoop, Scala 2.11 + Scala 2.12) with success
> - Ran end-to-end tests locally
> 
> Cheers,
> Gordon
> 
> On Mon, Apr 8, 2019 at 5:36 AM Thomas Weise  wrote:
> 
>> +1 (binding)
>> 
>> * verified checksums and signatures
>> * run release build, built internal packages, run few internal tests
>> 
>> Couple observations:
>> 
>> Table API artifact change:
>> https://issues.apache.org/jira/browse/FLINK-11064
>> 
>> JIRA mentions the change, but should we be more explicit and add that info
>> into the release blog/notes?
>> 
>> The version number for shaded hadoop2 surprised me:
>> 
>> Example: flink-shaded-hadoop2-2.8.3-1.8.0.jar
>> 
>> If 2.8.3 is part of the version number (vs. artifactId), then shouldn't the
>> version number be 1.8.0-2.8.3? (Alternatively, the hadoop version could be
>> included into artifactId.)
>> 
>> Thomas
>> 
>> On Sun, Apr 7, 2019 at 12:15 PM Chesnay Schepler 
>> wrote:
>> 
>>> Missed that one :/
>>> 
>>> On 06/04/2019 08:29, Aljoscha Krettek wrote:
 Ah, the first two are not on 1.8.0-rc5, but FLINK-11855 is.
 
> On 6. Apr 2019, at 08:23, Aljoscha Krettek 
>> wrote:
> 
> Thanks, Chesnay! I had that tab open already and was prepared to do it
>>> but it’s good you also thought about it. :-) What about these three
>> issues,
>>> though:
>>> 
>> https://issues.apache.org/jira/issues/?filter=12334772&jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.8.1%20and%20resolution%20!%3D%20unresolved%20ORDER%20BY%20updated%20DESC
>>> <
>>> 
>> https://issues.apache.org/jira/issues/?filter=12334772&jql=project%20=%20FLINK%20AND%20fixVersion%20=%201.8.1%20and%20resolution%20!=%20unresolved%20ORDER%20BY%20updated%20DESC
 
> 
> Aljoscha
> 
>> On 5. Apr 2019, at 14:05, Chesnay Schepler 
>> wrote:
>> 
>> Went through JIRA and update the versions.
>> 
>> On 05/04/2019 13:57, Chesnay Schepler wrote:
>>> +1
>>> 
>>> We'll have to comb through JIRA again as I already found a number of
>>> tickets that were marked as fixed for 1.8.1, but are included in this RC.
>>> 
>>> On 05/04/2019 07:10, Rong Rong wrote:
 +1 (non-binding)
 
 * Verified checksums and GPG files matches release files
 * Verified that the source archives do not contain any binaries
 * Built the source with Maven to ensure all source files have
>> Apache
>>> headers
 * Checked that all POM files point to the same version
 * `mvn clean verify` against scala-2.11 and scala-2.12
  - Also verified rest.port setting is taking effect after
 YarnClusterEntryUtils change
 * Verified quickstart-scala and quickstart-java are working with
>> the
>>> staging
 repository.
 
 --
 Rong
 
 
 On Wed, Apr 3, 2019 at 11:22 PM Aljoscha Krettek <
>>> aljos...@apache.org>
 wrote:
 
> Hi everyone,
> Please review and vote on the release candidate 5 for Flink 1.8.0,
>>> as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific
>>> comments)
> 
> 
> The complete staging area is available for your review, which
>>> includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience
>>> releases to be
> deployed to dist.apache.org [2], which are signed with the key
>> with
> fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
> * all artifacts to be deployed to the Maven Central Repository
>> [4],
> * source code tag "release-1.8.0-rc5" [5],
> * website pull request listing the new release [6]
> * website pull request adding announcement blog post [7].
> 
> The vote will be open for at least 72 hours. It is adopted by
>>> majority

Re: SQL CLI and JDBC

2019-04-08 Thread Fabian Hueske
Hi Hanan,

I'm not aware of any plans to add a JDBC Driver.

One issue with the JDBC interface is that it only works well for queries on
batch data and a subset of queries on streaming data.

Many streaming SQL queries are not able to emit final results (or need to
update previously emitted results).
Take for instance a query like

SELECT colA, COUNT(*)
FROM tab
GROUP BY colA;

If tab is a continuously growing table, no row of the queries result will
ever be final because a new row with any value of colA can be added at any
point in time.
JDBC does not support to retract or update result rows that were emitted
before.

Best, Fabian


Am So., 7. Apr. 2019 um 11:31 Uhr schrieb Hanan Yehudai <
hanan.yehu...@radcom.com>:

> I didn’t see any docs on this -  is there a JDBC Driver that allows the
> same functionalities as the SQL CLI ?
> If not , is it on the roadmap ?
>
>


[DISCUSS] Adaptive Parallelism of Job Vertex

2019-04-08 Thread Bo WANG
Hi all,
In distribution computing system, execution parallelism is vital for both
resource efficiency and execution performance. In Flink, execution
parallelism is a pre-specified parameter, which is usually an empirical
value and thus might not be optimal on the various amount of data processed
by each task.

Furthermore, a fixed parallelism cannot scale to varying data size, which
is common in production cluster, since we may not frequently change the
cluster configuration.

Thus, we propose adaptively determine the execution parallelism of each
vertex at runtime based on the actual input data size and an ideal data
size processed by each task. The ideal data size is a pre-specified
parameter according to the property of the operator.

The design doc is ready:
https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing,
any comments are highly appreciated.


[jira] [Created] (FLINK-12121) Use composition instead of inheritance for the InternalKeyContext logic in backend

2019-04-08 Thread Yu Li (JIRA)
Yu Li created FLINK-12121:
-

 Summary: Use composition instead of inheritance for the 
InternalKeyContext logic in backend
 Key: FLINK-12121
 URL: https://issues.apache.org/jira/browse/FLINK-12121
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Yu Li
Assignee: Yu Li


Commonly it's 
[recommended|https://stackoverflow.com/questions/2399544/difference-between-inheritance-and-composition]
 to favor composition over inheritance in java design, but currently in keyed 
backend we're using inheritance for the {{InternalKeyContext}} logic, and here 
we propose to change to the composition way.

Another advantage of changing to the composition way is that we could remove 
the requirement of a heap backend instance when constructing 
{{HeapRestoreOperation}}, and further making sure all fields are final when 
constructing the {{HeapKeyedStateBackend}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
Oh boy, this is an interesting pickle.

For *last-access-timestamp*, I think only *event-time-of-current-record* makes 
sense. I’m looking at this from a GDPR/regulatory compliance perspective. If 
you update a state, by say storing the event you just received in state, you 
want to use the exact timestamp of that event to to expiration. Both 
*max-timestamp-of-data-seen-so-far* and *last-watermark* suffer from problems 
in edge cases: if the timestamp of an event you receive is quite a bit earlier 
than other timestamps that we have seen so far (i.e. the event is late) we 
would artificially lengthen the TTL of that event (which is stored in state) 
and would therefore break regulatory requirements. Always using the timestamp 
of an event doesn’t suffer from that problem.

For *expiration-check-time*, both *last-watermark* and 
*current-processing-time* could make sense but I’m leaning towards 
*processing-time*. The reason is again the GDPR/compliance view: if we have an 
old savepoint with data that should have been expired by now but we re-process 
it with *last-watermark* expiration, this means that we will get to “see” that 
state even though we shouldn’t allowed to be. If we use 
*current-processing-time* for expiration, we wouldn’t have that problem because 
that old data (according to their event-time timestamp) would be properly 
cleaned up and access would be prevented.

To sum up:
last-access-timestamp: event-time of event
expiration-check-time: processing-time
 
What do you think?

Aljoscha

> On 6. Apr 2019, at 01:30, Konstantin Knauf  wrote:
> 
> Hi Andrey,
> 
> I agree with Elias. This would be the most natural behavior. I wouldn't add
> additional slightly different notions of time to Flink.
> 
> As I can also see a use case for the combination
> 
> * Timestamp stored: Event timestamp
> * Timestamp to check expiration: Processing Time
> 
> we could (maybe in a second step) add the possibility to mix and match time
> characteristics for both aspects.
> 
> Cheers,
> 
> Konstantin
> 
> On Thu, Apr 4, 2019 at 7:59 PM Elias Levy 
> wrote:
> 
>> My 2c:
>> 
>> Timestamp stored with the state value: Event timestamp
>> Timestamp used to check expiration: Last emitted watermark
>> 
>> That follows the event time processing model used elsewhere is Flink.
>> E.g. events are segregated into windows based on their event time, but the
>> windows do not fire until the watermark advances past the end of the window.
>> 
>> 
>> On Thu, Apr 4, 2019 at 7:55 AM Andrey Zagrebin 
>> wrote:
>> 
>>> Hi All,
>>> 
>>> As you might have already seen there is an effort tracked in FLINK-12005
>>> [1] to support event time scale for state with time-to-live (TTL) [2].
>>> While thinking about design, we realised that there can be multiple
>>> options
>>> for semantics of this feature, depending on use case. There is also
>>> sometimes confusion because of event time out-of-order nature in Flink. I
>>> am starting this thread to discuss potential use cases of this feature and
>>> their requirements for interested users and developers. There was already
>>> discussion thread asking about event time for TTL and it already contains
>>> some thoughts [3].
>>> 
>>> There are two semantical cases where we use time for TTL feature at the
>>> moment. Firstly, we store timestamp of state last access/update. Secondly,
>>> we use this timestamp and current timestamp to check expiration and
>>> garbage
>>> collect state at some point later.
>>> 
>>> At the moment, Flink supports *only processing time* for both timestamps:
>>> state *last access and current timestamp*. It is basically current local
>>> system unix epoch time.
>>> 
>>> When it comes to event time scale, we also need to define what Flink
>>> should
>>> use for these two timestamps. Here I will list some options and their
>>> possible pros&cons for discussion. There might be more depending on use
>>> case.
>>> 
>>> *Last access timestamp (stored in backend with the actual state value):*
>>> 
>>>   - *Event timestamp of currently being processed record.* This seems to
>>>   be the simplest option and it allows user-defined timestamps in state
>>>   backend. The problem here might be instability of event time which can
>>> not
>>>   only increase but also decrease if records come out of order. This can
>>> lead
>>>   to rewriting the state timestamp to smaller value which is unnatural
>>> for
>>>   the notion of time.
>>>   - *Max event timestamp of records seen so far for this record key.*
>>> This
>>>   option is similar to the previous one but it tries to fix the notion of
>>>   time to make it always increasing. Maintaining this timestamp has also
>>>   performance implications because the previous timestamp needs to be
>>> read
>>>   out to decide whether to rewrite it.
>>>   - *Last emitted watermark*. This is what we usually use for other
>>>   operations to trigger some actions in Flink, like timers and windows
>>> but it
>>>   can be unrelated to the record which a

[jira] [Created] (FLINK-12122) Spread out tasks evenly across all available registered TaskManagers

2019-04-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-12122:
-

 Summary: Spread out tasks evenly across all available registered 
TaskManagers
 Key: FLINK-12122
 URL: https://issues.apache.org/jira/browse/FLINK-12122
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.7.2, 1.6.4, 1.8.0
Reporter: Till Rohrmann
 Fix For: 1.7.3, 1.9.0, 1.8.1


With Flip-6, we changed the default behaviour how slots are assigned to 
{{TaskManages}}. Instead of evenly spreading it out over all registered 
{{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a tendency 
to first fill up a TM before using another one. This is a regression wrt the 
pre Flip-6 code.

I suggest to change the behaviour so that we try to evenly distribute slots 
across all available {{TaskManagers}} by considering how many of their slots 
are already allocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12123) Upgrade Jepsen to 0.1.13 in flink-jepsen

2019-04-08 Thread Gary Yao (JIRA)
Gary Yao created FLINK-12123:


 Summary: Upgrade Jepsen to 0.1.13 in flink-jepsen
 Key: FLINK-12123
 URL: https://issues.apache.org/jira/browse/FLINK-12123
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.8.0


Raise version of the jepsen dependency in {{flink-jepsen/project.clj}} to 
0.1.11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Kostas Kloudas
Hi all,

For GDPR: I am not sure about the regulatory requirements of GDPR but I
would assume that the time for deletion starts counting from the time an
organisation received the data (i.e. the wall-clock ingestion time of the
data), and not the "event time" of the data. In other case, an organisaton
may be violating GDPR by just receiving e.g. 1 year old data of a user
whole deletion policy is "you are allowed to keep them for 6 months".

Now for the discussion in this thread, I think that the scenario:

* Timestamp stored: Event timestamp
* Timestamp to check expiration: Processing Time

has the underlying assumption that there is a relationship between
event-time and processing time, which is not necessarily the case.
Event-time, although we call it "time", is just another user-defined column
or attribute of the data and can be anything. It is not an "objective" and
independently evolving attribute like wall-clock time. I am not sure what
could be the solution, as out-of-orderness can always lead to arbitrary,
non-reproducible and difficult to debug behaviour (e.g. a super-early
element that arrives out-of-order and, as the succeeding elements set the
timestamp to lower values, it gets deleted by the state backend, although
the user-level windowing logic would expect it to be there).

Given that last point made above, and apart from the semantics of the
proposed feature, I think that we should also discuss if it is a good idea
to have event time TTL implemented in state backend level in the first
place. Personally, I am not so convinced that this is a good idea, as we
introduce another (potentially competing) mechanism for handling event
time, apart from the user program. An example can be the one that I
described above. And this also defeats one of the main advantages of event
time, in my opinion, which is reproducability of the results.

I may be wrong, but I would appreciate any opinions on this.

Cheers,
Kostas

On Mon, Apr 8, 2019 at 11:12 AM Aljoscha Krettek 
wrote:

> Oh boy, this is an interesting pickle.
>
> For *last-access-timestamp*, I think only *event-time-of-current-record*
> makes sense. I’m looking at this from a GDPR/regulatory compliance
> perspective. If you update a state, by say storing the event you just
> received in state, you want to use the exact timestamp of that event to to
> expiration. Both *max-timestamp-of-data-seen-so-far* and *last-watermark*
> suffer from problems in edge cases: if the timestamp of an event you
> receive is quite a bit earlier than other timestamps that we have seen so
> far (i.e. the event is late) we would artificially lengthen the TTL of that
> event (which is stored in state) and would therefore break regulatory
> requirements. Always using the timestamp of an event doesn’t suffer from
> that problem.
>
> For *expiration-check-time*, both *last-watermark* and
> *current-processing-time* could make sense but I’m leaning towards
> *processing-time*. The reason is again the GDPR/compliance view: if we have
> an old savepoint with data that should have been expired by now but we
> re-process it with *last-watermark* expiration, this means that we will get
> to “see” that state even though we shouldn’t allowed to be. If we use
> *current-processing-time* for expiration, we wouldn’t have that problem
> because that old data (according to their event-time timestamp) would be
> properly cleaned up and access would be prevented.
>
> To sum up:
> last-access-timestamp: event-time of event
> expiration-check-time: processing-time
>
> What do you think?
>
> Aljoscha
>
> > On 6. Apr 2019, at 01:30, Konstantin Knauf 
> wrote:
> >
> > Hi Andrey,
> >
> > I agree with Elias. This would be the most natural behavior. I wouldn't
> add
> > additional slightly different notions of time to Flink.
> >
> > As I can also see a use case for the combination
> >
> > * Timestamp stored: Event timestamp
> > * Timestamp to check expiration: Processing Time
> >
> > we could (maybe in a second step) add the possibility to mix and match
> time
> > characteristics for both aspects.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Thu, Apr 4, 2019 at 7:59 PM Elias Levy 
> > wrote:
> >
> >> My 2c:
> >>
> >> Timestamp stored with the state value: Event timestamp
> >> Timestamp used to check expiration: Last emitted watermark
> >>
> >> That follows the event time processing model used elsewhere is Flink.
> >> E.g. events are segregated into windows based on their event time, but
> the
> >> windows do not fire until the watermark advances past the end of the
> window.
> >>
> >>
> >> On Thu, Apr 4, 2019 at 7:55 AM Andrey Zagrebin 
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> As you might have already seen there is an effort tracked in
> FLINK-12005
> >>> [1] to support event time scale for state with time-to-live (TTL) [2].
> >>> While thinking about design, we realised that there can be multiple
> >>> options
> >>> for semantics of this feature, depending on use case. There is also
> >>> som

[jira] [Created] (FLINK-12124) Security is not support dynamicProperties

2019-04-08 Thread zhouqi (JIRA)
zhouqi created FLINK-12124:
--

 Summary: Security is not support dynamicProperties
 Key: FLINK-12124
 URL: https://issues.apache.org/jira/browse/FLINK-12124
 Project: Flink
  Issue Type: Bug
  Components: Command Line Client
Affects Versions: 1.7.2
Reporter: zhouqi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12125) Add OVH to poweredby.zh.md and index.zh.md

2019-04-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-12125:
-

 Summary: Add OVH to poweredby.zh.md and index.zh.md
 Key: FLINK-12125
 URL: https://issues.apache.org/jira/browse/FLINK-12125
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


OVH was added to the {{poweredby.md}} and {index.md}} pages in commits 
55ae4426d5b91695e1e5629b1d0a16b7a1e010f0 and 
d4a160ab336c5ae1b2f772fbeff7e003478e274b. See also PR 
https://github.com/apache/flink-web/pull/193.

The corresponding Chinese pages should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12126) Make relaxed rescaling behaviour accessible in REST interface

2019-04-08 Thread Joshua Schneider (JIRA)
Joshua Schneider created FLINK-12126:


 Summary: Make relaxed rescaling behaviour accessible in REST 
interface
 Key: FLINK-12126
 URL: https://issues.apache.org/jira/browse/FLINK-12126
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.7.2
Reporter: Joshua Schneider


The REST handler for rescaling uses RescalingBehaviour.STRICT as a hard-coded 
parameter. This has the following consequences when a job is rescaled:

 * All operators are set to the new parallelism. If at least one operator's max 
parallelism is less than the new parallelism, the rescale operation is aborted.
 * Therefore, it is impossible to restrict the parallelism of a subset of 
operators (say, to one) in combination with rescaling.

A different behaviour which fixes these issues, RescalingBehaviour.RELAXED, is 
already provided by the internal API. However, it appears to be unused.

The REST interface should provide a parameter that selects the strategy. 
Alternatively, it could be a setting of the job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Aljoscha Krettek
I had a discussion with Andrey and now think that also the case 
event-time-timestamp/watermark-cleanup is a valid case. If you don’t need this 
for regulatory compliance but just for cleaning up old state, in case where you 
have re-processing of old data.

I think the discussion about whether to have this in the backends is also good 
to have: I’d say it’s good to have it in the backends because this
 (1) decreases state size, for user timers a timer entry is basically a  whereas if we use backend TTL it’s only the timestamp
 (2) can piggyback on log compaction in RocksDB. A user-time manually has to go 
to state and delete it, which can be costly, while TTL in the backend would 
happen as-we-go

Aljoscha

> On 8. Apr 2019, at 12:03, Kostas Kloudas  wrote:
> 
> Hi all,
> 
> For GDPR: I am not sure about the regulatory requirements of GDPR but I would 
> assume that the time for deletion starts counting from the time an 
> organisation received the data (i.e. the wall-clock ingestion time of the 
> data), and not the "event time" of the data. In other case, an organisaton 
> may be violating GDPR by just receiving e.g. 1 year old data of a user whole 
> deletion policy is "you are allowed to keep them for 6 months".
> 
> Now for the discussion in this thread, I think that the scenario:
> 
> * Timestamp stored: Event timestamp
> * Timestamp to check expiration: Processing Time
> 
> has the underlying assumption that there is a relationship between event-time 
> and processing time, which is not necessarily the case. Event-time, although 
> we call it "time", is just another user-defined column or attribute of the 
> data and can be anything. It is not an "objective" and independently evolving 
> attribute like wall-clock time. I am not sure what could be the solution, as 
> out-of-orderness can always lead to arbitrary, non-reproducible and difficult 
> to debug behaviour (e.g. a super-early element that arrives out-of-order and, 
> as the succeeding elements set the timestamp to lower values, it gets deleted 
> by the state backend, although the user-level windowing logic would expect it 
> to be there).
> 
> Given that last point made above, and apart from the semantics of the 
> proposed feature, I think that we should also discuss if it is a good idea to 
> have event time TTL implemented in state backend level in the first place. 
> Personally, I am not so convinced that this is a good idea, as we introduce 
> another (potentially competing) mechanism for handling event time, apart from 
> the user program. An example can be the one that I described above. And this 
> also defeats one of the main advantages of event time, in my opinion, which 
> is reproducability of the results.
> 
> I may be wrong, but I would appreciate any opinions on this.
> 
> Cheers,
> Kostas
> 
> On Mon, Apr 8, 2019 at 11:12 AM Aljoscha Krettek  > wrote:
> Oh boy, this is an interesting pickle.
> 
> For *last-access-timestamp*, I think only *event-time-of-current-record* 
> makes sense. I’m looking at this from a GDPR/regulatory compliance 
> perspective. If you update a state, by say storing the event you just 
> received in state, you want to use the exact timestamp of that event to to 
> expiration. Both *max-timestamp-of-data-seen-so-far* and *last-watermark* 
> suffer from problems in edge cases: if the timestamp of an event you receive 
> is quite a bit earlier than other timestamps that we have seen so far (i.e. 
> the event is late) we would artificially lengthen the TTL of that event 
> (which is stored in state) and would therefore break regulatory requirements. 
> Always using the timestamp of an event doesn’t suffer from that problem.
> 
> For *expiration-check-time*, both *last-watermark* and 
> *current-processing-time* could make sense but I’m leaning towards 
> *processing-time*. The reason is again the GDPR/compliance view: if we have 
> an old savepoint with data that should have been expired by now but we 
> re-process it with *last-watermark* expiration, this means that we will get 
> to “see” that state even though we shouldn’t allowed to be. If we use 
> *current-processing-time* for expiration, we wouldn’t have that problem 
> because that old data (according to their event-time timestamp) would be 
> properly cleaned up and access would be prevented.
> 
> To sum up:
> last-access-timestamp: event-time of event
> expiration-check-time: processing-time
> 
> What do you think?
> 
> Aljoscha
> 
> > On 6. Apr 2019, at 01:30, Konstantin Knauf  > > wrote:
> > 
> > Hi Andrey,
> > 
> > I agree with Elias. This would be the most natural behavior. I wouldn't add
> > additional slightly different notions of time to Flink.
> > 
> > As I can also see a use case for the combination
> > 
> > * Timestamp stored: Event timestamp
> > * Timestamp to check expiration: Processing Time
> > 
> > we could (maybe in a second step) add the possibility to mix and match t

[jira] [Created] (FLINK-12127) Move network related options to NetworkEnvironmentOptions

2019-04-08 Thread zhijiang (JIRA)
zhijiang created FLINK-12127:


 Summary: Move network related options to NetworkEnvironmentOptions
 Key: FLINK-12127
 URL: https://issues.apache.org/jira/browse/FLINK-12127
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: zhijiang
Assignee: zhijiang


Some network related options in TaskManagerOptions could be moved into new 
introduced `NetworkEnvironmentOptions` which would be used for different 
shuffle services.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Semantics of event time for state TTL

2019-04-08 Thread Elias Levy
Hasn't this been always the end goal?  It's certainly what we have been
waiting on for job with very large TTLed state.  Beyond timer storage,
timer processing to simply expire stale data that may not be accessed
otherwise is expensive.

On Mon, Apr 8, 2019 at 7:11 AM Aljoscha Krettek  wrote:

> I had a discussion with Andrey and now think that also the case
> event-time-timestamp/watermark-cleanup is a valid case. If you don’t need
> this for regulatory compliance but just for cleaning up old state, in case
> where you have re-processing of old data.
>
> I think the discussion about whether to have this in the backends is also
> good to have: I’d say it’s good to have it in the backends because this
>  (1) decreases state size, for user timers a timer entry is basically a
>  whereas if we use backend TTL it’s only the timestamp
>  (2) can piggyback on log compaction in RocksDB. A user-time manually has
> to go to state and delete it, which can be costly, while TTL in the backend
> would happen as-we-go
>
> Aljoscha
>
> On 8. Apr 2019, at 12:03, Kostas Kloudas  wrote:
>
> Hi all,
>
> For GDPR: I am not sure about the regulatory requirements of GDPR but I
> would assume that the time for deletion starts counting from the time an
> organisation received the data (i.e. the wall-clock ingestion time of the
> data), and not the "event time" of the data. In other case, an organisaton
> may be violating GDPR by just receiving e.g. 1 year old data of a user
> whole deletion policy is "you are allowed to keep them for 6 months".
>
> Now for the discussion in this thread, I think that the scenario:
>
> * Timestamp stored: Event timestamp
> * Timestamp to check expiration: Processing Time
>
> has the underlying assumption that there is a relationship between
> event-time and processing time, which is not necessarily the case.
> Event-time, although we call it "time", is just another user-defined column
> or attribute of the data and can be anything. It is not an "objective" and
> independently evolving attribute like wall-clock time. I am not sure what
> could be the solution, as out-of-orderness can always lead to arbitrary,
> non-reproducible and difficult to debug behaviour (e.g. a super-early
> element that arrives out-of-order and, as the succeeding elements set the
> timestamp to lower values, it gets deleted by the state backend, although
> the user-level windowing logic would expect it to be there).
>
> Given that last point made above, and apart from the semantics of the
> proposed feature, I think that we should also discuss if it is a good idea
> to have event time TTL implemented in state backend level in the first
> place. Personally, I am not so convinced that this is a good idea, as we
> introduce another (potentially competing) mechanism for handling event
> time, apart from the user program. An example can be the one that I
> described above. And this also defeats one of the main advantages of event
> time, in my opinion, which is reproducability of the results.
>
> I may be wrong, but I would appreciate any opinions on this.
>
> Cheers,
> Kostas
>
> On Mon, Apr 8, 2019 at 11:12 AM Aljoscha Krettek 
> wrote:
>
>> Oh boy, this is an interesting pickle.
>>
>> For *last-access-timestamp*, I think only *event-time-of-current-record*
>> makes sense. I’m looking at this from a GDPR/regulatory compliance
>> perspective. If you update a state, by say storing the event you just
>> received in state, you want to use the exact timestamp of that event to to
>> expiration. Both *max-timestamp-of-data-seen-so-far* and *last-watermark*
>> suffer from problems in edge cases: if the timestamp of an event you
>> receive is quite a bit earlier than other timestamps that we have seen so
>> far (i.e. the event is late) we would artificially lengthen the TTL of that
>> event (which is stored in state) and would therefore break regulatory
>> requirements. Always using the timestamp of an event doesn’t suffer from
>> that problem.
>>
>> For *expiration-check-time*, both *last-watermark* and
>> *current-processing-time* could make sense but I’m leaning towards
>> *processing-time*. The reason is again the GDPR/compliance view: if we have
>> an old savepoint with data that should have been expired by now but we
>> re-process it with *last-watermark* expiration, this means that we will get
>> to “see” that state even though we shouldn’t allowed to be. If we use
>> *current-processing-time* for expiration, we wouldn’t have that problem
>> because that old data (according to their event-time timestamp) would be
>> properly cleaned up and access would be prevented.
>>
>> To sum up:
>> last-access-timestamp: event-time of event
>> expiration-check-time: processing-time
>>
>> What do you think?
>>
>> Aljoscha
>>
>> > On 6. Apr 2019, at 01:30, Konstantin Knauf 
>> wrote:
>> >
>> > Hi Andrey,
>> >
>> > I agree with Elias. This would be the most natural behavior. I wouldn't
>> add
>> > additional slightly different notions of time to 

[jira] [Created] (FLINK-12128) There is a typo on the website

2019-04-08 Thread Kenneth Yang (JIRA)
Kenneth Yang created FLINK-12128:


 Summary: There is a typo on the website
 Key: FLINK-12128
 URL: https://issues.apache.org/jira/browse/FLINK-12128
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Kenneth Yang


[https://flink.apache.org/roadmap.html]

"Various of these enhancements can be taken _*form*_ the contributed code from 
the [Blink fork|https://github.com/apache/flink/tree/blink].";

I think this sentence has a typo, should change the *form* to _from_

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12129) CompressedViews need release heap buffers to reduce memory usage in blink

2019-04-08 Thread Jingsong Lee (JIRA)
Jingsong Lee created FLINK-12129:


 Summary: CompressedViews need release heap buffers to reduce 
memory usage in blink
 Key: FLINK-12129
 URL: https://issues.apache.org/jira/browse/FLINK-12129
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Reporter: Jingsong Lee


In BinaryHashPartition, CompressedViews will be maintained for a long time. The 
heap buffers in the view that ends spill is useless and should be released.

see CompressedBlockChannelWriter.compressedBuffers and etc..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12130) Apply command line options to configuration before install security modules

2019-04-08 Thread Victor Wong (JIRA)
Victor Wong created FLINK-12130:
---

 Summary: Apply command line options to configuration before 
install security modules
 Key: FLINK-12130
 URL: https://issues.apache.org/jira/browse/FLINK-12130
 Project: Flink
  Issue Type: Improvement
  Components: Command Line Client
Reporter: Victor Wong


Currently if the user configures Kerberos credentials through command line, it 
won't work.
{code:java}
// flink run -m yarn-cluster -yD security.kerberos.login.keytab=/path/to/keytab 
-yD security.kerberos.login.principal=xxx /path/to/test.jar
{code}
Maybe we could call 
_org.apache.flink.client.cli.AbstractCustomCommandLine#applyCommandLineOptionsToConfiguration_
 ** before _SecurityUtils.install(new 
SecurityConfiguration(cli.configuration));_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12131) Resetting ExecutionVertex in region failover may cause inconsistency of IntermediateResult status

2019-04-08 Thread Zhu Zhu (JIRA)
Zhu Zhu created FLINK-12131:
---

 Summary: Resetting ExecutionVertex in region failover may cause 
inconsistency of IntermediateResult status
 Key: FLINK-12131
 URL: https://issues.apache.org/jira/browse/FLINK-12131
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Zhu Zhu
Assignee: Zhu Zhu


Currently the *IntermediateResult* status is only reset when its producer 
*ExecutionJobVertex* is reset.

 

When region failover strategy is enabled, the failed region vertices are reset 
through  *ExecutionVertex.resetForNewExecution()*. The 
*numberOfRunningProducers* counter in

IntermediateResult, however, is not properly adjusted in this case.

So if a FINISHED vertex is restarted and finishes again, the counter may drop 
below 0.

 

Besides, the consumable property of the partition is not reset as well. This 
may lead to incorrect input state check result for lazy scheduling.

 

I'd propose to invoke *IntermediateResultPartition.resetForNewExecution()* in 
*ExecutionVertex.resetForNewExecution()* and reset the 
*numberOfRunningProducers* counter and *IntermediateResultPartition* there.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12132) The example in /docs/ops/deployment/yarn_setup.md should be updated due to the change FLINK-2021

2019-04-08 Thread Wang Geng (JIRA)
Wang Geng created FLINK-12132:
-

 Summary: The example in /docs/ops/deployment/yarn_setup.md should 
be updated due to the change FLINK-2021
 Key: FLINK-12132
 URL: https://issues.apache.org/jira/browse/FLINK-12132
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Wang Geng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12133) Support unbounded aggregate in streaming table runtime

2019-04-08 Thread Jark Wu (JIRA)
Jark Wu created FLINK-12133:
---

 Summary: Support unbounded aggregate in streaming table runtime
 Key: FLINK-12133
 URL: https://issues.apache.org/jira/browse/FLINK-12133
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Reporter: Jark Wu
Assignee: Jark Wu


This ticket is aiming to support unbounded aggregate in streaming runtime. This 
should includes:

1. GroupAggFunction: function that support unbounded aggregate without 
optimizations
2. MiniBatchGroupAggFunction: function that support unbounded aggregate with 
minibatch optimization
3. MiniBatchLocalGroupAggFunction & MiniBatchGlobalGroupAggFunction:  function 
that support unbounded aggregate with local combine optimization



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)