[
https://issues.apache.org/jira/browse/FALCON-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691996#comment-15691996
]
Srikanth Sundarrajan commented on FALCON-1406:
----------------------------------------------
Thanks [~ajayyadava], Wanted to bring the notes from the discussion back into
the jira for others to also chime in.
+*1. Motivation for the feature*+
There is a need for entity update effective in the past due to issues relating
to code issues, data schema changes that are retro effective. There is also a
need for a clean way to do this without soiling the system with newer temporary
entities or other hacks. There is generally an agreement acknowledging the
utility of the feature.
+*2. Versioning of Entities*+
Would versioning of entities be a better way to handle this generically and
more cleanly. The points discussed around these were
* Would it make sense to track and maintain versioning for feeds and what
would be the challenges for the consumers of data/feed to depend if feed was
versioned
* If entities are versioned, would all the APIs and hence the end users will
be version aware in all the operation
* Would versioning solve this problem more cleanly and if so how
This is what we felt would be good answers to these questions.
* Versioning of feed would indeed make it difficult and challenging for the
consumers. The way processes depend on the latest definition of the feed at the
time of its execution seemed the right approach (Lifecycle action execution
would still benefit from versioning, more on that later)
* Processes on the other hand would benefit from versioning as there is code
associated with it. There are a number of ways to look at the versioning
scheme. If time (loosely effective time) were to be a equivalent of a version
then the current feature does allow for a rudimentary versioning scheme. But
the fact is that the rest of the system particularly the config store has to be
version aware (regardless of the version scheme). If any of the sub services
within the Falcon system were to use the definition of the entity and build out
further capabilities, then those have to be version aware as well (for ex. SLA
monitoring, alerting and likes). While the system itself has the ability to
track the version / history of the entity, it didn't seem right to burden the
users (or the APIs) to be version aware. It would be helpful to retain the
current semantics. However Definition listing, Feed instance availability,
Dependency APIs would benefit from being version aware.
+*3. Known and unknown gaps*+
* There are many sub services particularly on the instance start/finish path
that may be broken if not handled correctly with this change
* Scheduled feeds can have similar problems such as processes as we can
choose to make an update retroactively.
+*4. Way forward*+
* Design document to identify the gaps relating to other affected components
with the effective time and particularly if the approach to treat entities as
versioned, what changes would these entail
* Identify and file associated JIRA related to these gaps and address them.
* As a community we can then review and ensure all known gaps are covered in
the design document and issues are tracked.
Request [~ajayyadava] to chime in with missing details of if any details are
misrepresented.
> Effective time in Entity updates.
> ---------------------------------
>
> Key: FALCON-1406
> URL: https://issues.apache.org/jira/browse/FALCON-1406
> Project: Falcon
> Issue Type: New Feature
> Reporter: sandeep samudrala
> Assignee: sandeep samudrala
> Attachments: FALCON-1406-initial.patch,
> effective_time_in_entity_updates.pdf
>
>
> Effective time with entity updates needs to be provided even with past time
> too. There was effective time capability provided in the past which gives the
> functionality to set an effective time for an entity with only current or
> future time(now + delay), which could not solve all the issues.
> Following are few scenarios which would require effective time to be
> available with time back in past.
> a) New code being deployed for an incompatible input data set which would
> leave instances with old code and new data.
> b) Bad code being pushed for which, the entity should be able to go back in
> time to replay(rerun) with new code.
> c) Orchestration level changes(good/bad) would need functionality to go back
> in time to start with.
> For reference: Linking all the Jiras that have been worked upon around
> effective time .
> https://issues.apache.org/jira/browse/FALCON-374
> https://issues.apache.org/jira/browse/FALCON-297
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)