[
https://issues.apache.org/jira/browse/FALCON-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332635#comment-15332635
]
Venkatesan Ramachandran commented on FALCON-2030:
-------------------------------------------------
[~ajayyadava] welcome back and thanks for the info.
The reason is that we hit FALCON-2023 if no pattern is specified in the path.
Also, for snapshot like data (the use case you are referring to), it will be
better to write that under a subfolder -- it could be a timestamp pattern or
version number (like EPOCH as a number). While accessing, workflows can use
LATEST EL to get the latest folder and consume it.
This way, the datasets version could be tracked and maintained. Even metadata
can change (append/remove/update) although at a very slow rate. This way we can
ensure inflight workflow/pipelines do not get affected by the
addition/removal/update of data.
Let me know what you think.
> Enforce time partition pattern in the data location path in feed definition
> ----------------------------------------------------------------------------
>
> Key: FALCON-2030
> URL: https://issues.apache.org/jira/browse/FALCON-2030
> Project: Falcon
> Issue Type: Improvement
> Components: feed
> Reporter: Venkatesan Ramachandran
> Assignee: Venkatesan Ramachandran
>
> In feed definition, data location can be specified without time series
> pattern like below:
> <locations>
> <location type="data"
> path="/tmp/falcon-regression/RetentionTest/testFolders/"/>
> <location type="stats" path="/projects/falcon/clicksStats"/>
> <location type="meta" path="/projects/falcon/clicksMetaData"/>
> </locations>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)