[
https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321077#comment-15321077
]
Chandni Singh commented on APEXMALHAR-2063:
-------------------------------------------
Majority of these came from StorageAgent
1. long[] getWindowIds(int operatorId)
2. save(Object object, int operatorId, long windowId)
3. load(int operatorId, long windowId)
4. delete(int operatorId, long windowId)
These were added to WindowDataManager for dynamic partitioning support
1. long[] getWindowIds()
2. load(long windowId)
Will have a variant for all the above methods except these 2.
1. long[] getWindowIds(int operatorId)
This is being used in 2 places:
-- JMSInput which should use getLargestRecoveryWindow() instead of this.
-- ManagedState which should use a variant of delete, that is,
deleteWindowsGreaterThan(...)
2. long[] getWindowIds()
This is used for dynamic partitioning of managed state and I will discuss with
[~timothyfarkas] how can we avoid using this.
> Integrate WAL to FS WindowDataManager
> -------------------------------------
>
> Key: APEXMALHAR-2063
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
> Project: Apache Apex Malhar
> Issue Type: Improvement
> Reporter: Chandni Singh
> Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying
> tuples every completed application window after failure. For this it saves
> meta-data in a file per window. Having multiple small size files on hdfs
> cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain
> a mapping of how much data was flushed to WAL each window.
> In order to use FileSystemWAL for replaying data of a finished window, there
> are few changes made to FileSystemWAL this is because of following:
> 1. WindowDataManager needs to reply data of every finished window. This
> window may not be checkpointed.
> FileSystemWAL truncates the WAL file to the checkpointed point after recovery
> so this poses a problem.
> WindowDataManager should be able to control recovery of FileSystemWAL.
> 2. FileSystemWAL writes to temporary files. The mapping of temp files to
> actual file is part of its state which is checkpointed. Since
> WindowDataManager replays data of a window not yet checkpointed, it needs to
> know the actual temporary file the data is being persisted to.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)