[ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen updated KYLIN-1506:
----------------------------
    Affects Version/s: v1.4.0
                       v1.5.0
                       v1.3.0

> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> ------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1506
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1506
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.5.0, v1.4.0, v1.3.0
>            Reporter: Hao Chen
>            Assignee: Hao Chen
>              Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
>         try {
>             NavigableSet<String> resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
>             if (resources == null || resources.isEmpty()) {
>                 return Collections.emptyList();
>             }
>             // Collections.sort(resources);
>             String rangeStart = resources.first();
>             String rangeEnd = resources.last();
>             return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
>         } catch (IOException e) {
>             logger.error("error get all Jobs:", e);
>             throw new PersistentException(e);
>         }
>     }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List<ExecutableOutputPO> getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
>         try {
>             return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
>         } catch (IOException e) {
>             logger.error("error get all Jobs:", e);
>             throw new PersistentException(e);
>         }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to