[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Chen updated KYLIN-1506: ---------------------------- Affects Version/s: v1.4.0 v1.5.0 v1.3.0 > Refactor resource interface for timeseries-based data like jobs to much > better performance > ------------------------------------------------------------------------------------------ > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement > Affects Versions: v1.5.0, v1.4.0, v1.3.0 > Reporter: Hao Chen > Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet<String> resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List<ExecutableOutputPO> getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)