[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao Chen updated KYLIN-1506: ---------------------------- Description: h1. Problem Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always: 1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)" 2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" {code} public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException { try { NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); if (resources == null || resources.isEmpty()) { return Collections.emptyList(); } // Collections.sort(resources); String rangeStart = resources.first(); String rangeEnd = resources.last(); return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER); } catch (IOException e) { logger.error("error get all Jobs:", e); throw new PersistentException(e); } } {code} h2. Solution In fact we could simply combine the two-times scan into one directly: {code} store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) store.getAllResources(resourcePath, Class, Serializer) {code} For example, refactored "List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis)" as following: {code} public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException { try { return store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER); } catch (IOException e) { logger.error("error get all Jobs:", e); throw new PersistentException(e); } } {code} was: h1. Problem Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always: 1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)" 2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" {code} public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException { try { NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); if (resources == null || resources.isEmpty()) { return Collections.emptyList(); } // Collections.sort(resources); String rangeStart = resources.first(); String rangeEnd = resources.last(); return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER); } catch (IOException e) { logger.error("error get all Jobs:", e); throw new PersistentException(e); } } {code} h2. Solution In fact we could simply combine the two-times scan into one directly: {code} store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) {code} > Refactor resource interface for timeseries-based data like jobs to much > better performance > ------------------------------------------------------------------------------------------ > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Sub-task > Reporter: Hao Chen > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet<String> resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List<ExecutableOutputPO> getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)