[jira] [Commented] (KYLIN-1441) Display time column as partition column
[ https://issues.apache.org/jira/browse/KYLIN-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203810#comment-15203810 ] qianqiaoneng commented on KYLIN-1441: - #1. I mean is on cube build confirm dialog, there is an "End Date(Exclude)" that user need to specify when trigger a build from GUI. I think this input need to be changed to "End Date Time (Exclude)" when Kylin supports date and time as the partition column(s). #2, Yes, you are right. > Display time column as partition column > --- > > Key: KYLIN-1441 > URL: https://issues.apache.org/jira/browse/KYLIN-1441 > Project: Kylin > Issue Type: Task > Components: REST Service, Web >Reporter: Dipesh >Assignee: Dipesh > Fix For: Backlog > > Attachments: > 0001-KYLIN-1441-Display-time-column-as-partition-column.patch > > > There are requirements to support time column as partition column when > creating cube in the cube designer. Display time column if present as > possible choice for partition column. > Backend changes for using time column as a partition column is covered here > https://issues.apache.org/jira/browse/KYLIN-1427 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1431) Define stream config at table level, instead of on cube level
[ https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI resolved KYLIN-1431. - Resolution: Fixed > Define stream config at table level, instead of on cube level > - > > Key: KYLIN-1431 > URL: https://issues.apache.org/jira/browse/KYLIN-1431 > Project: Kylin > Issue Type: Improvement > Components: Metadata, streaming, Web >Affects Versions: v1.5.0, v1.4.0 >Reporter: Shaofeng SHI >Assignee: Zhong,Jason > Fix For: v1.5.0 > > > In 2.0 streaming, user need enter the kafka information when create the cube, > like the topic, the broker list, etc; while these info should be independent > with cube, and can be reused across cubes which share the same table. > The expected design is, define kafka config when adding the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1431) Define stream config at table level, instead of on cube level
[ https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI closed KYLIN-1431. --- > Define stream config at table level, instead of on cube level > - > > Key: KYLIN-1431 > URL: https://issues.apache.org/jira/browse/KYLIN-1431 > Project: Kylin > Issue Type: Improvement > Components: Metadata, streaming, Web >Affects Versions: v1.5.0, v1.4.0 >Reporter: Shaofeng SHI >Assignee: Zhong,Jason > Fix For: v1.5.0 > > > In 2.0 streaming, user need enter the kafka information when create the cube, > like the topic, the broker list, etc; while these info should be independent > with cube, and can be reused across cubes which share the same table. > The expected design is, define kafka config when adding the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1431) Define stream config at table level, instead of on cube level
[ https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-1431: Fix Version/s: v1.5.0 > Define stream config at table level, instead of on cube level > - > > Key: KYLIN-1431 > URL: https://issues.apache.org/jira/browse/KYLIN-1431 > Project: Kylin > Issue Type: Improvement > Components: Metadata, streaming, Web >Affects Versions: v1.5.0, v1.4.0 >Reporter: Shaofeng SHI >Assignee: Zhong,Jason > Fix For: v1.5.0 > > > In 2.0 streaming, user need enter the kafka information when create the cube, > like the topic, the broker list, etc; while these info should be independent > with cube, and can be reused across cubes which share the same table. > The expected design is, define kafka config when adding the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1511) model SelectedColumnMeta cannot be parsed by Jackson
Lola Liu created KYLIN-1511: --- Summary: model SelectedColumnMeta cannot be parsed by Jackson Key: KYLIN-1511 URL: https://issues.apache.org/jira/browse/KYLIN-1511 Project: Kylin Issue Type: Bug Components: REST Service Affects Versions: v1.4.0 Reporter: Lola Liu Assignee: liyang class SelectedColumnMeta is immutable and doesn’t have a default constructor. When trying to deserialize a JSON String to SelectedColumnMeta an Exception “JsonMappingException: No suitable constructor found” is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View
[ https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203739#comment-15203739 ] hongbin ma commented on KYLIN-1510: --- my mistake, I'll close this issue. thanks [~sunyerui] > Need to build a cube which has LOOKUP table referring Hive View > --- > > Key: KYLIN-1510 > URL: https://issues.apache.org/jira/browse/KYLIN-1510 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > Labels: newbie > > as this issue is raised more than once, I think we should treat it as a > common issue and fix it soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View
[ https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma resolved KYLIN-1510. --- Resolution: Duplicate > Need to build a cube which has LOOKUP table referring Hive View > --- > > Key: KYLIN-1510 > URL: https://issues.apache.org/jira/browse/KYLIN-1510 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > Labels: newbie > > as this issue is raised more than once, I think we should treat it as a > common issue and fix it soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View
[ https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203685#comment-15203685 ] Yerui Sun commented on KYLIN-1510: -- This seems duplicates KYLIN-1077, please [~mahongbin] noticed it. > Need to build a cube which has LOOKUP table referring Hive View > --- > > Key: KYLIN-1510 > URL: https://issues.apache.org/jira/browse/KYLIN-1510 > Project: Kylin > Issue Type: Bug >Reporter: hongbin ma >Assignee: hongbin ma > Labels: newbie > > as this issue is raised more than once, I think we should treat it as a > common issue and fix it soon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1238) Docker version for Kylin 1.0 or latest stable release
[ https://issues.apache.org/jira/browse/KYLIN-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203307#comment-15203307 ] Gabor Liptak commented on KYLIN-1238: - Kylin 1.2 was published at https://hub.docker.com/r/sequenceiq/kylin/tags/ > Docker version for Kylin 1.0 or latest stable release > - > > Key: KYLIN-1238 > URL: https://issues.apache.org/jira/browse/KYLIN-1238 > Project: Kylin > Issue Type: Improvement >Reporter: Santosh >Assignee: Shaofeng SHI > > Current Docker container is Kylin 0.7 which is an old version. Latest stable > Kylin release should be included in Docker container and made available on > Kylin website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1506) Refactor time-based filter on resource
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen updated KYLIN-1506: Description: h1. Problem Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always: 1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)" 2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" {code} public List getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException { try { NavigableSet resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); if (resources == null || resources.isEmpty()) { return Collections.emptyList(); } // Collections.sort(resources); String rangeStart = resources.first(); String rangeEnd = resources.last(); return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER); } catch (IOException e) { logger.error("error get all Jobs:", e); throw new PersistentException(e); } } {code} h2. Solution In fact we could simply combine the two-times scan into one directly: {code} store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) {code} was: H1. Problem Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always: 1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)" 2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" {code} public List getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException { try { NavigableSet resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); if (resources == null || resources.isEmpty()) { return Collections.emptyList(); } // Collections.sort(resources); String rangeStart = resources.first(); String rangeEnd = resources.last(); return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER); } catch (IOException e) { logger.error("error get all Jobs:", e); throw new PersistentException(e); } } {code} H2. Solution In fact we could simply combine the two-times scan into one directly: {code} store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) {code} > Refactor time-based filter on resource > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Sub-task >Reporter: Hao Chen > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KYLIN-1498) cube desc signature not calculated correctly
[ https://issues.apache.org/jira/browse/KYLIN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma reopened KYLIN-1498: --- current cube desc calculateSignature failed to capture new key fields like engineType, storage type, etc. > cube desc signature not calculated correctly > > > Key: KYLIN-1498 > URL: https://issues.apache.org/jira/browse/KYLIN-1498 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0 >Reporter: hongbin ma >Assignee: hongbin ma > > Currently cube desc's signature does not take model's signature into account > (only takes model's name). So even when model is changed the cube side is > unaware. > I'd suggest to add a signature for each model, and the cube desc's signature > calculation will take that as a parameter as well as other fields in cube > desc itself. when model's signature changes, cube desc's changes too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1122) Kylin support detail data query from fact table
[ https://issues.apache.org/jira/browse/KYLIN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203282#comment-15203282 ] liyang commented on KYLIN-1122: --- Agree! > Kylin support detail data query from fact table > --- > > Key: KYLIN-1122 > URL: https://issues.apache.org/jira/browse/KYLIN-1122 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v1.2 >Reporter: Xiaoyu Wang >Assignee: liyang > Fix For: Backlog > > Attachments: > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(2.x-staging).patch, > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(update-v2-1.x-staging).patch, > > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact-new-impl-under-refactoring-2.x-staging.patch > > > Now Kylin does not support query correct detail rows from fact table like: > select column1,column2,column3 from fact_table > The jira KYLIN-1075 add the "SUM" function on the measure column if defined. > But only the column number type is support. > I change some code to support this issue: > Add a "VALUE" measure function : the same value and datatype in the input and > output of this function. > If you want to query detail data from fact table > *require*: > 1.Configure the column which not dimensions to "VALUE" or "SUM" measure.(If > not configure measure function in the column will get NULL value) > 2.The source table must has an unique value column and configure it as > dimension. > If you have the better solution please comment here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KYLIN-1323) Improve performance of converting data to hfile
[ https://issues.apache.org/jira/browse/KYLIN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang reassigned KYLIN-1323: - Assignee: liyang (was: Yerui Sun) > Improve performance of converting data to hfile > --- > > Key: KYLIN-1323 > URL: https://issues.apache.org/jira/browse/KYLIN-1323 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.2 >Reporter: Yerui Sun >Assignee: liyang > Fix For: v1.4.0, v1.3.0 > > Attachments: KYLIN-1323-1.x-staging.2.patch, > KYLIN-1323-1.x-staging.patch, KYLIN-1323-2.x-staging.2.patch > > > Supposed that we got 100GB data after cuboid building, and with setting that > 10GB per region. For now, 10 split keys was calculated, and 10 region > created, 10 reducer used in ‘convert to hfile’ step. > With optimization, we could calculate 100 (or more) split keys, and use all > them in ‘covert to file’ step, but sampled 10 keys in them to create regions. > The result is still 10 region created, but 100 reducer used in ‘convert to > file’ step. Of course, the hfile created is also 100, and load 10 files per > region. That’s should be fine, doesn’t affect the query performance > dramatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1472) Export csv get error when there is a plus sign in the sql
[ https://issues.apache.org/jira/browse/KYLIN-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong,Jason resolved KYLIN-1472. Resolution: Fixed Fix Version/s: (was: Backlog) v1.5.1 v1.3.1 > Export csv get error when there is a plus sign in the sql > - > > Key: KYLIN-1472 > URL: https://issues.apache.org/jira/browse/KYLIN-1472 > Project: Kylin > Issue Type: Bug > Components: Web >Affects Versions: v1.4.0, v1.2 >Reporter: nichunen >Assignee: Zhong,Jason > Fix For: v1.3.1, v1.5.1 > > Attachments: KYLIN-1472-FOR1X.patch, KYLIN-1472-FOR2X.patch > > > For example, query the sample cube with "select max(price)+min(price) from > KYLIN_SALES", get the result on the web window. But click the "export" button > get an error message "Encountered \"min\" at line 1, column 19. Was expecting > one of...". > This is because the export button visit the api url directly, in the url, the > plus sign is treated as blank, so kylin server get sql "select max(price) > min(price) from KYLIN_SALES" which is an invalid sql. > I will submit two patches for 1.x and 2.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KYLIN-1323) Improve performance of converting data to hfile
[ https://issues.apache.org/jira/browse/KYLIN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang reopened KYLIN-1323: --- Reopen for v1.5 MR engine V2 > Improve performance of converting data to hfile > --- > > Key: KYLIN-1323 > URL: https://issues.apache.org/jira/browse/KYLIN-1323 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.2 >Reporter: Yerui Sun >Assignee: Yerui Sun > Fix For: v1.4.0, v1.3.0 > > Attachments: KYLIN-1323-1.x-staging.2.patch, > KYLIN-1323-1.x-staging.patch, KYLIN-1323-2.x-staging.2.patch > > > Supposed that we got 100GB data after cuboid building, and with setting that > 10GB per region. For now, 10 split keys was calculated, and 10 region > created, 10 reducer used in ‘convert to hfile’ step. > With optimization, we could calculate 100 (or more) split keys, and use all > them in ‘covert to file’ step, but sampled 10 keys in them to create regions. > The result is still 10 region created, but 100 reducer used in ‘convert to > file’ step. Of course, the hfile created is also 100, and load 10 files per > region. That’s should be fine, doesn’t affect the query performance > dramatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and
[ https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203275#comment-15203275 ] Hao Chen commented on KYLIN-1505: - Cool, thanks Yang! > Combine guava filters with Predicates.and > -- > > Key: KYLIN-1505 > URL: https://issues.apache.org/jira/browse/KYLIN-1505 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > - Combine guava filters with Predicates.and(filters) > - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
[ https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen updated KYLIN-1434: Fix Version/s: v1.5.1 > Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment > > > Key: KYLIN-1434 > URL: https://issues.apache.org/jira/browse/KYLIN-1434 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > The API request for Job Monitor page like: > {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for > fetching only 15 row records (25.1 KB), which is too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
[ https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen updated KYLIN-1434: Affects Version/s: v1.5.0 v1.3.0 > Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment > > > Key: KYLIN-1434 > URL: https://issues.apache.org/jira/browse/KYLIN-1434 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > The API request for Job Monitor page like: > {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for > fetching only 15 row records (25.1 KB), which is too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
[ https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen resolved KYLIN-1434. - Resolution: Resolved The response should be speeded up by about 3 times with the changes in KYLIN-1504, KYLIN-1505, KYLIN-1506: Original: {code} Response time: 6+ = 2.5: {scan jobs keys + keys resorting time + scan job values} + 2.5:{scan outputs keys + keys resorting time + scan outputs values} + 1: {duplicated filtering times} {code} Now: {code} Response time: 2+ = 1: {scan job time} + 1:{scan outputs time} {code} So treat this ticket as resolved. > Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment > > > Key: KYLIN-1434 > URL: https://issues.apache.org/jira/browse/KYLIN-1434 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.4.0 >Reporter: Hao Chen >Assignee: Hao Chen > > The API request for Job Monitor page like: > {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for > fetching only 15 row records (25.1 KB), which is too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
[ https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203269#comment-15203269 ] Hao Chen commented on KYLIN-1434: - [~liyang.g...@gmail.com] cool, thanks for the rapid action. > Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment > > > Key: KYLIN-1434 > URL: https://issues.apache.org/jira/browse/KYLIN-1434 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.4.0 >Reporter: Hao Chen >Assignee: Hao Chen > > The API request for Job Monitor page like: > {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for > fetching only 15 row records (25.1 KB), which is too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH
[ https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang reassigned KYLIN-1507: - Assignee: liyang > Couldn't find hive dependency jar on some platform like CDH > --- > > Key: KYLIN-1507 > URL: https://issues.apache.org/jira/browse/KYLIN-1507 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v1.5.0 >Reporter: Shaofeng SHI >Assignee: liyang > Fix For: v1.5.1 > > > Reported by user ianzeng in u...@kylin.apache.org mailing list: > I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But > got error msg as follow: > 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; > Ignoring. > 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2016-03-18 18:18:43,509 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. > Instead, use dfs.metrics.session-id > 2016-03-18 18:18:43,921 INFO [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output > Committer Algorithm version is 1 > 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047) > at > org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045) > ... 8 more > And -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH
[ https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203265#comment-15203265 ] liyang commented on KYLIN-1507: --- Attempted a fix, commit e4c795cf7e7e7bd62d7d615a7c327fe244c1dbcc Didn't verify on CDH however. Please reopen the JIRA if the problem still exists. > Couldn't find hive dependency jar on some platform like CDH > --- > > Key: KYLIN-1507 > URL: https://issues.apache.org/jira/browse/KYLIN-1507 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v1.5.0 >Reporter: Shaofeng SHI > Fix For: v1.5.1 > > > Reported by user ianzeng in u...@kylin.apache.org mailing list: > I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But > got error msg as follow: > 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; > Ignoring. > 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2016-03-18 18:18:43,509 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. > Instead, use dfs.metrics.session-id > 2016-03-18 18:18:43,921 INFO [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output > Committer Algorithm version is 1 > 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047) > at > org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045) > ... 8 more > And -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH
[ https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang resolved KYLIN-1507. --- Resolution: Fixed Fix Version/s: v1.5.1 > Couldn't find hive dependency jar on some platform like CDH > --- > > Key: KYLIN-1507 > URL: https://issues.apache.org/jira/browse/KYLIN-1507 > Project: Kylin > Issue Type: Bug > Components: General >Affects Versions: v1.5.0 >Reporter: Shaofeng SHI >Assignee: liyang > Fix For: v1.5.1 > > > Reported by user ianzeng in u...@kylin.apache.org mailing list: > I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But > got error msg as follow: > 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: hadoop.ssl.server.conf; > Ignoring. > 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: > job.xml:an attempt to override final parameter: > mapreduce.job.end-notification.max.attempts; Ignoring. > 2016-03-18 18:18:43,509 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. > Instead, use dfs.metrics.session-id > 2016-03-18 18:18:43,921 INFO [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output > Committer Algorithm version is 1 > 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047) > at > org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Caused by: java.lang.ClassNotFoundException: Class > org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045) > ... 8 more > And -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203254#comment-15203254 ] liyang edited comment on KYLIN-1506 at 3/20/16 11:52 AM: - I refactor the interface to be {code}getAllResources(String folderPath, long timeStart, long timeEndExclusive, Class clazz, Serializer serializer){code} and dropped the {{rangeStart}} / {{rangeEnd}} interface. ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / {{rangeEnd}} interface is not very appropriate and is odd to implement on {{FileResourceStore}}. Modification is based on Hao's work. Thanks! was (Author: liyang.g...@gmail.com): I refactor the interface to be {code}getAllResources(String folderPath, Class clazz, Serializer serializer){code} and dropped the {{rangeStart}} / {{rangeEnd}} interface. ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / {{rangeEnd}} interface is not very appropriate and is odd to implement on {{FileResourceStore}}. Modification is based on Hao's work. Thanks! > Refactor resource interface for timeseries-based data like jobs to much > better performance > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
[ https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203256#comment-15203256 ] liyang commented on KYLIN-1434: --- KYLIN-1504, KYLIN-1505, KYLIN-1506 are processed. > Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment > > > Key: KYLIN-1434 > URL: https://issues.apache.org/jira/browse/KYLIN-1434 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.4.0 >Reporter: Hao Chen >Assignee: Hao Chen > > The API request for Job Monitor page like: > {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for > fetching only 15 row records (25.1 KB), which is too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203254#comment-15203254 ] liyang commented on KYLIN-1506: --- I refactor the interface to be {code}getAllResources(String folderPath, Class clazz, Serializer serializer){code} and dropped the {{rangeStart}} / {{rangeEnd}} interface. ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / {{rangeEnd}} interface is not very appropriate and is odd to implement on {{FileResourceStore}}. Modification is based on Hao's work. Thanks! > Refactor resource interface for timeseries-based data like jobs to much > better performance > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and
[ https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203255#comment-15203255 ] liyang commented on KYLIN-1505: --- Merged and committed. Thanks Hao! > Combine guava filters with Predicates.and > -- > > Key: KYLIN-1505 > URL: https://issues.apache.org/jira/browse/KYLIN-1505 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > - Combine guava filters with Predicates.and(filters) > - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1122) Kylin support detail data query from fact table
[ https://issues.apache.org/jira/browse/KYLIN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203245#comment-15203245 ] Xiaoyu Wang commented on KYLIN-1122: for the {{SQLDigest.isRawQuery}} property . I think it derive from both {{groupbyColumns}} and {{metricColumns}} will be correct! if only derive from {{groupbyColumns}}, the sql: "select sum(price) as GMV, count(1) as TRANS_CNT from test_kylin_fact" will be identified "RawQuery". and in "RawQuery",it will add RAW agg function on all column which defined RAW measure,and remove SUM agg on the column which hack from {{OLAPEnumerator}}. that is not expect! [~liyang.g...@gmail.com] > Kylin support detail data query from fact table > --- > > Key: KYLIN-1122 > URL: https://issues.apache.org/jira/browse/KYLIN-1122 > Project: Kylin > Issue Type: New Feature > Components: Query Engine >Affects Versions: v1.2 >Reporter: Xiaoyu Wang >Assignee: liyang > Fix For: Backlog > > Attachments: > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(2.x-staging).patch, > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(update-v2-1.x-staging).patch, > > 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact-new-impl-under-refactoring-2.x-staging.patch > > > Now Kylin does not support query correct detail rows from fact table like: > select column1,column2,column3 from fact_table > The jira KYLIN-1075 add the "SUM" function on the measure column if defined. > But only the column number type is support. > I change some code to support this issue: > Add a "VALUE" measure function : the same value and datatype in the input and > output of this function. > If you want to query detail data from fact table > *require*: > 1.Configure the column which not dimensions to "VALUE" or "SUM" measure.(If > not configure measure function in the column will get NULL value) > 2.The source table must has an unique value column and configure it as > dimension. > If you have the better solution please comment here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1505) Combine guava filters with Predicates.and
[ https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen resolved KYLIN-1505. - Resolution: Resolved Patches are accepted and merged together with https://issues.apache.org/jira/browse/KYLIN-1506 into codebase at https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761 > Combine guava filters with Predicates.and > -- > > Key: KYLIN-1505 > URL: https://issues.apache.org/jira/browse/KYLIN-1505 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > - Combine guava filters with Predicates.and(filters) > - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen resolved KYLIN-1506. - Resolution: Resolved Patches are accepted and merged into code base at https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761 > Refactor resource interface for timeseries-based data like jobs to much > better performance > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen reopened KYLIN-1506: - > Refactor resource interface for timeseries-based data like jobs to much > better performance > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen resolved KYLIN-1506. - Resolution: Fixed Patches are accepted and merged into code base at https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761 > Refactor resource interface for timeseries-based data like jobs to much > better performance > -- > > Key: KYLIN-1506 > URL: https://issues.apache.org/jira/browse/KYLIN-1506 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: patch > > h1. Problem > Currently all operations like getJobOutputs/getJobs and so on are use > two-times scan to get the response, for example, currently the scan always: > 1. Get keys, sort, get first and last key (in fact which is just get by > prefix filter) with "store.listResources(resourcePath)" > 2. Re-scan the keys with timestamp filter: > "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)" > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > NavigableSet resources = > store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT); > if (resources == null || resources.isEmpty()) { > return Collections.emptyList(); > } > // Collections.sort(resources); > String rangeStart = resources.first(); > String rangeEnd = resources.last(); > return store.getAllResources(rangeStart, rangeEnd, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} > h2. Solution > In fact we could simply combine the two-times scan into one directly: > {code} > store.getAllResources(resourcePath,startTime, endTime, Class, Serializer) > store.getAllResources(resourcePath, Class, Serializer) > {code} > For example, refactored "List getJobOutputs(long > timeStartInMillis, long timeEndInMillis)" as following: > {code} > public List getJobOutputs(long timeStartInMillis, long > timeEndInMillis) throws PersistentException { > try { > return > store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, > timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, > JOB_OUTPUT_SERIALIZER); > } catch (IOException e) { > logger.error("error get all Jobs:", e); > throw new PersistentException(e); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1505) Combine guava filters with Predicates.and
[ https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen updated KYLIN-1505: Summary: Combine guava filters with Predicates.and (was: Combine guava filters with Predicates.and and combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists) > Combine guava filters with Predicates.and > -- > > Key: KYLIN-1505 > URL: https://issues.apache.org/jira/browse/KYLIN-1505 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > - Combine guava filters with Predicates.and(filters) > - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and and combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists
[ https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203152#comment-15203152 ] Hao Chen commented on KYLIN-1505: - Will not Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists because http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase > Combine guava filters with Predicates.and and combine hbase RowKeyOnlyFilter > and PrefixFilter with FilterLists > -- > > Key: KYLIN-1505 > URL: https://issues.apache.org/jira/browse/KYLIN-1505 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Fix For: v1.5.1 > > > - Combine guava filters with Predicates.and(filters) > - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side
[ https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hao Chen resolved KYLIN-1504. - Resolution: Resolved Resolve the ticket and the pull request https://github.com/apache/kylin/pull/29 has been merged into master branch at https://github.com/apache/kylin/commit/801fb83b22e6a737ca9c43155a4860951bf370a2 > Use NavigableSet to store rowkey and use prefix filter to check resource path > prefix instead String comparison on tomcat side > - > > Key: KYLIN-1504 > URL: https://issues.apache.org/jira/browse/KYLIN-1504 > Project: Kylin > Issue Type: Improvement > Components: Metadata, REST Service >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: jobs, metadata > Fix For: v1.5.1 > > > - Use NavigableSet instead of ArrayList to store natively > ordered and unique row-key instead of ugly repeatedly using > `Collections.sort` or check whether existing on business logic layer, in fact > because the raw-key is originally sorted in hbase, the change won't consume > any more computational complexity. > - Verify prefix in hbase region level using prefix filter instead of > comparing String in client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side
[ https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203139#comment-15203139 ] Hao Chen commented on KYLIN-1504: - Thanks [~liyang.g...@gmail.com] > Use NavigableSet to store rowkey and use prefix filter to check resource path > prefix instead String comparison on tomcat side > - > > Key: KYLIN-1504 > URL: https://issues.apache.org/jira/browse/KYLIN-1504 > Project: Kylin > Issue Type: Improvement > Components: Metadata, REST Service >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: jobs, metadata > Fix For: v1.5.1 > > > - Use NavigableSet instead of ArrayList to store natively > ordered and unique row-key instead of ugly repeatedly using > `Collections.sort` or check whether existing on business logic layer, in fact > because the raw-key is originally sorted in hbase, the change won't consume > any more computational complexity. > - Verify prefix in hbase region level using prefix filter instead of > comparing String in client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side
[ https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203137#comment-15203137 ] liyang edited comment on KYLIN-1504 at 3/20/16 7:23 AM: Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter commit because a) range scan is equally good performance[1]; b) want to keep the KeyOnlyFilter for reduced traffic. [1] http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase was (Author: liyang.g...@gmail.com): Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter commit because a) range scan is equally good performance[1]; b) want to keep the KeyOnlyFilter for reduced traffic. > Use NavigableSet to store rowkey and use prefix filter to check resource path > prefix instead String comparison on tomcat side > - > > Key: KYLIN-1504 > URL: https://issues.apache.org/jira/browse/KYLIN-1504 > Project: Kylin > Issue Type: Improvement > Components: Metadata, REST Service >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: jobs, metadata > Fix For: v1.5.1 > > > - Use NavigableSet instead of ArrayList to store natively > ordered and unique row-key instead of ugly repeatedly using > `Collections.sort` or check whether existing on business logic layer, in fact > because the raw-key is originally sorted in hbase, the change won't consume > any more computational complexity. > - Verify prefix in hbase region level using prefix filter instead of > comparing String in client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side
[ https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203137#comment-15203137 ] liyang commented on KYLIN-1504: --- Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter commit because a) range scan is equally good performance[1]; b) want to keep the KeyOnlyFilter for reduced traffic. > Use NavigableSet to store rowkey and use prefix filter to check resource path > prefix instead String comparison on tomcat side > - > > Key: KYLIN-1504 > URL: https://issues.apache.org/jira/browse/KYLIN-1504 > Project: Kylin > Issue Type: Improvement > Components: Metadata, REST Service >Affects Versions: v1.5.0, v1.4.0, v1.3.0 >Reporter: Hao Chen >Assignee: Hao Chen > Labels: jobs, metadata > Fix For: v1.5.1 > > > - Use NavigableSet instead of ArrayList to store natively > ordered and unique row-key instead of ugly repeatedly using > `Collections.sort` or check whether existing on business logic layer, in fact > because the raw-key is originally sorted in hbase, the change won't consume > any more computational complexity. > - Verify prefix in hbase region level using prefix filter instead of > comparing String in client side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1249) A client library to help automatic cube
[ https://issues.apache.org/jira/browse/KYLIN-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nichunen updated KYLIN-1249: Attachment: (was: KYLIN-1472-Package.patch) > A client library to help automatic cube > --- > > Key: KYLIN-1249 > URL: https://issues.apache.org/jira/browse/KYLIN-1249 > Project: Kylin > Issue Type: New Feature > Components: Tools, Build and Test >Affects Versions: v1.2 >Reporter: nichunen >Assignee: hongbin ma > Fix For: v1.3.1 > > Attachments: KYLIN-1249-DOC.patch, KYLIN-1249.patch > > > As there is a strong demand for a client library to help automatic cube > building/refreshing, we will contribute our kylin client tool to kylin. > The tool is based on kylin rest apis, and is developed with python. As > discussed with Hongbin, we will do some simplification work. The main > function of the tool will be job creation, job status check, job kill, job > scheduling, job failover. Also, we will reserve some other features like > simple cube definition and cube batch create for your choice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)