[jira] [Commented] (KYLIN-1441) Display time column as partition column

2016-03-20 Thread qianqiaoneng (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203810#comment-15203810
 ] 

qianqiaoneng commented on KYLIN-1441:
-

#1. I mean is on cube build confirm dialog, there is an "End Date(Exclude)" 
that user need to specify when trigger a build from GUI. I think this input 
need to be changed to "End Date Time (Exclude)" when Kylin supports date and 
time as the partition column(s).

#2, Yes, you are right.

> Display time column as partition column
> ---
>
> Key: KYLIN-1441
> URL: https://issues.apache.org/jira/browse/KYLIN-1441
> Project: Kylin
>  Issue Type: Task
>  Components: REST Service, Web 
>Reporter: Dipesh
>Assignee: Dipesh
> Fix For: Backlog
>
> Attachments: 
> 0001-KYLIN-1441-Display-time-column-as-partition-column.patch
>
>
> There are requirements to support time column as partition column when 
> creating cube in the cube designer. Display time column if present as 
> possible choice for partition column.
> Backend changes for using time column as a partition column is covered here  
> https://issues.apache.org/jira/browse/KYLIN-1427



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1431) Define stream config at table level, instead of on cube level

2016-03-20 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-1431.
-
Resolution: Fixed

> Define stream config at table level, instead of on cube level
> -
>
> Key: KYLIN-1431
> URL: https://issues.apache.org/jira/browse/KYLIN-1431
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, streaming, Web 
>Affects Versions: v1.5.0, v1.4.0
>Reporter: Shaofeng SHI
>Assignee: Zhong,Jason
> Fix For: v1.5.0
>
>
> In 2.0 streaming, user need enter the kafka information when create the cube, 
> like the topic, the broker list, etc; while these info should be independent 
> with cube, and can be reused across cubes which share the same table.
> The expected design is, define kafka config when adding the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1431) Define stream config at table level, instead of on cube level

2016-03-20 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI closed KYLIN-1431.
---

> Define stream config at table level, instead of on cube level
> -
>
> Key: KYLIN-1431
> URL: https://issues.apache.org/jira/browse/KYLIN-1431
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, streaming, Web 
>Affects Versions: v1.5.0, v1.4.0
>Reporter: Shaofeng SHI
>Assignee: Zhong,Jason
> Fix For: v1.5.0
>
>
> In 2.0 streaming, user need enter the kafka information when create the cube, 
> like the topic, the broker list, etc; while these info should be independent 
> with cube, and can be reused across cubes which share the same table.
> The expected design is, define kafka config when adding the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1431) Define stream config at table level, instead of on cube level

2016-03-20 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-1431:

Fix Version/s: v1.5.0

> Define stream config at table level, instead of on cube level
> -
>
> Key: KYLIN-1431
> URL: https://issues.apache.org/jira/browse/KYLIN-1431
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, streaming, Web 
>Affects Versions: v1.5.0, v1.4.0
>Reporter: Shaofeng SHI
>Assignee: Zhong,Jason
> Fix For: v1.5.0
>
>
> In 2.0 streaming, user need enter the kafka information when create the cube, 
> like the topic, the broker list, etc; while these info should be independent 
> with cube, and can be reused across cubes which share the same table.
> The expected design is, define kafka config when adding the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1511) model SelectedColumnMeta cannot be parsed by Jackson

2016-03-20 Thread Lola Liu (JIRA)
Lola Liu created KYLIN-1511:
---

 Summary: model SelectedColumnMeta cannot be parsed by Jackson
 Key: KYLIN-1511
 URL: https://issues.apache.org/jira/browse/KYLIN-1511
 Project: Kylin
  Issue Type: Bug
  Components: REST Service
Affects Versions: v1.4.0
Reporter: Lola Liu
Assignee: liyang


class SelectedColumnMeta is immutable and doesn’t have a default constructor.

When trying to deserialize a JSON String to SelectedColumnMeta an Exception 
“JsonMappingException: No suitable constructor found” is thrown.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View

2016-03-20 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203739#comment-15203739
 ] 

hongbin ma commented on KYLIN-1510:
---

my mistake, I'll close this issue. thanks [~sunyerui]

> Need to build a cube which has LOOKUP table referring Hive View
> ---
>
> Key: KYLIN-1510
> URL: https://issues.apache.org/jira/browse/KYLIN-1510
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>  Labels: newbie
>
> as this issue is raised more than once, I think we should treat it as a 
> common issue and fix it soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View

2016-03-20 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma resolved KYLIN-1510.
---
Resolution: Duplicate

> Need to build a cube which has LOOKUP table referring Hive View
> ---
>
> Key: KYLIN-1510
> URL: https://issues.apache.org/jira/browse/KYLIN-1510
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>  Labels: newbie
>
> as this issue is raised more than once, I think we should treat it as a 
> common issue and fix it soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1510) Need to build a cube which has LOOKUP table referring Hive View

2016-03-20 Thread Yerui Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203685#comment-15203685
 ] 

Yerui Sun commented on KYLIN-1510:
--

This seems duplicates KYLIN-1077, please [~mahongbin] noticed it.

> Need to build a cube which has LOOKUP table referring Hive View
> ---
>
> Key: KYLIN-1510
> URL: https://issues.apache.org/jira/browse/KYLIN-1510
> Project: Kylin
>  Issue Type: Bug
>Reporter: hongbin ma
>Assignee: hongbin ma
>  Labels: newbie
>
> as this issue is raised more than once, I think we should treat it as a 
> common issue and fix it soon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1238) Docker version for Kylin 1.0 or latest stable release

2016-03-20 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203307#comment-15203307
 ] 

Gabor Liptak commented on KYLIN-1238:
-

Kylin 1.2 was published at https://hub.docker.com/r/sequenceiq/kylin/tags/

> Docker version for Kylin 1.0 or latest stable release
> -
>
> Key: KYLIN-1238
> URL: https://issues.apache.org/jira/browse/KYLIN-1238
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Santosh
>Assignee: Shaofeng SHI
>
> Current Docker container is Kylin 0.7 which is an old version. Latest stable 
> Kylin release should be included in Docker container and made available on 
> Kylin website. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1506) Refactor time-based filter on resource

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen updated KYLIN-1506:

Description: 
h1. Problem

Currently all operations like getJobOutputs/getJobs and so on are use two-times 
scan to get the response, for example, currently the scan always:
1. Get keys, sort, get first and last key (in fact which is just get by prefix 
filter) with "store.listResources(resourcePath)"
2. Re-scan the keys with timestamp filter: 
"store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"

{code}
public List getJobOutputs(long timeStartInMillis, long 
timeEndInMillis) throws PersistentException {
try {
NavigableSet resources = 
store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
if (resources == null || resources.isEmpty()) {
return Collections.emptyList();
}
// Collections.sort(resources);
String rangeStart = resources.first();
String rangeEnd = resources.last();
return store.getAllResources(rangeStart, rangeEnd, 
timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
JOB_OUTPUT_SERIALIZER);
} catch (IOException e) {
logger.error("error get all Jobs:", e);
throw new PersistentException(e);
}
}
{code}

h2. Solution
In fact we could simply combine the two-times scan into one directly:
{code}
store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
{code}

  was:
H1. Problem

Currently all operations like getJobOutputs/getJobs and so on are use two-times 
scan to get the response, for example, currently the scan always:
1. Get keys, sort, get first and last key (in fact which is just get by prefix 
filter) with "store.listResources(resourcePath)"
2. Re-scan the keys with timestamp filter: 
"store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"

{code}
public List getJobOutputs(long timeStartInMillis, long 
timeEndInMillis) throws PersistentException {
try {
NavigableSet resources = 
store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
if (resources == null || resources.isEmpty()) {
return Collections.emptyList();
}
// Collections.sort(resources);
String rangeStart = resources.first();
String rangeEnd = resources.last();
return store.getAllResources(rangeStart, rangeEnd, 
timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
JOB_OUTPUT_SERIALIZER);
} catch (IOException e) {
logger.error("error get all Jobs:", e);
throw new PersistentException(e);
}
}
{code}

H2. Solution
In fact we could simply combine the two-times scan into one directly:
{code}
store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
{code}


> Refactor time-based filter on resource
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: Hao Chen
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KYLIN-1498) cube desc signature not calculated correctly

2016-03-20 Thread hongbin ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hongbin ma reopened KYLIN-1498:
---

current cube desc calculateSignature failed to capture new key fields like 
engineType, storage type, etc.

> cube desc signature not calculated correctly
> 
>
> Key: KYLIN-1498
> URL: https://issues.apache.org/jira/browse/KYLIN-1498
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> Currently cube desc's signature does not  take model's signature into account 
> (only takes model's name). So even when model is changed the cube side is 
> unaware. 
> I'd suggest to add a signature for each model, and the cube desc's signature 
> calculation will take that as a parameter as well as other fields in cube 
> desc itself. when model's signature changes, cube desc's changes too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1122) Kylin support detail data query from fact table

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203282#comment-15203282
 ] 

liyang commented on KYLIN-1122:
---

Agree!

> Kylin support detail data query from fact table
> ---
>
> Key: KYLIN-1122
> URL: https://issues.apache.org/jira/browse/KYLIN-1122
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v1.2
>Reporter: Xiaoyu Wang
>Assignee: liyang
> Fix For: Backlog
>
> Attachments: 
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(2.x-staging).patch, 
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(update-v2-1.x-staging).patch,
>  
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact-new-impl-under-refactoring-2.x-staging.patch
>
>
> Now Kylin does not support query correct detail rows from fact table like:
> select column1,column2,column3 from fact_table
> The jira KYLIN-1075 add the "SUM" function on the measure column if defined.
> But only the column number type is support.
> I change some code to support this issue:
> Add a "VALUE" measure function : the same value and datatype in the input and 
> output of this function.
> If you want to query detail data from fact table
> *require*:
> 1.Configure the column which not dimensions to "VALUE" or "SUM" measure.(If 
> not configure measure function in the column will get NULL value)
> 2.The source table must has an unique value column and configure it as 
> dimension.
> If you have the better solution please comment here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KYLIN-1323) Improve performance of converting data to hfile

2016-03-20 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang reassigned KYLIN-1323:
-

Assignee: liyang  (was: Yerui Sun)

> Improve performance of converting data to hfile
> ---
>
> Key: KYLIN-1323
> URL: https://issues.apache.org/jira/browse/KYLIN-1323
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.2
>Reporter: Yerui Sun
>Assignee: liyang
> Fix For: v1.4.0, v1.3.0
>
> Attachments: KYLIN-1323-1.x-staging.2.patch, 
> KYLIN-1323-1.x-staging.patch, KYLIN-1323-2.x-staging.2.patch
>
>
> Supposed that we got 100GB data after cuboid building, and with setting that 
> 10GB per region. For now, 10 split keys was calculated, and 10 region 
> created, 10 reducer used in ‘convert to hfile’ step. 
> With optimization, we could calculate 100 (or more) split keys, and use all 
> them in ‘covert to file’ step, but sampled 10 keys in them to create regions. 
> The result is still 10 region created, but 100 reducer used in ‘convert to 
> file’ step. Of course, the hfile created is also 100, and load 10 files per 
> region. That’s should be fine, doesn’t affect the query performance 
> dramatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1472) Export csv get error when there is a plus sign in the sql

2016-03-20 Thread Zhong,Jason (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhong,Jason resolved KYLIN-1472.

   Resolution: Fixed
Fix Version/s: (was: Backlog)
   v1.5.1
   v1.3.1

> Export csv get error when there is a plus sign in the sql
> -
>
> Key: KYLIN-1472
> URL: https://issues.apache.org/jira/browse/KYLIN-1472
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v1.4.0, v1.2
>Reporter: nichunen
>Assignee: Zhong,Jason
> Fix For: v1.3.1, v1.5.1
>
> Attachments: KYLIN-1472-FOR1X.patch, KYLIN-1472-FOR2X.patch
>
>
> For example, query the sample cube with "select max(price)+min(price) from 
> KYLIN_SALES", get the result on the web window. But click the "export" button 
> get an error message "Encountered \"min\" at line 1, column 19. Was expecting 
> one of...".
> This is because the export button visit the api url directly, in the url, the 
> plus sign is treated as blank, so kylin server get sql "select max(price) 
> min(price) from KYLIN_SALES" which is an invalid sql.
> I will submit two patches for 1.x and 2.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KYLIN-1323) Improve performance of converting data to hfile

2016-03-20 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang reopened KYLIN-1323:
---

Reopen for v1.5 MR engine V2

> Improve performance of converting data to hfile
> ---
>
> Key: KYLIN-1323
> URL: https://issues.apache.org/jira/browse/KYLIN-1323
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.2
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.4.0, v1.3.0
>
> Attachments: KYLIN-1323-1.x-staging.2.patch, 
> KYLIN-1323-1.x-staging.patch, KYLIN-1323-2.x-staging.2.patch
>
>
> Supposed that we got 100GB data after cuboid building, and with setting that 
> 10GB per region. For now, 10 split keys was calculated, and 10 region 
> created, 10 reducer used in ‘convert to hfile’ step. 
> With optimization, we could calculate 100 (or more) split keys, and use all 
> them in ‘covert to file’ step, but sampled 10 keys in them to create regions. 
> The result is still 10 region created, but 100 reducer used in ‘convert to 
> file’ step. Of course, the hfile created is also 100, and load 10 files per 
> region. That’s should be fine, doesn’t affect the query performance 
> dramatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and

2016-03-20 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203275#comment-15203275
 ] 

Hao Chen commented on KYLIN-1505:
-

Cool, thanks Yang!

> Combine guava filters with Predicates.and 
> --
>
> Key: KYLIN-1505
> URL: https://issues.apache.org/jira/browse/KYLIN-1505
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> - Combine guava filters with Predicates.and(filters)
> - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen updated KYLIN-1434:

Fix Version/s: v1.5.1

> Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
> 
>
> Key: KYLIN-1434
> URL: https://issues.apache.org/jira/browse/KYLIN-1434
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> The API request for Job Monitor page like: 
> {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for 
> fetching only 15 row records (25.1 KB), which is too slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen updated KYLIN-1434:

Affects Version/s: v1.5.0
   v1.3.0

> Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
> 
>
> Key: KYLIN-1434
> URL: https://issues.apache.org/jira/browse/KYLIN-1434
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> The API request for Job Monitor page like: 
> {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for 
> fetching only 15 row records (25.1 KB), which is too slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen resolved KYLIN-1434.
-
Resolution: Resolved


The response should be speeded up by about 3 times with the changes in 
KYLIN-1504, KYLIN-1505, KYLIN-1506:

Original: 
{code}
Response time: 6+ = 2.5: {scan jobs keys  + keys resorting time + scan job 
values} + 2.5:{scan outputs keys + keys resorting time + scan outputs values}  
+ 1: {duplicated filtering times}
{code}

Now:
{code}
Response time: 2+ = 1: {scan job time} + 1:{scan outputs time}
{code}

So treat this ticket as resolved.


> Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
> 
>
> Key: KYLIN-1434
> URL: https://issues.apache.org/jira/browse/KYLIN-1434
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>
> The API request for Job Monitor page like: 
> {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for 
> fetching only 15 row records (25.1 KB), which is too slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment

2016-03-20 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203269#comment-15203269
 ] 

Hao Chen commented on KYLIN-1434:
-

[~liyang.g...@gmail.com] cool, thanks for the rapid action.

> Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
> 
>
> Key: KYLIN-1434
> URL: https://issues.apache.org/jira/browse/KYLIN-1434
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>
> The API request for Job Monitor page like: 
> {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for 
> fetching only 15 row records (25.1 KB), which is too slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH

2016-03-20 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang reassigned KYLIN-1507:
-

Assignee: liyang

> Couldn't find hive dependency jar on some platform like CDH
> ---
>
> Key: KYLIN-1507
> URL: https://issues.apache.org/jira/browse/KYLIN-1507
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v1.5.0
>Reporter: Shaofeng SHI
>Assignee: liyang
> Fix For: v1.5.1
>
>
> Reported by user ianzeng  in u...@kylin.apache.org mailing list:
>   I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But 
> got error msg as follow:
> 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> Ignoring.
> 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2016-03-18 18:18:43,509 INFO [main] 
> org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 2016-03-18 18:18:43,921 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
> Committer Algorithm version is 1
> 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorProcessTree : [ ]
> 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
>   at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
>   ... 8 more
> And 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203265#comment-15203265
 ] 

liyang commented on KYLIN-1507:
---

Attempted a fix, commit e4c795cf7e7e7bd62d7d615a7c327fe244c1dbcc

Didn't verify on CDH however. Please reopen the JIRA if the problem still 
exists.

> Couldn't find hive dependency jar on some platform like CDH
> ---
>
> Key: KYLIN-1507
> URL: https://issues.apache.org/jira/browse/KYLIN-1507
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v1.5.0
>Reporter: Shaofeng SHI
> Fix For: v1.5.1
>
>
> Reported by user ianzeng  in u...@kylin.apache.org mailing list:
>   I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But 
> got error msg as follow:
> 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> Ignoring.
> 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2016-03-18 18:18:43,509 INFO [main] 
> org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 2016-03-18 18:18:43,921 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
> Committer Algorithm version is 1
> 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorProcessTree : [ ]
> 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
>   at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
>   ... 8 more
> And 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1507) Couldn't find hive dependency jar on some platform like CDH

2016-03-20 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-1507.
---
   Resolution: Fixed
Fix Version/s: v1.5.1

> Couldn't find hive dependency jar on some platform like CDH
> ---
>
> Key: KYLIN-1507
> URL: https://issues.apache.org/jira/browse/KYLIN-1507
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v1.5.0
>Reporter: Shaofeng SHI
>Assignee: liyang
> Fix For: v1.5.1
>
>
> Reported by user ianzeng  in u...@kylin.apache.org mailing list:
>   I has installed kylin 1.5 on redhead 6.3. I try build sample cube. But 
> got error msg as follow:
> 2016-03-18 18:18:43,084 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> Ignoring.
> 2016-03-18 18:18:43,093 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2016-03-18 18:18:43,509 INFO [main] 
> org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
> Instead, use dfs.metrics.session-id
> 2016-03-18 18:18:43,921 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
> Committer Algorithm version is 1
> 2016-03-18 18:18:43,933 INFO [main] org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorProcessTree : [ ]
> 2016-03-18 18:18:44,120 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
>   at 
> org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hive.hcatalog.mapreduce.HCatInputFormat not found
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
>   at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
>   ... 8 more
> And 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203254#comment-15203254
 ] 

liyang edited comment on KYLIN-1506 at 3/20/16 11:52 AM:
-

I refactor the interface to be
{code}getAllResources(String folderPath, long timeStart, long timeEndExclusive, 
Class clazz, Serializer serializer){code}
and dropped the {{rangeStart}} / {{rangeEnd}} interface.

ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / 
{{rangeEnd}} interface is not very appropriate and is odd to implement on 
{{FileResourceStore}}.

Modification is based on Hao's work. Thanks!




was (Author: liyang.g...@gmail.com):
I refactor the interface to be
{code}getAllResources(String folderPath, Class clazz, Serializer 
serializer){code}
and dropped the {{rangeStart}} / {{rangeEnd}} interface.

ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / 
{{rangeEnd}} interface is not very appropriate and is odd to implement on 
{{FileResourceStore}}.

Modification is based on Hao's work. Thanks!



> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1434) Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203256#comment-15203256
 ] 

liyang commented on KYLIN-1434:
---

KYLIN-1504, KYLIN-1505, KYLIN-1506 are processed.

> Kylin Job Monitor API: /kylin/api/jobs is too slow in large kylin deployment
> 
>
> Key: KYLIN-1434
> URL: https://issues.apache.org/jira/browse/KYLIN-1434
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.4.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>
> The API request for Job Monitor page like: 
> {code}/kylin/api/jobs?limit=15&offset=15{code} takes more than 11 seconds for 
> fetching only 15 row records (25.1 KB), which is too slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203254#comment-15203254
 ] 

liyang commented on KYLIN-1506:
---

I refactor the interface to be
{code}getAllResources(String folderPath, Class clazz, Serializer 
serializer){code}
and dropped the {{rangeStart}} / {{rangeEnd}} interface.

ResourceStore has a directory model, not a k-v model. The {{rangeStart}} / 
{{rangeEnd}} interface is not very appropriate and is odd to implement on 
{{FileResourceStore}}.

Modification is based on Hao's work. Thanks!



> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203255#comment-15203255
 ] 

liyang commented on KYLIN-1505:
---

Merged and committed.

Thanks Hao!

> Combine guava filters with Predicates.and 
> --
>
> Key: KYLIN-1505
> URL: https://issues.apache.org/jira/browse/KYLIN-1505
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> - Combine guava filters with Predicates.and(filters)
> - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1122) Kylin support detail data query from fact table

2016-03-20 Thread Xiaoyu Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203245#comment-15203245
 ] 

Xiaoyu Wang commented on KYLIN-1122:


for the {{SQLDigest.isRawQuery}} property .
I think it derive from both {{groupbyColumns}} and {{metricColumns}} will be 
correct!
if only derive from {{groupbyColumns}}, the sql: "select sum(price) as GMV, 
count(1) as TRANS_CNT from test_kylin_fact" will be identified "RawQuery". and 
in "RawQuery",it will add RAW agg function on all column which defined RAW 
measure,and remove SUM agg on the column which hack from {{OLAPEnumerator}}. 
that is not expect! [~liyang.g...@gmail.com]

> Kylin support detail data query from fact table
> ---
>
> Key: KYLIN-1122
> URL: https://issues.apache.org/jira/browse/KYLIN-1122
> Project: Kylin
>  Issue Type: New Feature
>  Components: Query Engine
>Affects Versions: v1.2
>Reporter: Xiaoyu Wang
>Assignee: liyang
> Fix For: Backlog
>
> Attachments: 
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(2.x-staging).patch, 
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact(update-v2-1.x-staging).patch,
>  
> 0001-KYLIN-1122-Kylin-support-detail-data-query-from-fact-new-impl-under-refactoring-2.x-staging.patch
>
>
> Now Kylin does not support query correct detail rows from fact table like:
> select column1,column2,column3 from fact_table
> The jira KYLIN-1075 add the "SUM" function on the measure column if defined.
> But only the column number type is support.
> I change some code to support this issue:
> Add a "VALUE" measure function : the same value and datatype in the input and 
> output of this function.
> If you want to query detail data from fact table
> *require*:
> 1.Configure the column which not dimensions to "VALUE" or "SUM" measure.(If 
> not configure measure function in the column will get NULL value)
> 2.The source table must has an unique value column and configure it as 
> dimension.
> If you have the better solution please comment here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1505) Combine guava filters with Predicates.and

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen resolved KYLIN-1505.
-
Resolution: Resolved

Patches are accepted and merged together with 
https://issues.apache.org/jira/browse/KYLIN-1506 into codebase at 
https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761

> Combine guava filters with Predicates.and 
> --
>
> Key: KYLIN-1505
> URL: https://issues.apache.org/jira/browse/KYLIN-1505
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> - Combine guava filters with Predicates.and(filters)
> - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen resolved KYLIN-1506.
-
Resolution: Resolved

Patches are accepted and merged into code base at 
https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761

> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen reopened KYLIN-1506:
-

> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1506) Refactor resource interface for timeseries-based data like jobs to much better performance

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen resolved KYLIN-1506.
-
Resolution: Fixed

Patches are accepted and merged into code base at 
https://github.com/apache/kylin/commit/6df837fa7abbeba0edd13e099150dc1590e31761

> Refactor resource interface for timeseries-based data like jobs to much 
> better performance
> --
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use 
> two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by 
> prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: 
> "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> NavigableSet resources = 
> store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List getJobOutputs(long 
> timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List getJobOutputs(long timeStartInMillis, long 
> timeEndInMillis) throws PersistentException {
> try {
> return 
> store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, 
> timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, 
> JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1505) Combine guava filters with Predicates.and

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen updated KYLIN-1505:

Summary: Combine guava filters with Predicates.and   (was: Combine guava 
filters with Predicates.and and combine hbase RowKeyOnlyFilter and PrefixFilter 
with FilterLists)

> Combine guava filters with Predicates.and 
> --
>
> Key: KYLIN-1505
> URL: https://issues.apache.org/jira/browse/KYLIN-1505
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> - Combine guava filters with Predicates.and(filters)
> - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1505) Combine guava filters with Predicates.and and combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists

2016-03-20 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203152#comment-15203152
 ] 

Hao Chen commented on KYLIN-1505:
-

Will not Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists 
because 
http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase

> Combine guava filters with Predicates.and and combine hbase RowKeyOnlyFilter 
> and PrefixFilter with FilterLists
> --
>
> Key: KYLIN-1505
> URL: https://issues.apache.org/jira/browse/KYLIN-1505
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
> Fix For: v1.5.1
>
>
> - Combine guava filters with Predicates.and(filters)
> - Combine hbase RowKeyOnlyFilter and PrefixFilter with FilterLists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side

2016-03-20 Thread Hao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Chen resolved KYLIN-1504.
-
Resolution: Resolved

Resolve the ticket and the pull request https://github.com/apache/kylin/pull/29 
has been merged into master branch at 
https://github.com/apache/kylin/commit/801fb83b22e6a737ca9c43155a4860951bf370a2

> Use NavigableSet to store rowkey and use prefix filter to check resource path 
> prefix instead String comparison on tomcat side
> -
>
> Key: KYLIN-1504
> URL: https://issues.apache.org/jira/browse/KYLIN-1504
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, REST Service
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: jobs, metadata
> Fix For: v1.5.1
>
>
> - Use NavigableSet instead of ArrayList to store natively 
> ordered and unique row-key instead of ugly repeatedly using 
> `Collections.sort` or check whether existing on business logic layer, in fact 
> because the raw-key is originally sorted in hbase, the change won't consume 
> any more computational complexity.
> - Verify prefix in hbase region level using prefix filter instead of 
> comparing String in client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side

2016-03-20 Thread Hao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203139#comment-15203139
 ] 

Hao Chen commented on KYLIN-1504:
-

Thanks [~liyang.g...@gmail.com]

> Use NavigableSet to store rowkey and use prefix filter to check resource path 
> prefix instead String comparison on tomcat side
> -
>
> Key: KYLIN-1504
> URL: https://issues.apache.org/jira/browse/KYLIN-1504
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, REST Service
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: jobs, metadata
> Fix For: v1.5.1
>
>
> - Use NavigableSet instead of ArrayList to store natively 
> ordered and unique row-key instead of ugly repeatedly using 
> `Collections.sort` or check whether existing on business logic layer, in fact 
> because the raw-key is originally sorted in hbase, the change won't consume 
> any more computational complexity.
> - Verify prefix in hbase region level using prefix filter instead of 
> comparing String in client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203137#comment-15203137
 ] 

liyang edited comment on KYLIN-1504 at 3/20/16 7:23 AM:


Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter 
commit because a) range scan is equally good performance[1]; b) want to keep 
the KeyOnlyFilter for reduced traffic.

[1] 
http://stackoverflow.com/questions/10942638/should-i-user-prefixfilter-or-rowkey-range-scan-in-hbase



was (Author: liyang.g...@gmail.com):
Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter 
commit because a) range scan is equally good performance[1]; b) want to keep 
the KeyOnlyFilter for reduced traffic.

> Use NavigableSet to store rowkey and use prefix filter to check resource path 
> prefix instead String comparison on tomcat side
> -
>
> Key: KYLIN-1504
> URL: https://issues.apache.org/jira/browse/KYLIN-1504
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, REST Service
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: jobs, metadata
> Fix For: v1.5.1
>
>
> - Use NavigableSet instead of ArrayList to store natively 
> ordered and unique row-key instead of ugly repeatedly using 
> `Collections.sort` or check whether existing on business logic layer, in fact 
> because the raw-key is originally sorted in hbase, the change won't consume 
> any more computational complexity.
> - Verify prefix in hbase region level using prefix filter instead of 
> comparing String in client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1504) Use NavigableSet to store rowkey and use prefix filter to check resource path prefix instead String comparison on tomcat side

2016-03-20 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203137#comment-15203137
 ] 

liyang commented on KYLIN-1504:
---

Thanks Hao! Merged #29 with some revision. Didn't include the PrefixFilter 
commit because a) range scan is equally good performance[1]; b) want to keep 
the KeyOnlyFilter for reduced traffic.

> Use NavigableSet to store rowkey and use prefix filter to check resource path 
> prefix instead String comparison on tomcat side
> -
>
> Key: KYLIN-1504
> URL: https://issues.apache.org/jira/browse/KYLIN-1504
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, REST Service
>Affects Versions: v1.5.0, v1.4.0, v1.3.0
>Reporter: Hao Chen
>Assignee: Hao Chen
>  Labels: jobs, metadata
> Fix For: v1.5.1
>
>
> - Use NavigableSet instead of ArrayList to store natively 
> ordered and unique row-key instead of ugly repeatedly using 
> `Collections.sort` or check whether existing on business logic layer, in fact 
> because the raw-key is originally sorted in hbase, the change won't consume 
> any more computational complexity.
> - Verify prefix in hbase region level using prefix filter instead of 
> comparing String in client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1249) A client library to help automatic cube

2016-03-20 Thread nichunen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nichunen updated KYLIN-1249:

Attachment: (was: KYLIN-1472-Package.patch)

> A client library to help automatic cube
> ---
>
> Key: KYLIN-1249
> URL: https://issues.apache.org/jira/browse/KYLIN-1249
> Project: Kylin
>  Issue Type: New Feature
>  Components: Tools, Build and Test
>Affects Versions: v1.2
>Reporter: nichunen
>Assignee: hongbin ma
> Fix For: v1.3.1
>
> Attachments: KYLIN-1249-DOC.patch, KYLIN-1249.patch
>
>
> As  there is  a strong demand for a client library to help automatic cube 
> building/refreshing, we will contribute our kylin client tool to kylin.
> The tool is based on kylin rest apis, and is developed with python. As 
> discussed with Hongbin, we will do some simplification work. The main 
> function of the tool will be job creation, job status check, job kill, job 
> scheduling, job failover. Also, we will reserve some other features like 
> simple cube definition and cube batch create for your choice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)