[jira] [Updated] (KYLIN-3368) "/kylin/kylin_metadata/metadata/" has many garbage for spark cubing

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3368:

Summary: "/kylin/kylin_metadata/metadata/" has many garbage for spark 
cubing  (was: "/kylin/kylin_metadata/metadata/" has many gargage for spark 
cubing)

> "/kylin/kylin_metadata/metadata/" has many garbage for spark cubing
> ---
>
> Key: KYLIN-3368
> URL: https://issues.apache.org/jira/browse/KYLIN-3368
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.2.0, v2.3.0
>Reporter: Shaofeng SHI
>Priority: Major
>
> If using Spark as cube engine, Kylin will dump metadata to HDFS in 
> "/kylin/kylin_metadata/metadata/"; As time goes on, many files were left 
> there.
>  
> They should be deleted when the job is finished.
>  
> hadoop fs -ls /kylin/kylin_metadata/metadata
> Found 583 items
> drwxr-xr-x - root hdfs 0 2018-02-11 16:20 
> /kylin/kylin_metadata/metadata/007d8b6c-5db1-478f-a373-7fd57231494b
> drwxr-xr-x - root hdfs 0 2018-02-21 08:24 
> /kylin/kylin_metadata/metadata/00f93335-0d85-4c19-953c-5e4660f4d58e
> drwxr-xr-x - root hdfs 0 2018-03-21 10:17 
> /kylin/kylin_metadata/metadata/011d3c7b-e707-42ca-b355-a0037820159b
> drwxr-xr-x - root hdfs 0 2018-04-07 00:41 
> /kylin/kylin_metadata/metadata/0263789f-969a-485f-aa6c-df4394a45df8
> drwxr-xr-x - root hdfs 0 2018-02-23 00:24 
> /kylin/kylin_metadata/metadata/02977825-92cd-475d-8a3f-0f75255a9ba9
> drwxr-xr-x - root hdfs 0 2018-02-15 08:24 
> /kylin/kylin_metadata/metadata/02e058fd-ac6a-451f-99b4-f0b03ad6e3a5
> drwxr-xr-x - root hdfs 0 2018-03-15 10:21 
> /kylin/kylin_metadata/metadata/0337c6c9-aebb-4ae0-8485-9905283fc818
> drwxr-xr-x - root hdfs 0 2018-02-23 12:22 
> /kylin/kylin_metadata/metadata/03d7f34f-10dc-4329-9dde-f646cc040ba9
> drwxr-xr-x - root hdfs 0 2018-04-10 12:20 
> /kylin/kylin_metadata/metadata/03fa7654-7b14-4bda-aed8-64ac9d08081a
> drwxr-xr-x - root hdfs 0 2018-03-30 00:23 
> /kylin/kylin_metadata/metadata/0406e5bb-112f-4c54-bdf8-7b1e34652522
> drwxr-xr-x - root hdfs 0 2018-03-24 00:33 
> /kylin/kylin_metadata/metadata/0426907d-541f-4f23-9db7-9a965f5b6739
> drwxr-xr-x - root hdfs 0 2018-02-07 15:03 
> /kylin/kylin_metadata/metadata/04771302-2090-4793-956d-bf5ba9360d9b
> drwxr-xr-x - root hdfs 0 2018-03-30 00:20 
> /kylin/kylin_metadata/metadata/0482b928-6b09-48d1-909f-68c66f71a827
> drwxr-xr-x - root hdfs 0 2018-03-04 16:36 
> /kylin/kylin_metadata/metadata/04943bbd-8425-4304-b9c8-cc13a55e8ac1
> drwxr-xr-x - root hdfs 0 2018-03-05 08:35 
> /kylin/kylin_metadata/metadata/04ee2c3d-bd07-4e7a-90db-3d87d088ff7d
> drwxr-xr-x - root hdfs 0 2018-03-02 16:24 
> /kylin/kylin_metadata/metadata/05808a9d-2e55-4dc3-8c28-5b662068dda8
> drwxr-xr-x - root hdfs 0 2018-03-04 08:35 
> /kylin/kylin_metadata/metadata/05c63d7c-4ea5-48fc-897d-6db9d78ffaf7
> drwxr-xr-x - root hdfs 0 2018-03-01 01:02 
> /kylin/kylin_metadata/metadata/05ea3568-e85f-452a-b44d-4657145a724a



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3368) "/kylin/kylin_metadata/metadata/" has many gargage for spark cubing

2018-05-04 Thread Shaofeng SHI (JIRA)
Shaofeng SHI created KYLIN-3368:
---

 Summary: "/kylin/kylin_metadata/metadata/" has many gargage for 
spark cubing
 Key: KYLIN-3368
 URL: https://issues.apache.org/jira/browse/KYLIN-3368
 Project: Kylin
  Issue Type: Bug
  Components: Spark Engine
Affects Versions: v2.3.0, v2.2.0
Reporter: Shaofeng SHI


If using Spark as cube engine, Kylin will dump metadata to HDFS in 
"/kylin/kylin_metadata/metadata/"; As time goes on, many files were left there.

 

They should be deleted when the job is finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3368) "/kylin/kylin_metadata/metadata/" has many gargage for spark cubing

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3368:

Description: 
If using Spark as cube engine, Kylin will dump metadata to HDFS in 
"/kylin/kylin_metadata/metadata/"; As time goes on, many files were left there.

 

They should be deleted when the job is finished.

 

hadoop fs -ls /kylin/kylin_metadata/metadata
Found 583 items
drwxr-xr-x - root hdfs 0 2018-02-11 16:20 
/kylin/kylin_metadata/metadata/007d8b6c-5db1-478f-a373-7fd57231494b
drwxr-xr-x - root hdfs 0 2018-02-21 08:24 
/kylin/kylin_metadata/metadata/00f93335-0d85-4c19-953c-5e4660f4d58e
drwxr-xr-x - root hdfs 0 2018-03-21 10:17 
/kylin/kylin_metadata/metadata/011d3c7b-e707-42ca-b355-a0037820159b
drwxr-xr-x - root hdfs 0 2018-04-07 00:41 
/kylin/kylin_metadata/metadata/0263789f-969a-485f-aa6c-df4394a45df8
drwxr-xr-x - root hdfs 0 2018-02-23 00:24 
/kylin/kylin_metadata/metadata/02977825-92cd-475d-8a3f-0f75255a9ba9
drwxr-xr-x - root hdfs 0 2018-02-15 08:24 
/kylin/kylin_metadata/metadata/02e058fd-ac6a-451f-99b4-f0b03ad6e3a5
drwxr-xr-x - root hdfs 0 2018-03-15 10:21 
/kylin/kylin_metadata/metadata/0337c6c9-aebb-4ae0-8485-9905283fc818
drwxr-xr-x - root hdfs 0 2018-02-23 12:22 
/kylin/kylin_metadata/metadata/03d7f34f-10dc-4329-9dde-f646cc040ba9
drwxr-xr-x - root hdfs 0 2018-04-10 12:20 
/kylin/kylin_metadata/metadata/03fa7654-7b14-4bda-aed8-64ac9d08081a
drwxr-xr-x - root hdfs 0 2018-03-30 00:23 
/kylin/kylin_metadata/metadata/0406e5bb-112f-4c54-bdf8-7b1e34652522
drwxr-xr-x - root hdfs 0 2018-03-24 00:33 
/kylin/kylin_metadata/metadata/0426907d-541f-4f23-9db7-9a965f5b6739
drwxr-xr-x - root hdfs 0 2018-02-07 15:03 
/kylin/kylin_metadata/metadata/04771302-2090-4793-956d-bf5ba9360d9b
drwxr-xr-x - root hdfs 0 2018-03-30 00:20 
/kylin/kylin_metadata/metadata/0482b928-6b09-48d1-909f-68c66f71a827
drwxr-xr-x - root hdfs 0 2018-03-04 16:36 
/kylin/kylin_metadata/metadata/04943bbd-8425-4304-b9c8-cc13a55e8ac1
drwxr-xr-x - root hdfs 0 2018-03-05 08:35 
/kylin/kylin_metadata/metadata/04ee2c3d-bd07-4e7a-90db-3d87d088ff7d
drwxr-xr-x - root hdfs 0 2018-03-02 16:24 
/kylin/kylin_metadata/metadata/05808a9d-2e55-4dc3-8c28-5b662068dda8
drwxr-xr-x - root hdfs 0 2018-03-04 08:35 
/kylin/kylin_metadata/metadata/05c63d7c-4ea5-48fc-897d-6db9d78ffaf7
drwxr-xr-x - root hdfs 0 2018-03-01 01:02 
/kylin/kylin_metadata/metadata/05ea3568-e85f-452a-b44d-4657145a724a

  was:
If using Spark as cube engine, Kylin will dump metadata to HDFS in 
"/kylin/kylin_metadata/metadata/"; As time goes on, many files were left there.

 

They should be deleted when the job is finished.


> "/kylin/kylin_metadata/metadata/" has many gargage for spark cubing
> ---
>
> Key: KYLIN-3368
> URL: https://issues.apache.org/jira/browse/KYLIN-3368
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.2.0, v2.3.0
>Reporter: Shaofeng SHI
>Priority: Major
>
> If using Spark as cube engine, Kylin will dump metadata to HDFS in 
> "/kylin/kylin_metadata/metadata/"; As time goes on, many files were left 
> there.
>  
> They should be deleted when the job is finished.
>  
> hadoop fs -ls /kylin/kylin_metadata/metadata
> Found 583 items
> drwxr-xr-x - root hdfs 0 2018-02-11 16:20 
> /kylin/kylin_metadata/metadata/007d8b6c-5db1-478f-a373-7fd57231494b
> drwxr-xr-x - root hdfs 0 2018-02-21 08:24 
> /kylin/kylin_metadata/metadata/00f93335-0d85-4c19-953c-5e4660f4d58e
> drwxr-xr-x - root hdfs 0 2018-03-21 10:17 
> /kylin/kylin_metadata/metadata/011d3c7b-e707-42ca-b355-a0037820159b
> drwxr-xr-x - root hdfs 0 2018-04-07 00:41 
> /kylin/kylin_metadata/metadata/0263789f-969a-485f-aa6c-df4394a45df8
> drwxr-xr-x - root hdfs 0 2018-02-23 00:24 
> /kylin/kylin_metadata/metadata/02977825-92cd-475d-8a3f-0f75255a9ba9
> drwxr-xr-x - root hdfs 0 2018-02-15 08:24 
> /kylin/kylin_metadata/metadata/02e058fd-ac6a-451f-99b4-f0b03ad6e3a5
> drwxr-xr-x - root hdfs 0 2018-03-15 10:21 
> /kylin/kylin_metadata/metadata/0337c6c9-aebb-4ae0-8485-9905283fc818
> drwxr-xr-x - root hdfs 0 2018-02-23 12:22 
> /kylin/kylin_metadata/metadata/03d7f34f-10dc-4329-9dde-f646cc040ba9
> drwxr-xr-x - root hdfs 0 2018-04-10 12:20 
> /kylin/kylin_metadata/metadata/03fa7654-7b14-4bda-aed8-64ac9d08081a
> drwxr-xr-x - root hdfs 0 2018-03-30 00:23 
> /kylin/kylin_metadata/metadata/0406e5bb-112f-4c54-bdf8-7b1e34652522
> drwxr-xr-x - root hdfs 0 2018-03-24 00:33 
> /kylin/kylin_metadata/metadata/0426907d-541f-4f23-9db7-9a965f5b6739
> drwxr-xr-x - root hdfs 0 2018-02-07 15:03 
> /kylin/kylin_metadata/metadata/04771302-2090-4793-956d-bf5ba9360d9b
> drwxr-xr-x - root hdfs 0 2018-03-30 00:20 
> /kylin/kylin_metadata/metadata/0482b928-6b09-48d1-909f-68c66f71a827
> drwxr-xr-x - root hdfs 0 2018-03-04 16:36 
> 

[jira] [Commented] (KYLIN-3366) Configure automatic enabling of cubes after a build process

2018-05-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KYLIN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463614#comment-16463614
 ] 

Roberto Tardío Olmos commented on KYLIN-3366:
-

Many thanks Julian and good work! This patch fit my requirement. 

> Configure automatic enabling of cubes after a build process
> ---
>
> Key: KYLIN-3366
> URL: https://issues.apache.org/jira/browse/KYLIN-3366
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.3.1
> Environment: Kylin 2.3.1 and Hadoop EMR 5.7
>Reporter: Roberto Tardío Olmos
>Assignee: Pan, Julian
>Priority: Minor
>  Labels: features
> Attachments: KYLIN-3366.patch
>
>
> Kylin automatically enables the disabled cubes after a construction process. 
> This behavior forces us to constantly disable a new cube that is under 
> development to replace an existing and enabled cube. If we do not disable it, 
> we could have problems with the routing of the queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3367) Add the compatibility for new version of hbase

2018-05-04 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated KYLIN-3367:
---
Description: 
The Hbase version is 1.4.3. The newly hbase version add some new method in 
*{{Table}}* and *{{ResultScanner}}* interface.
 So we should add the implementation method.

org.apache.hadoop.hbase.client.ResultScanner.java

 
{code:java}
/**
 * Allow the client to renew the scanner's lease on the server.
 * @return true if the lease was successfully renewed, false otherwise.
 */
boolean renewLease();

/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
{code:java}
org.apache.hadoop.hbase.client.Table.java

/**
 * Set timeout (millisecond) of each operation in this Table instance, will 
override the value
 * of hbase.client.operation.timeout in configuration.
 * Operation timeout is a top-level restriction that makes sure a blocking 
method will not be
 * blocked more than this. In each operation, if rpc request fails because of 
timeout or
 * other reason, it will retry until success or throw a 
RetriesExhaustedException. But if the
 * total time being blocking reach the operation timeout before retries 
exhausted, it will break
 * early and throw SocketTimeoutException.
 * @param operationTimeout the total timeout of each operation in millisecond.
 */
public void setOperationTimeout(int operationTimeout);

/**
 * Get timeout (millisecond) of each operation for in Table instance.
 */
public int getOperationTimeout();

/**
 * Get timeout (millisecond) of each rpc request in this Table instance.
 *
 * @returns Currently configured read timeout
 * @deprecated Use getReadRpcTimeout or getWriteRpcTimeout instead
 */
@Deprecated
int getRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc request in operations of this Table 
instance, will
 * override the value of hbase.rpc.timeout in configuration.
 * If a rpc request waiting too long, it will stop waiting and send a new 
request to retry until
 * retries exhausted or operation timeout reached.
 * 
 * NOTE: This will set both the read and write timeout settings to the provided 
value.
 *
 * @param rpcTimeout the timeout of each rpc request in millisecond.
 *
 * @deprecated Use setReadRpcTimeout or setWriteRpcTimeout instead
 */
@Deprecated
void setRpcTimeout(int rpcTimeout);

/**
 * Get timeout (millisecond) of each rpc read request in this Table instance.
 */
int getReadRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc read request in operations of this 
Table instance, will
 * override the value of hbase.rpc.read.timeout in configuration.
 * If a rpc read request waiting too long, it will stop waiting and send a new 
request to retry
 * until retries exhausted or operation timeout reached.
 *
 * @param readRpcTimeout
 */
void setReadRpcTimeout(int readRpcTimeout);

/**
 * Get timeout (millisecond) of each rpc write request in this Table instance.
 */
int getWriteRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc write request in operations of this 
Table instance, will
 * override the value of hbase.rpc.write.timeout in configuration.
 * If a rpc write request waiting too long, it will stop waiting and send a new 
request to retry
 * until retries exhausted or operation timeout reached.
 *
 * @param writeRpcTimeout
 */
void setWriteRpcTimeout(int writeRpcTimeout);
{code}
 

  was:
The Hbase version is 1.4.3. The newly hbase version add some new method in 
*{{Table}}* and *{{ResultScanner}}* interface.
 So we should add the implementation method.

ResultScanner.java

 
{code:java}
/**
 * Allow the client to renew the scanner's lease on the server.
 * @return true if the lease was successfully renewed, false otherwise.
 */
boolean renewLease();

/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
{code:java}
org.apache.hadoop.hbase.client.Table.java

/**
 * Set timeout (millisecond) of each operation in this Table instance, will 
override the value
 * of hbase.client.operation.timeout in configuration.
 * Operation timeout is a top-level restriction that makes sure a blocking 
method will not be
 * blocked more than this. In each operation, if rpc request fails because of 
timeout or
 * other reason, it will retry until success or throw a 
RetriesExhaustedException. But if the
 * total time being blocking reach the operation timeout before retries 
exhausted, it will break
 * early and throw SocketTimeoutException.
 * @param operationTimeout the total timeout of each operation in millisecond.
 */
public void setOperationTimeout(int operationTimeout);

/**
 * Get timeout (millisecond) of each operation for in Table instance.
 */
public int getOperationTimeout();

/**
 * Get timeout (millisecond) of each rpc request in this Table instance.
 *
 * @returns Currently configured read timeout
 * @deprecated Use 

[jira] [Updated] (KYLIN-3367) Add the compatibility for new version of hbase

2018-05-04 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated KYLIN-3367:
---
Description: 
The Hbase version is 1.4.3. The newly hbase version add some new method in 
*{{Table}}* and *{{ResultScanner}}* interface.
 So we should add the implementation method.

ResultScanner.java

 
{code:java}
/**
 * Allow the client to renew the scanner's lease on the server.
 * @return true if the lease was successfully renewed, false otherwise.
 */
boolean renewLease();

/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
{code:java}
org.apache.hadoop.hbase.client.Table.java

/**
 * Set timeout (millisecond) of each operation in this Table instance, will 
override the value
 * of hbase.client.operation.timeout in configuration.
 * Operation timeout is a top-level restriction that makes sure a blocking 
method will not be
 * blocked more than this. In each operation, if rpc request fails because of 
timeout or
 * other reason, it will retry until success or throw a 
RetriesExhaustedException. But if the
 * total time being blocking reach the operation timeout before retries 
exhausted, it will break
 * early and throw SocketTimeoutException.
 * @param operationTimeout the total timeout of each operation in millisecond.
 */
public void setOperationTimeout(int operationTimeout);

/**
 * Get timeout (millisecond) of each operation for in Table instance.
 */
public int getOperationTimeout();

/**
 * Get timeout (millisecond) of each rpc request in this Table instance.
 *
 * @returns Currently configured read timeout
 * @deprecated Use getReadRpcTimeout or getWriteRpcTimeout instead
 */
@Deprecated
int getRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc request in operations of this Table 
instance, will
 * override the value of hbase.rpc.timeout in configuration.
 * If a rpc request waiting too long, it will stop waiting and send a new 
request to retry until
 * retries exhausted or operation timeout reached.
 * 
 * NOTE: This will set both the read and write timeout settings to the provided 
value.
 *
 * @param rpcTimeout the timeout of each rpc request in millisecond.
 *
 * @deprecated Use setReadRpcTimeout or setWriteRpcTimeout instead
 */
@Deprecated
void setRpcTimeout(int rpcTimeout);

/**
 * Get timeout (millisecond) of each rpc read request in this Table instance.
 */
int getReadRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc read request in operations of this 
Table instance, will
 * override the value of hbase.rpc.read.timeout in configuration.
 * If a rpc read request waiting too long, it will stop waiting and send a new 
request to retry
 * until retries exhausted or operation timeout reached.
 *
 * @param readRpcTimeout
 */
void setReadRpcTimeout(int readRpcTimeout);

/**
 * Get timeout (millisecond) of each rpc write request in this Table instance.
 */
int getWriteRpcTimeout();

/**
 * Set timeout (millisecond) of each rpc write request in operations of this 
Table instance, will
 * override the value of hbase.rpc.write.timeout in configuration.
 * If a rpc write request waiting too long, it will stop waiting and send a new 
request to retry
 * until retries exhausted or operation timeout reached.
 *
 * @param writeRpcTimeout
 */
void setWriteRpcTimeout(int writeRpcTimeout);
{code}
 

  was:
The Hbase version is 1.4.3. The newly hbase version add some new method in 
*{{Table}}* and *{{ResultScanner}}* interface.
So we should add the implementation method.

ResultScanner.java

 
{code:java}
/**
 * Allow the client to renew the scanner's lease on the server.
 * @return true if the lease was successfully renewed, false otherwise.
 */
boolean renewLease();

/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
{code:java}
/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
 


> Add the compatibility for new version of hbase
> --
>
> Key: KYLIN-3367
> URL: https://issues.apache.org/jira/browse/KYLIN-3367
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.4.0
>Reporter: wan kun
>Priority: Minor
>
> The Hbase version is 1.4.3. The newly hbase version add some new method in 
> *{{Table}}* and *{{ResultScanner}}* interface.
>  So we should add the implementation method.
> ResultScanner.java
>  
> {code:java}
> /**
>  * Allow the client to renew the scanner's lease on the server.
>  * @return true if the lease was successfully renewed, false otherwise.
>  */
> boolean renewLease();
> /**
>  * @return the scan metrics, or {@code null} if we do not enable metrics.
>  */
> ScanMetrics getScanMetrics();
> {code}
> {code:java}
> 

[jira] [Created] (KYLIN-3367) Add the compatibility for new version of hbase

2018-05-04 Thread wan kun (JIRA)
wan kun created KYLIN-3367:
--

 Summary: Add the compatibility for new version of hbase
 Key: KYLIN-3367
 URL: https://issues.apache.org/jira/browse/KYLIN-3367
 Project: Kylin
  Issue Type: Bug
  Components: REST Service
Affects Versions: v2.4.0
Reporter: wan kun


The Hbase version is 1.4.3. The newly hbase version add some new method in 
*{{Table}}* and *{{ResultScanner}}* interface.
So we should add the implementation method.

ResultScanner.java

 
{code:java}
/**
 * Allow the client to renew the scanner's lease on the server.
 * @return true if the lease was successfully renewed, false otherwise.
 */
boolean renewLease();

/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
{code:java}
/**
 * @return the scan metrics, or {@code null} if we do not enable metrics.
 */
ScanMetrics getScanMetrics();
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3366) Configure automatic enabling of cubes after a build process

2018-05-04 Thread Pan, Julian (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463569#comment-16463569
 ] 

Pan, Julian commented on KYLIN-3366:


We met same issue, so attach the patch for our solution.

Add properties: kylin.job.cube-auto-ready-enabled to control enable or not. The 
default value is true.

Please add this attribute to cube configuration overwrite and set it to false 
if user do not want auto enable the cube when finish build.

Does it fit your requirement?

> Configure automatic enabling of cubes after a build process
> ---
>
> Key: KYLIN-3366
> URL: https://issues.apache.org/jira/browse/KYLIN-3366
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.3.1
> Environment: Kylin 2.3.1 and Hadoop EMR 5.7
>Reporter: Roberto Tardío Olmos
>Assignee: Pan, Julian
>Priority: Minor
>  Labels: features
> Attachments: KYLIN-3366.patch
>
>
> Kylin automatically enables the disabled cubes after a construction process. 
> This behavior forces us to constantly disable a new cube that is under 
> development to replace an existing and enabled cube. If we do not disable it, 
> we could have problems with the routing of the queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3366) Configure automatic enabling of cubes after a build process

2018-05-04 Thread Pan, Julian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pan, Julian updated KYLIN-3366:
---
Attachment: KYLIN-3366.patch

> Configure automatic enabling of cubes after a build process
> ---
>
> Key: KYLIN-3366
> URL: https://issues.apache.org/jira/browse/KYLIN-3366
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v2.3.1
> Environment: Kylin 2.3.1 and Hadoop EMR 5.7
>Reporter: Roberto Tardío Olmos
>Assignee: Pan, Julian
>Priority: Minor
>  Labels: features
> Attachments: KYLIN-3366.patch
>
>
> Kylin automatically enables the disabled cubes after a construction process. 
> This behavior forces us to constantly disable a new cube that is under 
> development to replace an existing and enabled cube. If we do not disable it, 
> we could have problems with the routing of the queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3115:

Fix Version/s: v2.4.0

> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Fix For: v2.4.0
>
>
> In class NDCuboidBuilder:
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> } 
> which will create a bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to 
> 256. The cube building job will succeed. While, the merge job will always 
> fail. Since in class MergeCuboidMapper method doMap:
> public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
> long cuboidID = rowKeySplitter.split(key.getBytes());
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
> System.arraycopy(bytes, offset, split.value, 0, colLength);
> offset += colLength;
> }
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column value length is 256 in bytes and is being copied to a bytes array 
> with length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users do have 
> not much Kylin knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3186) Add support for partitioning columns that combine date and time (e.g. YYYYMMDDHHMISS)

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463530#comment-16463530
 ] 

Shaofeng SHI commented on KYLIN-3186:
-

+1; Vsevolod, would you like to contribute a patch? 

> Add support for partitioning columns that combine date and time (e.g. 
> MMDDHHMISS)
> -
>
> Key: KYLIN-3186
> URL: https://issues.apache.org/jira/browse/KYLIN-3186
> Project: Kylin
>  Issue Type: Improvement
>  Components: Others
>Affects Versions: v2.2.0
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> In a multitude of existing enterprise applications partitioning is done on a 
> single column that fuse date and time into a single value (string, integer or 
> big integer). Typical formats are MMDDHHMM or  MMDDHHMMSS (e.g. 
> 201801181621 and 20180118154734).
> Such representation is human readable and provides natural sorting of the 
> date/time values.
> Lack of support for such date/time representation requires some ugly 
> workarounds, like creating views that split date and time into separate 
> columns or data copying into tables with different partitioning scheme, none 
> of which is a particularly good solution.
> More over, using views approach on Hive causes severe performance issues, due 
> to inability of Hive optimizer correctly analyze filtering conditions 
> auto-generated by Kylin during the flat table build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3351) Cube Planner not working in apche kylin 2.3.0(open Source)

2018-05-04 Thread praveenece (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463508#comment-16463508
 ] 

praveenece commented on KYLIN-3351:
---

Hi Zhong,

i builded System Cubes but Source records is zero.
whether i want to do any Data level?

Can you please Guide me?

Regards
Praveen. G






On Wed, May 2, 2018 at 1:34 PM, Zhong Yanghong (JIRA) 



> Cube Planner not working in apche kylin 2.3.0(open Source)
> --
>
> Key: KYLIN-3351
> URL: https://issues.apache.org/jira/browse/KYLIN-3351
> Project: Kylin
>  Issue Type: Task
>Reporter: praveenece
>Priority: Major
>
> Hi Team 
>    i want test Cube planner in apache-kylin(2.3.0),So i created cube with 
> segment, and i hit query to cube many times more than thousands but the cube 
> planner there is no change in cuboid level like color changing and not get 
> COUNT in exactly row count and other items.can u please guide me.
> Note:
> Could u tell me?
> Cube Planner Working only old Cube (like 3 month before) or new Cube also.
> Configuration(kylin.Properties)
> kylin.cube.cubeplanner.enabled=true
> Kylin Version=2.3.0(SINGLE  NODE)
> MyCube creation =10days before 
> I referred: http://kylin.apache.org/docs23/howto/howto_use_cube_planner.html
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3095) Use ArrayDeque instead of LinkedList for queue implementation

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3095:

Fix Version/s: v2.4.0

> Use ArrayDeque instead of LinkedList for queue implementation
> -
>
> Key: KYLIN-3095
> URL: https://issues.apache.org/jira/browse/KYLIN-3095
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee:  Kaige Liu
>Priority: Minor
>  Labels: parallel
> Fix For: v2.4.0
>
>
> Use ArrayDeque instead of LinkedList for queue implementation where thread 
> safety is not needed.
> From https://docs.oracle.com/javase/7/docs/api/java/util/ArrayDeque.html
> {quote}
> Resizable-array implementation of the Deque interface. Array deques have no 
> capacity restrictions; they grow as necessary to support usage. They are not 
> thread-safe; in the absence of external synchronization, they do not support 
> concurrent access by multiple threads. Null elements are prohibited. This 
> class is likely to be faster than Stack when used as a stack, and *faster 
> than LinkedList when used as a queue.*
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3295) Unused method SQLDigestUtil#appendTsFilterToExecute

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3295:

Fix Version/s: v2.4.0

> Unused method SQLDigestUtil#appendTsFilterToExecute
> ---
>
> Key: KYLIN-3295
> URL: https://issues.apache.org/jira/browse/KYLIN-3295
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: jiatao.tao
>Priority: Minor
> Fix For: v2.4.0
>
>
> SQLDigestUtil#appendTsFilterToExecute is not called anywhere.
> {code}
>T ret = action.apply(null);
> {code}
> Passing null to {{apply}} seems incorrect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-1948) IntegerDimEnc, does not encode -1 correctly

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-1948:

Fix Version/s: v2.4.0

> IntegerDimEnc, does not encode -1 correctly
> ---
>
> Key: KYLIN-1948
> URL: https://issues.apache.org/jira/browse/KYLIN-1948
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: jiatao.tao
>Priority: Major
> Fix For: v2.4.0
>
>
> The code for -1 is all 0xff, which is the code for NULL. Need a fix, since -1 
> is a common value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3187) JDK APIs using the default locale, time zone or character set should be avoided

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3187:

Fix Version/s: v2.4.0

> JDK APIs using the default locale, time zone or character set should be 
> avoided
> ---
>
> Key: KYLIN-3187
> URL: https://issues.apache.org/jira/browse/KYLIN-3187
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Reporter: Ted Yu
>Assignee:  Kaige Liu
>Priority: Major
>  Labels: usability
> Fix For: v2.4.0
>
>
> Here are a few examples:
> {code}
> server-base/src/main/java/org/apache/kylin/rest/service/JobService.java:  
>   Calendar calendar = Calendar.getInstance();
> storage-hbase/src/main/java/org/apache/kylin/storage/hbase/util/HbaseStreamingInput.java:
> Calendar cal = Calendar.getInstance();
> {code}
> Locale should be specified.
> See CALCITE-1667 for related information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3168) CubeHFileJob should use currentHBaseConfiguration but not new create hbase configuration

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3168:

Affects Version/s: (was: v2.4.0)
   v2.2.0
   v2.3.0
Fix Version/s: v2.4.0

> CubeHFileJob should use currentHBaseConfiguration but not new create hbase 
> configuration
> 
>
> Key: KYLIN-3168
> URL: https://issues.apache.org/jira/browse/KYLIN-3168
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.2.0, v2.3.0
>Reporter: wuyingjun
>Assignee: wuyingjun
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: AfterModified.png, CubeHFileJob_Exception.png, 
> KYLIN-3168.patch, 飞信截图20180124232443.png
>
>
> CubeHFileJob  use new hbase configuration is not correct because when the 
> zookeeper quorum is not localhost (job worker), kylin may account zookeeper 
> ConnectionLossException.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463495#comment-16463495
 ] 

Shaofeng SHI commented on KYLIN-3122:
-

Hi Vsevolod,

Recently we also observed such issue. Another Jira is created: KYLIN-3352 . 
Today when I go through the Jiras I found you already reported it monthes ago.

Usually the partition column is a "Date" or "Timestamp" type, for which we will 
not use trie-dictionary, then this problem wasn't discovered. 

Soon we will upload the patch for KYLIN-3352, please keep an eye on it and see 
whether it solves your problem.

 

Thanks for the reporting!

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Fix For: v2.4.0
>
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3122:

Fix Version/s: v2.4.0

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Fix For: v2.4.0
>
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-3309) Kylin ODBC driver login :Username and password unknown

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI closed KYLIN-3309.
---

> Kylin ODBC driver login :Username and password unknown
> --
>
> Key: KYLIN-3309
> URL: https://issues.apache.org/jira/browse/KYLIN-3309
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - ODBC
>Affects Versions: v2.3.0
>Reporter: ram
>Priority: Major
>
> we have installed kylin on cloudera CDH on AWS EC2 cluster.  The kylin is 
> working perfectly with this setup and cubes are created in Kylin.
> We need to access kylin via ODBC driver. But connection window for ODBC asks 
> for username and password. We are not sure what credentials(username and 
> password) should be used here. We are using username : ADMIN and password : 
> KYLIN to login but it fails to login. 
> Request to investigate and fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3309) Kylin ODBC driver login :Username and password unknown

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-3309.
-
Resolution: Not A Problem

> Kylin ODBC driver login :Username and password unknown
> --
>
> Key: KYLIN-3309
> URL: https://issues.apache.org/jira/browse/KYLIN-3309
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - ODBC
>Affects Versions: v2.3.0
>Reporter: ram
>Priority: Major
>
> we have installed kylin on cloudera CDH on AWS EC2 cluster.  The kylin is 
> working perfectly with this setup and cubes are created in Kylin.
> We need to access kylin via ODBC driver. But connection window for ODBC asks 
> for username and password. We are not sure what credentials(username and 
> password) should be used here. We are using username : ADMIN and password : 
> KYLIN to login but it fails to login. 
> Request to investigate and fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3309) Kylin ODBC driver login :Username and password unknown

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463491#comment-16463491
 ] 

Shaofeng SHI commented on KYLIN-3309:
-

For questions, please send to u...@kylin.apache.org; you need subscribe to the 
mailing list first: user-subscr...@kylin.apache.org

> Kylin ODBC driver login :Username and password unknown
> --
>
> Key: KYLIN-3309
> URL: https://issues.apache.org/jira/browse/KYLIN-3309
> Project: Kylin
>  Issue Type: Bug
>  Components: Driver - ODBC
>Affects Versions: v2.3.0
>Reporter: ram
>Priority: Major
>
> we have installed kylin on cloudera CDH on AWS EC2 cluster.  The kylin is 
> working perfectly with this setup and cubes are created in Kylin.
> We need to access kylin via ODBC driver. But connection window for ODBC asks 
> for username and password. We are not sure what credentials(username and 
> password) should be used here. We are using username : ADMIN and password : 
> KYLIN to login but it fails to login. 
> Request to investigate and fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3352) Segment pruning bug, e.g. date_col > "max_date+1"

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3352:

Affects Version/s: v2.1.0
   v2.2.0
   v2.3.0
Fix Version/s: v2.4.0

> Segment pruning bug, e.g. date_col > "max_date+1"
> -
>
> Key: KYLIN-3352
> URL: https://issues.apache.org/jira/browse/KYLIN-3352
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.1.0, v2.2.0, v2.3.0
>Reporter: liyang
>Assignee: liyang
>Priority: Major
> Fix For: v2.4.0
>
>
> Currently {{date_col > "max_date+1"}} is rounded down to {{date_col > 
> "max_date"}} during encoding and further evaluated as {{date_col >= 
> "max_date"}} during segment pruning. This causes a segment can be pruned is 
> not pruned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3168) CubeHFileJob should use currentHBaseConfiguration but not new create hbase configuration

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463483#comment-16463483
 ] 

Shaofeng SHI commented on KYLIN-3168:
-

I see it, thanks for the information. I will merge it.

> CubeHFileJob should use currentHBaseConfiguration but not new create hbase 
> configuration
> 
>
> Key: KYLIN-3168
> URL: https://issues.apache.org/jira/browse/KYLIN-3168
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.0
>Reporter: wuyingjun
>Assignee: wuyingjun
>Priority: Major
> Attachments: AfterModified.png, CubeHFileJob_Exception.png, 
> KYLIN-3168.patch, 飞信截图20180124232443.png
>
>
> CubeHFileJob  use new hbase configuration is not correct because when the 
> zookeeper quorum is not localhost (job worker), kylin may account zookeeper 
> ConnectionLossException.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3168) CubeHFileJob should use currentHBaseConfiguration but not new create hbase configuration

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3168:

Affects Version/s: (was: v2.2.0)
   v2.4.0

> CubeHFileJob should use currentHBaseConfiguration but not new create hbase 
> configuration
> 
>
> Key: KYLIN-3168
> URL: https://issues.apache.org/jira/browse/KYLIN-3168
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.4.0
>Reporter: wuyingjun
>Assignee: wuyingjun
>Priority: Major
> Attachments: AfterModified.png, CubeHFileJob_Exception.png, 
> KYLIN-3168.patch, 飞信截图20180124232443.png
>
>
> CubeHFileJob  use new hbase configuration is not correct because when the 
> zookeeper quorum is not localhost (job worker), kylin may account zookeeper 
> ConnectionLossException.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2683) Support reloading kerberos token of BeelineHiveClient

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463479#comment-16463479
 ] 

Shaofeng SHI commented on KYLIN-2683:
-

Hello Kanta, do you have any update on this? Does this patch work well in your 
environment?

> Support reloading kerberos token of BeelineHiveClient
> -
>
> Key: KYLIN-2683
> URL: https://issues.apache.org/jira/browse/KYLIN-2683
> Project: Kylin
>  Issue Type: Bug
>  Components: Security
>Reporter: Kanta Kuramoto
>Assignee: Kanta Kuramoto
>Priority: Minor
>  Labels: scope, security
> Attachments: KYLIN-2683.patch, kerberos_auth.png
>
>
> When the datasource is kerberized, the behavior of reloading kerberos token 
> is different between "Cube Build" and "Load Hive Table".
> I summarized the detail of this behavior in the attached image.
>  
> I think BeelineHiveClient#init shuold be implementad like following.
> http://appcrawler.com/wordpress/2015/06/18/examples-of-connecting-to-kerberos-hive-in-jdbc/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2932) Simplify the thread model for in-memory cubing

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463476#comment-16463476
 ] 

Shaofeng SHI commented on KYLIN-2932:
-

Hello Ken, I believe you already adopted this method in production? Can you 
share some performance and stability data? Thanks!

> Simplify the thread model for in-memory cubing
> --
>
> Key: KYLIN-2932
> URL: https://issues.apache.org/jira/browse/KYLIN-2932
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang Ken
>Assignee: Wang Ken
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: APACHE-KYLIN-2932.patch
>
>
> The current implementation uses split threads, task threads and main thread 
> to do the cube building, there is complex join and error handling logic.
> The new implement leverages the ForkJoinPool from JDK,  the event split logic 
> is handled in
> main thread. Cuboid task and sub-tasks are handled in fork join pool, cube 
> results are collected
> async and can be write to output earlier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2932) Simplify the thread model for in-memory cubing

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-2932:

Fix Version/s: (was: Future)
   v2.4.0

> Simplify the thread model for in-memory cubing
> --
>
> Key: KYLIN-2932
> URL: https://issues.apache.org/jira/browse/KYLIN-2932
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang Ken
>Assignee: Wang Ken
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: APACHE-KYLIN-2932.patch
>
>
> The current implementation uses split threads, task threads and main thread 
> to do the cube building, there is complex join and error handling logic.
> The new implement leverages the ForkJoinPool from JDK,  the event split logic 
> is handled in
> main thread. Cuboid task and sub-tasks are handled in fork join pool, cube 
> results are collected
> async and can be write to output earlier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3357) Sum of small int measure may be nagetive after KYLIN-2982

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3357:
---

Assignee: hongbin ma

> Sum of small int measure may be nagetive after KYLIN-2982
> -
>
> Key: KYLIN-3357
> URL: https://issues.apache.org/jira/browse/KYLIN-3357
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.3.0
>Reporter: Liu Shaohui
>Assignee: hongbin ma
>Priority: Critical
>
> After KYLIN-2982, the sum of small int measure may be nagetive.
> Same problem is reported in kylin user mail with title "negative result in 
> kylin 2.3.0"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3356) Constant in SecretKeySpec

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3356:

Fix Version/s: v2.4.0

> Constant in SecretKeySpec
> -
>
> Key: KYLIN-3356
> URL: https://issues.apache.org/jira/browse/KYLIN-3356
> Project: Kylin
>  Issue Type: Improvement
>Reporter: liyang
>Priority: Major
> Fix For: v2.4.0
>
>
> Reported by Rumen Paletov : 
>  As part of some research about the common crypto mistakes that developers
>  make <[https://cs.ucsb.edu/~chris/research/doc/ccs13_cryptolint.pdf]>, I
>  noticed that your application has one of them.
>  
>  In particular, there's a violation of Rule 3 in
>  org.apache.kylin.common.util.EncryptUtil
>  
> <[https://github.com/apache/kylin/blob/5552164ba09eba989b9ddccdf3f1e4f83ed0b799/core-common/src/main/java/org/apache/kylin/common/util/EncryptUtil.java#L36]>.
>  That is, SecretKeySpec is being initialized with a constant key
>  
> <[https://github.com/apache/kylin/blob/5552164ba09eba989b9ddccdf3f1e4f83ed0b799/core-common/src/main/java/org/apache/kylin/common/util/EncryptUtil.java#L30]>
>  instead of a randomly generated one.
>  
>  One solution would be to generate a key using SecureRandom:
>  
>  > byte[] key = new byte[16];
>  > new SecureRandom.nextBytes(key);
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3348) "missing LastBuildJobID" error when building new cube segment

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3348:
---

Assignee: liyang

> "missing LastBuildJobID" error when building new cube segment
> -
>
> Key: KYLIN-3348
> URL: https://issues.apache.org/jira/browse/KYLIN-3348
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.0
>Reporter: liyang
>Assignee: liyang
>Priority: Major
> Fix For: v2.4.0
>
>
> An unstable exception. Likely to happen when there are multiple concurrent 
> builds.
> {{2018-04-18 20:11:16,856 ERROR [pool-33-thread-11] 
> threadpool.DefaultScheduler : ExecuteException 
> job:cc08da19-f53e-4344-a6c5-05e764834924}}
>  {{ org.apache.kylin.job.exception.ExecuteException: 
> org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: For cube CUBE[name=cube2], segment 
> cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:140)}}
>  \{{ at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:307)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
>  \{{ at java.lang.Thread.run(Thread.java:748)}}
>  {{ Caused by: org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: For cube CUBE[name=cube2], segment 
> cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:140)}}
>  \{{ at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:129)}}
>  \{{ ... 4 more}}
>  {{ Caused by: java.lang.IllegalStateException: For cube CUBE[name=cube2], 
> segment cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.cube.CubeManager$SegmentAssist.promoteNewlyBuiltSegments(CubeManager.java:810)}}
>  \{{ at 
> org.apache.kylin.cube.CubeManager.promoteNewlyBuiltSegments(CubeManager.java:535)}}
>  \{{ at 
> org.apache.kylin.engine.mr.steps.UpdateCubeInfoAfterBuildStep.doWork(UpdateCubeInfoAfterBuildStep.java:78)}}
>  \{{ at 
> io.kyligence.kap.engine.mr.steps.KapUpdateCubeInfoAfterBuildStep.doWork(SourceFile:47)}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:129)}}
>  \{{ ... 6 more}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3348) "missing LastBuildJobID" error when building new cube segment

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3348:

Affects Version/s: v2.3.0
Fix Version/s: v2.4.0

> "missing LastBuildJobID" error when building new cube segment
> -
>
> Key: KYLIN-3348
> URL: https://issues.apache.org/jira/browse/KYLIN-3348
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.3.0
>Reporter: liyang
>Priority: Major
> Fix For: v2.4.0
>
>
> An unstable exception. Likely to happen when there are multiple concurrent 
> builds.
> {{2018-04-18 20:11:16,856 ERROR [pool-33-thread-11] 
> threadpool.DefaultScheduler : ExecuteException 
> job:cc08da19-f53e-4344-a6c5-05e764834924}}
>  {{ org.apache.kylin.job.exception.ExecuteException: 
> org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: For cube CUBE[name=cube2], segment 
> cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:140)}}
>  \{{ at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:307)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
>  \{{ at java.lang.Thread.run(Thread.java:748)}}
>  {{ Caused by: org.apache.kylin.job.exception.ExecuteException: 
> java.lang.IllegalStateException: For cube CUBE[name=cube2], segment 
> cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:140)}}
>  \{{ at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:67)}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:129)}}
>  \{{ ... 4 more}}
>  {{ Caused by: java.lang.IllegalStateException: For cube CUBE[name=cube2], 
> segment cube2[2018041423000_2018041423001] missing LastBuildJobID}}
>  \{{ at 
> org.apache.kylin.cube.CubeManager$SegmentAssist.promoteNewlyBuiltSegments(CubeManager.java:810)}}
>  \{{ at 
> org.apache.kylin.cube.CubeManager.promoteNewlyBuiltSegments(CubeManager.java:535)}}
>  \{{ at 
> org.apache.kylin.engine.mr.steps.UpdateCubeInfoAfterBuildStep.doWork(UpdateCubeInfoAfterBuildStep.java:78)}}
>  \{{ at 
> io.kyligence.kap.engine.mr.steps.KapUpdateCubeInfoAfterBuildStep.doWork(SourceFile:47)}}
>  \{{ at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:129)}}
>  \{{ ... 6 more}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3353) Merge job should not be blocked by "kylin.cube.max-building-segments"

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3353:
---

 Assignee: Shaofeng SHI
Fix Version/s: v2.4.0

> Merge job should not be blocked by "kylin.cube.max-building-segments"
> -
>
> Key: KYLIN-3353
> URL: https://issues.apache.org/jira/browse/KYLIN-3353
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v2.4.0
>
>
> Currently there is a config "kylin.cube.max-building-segments" (default be 
> 10) set the max. jobs for a cube.
> In a frequently build case, that is possible that have 10 segments being 
> built concurrently. Then there is no room for the merge jobs. If the merge 
> job is blocked, more segments will be accumulated, and then impact on the 
> query performance.
>  
> So I suggest to disable the checking for merge jobs.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3289) Refactor the storage garbage clean up code

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3289:
---

Assignee: Guangyao Li

> Refactor the storage garbage clean up code
> --
>
> Key: KYLIN-3289
> URL: https://issues.apache.org/jira/browse/KYLIN-3289
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v2.3.0
>Reporter: Guangyao Li
>Assignee: Guangyao Li
>Priority: Minor
> Fix For: v2.4.0
>
>
> Kylin will produce some garbage data in storage when it runs.
> Now, the clean up tool "{{kylin.sh org.apache.kylin.tool.StorageCleanupJob}}" 
> can show what is garbage data or clean up the garbage by setting options 
> "–delete false" or "delete true".
> But Kylin can't show the size of garbage data for users.
> This reconfiguration adds some member variables and methods recording the  
> garbage size in the detection process. 
> After clean up job running over, Kylin can get the information about garbage 
> size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI reassigned KYLIN-3258:
---

Assignee: Shaofeng SHI

> No check for duplicate cube name when creating a hybrid cube
> 
>
> Key: KYLIN-3258
> URL: https://issues.apache.org/jira/browse/KYLIN-3258
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v2.4.0
>
>
> When loading hybrid cube definitions via REST API, there is no check for 
> duplicate cube names is the list. If due to a user error or incorrectly 
> generated list of cubes by an external application/script the same cube name 
> is listed more than once, new or updated hybrid cube will contain the same 
> cube listed multiple times.
> It does not seem to cause any immediate issues with querying, but it's just 
> not right. REST API should throw and exception, when the same cube name is 
> listed multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3289) Refactor the storage garbage clean up code

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3289:

Fix Version/s: v2.4.0

> Refactor the storage garbage clean up code
> --
>
> Key: KYLIN-3289
> URL: https://issues.apache.org/jira/browse/KYLIN-3289
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v2.3.0
>Reporter: Guangyao Li
>Priority: Minor
> Fix For: v2.4.0
>
>
> Kylin will produce some garbage data in storage when it runs.
> Now, the clean up tool "{{kylin.sh org.apache.kylin.tool.StorageCleanupJob}}" 
> can show what is garbage data or clean up the garbage by setting options 
> "–delete false" or "delete true".
> But Kylin can't show the size of garbage data for users.
> This reconfiguration adds some member variables and methods recording the  
> garbage size in the detection process. 
> After clean up job running over, Kylin can get the information about garbage 
> size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3258:

Fix Version/s: v2.4.0

> No check for duplicate cube name when creating a hybrid cube
> 
>
> Key: KYLIN-3258
> URL: https://issues.apache.org/jira/browse/KYLIN-3258
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v2.4.0
>
>
> When loading hybrid cube definitions via REST API, there is no check for 
> duplicate cube names is the list. If due to a user error or incorrectly 
> generated list of cubes by an external application/script the same cube name 
> is listed more than once, new or updated hybrid cube will contain the same 
> cube listed multiple times.
> It does not seem to cause any immediate issues with querying, but it's just 
> not right. REST API should throw and exception, when the same cube name is 
> listed multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-2987) Add 'auto.purge=true' when creating intermediate hive table or redistribute a hive table

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-2987.
-
   Resolution: Fixed
Fix Version/s: v2.3.0

> Add 'auto.purge=true' when creating intermediate hive table or redistribute a 
> hive table
> 
>
> Key: KYLIN-2987
> URL: https://issues.apache.org/jira/browse/KYLIN-2987
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Trivial
> Fix For: v2.3.0
>
> Attachments: APACHE-KYLIN-2987.patch
>
>
> At kylin side, we can add auto.purge=true when creating intermediate table.
> However, to make ‘auto.purge’ effective for “insert overwrite table”, we 
> still need one patch for hive.
> https://issues.apache.org/jira/browse/HIVE-15880



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-2987) Add 'auto.purge=true' when creating intermediate hive table or redistribute a hive table

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI closed KYLIN-2987.
---

> Add 'auto.purge=true' when creating intermediate hive table or redistribute a 
> hive table
> 
>
> Key: KYLIN-2987
> URL: https://issues.apache.org/jira/browse/KYLIN-2987
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Trivial
> Fix For: v2.3.0
>
> Attachments: APACHE-KYLIN-2987.patch
>
>
> At kylin side, we can add auto.purge=true when creating intermediate table.
> However, to make ‘auto.purge’ effective for “insert overwrite table”, we 
> still need one patch for hive.
> https://issues.apache.org/jira/browse/HIVE-15880



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2987) Add 'auto.purge=true' when creating intermediate hive table or redistribute a hive table

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463455#comment-16463455
 ] 

Shaofeng SHI commented on KYLIN-2987:
-

I see; thanks for the information.

> Add 'auto.purge=true' when creating intermediate hive table or redistribute a 
> hive table
> 
>
> Key: KYLIN-2987
> URL: https://issues.apache.org/jira/browse/KYLIN-2987
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Trivial
> Fix For: v2.3.0
>
> Attachments: APACHE-KYLIN-2987.patch
>
>
> At kylin side, we can add auto.purge=true when creating intermediate table.
> However, to make ‘auto.purge’ effective for “insert overwrite table”, we 
> still need one patch for hive.
> https://issues.apache.org/jira/browse/HIVE-15880



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-3325) Can't translate value to dictionary ID when using shard by

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI closed KYLIN-3325.
---

> Can't translate value to dictionary ID when using shard by
> --
>
> Key: KYLIN-3325
> URL: https://issues.apache.org/jira/browse/KYLIN-3325
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
> Fix For: v2.3.1
>
>
> When I set row key shard by 1 high cardinality dimension, I have this error 
> in one spark executor when building cube
> ERROR dimension.DictionaryDimEnc: Can't translate value 700321 to dictionary 
> ID, roundingFlag 0. Using default value \xFF
> This error only happens in one executor, others are fine. This error makes 
> spark tasks in that executor run much slower than the rest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3325) Can't translate value to dictionary ID when using shard by

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-3325.
-
Resolution: Duplicate

> Can't translate value to dictionary ID when using shard by
> --
>
> Key: KYLIN-3325
> URL: https://issues.apache.org/jira/browse/KYLIN-3325
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
> Fix For: v2.3.1
>
>
> When I set row key shard by 1 high cardinality dimension, I have this error 
> in one spark executor when building cube
> ERROR dimension.DictionaryDimEnc: Can't translate value 700321 to dictionary 
> ID, roundingFlag 0. Using default value \xFF
> This error only happens in one executor, others are fine. This error makes 
> spark tasks in that executor run much slower than the rest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3294) Remove HBaseMROutput.java, RangeKeyDistributionJob.java and other sunset classes

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463453#comment-16463453
 ] 

Shaofeng SHI commented on KYLIN-3294:
-

Hi Wenzheng, patch is welcomed~

> Remove HBaseMROutput.java, RangeKeyDistributionJob.java and other sunset 
> classes
> 
>
> Key: KYLIN-3294
> URL: https://issues.apache.org/jira/browse/KYLIN-3294
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Wenzheng Liu
>Priority: Major
> Fix For: v2.4.0
>
>
> They were legacy classes, keeping them will add maintainence effort 
> especially when upgrade HBase version. Should delete them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3294) Remove HBaseMROutput.java, RangeKeyDistributionJob.java and other sunset classes

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3294:

Fix Version/s: v2.4.0

> Remove HBaseMROutput.java, RangeKeyDistributionJob.java and other sunset 
> classes
> 
>
> Key: KYLIN-3294
> URL: https://issues.apache.org/jira/browse/KYLIN-3294
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Wenzheng Liu
>Priority: Major
> Fix For: v2.4.0
>
>
> They were legacy classes, keeping them will add maintainence effort 
> especially when upgrade HBase version. Should delete them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3325) Can't translate value to dictionary ID when using shard by

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3325:

Fix Version/s: v2.3.1

> Can't translate value to dictionary ID when using shard by
> --
>
> Key: KYLIN-3325
> URL: https://issues.apache.org/jira/browse/KYLIN-3325
> Project: Kylin
>  Issue Type: Bug
>Reporter: Le Anh Vu
>Priority: Major
> Fix For: v2.3.1
>
>
> When I set row key shard by 1 high cardinality dimension, I have this error 
> in one spark executor when building cube
> ERROR dimension.DictionaryDimEnc: Can't translate value 700321 to dictionary 
> ID, roundingFlag 0. Using default value \xFF
> This error only happens in one executor, others are fine. This error makes 
> spark tasks in that executor run much slower than the rest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3321) Set MALLOC_ARENA_MAX in script

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3321:

Fix Version/s: v2.4.0

> Set MALLOC_ARENA_MAX in script
> --
>
> Key: KYLIN-3321
> URL: https://issues.apache.org/jira/browse/KYLIN-3321
> Project: Kylin
>  Issue Type: Task
>  Components: Environment 
>Reporter: Ted Yu
>Assignee: jiatao.tao
>Priority: Major
> Fix For: v2.4.0
>
>
> conf/setenv.sh would be good place to set MALLOC_ARENA_MAX which prevents 
> native memory OOM.
> See https://github.com/prestodb/presto/issues/8993



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3296) When merge cube,get java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method)

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3296:

Fix Version/s: v2.4.0

> When merge cube,get java.lang.ArrayIndexOutOfBoundsException at 
> java.lang.System.arraycopy(Native Method)
> -
>
> Key: KYLIN-3296
> URL: https://issues.apache.org/jira/browse/KYLIN-3296
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.3.0
>Reporter: RenZhiMin
>Assignee: RenZhiMin
>Priority: Major
>  Labels: patch
> Fix For: v2.4.0
>
> Attachments: JIRA.master.3296.patch
>
>
> cube中,设计rowkey时,有个维度设置编码方式是固定长度500。每天采用内存构建算法。在合并cube时,在生成的mr中的map任务执行中出现“java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)” 
> 错误。经查看在生成的mr中的map任务中需要对要合并的cuboiddata数据的rowkey进行切分,切分时,根据每个维度的编码方式获取对应的长度,然后从rowkey中获取,并赋值给SplittedBytes的value中,由于value数组初始化时设置的固定值255,所以在切分大于255的维度值时,出现下标越界错误。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3346) kylin.web.hide-measures=RAW影响普通sum下钻查询

2018-05-04 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463447#comment-16463447
 ] 

Shaofeng SHI commented on KYLIN-3346:
-

Hi Chen Chen, 

Kylin still supports drill down, but drill down to raw data is not recommended, 
because Cube is not good at persisting raw data.

That was a try to make Raw as a special measure. Later we realized the 
limitation of that feature, then hide it from Kylin 2.3. If you want to keep 
using RAW, you can unset the kylin.web.hide-measures property, all other 
behavior are the same I believe.

> kylin.web.hide-measures=RAW影响普通sum下钻查询
> --
>
> Key: KYLIN-3346
> URL: https://issues.apache.org/jira/browse/KYLIN-3346
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.3.0
>Reporter: chenchen
>Priority: Major
>
> 我的场景是在在查询时候需要下钻查询,既能支持聚合又能支持下钻。在kylin2.2版本,measure中sum是默认支持下钻查询的。但是升级到kylin2.3版本后,就不能支持了(还添加了这个配置:kylin.web.hide-measures=RAW),想问一下kylin问啥不能兼容低版本呢。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3346) kylin.web.hide-measures=RAW影响普通sum下钻查询

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI resolved KYLIN-3346.
-
Resolution: Not A Problem

> kylin.web.hide-measures=RAW影响普通sum下钻查询
> --
>
> Key: KYLIN-3346
> URL: https://issues.apache.org/jira/browse/KYLIN-3346
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.3.0
>Reporter: chenchen
>Priority: Major
>
> 我的场景是在在查询时候需要下钻查询,既能支持聚合又能支持下钻。在kylin2.2版本,measure中sum是默认支持下钻查询的。但是升级到kylin2.3版本后,就不能支持了(还添加了这个配置:kylin.web.hide-measures=RAW),想问一下kylin问啥不能兼容低版本呢。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3331) Kylin start script hangs during retrieving hive dependencys

2018-05-04 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3331:

Fix Version/s: v2.4.0

> Kylin start script hangs during retrieving hive dependencys
> ---
>
> Key: KYLIN-3331
> URL: https://issues.apache.org/jira/browse/KYLIN-3331
> Project: Kylin
>  Issue Type: Improvement
>Reporter: nichunen
>Assignee: nichunen
>Priority: Minor
> Fix For: v2.4.0
>
>
> This happens if hive client mode is set to be "cli", hive command may hang if 
> the cluster is in unhealthy status(for instance, zk is stop). The script 
> should check this and kill the process if it's timeout. 
> Fail fast and tell the user is the right way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)