[jira] [Commented] (KUDU-2226) Frequently updated table does not flush DeltaMemStore in time and will occupy a lot of memory

2017-11-30 Thread ZhangZhen (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272545#comment-16272545
 ] 

ZhangZhen commented on KUDU-2226:
-

The table has almost 15K DRSs in one tablet, which results in 15K DMS in 
memory, every DMS consume 60KB memory, including data and overhead, that's 
900MB memory, thanks to [~jdcryans] for helping me figure this out.

According to JD's suggestion, I add more maintenance thread(default 1, set to 
5), and set maintenance_manager_polling_interval_ms to 50 instead of default 
250, hope the 15K DRSs can got a change to be compacted. After restarting the 
server, I can see more maintenance instances running, but I find most of the 
operations finished are "FlushMRSOp", "UndoDeltaBlockGCOp" etc, it seems the 
CompactRowSetsOp get a low maintenance score and will not be executed. 

> Frequently updated table does not flush DeltaMemStore in time and will occupy 
> a lot of memory
> -
>
> Key: KUDU-2226
> URL: https://issues.apache.org/jira/browse/KUDU-2226
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
> Environment: CentOS6.5 Linux 2.6.32-431
> Kudu1.3.0 
> GitCommit 00813f96b9cb
>Reporter: ZhangZhen
>
> I have a table with 10M rows in total and has been hash partitioned to 16 
> buckets. Each tablet has about 100MB on disk size according to the /tablets 
> Web UI. Everyday 50K new rows will be inserted into this table, and about 5M 
> rows of this table will be updated, that's about half of rows in total, each 
> row will be updated only once. 
> Then I found something strange, from the /mem-trackers UI of TS, I found 
> every tablet of this table occupied about 900MB memory, mainly occupied by 
> DeltaMemStore, the peak memory consumption is about 1.8G. 
> I don't understand why the DeltaMemStore will cost so much memory, 900MB DMS 
> vs 100MB on disk size, that seems strange to me. What's more, I found these 
> DMS will be flushed very slowly, so for a long time these memory are 
> occupied, which cause "Soft memory limit exceeded" in the TS, and in result 
> cause "Rejecting consensus request".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2227) Managing tablet copy sessions via kudu CLI tool

2017-11-30 Thread Alexey Serbin (JIRA)
Alexey Serbin created KUDU-2227:
---

 Summary: Managing tablet copy sessions via kudu CLI tool
 Key: KUDU-2227
 URL: https://issues.apache.org/jira/browse/KUDU-2227
 Project: Kudu
  Issue Type: Improvement
Reporter: Alexey Serbin
Priority: Minor


Once the functionality to abort tablet copy session is introduced, it would be 
nice to add a way to abort tablet copy via the kudu CLI tool.  At  least, the 
following seems to be useful:

* see the list of tablet copy sessions running currently (with info on their 
duration)
* abort the specified tablet copy session

In this context, it would also make sense to add an option to specify maximum 
allowed duration for tablet copying when moving tablet replicas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2228) Make Messenger options configurable

2017-11-30 Thread Sailesh Mukil (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273482#comment-16273482
 ] 

Sailesh Mukil commented on KUDU-2228:
-

CC: [~kwho]

> Make Messenger options configurable
> ---
>
> Key: KUDU-2228
> URL: https://issues.apache.org/jira/browse/KUDU-2228
> Project: Kudu
>  Issue Type: Task
>  Components: rpc
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>  Labels: refactor, rpc
>
> Currently, the RPC layer accesses many gflags directly to take certain 
> decisions, eg. whether to turn on encryption, authentication, etc.
> Since the RPC layer is to be used more like a library, these should be 
> configurable options that are passed to the Messenger (which is the API 
> endpoint for the application using the RPC layer), instead of the RPC layer 
> itself directly accessing these flags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2228) Make Messenger options configurable

2017-11-30 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created KUDU-2228:
---

 Summary: Make Messenger options configurable
 Key: KUDU-2228
 URL: https://issues.apache.org/jira/browse/KUDU-2228
 Project: Kudu
  Issue Type: Task
  Components: rpc
Reporter: Sailesh Mukil
Assignee: Sailesh Mukil


Currently, the RPC layer accesses many gflags directly to take certain 
decisions, eg. whether to turn on encryption, authentication, etc.

Since the RPC layer is to be used more like a library, these should be 
configurable options that are passed to the Messenger (which is the API 
endpoint for the application using the RPC layer), instead of the RPC layer 
itself directly accessing these flags.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2226) Frequently updated table does not flush DeltaMemStore in time and will occupy a lot of memory

2017-11-30 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273716#comment-16273716
 ] 

Jean-Daniel Cryans commented on KUDU-2226:
--

There's probably a big backlog, will be interesting to see if it gets to DMS 
flushes or compactions after some time.

> Frequently updated table does not flush DeltaMemStore in time and will occupy 
> a lot of memory
> -
>
> Key: KUDU-2226
> URL: https://issues.apache.org/jira/browse/KUDU-2226
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.3.0
> Environment: CentOS6.5 Linux 2.6.32-431
> Kudu1.3.0 
> GitCommit 00813f96b9cb
>Reporter: ZhangZhen
>
> I have a table with 10M rows in total and has been hash partitioned to 16 
> buckets. Each tablet has about 100MB on disk size according to the /tablets 
> Web UI. Everyday 50K new rows will be inserted into this table, and about 5M 
> rows of this table will be updated, that's about half of rows in total, each 
> row will be updated only once. 
> Then I found something strange, from the /mem-trackers UI of TS, I found 
> every tablet of this table occupied about 900MB memory, mainly occupied by 
> DeltaMemStore, the peak memory consumption is about 1.8G. 
> I don't understand why the DeltaMemStore will cost so much memory, 900MB DMS 
> vs 100MB on disk size, that seems strange to me. What's more, I found these 
> DMS will be flushed very slowly, so for a long time these memory are 
> occupied, which cause "Soft memory limit exceeded" in the TS, and in result 
> cause "Rejecting consensus request".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2229) log spam: Not starting pre-election -- already a leader

2017-11-30 Thread Mike Percy (JIRA)
Mike Percy created KUDU-2229:


 Summary: log spam: Not starting pre-election -- already a leader
 Key: KUDU-2229
 URL: https://issues.apache.org/jira/browse/KUDU-2229
 Project: Kudu
  Issue Type: Bug
  Components: consensus
Affects Versions: 1.6.0
Reporter: Mike Percy
Assignee: Mike Percy


A bug was introduced during the 1.6.0 development cycle that causes log spam by 
enabling the failure detector on the leader after a configuration change. That 
causes a lot of harmless (and useless) log messages. We'll fix this before 
releasing 1.6.0 and we'll never have a release with this log spam bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2230) Safe-to-evict logic should always consider local node viable

2017-11-30 Thread Mike Percy (JIRA)
Mike Percy created KUDU-2230:


 Summary: Safe-to-evict logic should always consider local node 
viable
 Key: KUDU-2230
 URL: https://issues.apache.org/jira/browse/KUDU-2230
 Project: Kudu
  Issue Type: Bug
  Components: consensus
Affects Versions: 1.6.0
Reporter: Mike Percy


During the 1.6.0 development cycle, we introduced an improvement to address 
KUDU-2048 in 1e4db3148a1cb4e340aa96edaea85c733cfdbf5a that uses heuristics to 
decide whether it is safe to evict a failed replica or not. That patch appears 
to have a bug wherein the leader checks against the last time it communicated 
with itself, which is nonsensical. The issue is that the field it checks 
against is never updated for the local node.

We're currently investigating this issue and working on a fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)