[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2018-03-22 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1641#comment-1641
 ] 

Mostafa Mokhtar commented on KUDU-2086:
---

Higher number of reactor threads and reduced tcmalloc contention in the reactor 
thread code path alleviated the issue.

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Joe McDonnell
>Priority: Major
> Attachments: krpc_hash_test.c
>
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2355) Uneven assignment of work across

2018-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated KUDU-2355:
--
Summary: Uneven assignment of work across   (was: Uneven assignment o)

> Uneven assignment of work across 
> -
>
> Key: KUDU-2355
> URL: https://issues.apache.org/jira/browse/KUDU-2355
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>
> While running inserts into a Kudu table I noticed that there is an uneven 
> assignment of work across maintenance threads, this reduces scalability of 
> the maintenance as more threads won't always mean better throughput. 
> |Thread name||Cumulative User CPU(s)||Cumulative Kernel 
> CPU(s)||Cumulative IO-wait(s)|
> |MaintenanceMgr [worker]-59102|   3410.94|352.07| 866.71|
> |MaintenanceMgr [worker]-59101|   3056.77|319.32| 794.1|
> |MaintenanceMgr [worker]-59100|   2924.29|300.35| 831.66|
> |MaintenanceMgr [worker]-59099|   3021.22|307.3| 783.84|
> |MaintenanceMgr [worker]-59098|   2174.47|216.55| 716|
> |MaintenanceMgr [worker]-59097|   3240.47|335.55| 846.99|
> |MaintenanceMgr [worker]-59096|   2206.57|218.63| 752.62|
> |MaintenanceMgr [worker]-59095|   2112.76|210.93| 720.67|
> Snapshot from top
> {code}
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  59102 kudu  20   0 25.3g  15g  11m R 105.0  6.1  64:42.20 MaintenanceMgr
>  59097 kudu  20   0 25.3g  15g  11m R 101.7  6.1  61:15.90 MaintenanceMgr
>  59096 kudu  20   0 25.3g  15g  11m R 98.4  6.1  42:12.19 MaintenanceMgr
>  59098 kudu  20   0 25.3g  15g  11m R 98.4  6.1  41:53.22 MaintenanceMgr
>  59100 kudu  20   0 25.3g  15g  11m D 36.1  6.1  55:49.99 MaintenanceMgr
>  59095 kudu  20   0 25.3g  15g  11m D 29.5  6.1  40:34.79 MaintenanceMgr
>  59099 kudu  20   0 25.3g  15g  11m D  0.0  6.1  57:28.81 MaintenanceMgr
>  59101 kudu  20   0 25.3g  15g  11m D  0.0  6.1  58:03.63 MaintenanceMgr
> {code}
> This was found using 
> kudu 1.8.0-SNAPSHOT (rev e70c5ee0d6d598ba53d002ebfc7f81bb2ceda404)
> server uuid bd4cc6fdd79d4ebc8a4def27004b011d



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2355) Uneven assignment of work across Maintenance Mgr threads

2018-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated KUDU-2355:
--
Summary: Uneven assignment of work across Maintenance Mgr threads  (was: 
Uneven assignment of work across )

> Uneven assignment of work across Maintenance Mgr threads
> 
>
> Key: KUDU-2355
> URL: https://issues.apache.org/jira/browse/KUDU-2355
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>
> While running inserts into a Kudu table I noticed that there is an uneven 
> assignment of work across maintenance threads, this reduces scalability of 
> the maintenance as more threads won't always mean better throughput. 
> |Thread name||Cumulative User CPU(s)||Cumulative Kernel 
> CPU(s)||Cumulative IO-wait(s)|
> |MaintenanceMgr [worker]-59102|   3410.94|352.07| 866.71|
> |MaintenanceMgr [worker]-59101|   3056.77|319.32| 794.1|
> |MaintenanceMgr [worker]-59100|   2924.29|300.35| 831.66|
> |MaintenanceMgr [worker]-59099|   3021.22|307.3| 783.84|
> |MaintenanceMgr [worker]-59098|   2174.47|216.55| 716|
> |MaintenanceMgr [worker]-59097|   3240.47|335.55| 846.99|
> |MaintenanceMgr [worker]-59096|   2206.57|218.63| 752.62|
> |MaintenanceMgr [worker]-59095|   2112.76|210.93| 720.67|
> Snapshot from top
> {code}
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  59102 kudu  20   0 25.3g  15g  11m R 105.0  6.1  64:42.20 MaintenanceMgr
>  59097 kudu  20   0 25.3g  15g  11m R 101.7  6.1  61:15.90 MaintenanceMgr
>  59096 kudu  20   0 25.3g  15g  11m R 98.4  6.1  42:12.19 MaintenanceMgr
>  59098 kudu  20   0 25.3g  15g  11m R 98.4  6.1  41:53.22 MaintenanceMgr
>  59100 kudu  20   0 25.3g  15g  11m D 36.1  6.1  55:49.99 MaintenanceMgr
>  59095 kudu  20   0 25.3g  15g  11m D 29.5  6.1  40:34.79 MaintenanceMgr
>  59099 kudu  20   0 25.3g  15g  11m D  0.0  6.1  57:28.81 MaintenanceMgr
>  59101 kudu  20   0 25.3g  15g  11m D  0.0  6.1  58:03.63 MaintenanceMgr
> {code}
> This was found using 
> kudu 1.8.0-SNAPSHOT (rev e70c5ee0d6d598ba53d002ebfc7f81bb2ceda404)
> server uuid bd4cc6fdd79d4ebc8a4def27004b011d



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2355) Uneven assignment o

2018-03-16 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created KUDU-2355:
-

 Summary: Uneven assignment o
 Key: KUDU-2355
 URL: https://issues.apache.org/jira/browse/KUDU-2355
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Mostafa Mokhtar


While running inserts into a Kudu table I noticed that there is an uneven 
assignment of work across maintenance threads, this reduces scalability of the 
maintenance as more threads won't always mean better throughput. 

|Thread name||  Cumulative User CPU(s)||Cumulative Kernel CPU(s)
||Cumulative IO-wait(s)|
|MaintenanceMgr [worker]-59102| 3410.94|352.07| 866.71|
|MaintenanceMgr [worker]-59101| 3056.77|319.32| 794.1|
|MaintenanceMgr [worker]-59100| 2924.29|300.35| 831.66|
|MaintenanceMgr [worker]-59099| 3021.22|307.3| 783.84|
|MaintenanceMgr [worker]-59098| 2174.47|216.55| 716|
|MaintenanceMgr [worker]-59097| 3240.47|335.55| 846.99|
|MaintenanceMgr [worker]-59096| 2206.57|218.63| 752.62|
|MaintenanceMgr [worker]-59095| 2112.76|210.93| 720.67|


Snapshot from top
{code}
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 59102 kudu  20   0 25.3g  15g  11m R 105.0  6.1  64:42.20 MaintenanceMgr
 59097 kudu  20   0 25.3g  15g  11m R 101.7  6.1  61:15.90 MaintenanceMgr
 59096 kudu  20   0 25.3g  15g  11m R 98.4  6.1  42:12.19 MaintenanceMgr
 59098 kudu  20   0 25.3g  15g  11m R 98.4  6.1  41:53.22 MaintenanceMgr
 59100 kudu  20   0 25.3g  15g  11m D 36.1  6.1  55:49.99 MaintenanceMgr
 59095 kudu  20   0 25.3g  15g  11m D 29.5  6.1  40:34.79 MaintenanceMgr
 59099 kudu  20   0 25.3g  15g  11m D  0.0  6.1  57:28.81 MaintenanceMgr
 59101 kudu  20   0 25.3g  15g  11m D  0.0  6.1  58:03.63 MaintenanceMgr
{code}

This was found using 
kudu 1.8.0-SNAPSHOT (rev e70c5ee0d6d598ba53d002ebfc7f81bb2ceda404)
server uuid bd4cc6fdd79d4ebc8a4def27004b011d



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2342) Insert into Lineitem table with 1340 tablets on 129 node cluster failed with "Failed to write batch "

2018-03-13 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated KUDU-2342:
--
Attachment: Impala query profile.txt

> Insert into Lineitem table with 1340 tablets on 129 node cluster failed with 
> "Failed to write batch "
> -
>
> Key: KUDU-2342
> URL: https://issues.apache.org/jira/browse/KUDU-2342
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.7.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: scalability
> Attachments: Impala query profile.txt
>
>
> While loading TPCH 30TB on 129 node cluster via Impala, write operation 
> failed with :
> Query Status: Kudu error(s) reported, first error: Timed out: Failed to 
> write batch of 38590 ops to tablet b8431200388d486995a4426c88bc06a2 after 1 
> attempt(s): Failed to write to server: a260dca5a9c846e99cb621881a7b86b8 
> (vc1515.halxg.cloudera.com:7050): Write RPC to X.X.X.X:7050 timed out after 
> 180.000s (SENT)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2342) Insert into Lineitem table with 1340 tablets on 129 node cluster failed with "Failed to write batch "

2018-03-13 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created KUDU-2342:
-

 Summary: Insert into Lineitem table with 1340 tablets on 129 node 
cluster failed with "Failed to write batch "
 Key: KUDU-2342
 URL: https://issues.apache.org/jira/browse/KUDU-2342
 Project: Kudu
  Issue Type: Bug
  Components: tablet
Affects Versions: 1.7.0
Reporter: Mostafa Mokhtar


While loading TPCH 30TB on 129 node cluster via Impala, write operation failed 
with :
Query Status: Kudu error(s) reported, first error: Timed out: Failed to 
write batch of 38590 ops to tablet b8431200388d486995a4426c88bc06a2 after 1 
attempt(s): Failed to write to server: a260dca5a9c846e99cb621881a7b86b8 
(vc1515.halxg.cloudera.com:7050): Write RPC to X.X.X.X:7050 timed out after 
180.000s (SENT)





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2192) KRPC should have a timer to close stuck connections

2018-01-23 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336750#comment-16336750
 ] 

Mostafa Mokhtar commented on KUDU-2192:
---

[~kwho]

Queries fail in a similar way when network partitioning is introduced after the 
query runs for a minute. 

 

{code}
h4. _Status:_ TransmitData() to 10.00.000.29:27000 failed: Network error: recv 
error: Connection timed out (error 110)

{code}

> KRPC should have a timer to close stuck connections
> ---
>
> Key: KUDU-2192
> URL: https://issues.apache.org/jira/browse/KUDU-2192
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Michael Ho
>Priority: Major
>
> If the remote host goes down or its network gets unplugged, all pending RPCs 
> to that host will be stuck if there is no timeout specified. While those RPCs 
> which have finished sending their payloads or those which haven't started 
> sending payloads can be cancelled quickly, those in mid-transmission (i.e. an 
> RPC at the front of the outbound queue with part of its payload sent already) 
> cannot be cancelled until the payload has been completely sent. Therefore, 
> it's beneficial to have a timeout to kill a connection if it's not making any 
> progress for an extended period of time so the RPC will fail and get unstuck. 
> The timeout may need to be conservatively large to avoid aggressive closing 
> of connections due to transient network issue. One can consider augmenting 
> the existing maintenance thread logic which checks for idle connection to 
> check for this kind of timeout. Please feel free to propose other 
> alternatives (e.g. TPC keepalive timeout) in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2192) KRPC should have a timer to close stuck connections

2018-01-19 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333051#comment-16333051
 ] 

Mostafa Mokhtar commented on KUDU-2192:
---

[~kwho] [~sailesh] [~hubert.sun]

Tried network partitioning between two backends with KRPC enabled  on 
10.00.000.28

sudo /sbin/iptables -I INPUT -s 10.00.000.29 -j DROP

And the query failed with the error below within 30 minutes

Query Status: TransmitData() to 10.00.000.28:27000 failed: Network error: recv 
error: Connection timed out (error 110)

Thrift failed in a similar way but in 15 minutes

 

 

> KRPC should have a timer to close stuck connections
> ---
>
> Key: KUDU-2192
> URL: https://issues.apache.org/jira/browse/KUDU-2192
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Michael Ho
>Priority: Major
>
> If the remote host goes down or its network gets unplugged, all pending RPCs 
> to that host will be stuck if there is no timeout specified. While those RPCs 
> which have finished sending their payloads or those which haven't started 
> sending payloads can be cancelled quickly, those in mid-transmission (i.e. an 
> RPC at the front of the outbound queue with part of its payload sent already) 
> cannot be cancelled until the payload has been completely sent. Therefore, 
> it's beneficial to have a timeout to kill a connection if it's not making any 
> progress for an extended period of time so the RPC will fail and get unstuck. 
> The timeout may need to be conservatively large to avoid aggressive closing 
> of connections due to transient network issue. One can consider augmenting 
> the existing maintenance thread logic which checks for idle connection to 
> check for this kind of timeout. Please feel free to propose other 
> alternatives (e.g. TPC keepalive timeout) in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2017-09-05 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154663#comment-16154663
 ] 

Mostafa Mokhtar commented on KUDU-2086:
---

[~tlipcon]

What about switching to round robin distribution?

> Uneven assignment of connections to Reactor threads creates skew and limits 
> transfer throughput
> ---
>
> Key: KUDU-2086
> URL: https://issues.apache.org/jira/browse/KUDU-2086
> Project: Kudu
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 1.4.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>
> Uneven assignment of connections to Reactor threads causes a couple of 
> reactor threads to run @100% which limits overall system throughput.
> Increasing the number of reactor threads alleviate the problem but some 
> threads are still running much hotter than others.
> Snapshot below is from a 20 node cluster
> {code}
> ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
> 00:03:17  69387  69596 ?00:03:17 rpc reactor-695
> 00:03:20  69387  69632 ?00:03:20 rpc reactor-696
> 00:03:21  69387  69607 ?00:03:21 rpc reactor-696
> 00:03:25  69387  69629 ?00:03:25 rpc reactor-696
> 00:03:26  69387  69594 ?00:03:26 rpc reactor-695
> 00:03:34  69387  69595 ?00:03:34 rpc reactor-695
> 00:03:35  69387  69625 ?00:03:35 rpc reactor-696
> 00:03:38  69387  69570 ?00:03:38 rpc reactor-695
> 00:03:38  69387  69620 ?00:03:38 rpc reactor-696
> 00:03:47  69387  69639 ?00:03:47 rpc reactor-696
> 00:03:48  69387  69593 ?00:03:48 rpc reactor-695
> 00:03:49  69387  69591 ?00:03:49 rpc reactor-695
> 00:04:04  69387  69600 ?00:04:04 rpc reactor-696
> 00:07:16  69387  69640 ?00:07:16 rpc reactor-696
> 00:07:39  69387  69616 ?00:07:39 rpc reactor-696
> 00:07:54  69387  69572 ?00:07:54 rpc reactor-695
> 00:09:10  69387  69613 ?00:09:10 rpc reactor-696
> 00:09:28  69387  69567 ?00:09:28 rpc reactor-695
> 00:09:39  69387  69603 ?00:09:39 rpc reactor-696
> 00:09:42  69387  69641 ?00:09:42 rpc reactor-696
> 00:09:59  69387  69604 ?00:09:59 rpc reactor-696
> 00:10:06  69387  69623 ?00:10:06 rpc reactor-696
> 00:10:43  69387  69636 ?00:10:43 rpc reactor-696
> 00:10:59  69387  69642 ?00:10:59 rpc reactor-696
> 00:11:28  69387  69585 ?00:11:28 rpc reactor-695
> 00:12:43  69387  69598 ?00:12:43 rpc reactor-695
> 00:15:42  69387  69578 ?00:15:42 rpc reactor-695
> 00:16:10  69387  69614 ?00:16:10 rpc reactor-696
> 00:17:43  69387  69575 ?00:17:43 rpc reactor-695
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2086) Uneven assignment of connections to Reactor threads creates skew and limits transfer throughput

2017-07-31 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created KUDU-2086:
-

 Summary: Uneven assignment of connections to Reactor threads 
creates skew and limits transfer throughput
 Key: KUDU-2086
 URL: https://issues.apache.org/jira/browse/KUDU-2086
 Project: Kudu
  Issue Type: Bug
  Components: rpc
Affects Versions: 1.4.0
Reporter: Mostafa Mokhtar


Uneven assignment of connections to Reactor threads causes a couple of reactor 
threads to run @100% which limits overall system throughput.

Increasing the number of reactor threads alleviate the problem but some threads 
are still running much hotter than others.

Snapshot below is from a 20 node cluster

{code}
ps -T -p 69387 | grep rpc |  grep -v "00:00"  | awk '{print $4,$0}' | sort
00:03:17  69387  69596 ?00:03:17 rpc reactor-695
00:03:20  69387  69632 ?00:03:20 rpc reactor-696
00:03:21  69387  69607 ?00:03:21 rpc reactor-696
00:03:25  69387  69629 ?00:03:25 rpc reactor-696
00:03:26  69387  69594 ?00:03:26 rpc reactor-695
00:03:34  69387  69595 ?00:03:34 rpc reactor-695
00:03:35  69387  69625 ?00:03:35 rpc reactor-696
00:03:38  69387  69570 ?00:03:38 rpc reactor-695
00:03:38  69387  69620 ?00:03:38 rpc reactor-696
00:03:47  69387  69639 ?00:03:47 rpc reactor-696
00:03:48  69387  69593 ?00:03:48 rpc reactor-695
00:03:49  69387  69591 ?00:03:49 rpc reactor-695
00:04:04  69387  69600 ?00:04:04 rpc reactor-696
00:07:16  69387  69640 ?00:07:16 rpc reactor-696
00:07:39  69387  69616 ?00:07:39 rpc reactor-696
00:07:54  69387  69572 ?00:07:54 rpc reactor-695
00:09:10  69387  69613 ?00:09:10 rpc reactor-696
00:09:28  69387  69567 ?00:09:28 rpc reactor-695
00:09:39  69387  69603 ?00:09:39 rpc reactor-696
00:09:42  69387  69641 ?00:09:42 rpc reactor-696
00:09:59  69387  69604 ?00:09:59 rpc reactor-696
00:10:06  69387  69623 ?00:10:06 rpc reactor-696
00:10:43  69387  69636 ?00:10:43 rpc reactor-696
00:10:59  69387  69642 ?00:10:59 rpc reactor-696
00:11:28  69387  69585 ?00:11:28 rpc reactor-695
00:12:43  69387  69598 ?00:12:43 rpc reactor-695
00:15:42  69387  69578 ?00:15:42 rpc reactor-695
00:16:10  69387  69614 ?00:16:10 rpc reactor-696
00:17:43  69387  69575 ?00:17:43 rpc reactor-695
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-1447) Document recommendation to disable THP

2017-02-14 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1584#comment-1584
 ] 

Mostafa Mokhtar commented on KUDU-1447:
---

[~tlipcon]
Recommendations in CM are easily overlooked, it would help if Kudu reports 
warnings in it's logs until madv_nohugepage is used.

> Document recommendation to disable THP
> --
>
> Key: KUDU-1447
> URL: https://issues.apache.org/jira/browse/KUDU-1447
> Project: Kudu
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.8.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> Doing a bunch of cluster testing, I finally got to the root of why sometimes 
> threads take several seconds to start up, causing various timeout issues, 
> false elections, etc. It turns out that khugepaged does synchronous page 
> compaction while holding a process's mmap semaphore, and when that's 
> concurrent with lots of IO, can block for several seconds.
> https://lkml.org/lkml/2011/7/26/103
> To avoid this, we should tell users to set hugepages to "madvise" or "never" 
> -- it's not sufficient to just disable defrag, because khugepaged still runs 
> in the background in that case and causes this sporadic issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KUDU-1754) Columns should default to NULL opposed to NOT NULL

2016-11-23 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created KUDU-1754:
-

 Summary: Columns should default to NULL opposed to NOT NULL 
 Key: KUDU-1754
 URL: https://issues.apache.org/jira/browse/KUDU-1754
 Project: Kudu
  Issue Type: Bug
  Components: api
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar


Columns default to "NOT NULL" if the nullability field is not specified.
This behavior opposes Oracle, Teradata, MSSqlserver, MySQL... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KUDU-1366) Consider switching to jemalloc

2016-09-15 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494959#comment-15494959
 ] 

Mostafa Mokhtar commented on KUDU-1366:
---

[~tarmstrong] FYI

> Consider switching to jemalloc
> --
>
> Key: KUDU-1366
> URL: https://issues.apache.org/jira/browse/KUDU-1366
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Reporter: Todd Lipcon
> Attachments: Kudu Benchmarks.pdf, Kudu Benchmarks.pdf
>
>
> We spend a fair amount of time in the allocator. While we could spend some 
> time trying to use arenas more, it's also worth considering switching 
> allocators. I ran a few quick tests with jemalloc 4.1 and it seems like it 
> might be better than the version of tcmalloc that we use (and has much more 
> active development)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)