Re: [MarkLogic Dev General] One node CPU utilization maxed out but others not in a 5 node cluster once load increases

2013-10-12 Thread Khan, Kashif
I apologize. I replied to the wrong email chain.


Kashif Khan, PMI-ACP


From: , Kashif Khan mailto:kashif.k...@hmhco.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Saturday, October 12, 2013 1:47 PM
To: indar verma mailto:send2i...@yahoo.co.in>>, 
MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>, 
Michael Blakeley mailto:m...@blakeley.com>>
Cc: 
"general-requ...@developer.marklogic.com"
 
mailto:general-requ...@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] One node CPU utilization maxed out but 
others not in a 5 node cluster once load increases

It looks good now Thanks. You can close the ticket.


Kashif Khan, PMI-ACP
Sr. Solution Architect
Publishing Technology

Houghton Mifflin Harcourt
9400 South Park Center Loop
Orlando, FL 32819
Office: 407.345.3420
Mobile: 407.949.4697
hmhco.com

From: indar verma mailto:send2i...@yahoo.co.in>>
Reply-To: indar verma mailto:send2i...@yahoo.co.in>>, 
MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Monday, September 30, 2013 8:35 AM
To: Michael Blakeley mailto:m...@blakeley.com>>, MarkLogic 
Developer Discussion 
mailto:general@developer.marklogic.com>>
Cc: 
"general-requ...@developer.marklogic.com"
 
mailto:general-requ...@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] One node CPU utilization maxed out but 
others not in a 5 node cluster once load increases

Hi Michael,

Thanks a lot for your suggestions and explaining me the problem in detail.

There are 4 forests in each node,

-- 2 masters and 2 replicas

Total 20 forests (10 masters + 10 replicas)

I am attaching some screenshots of the DB

I started looking into the xqy and trying to reduce response time.

I will follow your other instructions too to see the other factors.

Actually problem is, I have to give some justifications of maximum use of CPU 
in ML4 node only even it is data node and all the data is not present in that 
node only. so I am struggling to get a concrete reason. every time, my customer 
is asking why Ml4 node only going for maximum.

Thanks & Regards,
JJ


From: Michael Blakeley mailto:m...@blakeley.com>>
To: indar verma mailto:send2i...@yahoo.co.in>>; 
MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Cc: 
"general-requ...@developer.marklogic.com"
 
mailto:general-requ...@developer.marklogic.com>>
Sent: Monday, 30 September 2013 1:44 AM
Subject: Re: [MarkLogic Dev General] One node CPU utilization maxed out but 
others not in a 5 node cluster once load increases

Does "zenoss" mean Xen virtualization? PVM or HVM?

How many forests are on each host?

You could simply try upgrading from 6.0-3.2 to the latest release, 6.0-4, and 
see if that helps. But if it were me I would want to know which query or 
queries caused the problem.

Even though you aren't sending queries directly to that busy host, it's 
resolving index lookups as requested by the eval hosts. So it's still important 
to look at long-running queries, as these are the ones likely driving the load 
on your busy host. You also want to have a reproducible test case, and the best 
way to build that is to isolate a query that recreates the high load.

At the same time, dig into how utilization is measured and exactly what the 
numbers are. It's not enough to say that a host is "maxed out": you need to 
understand which subsystem is the bottleneck. It's quite difficult to drive a 
16-core or 32-core host to 0% idle, especially if the workload is mixed between 
network, disk, and CPU activity. You really want to know how much of each is 
involved, to better understand what "maxed out" really means. For example 
'iostat -mxz 15' is a good way to monitor current activity, or if sysstat is 
collecting data then sar can display it. Just to illustrate the point, here are 
some low-utilization numbers from a system I happen to have handy.

12:00:01 AMCPU%user%nice  %system  %iowait%steal%idle
12:05:01 AMall  6.24  0.30  0.50  0.24  0.0992.64
12:15:01 AMall  2.79  0.00  0.12  0.09  0.0596.95
12:25:01 AMall  3.39  0.00  0.16  0.10  0.0796.27
12:35:01 AMall  2.80  0.00  0.13  0.06  0.0696.96

If this host were "maxed out", that could appear as high %user, or %nice, or 
%system, or %iowait, or %steal - or any mix of those. That, in turn, would tell 
you something about why the host is busy.

If it turns out to be high %system or %iowait, take a look at the 
:8001/host-status.xqy page for the host in question. At the bottom you'll see a 
table of rates and loads, which will tell you something about where the host is 
spending its time.

-- Mike

On 29 Sep

Re: [MarkLogic Dev General] One node CPU utilization maxed out but others not in a 5 node cluster once load increases

2013-10-12 Thread Khan, Kashif
It looks good now Thanks. You can close the ticket.


Kashif Khan, PMI-ACP
Sr. Solution Architect
Publishing Technology

Houghton Mifflin Harcourt
9400 South Park Center Loop
Orlando, FL 32819
Office: 407.345.3420
Mobile: 407.949.4697
hmhco.com

From: indar verma mailto:send2i...@yahoo.co.in>>
Reply-To: indar verma mailto:send2i...@yahoo.co.in>>, 
MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Monday, September 30, 2013 8:35 AM
To: Michael Blakeley mailto:m...@blakeley.com>>, MarkLogic 
Developer Discussion 
mailto:general@developer.marklogic.com>>
Cc: 
"general-requ...@developer.marklogic.com"
 
mailto:general-requ...@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] One node CPU utilization maxed out but 
others not in a 5 node cluster once load increases

Hi Michael,

Thanks a lot for your suggestions and explaining me the problem in detail.

There are 4 forests in each node,

-- 2 masters and 2 replicas

Total 20 forests (10 masters + 10 replicas)

I am attaching some screenshots of the DB

I started looking into the xqy and trying to reduce response time.

I will follow your other instructions too to see the other factors.

Actually problem is, I have to give some justifications of maximum use of CPU 
in ML4 node only even it is data node and all the data is not present in that 
node only. so I am struggling to get a concrete reason. every time, my customer 
is asking why Ml4 node only going for maximum.

Thanks & Regards,
JJ


From: Michael Blakeley mailto:m...@blakeley.com>>
To: indar verma mailto:send2i...@yahoo.co.in>>; 
MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Cc: 
"general-requ...@developer.marklogic.com"
 
mailto:general-requ...@developer.marklogic.com>>
Sent: Monday, 30 September 2013 1:44 AM
Subject: Re: [MarkLogic Dev General] One node CPU utilization maxed out but 
others not in a 5 node cluster once load increases

Does "zenoss" mean Xen virtualization? PVM or HVM?

How many forests are on each host?

You could simply try upgrading from 6.0-3.2 to the latest release, 6.0-4, and 
see if that helps. But if it were me I would want to know which query or 
queries caused the problem.

Even though you aren't sending queries directly to that busy host, it's 
resolving index lookups as requested by the eval hosts. So it's still important 
to look at long-running queries, as these are the ones likely driving the load 
on your busy host. You also want to have a reproducible test case, and the best 
way to build that is to isolate a query that recreates the high load.

At the same time, dig into how utilization is measured and exactly what the 
numbers are. It's not enough to say that a host is "maxed out": you need to 
understand which subsystem is the bottleneck. It's quite difficult to drive a 
16-core or 32-core host to 0% idle, especially if the workload is mixed between 
network, disk, and CPU activity. You really want to know how much of each is 
involved, to better understand what "maxed out" really means. For example 
'iostat -mxz 15' is a good way to monitor current activity, or if sysstat is 
collecting data then sar can display it. Just to illustrate the point, here are 
some low-utilization numbers from a system I happen to have handy.

12:00:01 AMCPU%user%nice  %system  %iowait%steal%idle
12:05:01 AMall  6.24  0.30  0.50  0.24  0.0992.64
12:15:01 AMall  2.79  0.00  0.12  0.09  0.0596.95
12:25:01 AMall  3.39  0.00  0.16  0.10  0.0796.27
12:35:01 AMall  2.80  0.00  0.13  0.06  0.0696.96

If this host were "maxed out", that could appear as high %user, or %nice, or 
%system, or %iowait, or %steal - or any mix of those. That, in turn, would tell 
you something about why the host is busy.

If it turns out to be high %system or %iowait, take a look at the 
:8001/host-status.xqy page for the host in question. At the bottom you'll see a 
table of rates and loads, which will tell you something about where the host is 
spending its time.

-- Mike

On 29 Sep 2013, at 11:57 , indar verma 
mailto:send2i...@yahoo.co.in>> wrote:

> One more thing to add,
>
> We are sending requests to Ml1 to Ml3 in round robin fashion from the 
> application end. so Ml4 & Ml5 are not accepting any direct request from the 
> front end app.
>
> we are ingesting data through these two ml4 &Ml5 newly added nodes.
>
> Thanks,
> JJ
>
> From: indar verma mailto:send2i...@yahoo.co.in>>
> To: "general@developer.marklogic.com" 
> mailto:general@developer.marklogic.com>>
> Cc: 
> "general-requ...@developer.marklogic.com"
>  
> mailto:general-requ...@developer.marklogic.com>>
> Sent: Monday, 30 Sep