Re: Code performance question #2

2006-08-08 Thread Pid

David Kerber wrote:
 Pid wrote:
 
 here's another obvious question:

 if you're in a servlet, and you're getting an  separated string from
 somewhere, where is the somewhere that you're getting it from?

 does the servlet activate and collection the data somehow, or does the
 data get sent to the servlet (in, for example the query string)?
  

 The data is sent via an HTTP POST request, with the query string lightly
 encrypted.

if it's sent to the servlet using a POST, is there anything wrong with
using the hreq.getParameterNames()  hreq.getParameter(param_name)
methods?


Why encrypt the query string when you could just use an SSL connection
and encrypt the pipe? (Ignore this Q if you don't have control of the
other end).







 Peter Crowther wrote:
  

 From: David Kerber [mailto:[EMAIL PROTECTED] Is there a more
 efficient split method I could use?  Or am I completely missing
 the point of what you are suggesting?
 
 I think you've slightly missed the point.  I assume you're calling
 your function 5 times, each with a different field name that you want
 out of it.  You're then invoking string handling functions on the
 entire decrypted string 5 times, each time going through the bytes to
 extract the piece you need.  In the process, you traverse bytes you
 don't need several times.  My suggestion is that you tokenise this
 *once*, and hence only pay the string-handling overhead once.  Then
 you get all the parameters out of the same tokenised version.

 However, if the next thing you do is to write this to disk, I am even
 more convinced that you're optimising the wrong piece of code as the
 disk I/O is likely to take vastly more instructions than the string
 parse.

 These may be naïve questions, but I'll ask them anyway.  How have you
 identified these two pieces of code as the targets for optimisation? 
 What profiler have you used, under what conditions?  What proportion
 of your overall CPU budget is taken by these two snippets of code? 
 Is the machine CPU-bound in the first place, or is the bottleneck
 elsewhere?  If these are the worst culprits in your app, I'll be very
 surprised.

 - Peter
   
 
 
 
 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Code performance question #2

2006-08-08 Thread Peter Crowther
 From: David Kerber [mailto:[EMAIL PROTECTED] 
 Do you think 
 it be more efficient to scan the string once and grab the 
 field values as I get to each field marker?

Yes.

 Yes, the machine is cpu-bound.  
 My 768k data line will spike the cpu to 100% and hold it 
 above 95% until 
 the transmission queues on the other end of the WAN are 
 caught up.

(Wince).  Ouch.

 watching task manager tells me that the disk subsystem 
 seems to be able to keep up.

If you're on Windows, start up Performance Monitor and add the following
counters:

Processor: % CPU time
Memory: Pages/sec
Physical disk: Avg disk queue length (by array or disk if you have more
than one)

(I would add network: Bytes/sec, but it seems that's not the bottleneck)

The key disk counter is the queue length.  MS documentation suggests
that when the average queue length climbs above 2 per spindle in the
array, your disk subsystem is likely to be the bottleneck.  So if you
have a 5-disk array, queues that climb over 10 show a disk issue.

Memory: Pages/sec is a general indicator of paging traffic.
COnsistently high values tend to show a paging problem that could
possibly be solved by adding RAM.

Processor: % CPU time is a general processor counter.  Sustained values
above 80% tend to indicate a bottleneck according to MS.  It's sometimes
worth adding the % user time counter as well to see whether the issue is
your code or OS code.

 I haven't run a profiler on this code; I've tried, but getting the 
 configuration figured out has stumped me every time.  I 
 picked out these 
 particular routines (and one other I haven't posted) because of the 
 principal that 90% of the cpu time is taken by 10% of the code, and 
 these routines are the only loops in the entire servlet (i.e. 
 the only 
 lines of code which are executed more than once per incoming data 
 line).

Seems like a reasonable heuristic, I agree.  You may find that Tomcat
itself is the bottleneck - this is an area where profiling is of great
help.  However, I'd beware hidden complexity: the complexity behind
function calls into third-party libraries.  For example, you say you're
decrypting the arguments.  Depending on the exact crypto algorithm used,
this ranges from moderately expensive to horribly expensive; once again,
profiling would reveal this, and might indicate where a change to the
crypto could be of benefit.

Can you set up a simple Java test harness outside your servlet that
simply calls the servlet's service routine repeatedly with a few sample
lines?  If you can construct something that will run outside Tomcat,
it'll be easier to instrument and you'll be able to analyse the impact
of your tuning changes more easily.  I also see Mark's willing to help
getting a profiler set up... :-).

Sorry to point you off in a different direction from your likely
preferred route, but I've seen a lot of people spend a lot of time
optimising the wrong area of their code.  In a past life, I wrote
highly-optimised classifier code for an inference engine (admittedly in
C++); I found a profiler was the only way to work out what was
*actually* happening.  I ended up getting a factor of 20 out of my code
by combining optimisations in the most unlikely places, giving the
company the fastest engine in the world at that time.  I simply couldn't
have done that with static analysis - I kept guessing wrong!

- Peter

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-08 Thread Martin Gainty
Good Morning --
Please read
http://www.javaranch.com/newsletter/200401/IntroToCodeCoverage.html
paying particular attention to race conditions, deadly embraces and basic 
coverage of Functions

For the first iteration I would strongly urge you to use JCoverage
http://cms.jcoverage.com/

If you have the money then look into Clover
http://www.cenqua.com/login!default.jspa;jsessionid=F235AE9B72517AA7B94BC19A1D8D3559

If your clients require lightning fast performance follow the lead of google 
and other search engines and code the most CPU intensive
routines in C++

Caveat Emptor,
Martin --
*
This email message and any files transmitted with it contain confidential
information intended only for the person(s) to whom this email message is
addressed.  If you have received this email message in error, please notify
the sender immediately by telephone or email and destroy the original
message without making a copy.  Thank you.



- Original Message - 
From: Peter Crowther [EMAIL PROTECTED]
To: Tomcat Users List users@tomcat.apache.org
Sent: Tuesday, August 08, 2006 5:02 AM
Subject: RE: Code performance question #2


 From: David Kerber [mailto:[EMAIL PROTECTED] 
 Do you think 
 it be more efficient to scan the string once and grab the 
 field values as I get to each field marker?

Yes.

 Yes, the machine is cpu-bound.  
 My 768k data line will spike the cpu to 100% and hold it 
 above 95% until 
 the transmission queues on the other end of the WAN are 
 caught up.

(Wince).  Ouch.

 watching task manager tells me that the disk subsystem 
 seems to be able to keep up.

If you're on Windows, start up Performance Monitor and add the following
counters:

Processor: % CPU time
Memory: Pages/sec
Physical disk: Avg disk queue length (by array or disk if you have more
than one)

(I would add network: Bytes/sec, but it seems that's not the bottleneck)

The key disk counter is the queue length.  MS documentation suggests
that when the average queue length climbs above 2 per spindle in the
array, your disk subsystem is likely to be the bottleneck.  So if you
have a 5-disk array, queues that climb over 10 show a disk issue.

Memory: Pages/sec is a general indicator of paging traffic.
COnsistently high values tend to show a paging problem that could
possibly be solved by adding RAM.

Processor: % CPU time is a general processor counter.  Sustained values
above 80% tend to indicate a bottleneck according to MS.  It's sometimes
worth adding the % user time counter as well to see whether the issue is
your code or OS code.

 I haven't run a profiler on this code; I've tried, but getting the 
 configuration figured out has stumped me every time.  I 
 picked out these 
 particular routines (and one other I haven't posted) because of the 
 principal that 90% of the cpu time is taken by 10% of the code, and 
 these routines are the only loops in the entire servlet (i.e. 
 the only 
 lines of code which are executed more than once per incoming data 
 line).

Seems like a reasonable heuristic, I agree.  You may find that Tomcat
itself is the bottleneck - this is an area where profiling is of great
help.  However, I'd beware hidden complexity: the complexity behind
function calls into third-party libraries.  For example, you say you're
decrypting the arguments.  Depending on the exact crypto algorithm used,
this ranges from moderately expensive to horribly expensive; once again,
profiling would reveal this, and might indicate where a change to the
crypto could be of benefit.

Can you set up a simple Java test harness outside your servlet that
simply calls the servlet's service routine repeatedly with a few sample
lines?  If you can construct something that will run outside Tomcat,
it'll be easier to instrument and you'll be able to analyse the impact
of your tuning changes more easily.  I also see Mark's willing to help
getting a profiler set up... :-).

Sorry to point you off in a different direction from your likely
preferred route, but I've seen a lot of people spend a lot of time
optimising the wrong area of their code.  In a past life, I wrote
highly-optimised classifier code for an inference engine (admittedly in
C++); I found a profiler was the only way to work out what was
*actually* happening.  I ended up getting a factor of 20 out of my code
by combining optimisations in the most unlikely places, giving the
company the fastest engine in the world at that time.  I simply couldn't
have done that with static analysis - I kept guessing wrong!

- Peter

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Code performance question #2

2006-08-08 Thread Peter Crowther
 From: Martin Gainty [mailto:[EMAIL PROTECTED] 
 Please read
 http://www.javaranch.com/newsletter/200401/IntroToCodeCoverage.html
 paying particular attention to race conditions, deadly 
 embraces and basic coverage of Functions

Martin, I'm confused - could you just outline how code coverage tools
are of relevance to the OP's request for code *optimisation*, other than
by showing that particular code is exercised during a given run?  I
accept they're another useful development weapon, but haven't worked out
how they help in this case.

- Peter

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-08 Thread David Kerber

Pid wrote:


David Kerber wrote:
 


Pid wrote:

   


here's another obvious question:

if you're in a servlet, and you're getting an  separated string from
somewhere, where is the somewhere that you're getting it from?

does the servlet activate and collection the data somehow, or does the
data get sent to the servlet (in, for example the query string)?


 


The data is sent via an HTTP POST request, with the query string lightly
encrypted.
   



if it's sent to the servlet using a POST, is there anything wrong with
using the hreq.getParameterNames()  hreq.getParameter(param_name)
methods?
 

Encryption/obfuscation of the data.  I receive one parameter in the 
clear, and a 2nd parameter is a lightly encrypted string containing the 
rest of the data.  That makes it a but harder to break the data than 
encrypting each parameter separately would.  I grab these two items with 
the getParameter() methods, but then need to break out additional data 
from the data string once it is decrypted.




Why encrypt the query string when you could just use an SSL connection
and encrypt the pipe? (Ignore this Q if you don't have control of the
other end).
 

It's a major hassle to change the other end of the pipe, though it can 
be done if absolutely necessary.  SSL also takes a lot more cpu power to 
process than the light encryption this app uses.


Dave



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-08 Thread David Kerber

Peter Crowther wrote:

From: David Kerber [mailto:[EMAIL PROTECTED] 
Do you think 
it be more efficient to scan the string once and grab the 
field values as I get to each field marker?
   



Yes.

 

Yes, the machine is cpu-bound.  
My 768k data line will spike the cpu to 100% and hold it 
above 95% until 
the transmission queues on the other end of the WAN are 
caught up.
   



(Wince).  Ouch.
 

Yeah.  That surprised me when I first noticed it, too.  I never expected 
a 768k pipe to saturate this cpu.


 

watching task manager tells me that the disk subsystem 
seems to be able to keep up.
   



If you're on Windows, start up Performance Monitor and add the following
counters:

Processor: % CPU time
Memory: Pages/sec
Physical disk: Avg disk queue length (by array or disk if you have more
than one)

(I would add network: Bytes/sec, but it seems that's not the bottleneck)

The key disk counter is the queue length.  MS documentation suggests
that when the average queue length climbs above 2 per spindle in the
array, your disk subsystem is likely to be the bottleneck.  So if you
have a 5-disk array, queues that climb over 10 show a disk issue.
 

I'll check on that; I never knew what values to look for to spot 
disk-bound problems.



Memory: Pages/sec is a general indicator of paging traffic.
COnsistently high values tend to show a paging problem that could
possibly be solved by adding RAM.
 

I already added some RAM when I noted that the allocated memory was 
larger than the physical RAM.  Now it only has about 700MB allocated, 
with 1.5GB of physical RAM.  Wouldn't hurt to do some more checking, though.



Processor: % CPU time is a general processor counter.  Sustained values
above 80% tend to indicate a bottleneck according to MS.  It's sometimes
worth adding the % user time counter as well to see whether the issue is
your code or OS code.
 


Good point.

 

I haven't run a profiler on this code; I've tried, but getting the 
configuration figured out has stumped me every time.  I 
picked out these 
particular routines (and one other I haven't posted) because of the 
principal that 90% of the cpu time is taken by 10% of the code, and 
these routines are the only loops in the entire servlet (i.e. 
the only 
lines of code which are executed more than once per incoming data 
line).
   



Seems like a reasonable heuristic, I agree.  You may find that Tomcat
itself is the bottleneck - this is an area where profiling is of great
 

Yes, that's something I've considered.  I'm trying to pick the 
low-hanging fruit first and make sure my code is reasonably efficient 
before I go pointing fingers at Tomcat.  It may turn out that I just 
need to throw more hardware at the problem.



help.  However, I'd beware hidden complexity: the complexity behind
function calls into third-party libraries.  For example, you say you're
decrypting the arguments.  Depending on the exact crypto algorithm used,
this ranges from moderately expensive to horribly expensive; once again,
profiling would reveal this, and might indicate where a change to the
crypto could be of benefit.
 

It's a home-grown light-encryption algorithm, but based on responses you 
guys have posted to my two questions, I have some ideas on things to 
check there as well.



Can you set up a simple Java test harness outside your servlet that
simply calls the servlet's service routine repeatedly with a few sample
lines?  If you can construct something that will run outside Tomcat,
it'll be easier to instrument and you'll be able to analyse the impact
of your tuning changes more easily.  I also see Mark's willing to help
getting a profiler set up... :-).

Sorry to point you off in a different direction from your likely
preferred route, but I've seen a lot of people spend a lot of time
optimising the wrong area of their code.  In a past life, I wrote
highly-optimised classifier code for an inference engine (admittedly in
C++); I found a profiler was the only way to work out what was
*actually* happening.  I ended up getting a factor of 20 out of my code
by combining optimisations in the most unlikely places, giving the
company the fastest engine in the world at that time.  I simply couldn't
have done that with static analysis - I kept guessing wrong!

- Peter
 

Thanks, Peter!  I'll post back when I get more useful information, 
including how much the various suggestions helped.


Dave



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread Leon Rosenberg

thats ugly, why don't you tokenize it into string pairs, store the
pairs and works with them?
leon

On 8/7/06, David Kerber [EMAIL PROTECTED] wrote:

This code is part of a servlet running in TC 5.5.12, jre ver 1.5.0.6.

I use this code to break out individual data fields from a line which is
structured as  a=1b=2c=3d=4e=5.  It is
executed for over 2 million data lines per day, so this routine is
executed over 10 million times per day.  All the fields are short (  10
chars) except the 5th one, which can be up to about 40 characters.  Is
there a more cpu-efficient way of doing this, or is this about as good
as it gets?

private static String getField ( String fieldName, String dataString ) {
Integerii, kk;

ii = dataString.indexOf( fieldName + = );
if (ii == -1 ) return null;
kk = dataString.indexOf( , ii );
if ( kk.equals( -1 )) {
kk = dataString.length();
}
return ( dataString.substring( ii + 2, kk ));
}

TIA!
Dave



-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Code performance question #2

2006-08-07 Thread Peter Crowther
 From: David Kerber [mailto:[EMAIL PROTECTED] 
 It is 
 executed for over 2 million data lines per day, so this routine is 
 executed over 10 million times per day.
[snippet of code that parses the line each time elided]

Opinion: You're optimising the wrong piece of code.

You're calling this 5 times in quick succession.  This means you're
parsing the same string the same way 5 times in quick succession.  Split
it on the , put it into a suitably keyed structure such as a Map (or
even an array if you want the speed and know exactly what your
parameters are) and pass the keyed structure around.  That way you only
parse the string once.

- Peter

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread David Kerber
I'm not sure how to Split it on the , put it into a suitably keyed 
structure such as a Map other than the way I'm doing it already, unless 
I'm not understanding your suggestion.  So I think I need to give a bit 
more info about how this is used:


I can't control the data coming in; it's from a different application 
which sends this string in an encrypted form over a WAN.  After I 
decrypt the data string, I'm left with the string I describe below.  I 
need to parse out the data from the fields in the string, and then write 
them out to disk in a different format.  There is no other processing 
done with the data, so I don't need to reference any of them more than 
once.  Therefore I think loading them into an array probably adds more 
overhead than splitting the string the way I do right now.


Is there a more efficient split method I could use?  Or am I 
completely missing the point of what you are suggesting?


Dave


Peter Crowther wrote:

From: David Kerber [mailto:[EMAIL PROTECTED] 
It is 
executed for over 2 million data lines per day, so this routine is 
executed over 10 million times per day.
   


[snippet of code that parses the line each time elided]

Opinion: You're optimising the wrong piece of code.

You're calling this 5 times in quick succession.  This means you're
parsing the same string the same way 5 times in quick succession.  Split
it on the , put it into a suitably keyed structure such as a Map (or
even an array if you want the speed and know exactly what your
parameters are) and pass the keyed structure around.  That way you only
parse the string once.

- Peter
 





-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread David Kerber
See my response to Peter; I can't control the format of that data string 
(it's from a different application).  I just need to split out the data 
fields and store them away in a disk file.  Or am I missing the point of 
your suggestion?


Dave


Leon Rosenberg wrote:


thats ugly, why don't you tokenize it into string pairs, store the
pairs and works with them?
leon

On 8/7/06, David Kerber [EMAIL PROTECTED] wrote:


This code is part of a servlet running in TC 5.5.12, jre ver 1.5.0.6.

I use this code to break out individual data fields from a line which is
structured as  a=1b=2c=3d=4e=5.  It is
executed for over 2 million data lines per day, so this routine is
executed over 10 million times per day.  All the fields are short (  10
chars) except the 5th one, which can be up to about 40 characters.  Is
there a more cpu-efficient way of doing this, or is this about as good
as it gets?

private static String getField ( String fieldName, String 
dataString ) {

Integerii, kk;

ii = dataString.indexOf( fieldName + = );
if (ii == -1 ) return null;
kk = dataString.indexOf( , ii );
if ( kk.equals( -1 )) {
kk = dataString.length();
}
return ( dataString.substring( ii + 2, kk ));
}

TIA!
Dave






-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Code performance question #2

2006-08-07 Thread Peter Crowther
 From: David Kerber [mailto:[EMAIL PROTECTED] 
 Is there a more efficient split method I could use?  Or am I 
 completely missing the point of what you are suggesting?

I think you've slightly missed the point.  I assume you're calling your 
function 5 times, each with a different field name that you want out of it.  
You're then invoking string handling functions on the entire decrypted string 5 
times, each time going through the bytes to extract the piece you need.  In the 
process, you traverse bytes you don't need several times.  My suggestion is 
that you tokenise this *once*, and hence only pay the string-handling overhead 
once.  Then you get all the parameters out of the same tokenised version.

However, if the next thing you do is to write this to disk, I am even more 
convinced that you're optimising the wrong piece of code as the disk I/O is 
likely to take vastly more instructions than the string parse.

These may be naïve questions, but I'll ask them anyway.  How have you 
identified these two pieces of code as the targets for optimisation?  What 
profiler have you used, under what conditions?  What proportion of your overall 
CPU budget is taken by these two snippets of code?  Is the machine CPU-bound in 
the first place, or is the bottleneck elsewhere?  If these are the worst 
culprits in your app, I'll be very surprised.

- Peter

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread Pid
here's another obvious question:

if you're in a servlet, and you're getting an  separated string from
somewhere, where is the somewhere that you're getting it from?

does the servlet activate and collection the data somehow, or does the
data get sent to the servlet (in, for example the query string)?





Peter Crowther wrote:
 From: David Kerber [mailto:[EMAIL PROTECTED] 
 Is there a more efficient split method I could use?  Or am I 
 completely missing the point of what you are suggesting?
 
 I think you've slightly missed the point.  I assume you're calling your 
 function 5 times, each with a different field name that you want out of it.  
 You're then invoking string handling functions on the entire decrypted string 
 5 times, each time going through the bytes to extract the piece you need.  In 
 the process, you traverse bytes you don't need several times.  My suggestion 
 is that you tokenise this *once*, and hence only pay the string-handling 
 overhead once.  Then you get all the parameters out of the same tokenised 
 version.
 
 However, if the next thing you do is to write this to disk, I am even more 
 convinced that you're optimising the wrong piece of code as the disk I/O is 
 likely to take vastly more instructions than the string parse.
 
 These may be naïve questions, but I'll ask them anyway.  How have you 
 identified these two pieces of code as the targets for optimisation?  What 
 profiler have you used, under what conditions?  What proportion of your 
 overall CPU budget is taken by these two snippets of code?  Is the machine 
 CPU-bound in the first place, or is the bottleneck elsewhere?  If these are 
 the worst culprits in your app, I'll be very surprised.
 
   - Peter
 
 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread David Kerber

Peter Crowther wrote:

From: David Kerber [mailto:[EMAIL PROTECTED] 
Is there a more efficient split method I could use?  Or am I 
completely missing the point of what you are suggesting?
   



I think you've slightly missed the point.  I assume you're calling your function 5 times, each with a different field 


Yes.


name that you want out of it.  You're then invoking string handling functions 
on the entire decrypted string 5 times, each time going through the bytes to 
extract the piece you need.  In the process, you traverse bytes you don't need 
several times.  My suggestion is that you tokenise this *once*, and hence only 
pay the string-handling overhead once.  Then you get all the parameters out of 
the same tokenised version.
 

That is essentially my question:  how do I tokenize this more 
efficiently, without doing the search for the field names?  Do you think 
it be more efficient to scan the string once and grab the field values 
as I get to each field marker?  I can do that no problem.




However, if the next thing you do is to write this to disk, I am even more 
convinced that you're optimising the wrong piece of code as the disk I/O is 
likely to take vastly more instructions than the string parse.

These may be naïve questions, but I'll ask them anyway.  How have you 
identified these two pieces of code as the targets for optimisation?  What 
profiler have you used, under what conditions?  What proportion of your overall 
CPU budget is taken by these two snippets of code?  Is the machine CPU-bound in 
the first place, or is the bottleneck elsewhere?  If these are the worst 
culprits in your app, I'll be very surprised.
 

Those are good questions, but I've already considered them over the past 
few weeks as I've been working on this.  Yes, the machine is cpu-bound.  
My 768k data line will spike the cpu to 100% and hold it above 95% until 
the transmission queues on the other end of the WAN are caught up.  I've 
seen it take up to several hours depending on how long the comms were 
down.  The HD lights are busy, but this machine has a fast RAID system, 
and watching task manager tells me that the disk subsystem seems to be 
able to keep up. 

I haven't run a profiler on this code; I've tried, but getting the 
configuration figured out has stumped me every time.  I picked out these 
particular routines (and one other I haven't posted) because of the 
principal that 90% of the cpu time is taken by 10% of the code, and 
these routines are the only loops in the entire servlet (i.e. the only 
lines of code which are executed more than once per incoming data 
line).  The servlet itself is quite small, only 431 lines, including 
comments, declares, initialization, and functional code.


Thanks for your comments...
Dave


- Peter
 





-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread David Kerber

Pid wrote:


here's another obvious question:

if you're in a servlet, and you're getting an  separated string from
somewhere, where is the somewhere that you're getting it from?

does the servlet activate and collection the data somehow, or does the
data get sent to the servlet (in, for example the query string)?
 

The data is sent via an HTTP POST request, with the query string lightly 
encrypted.







Peter Crowther wrote:
 

From: David Kerber [mailto:[EMAIL PROTECTED] 
Is there a more efficient split method I could use?  Or am I 
completely missing the point of what you are suggesting?
 


I think you've slightly missed the point.  I assume you're calling your 
function 5 times, each with a different field name that you want out of it.  
You're then invoking string handling functions on the entire decrypted string 5 
times, each time going through the bytes to extract the piece you need.  In the 
process, you traverse bytes you don't need several times.  My suggestion is 
that you tokenise this *once*, and hence only pay the string-handling overhead 
once.  Then you get all the parameters out of the same tokenised version.

However, if the next thing you do is to write this to disk, I am even more 
convinced that you're optimising the wrong piece of code as the disk I/O is 
likely to take vastly more instructions than the string parse.

These may be naïve questions, but I'll ask them anyway.  How have you 
identified these two pieces of code as the targets for optimisation?  What 
profiler have you used, under what conditions?  What proportion of your overall 
CPU budget is taken by these two snippets of code?  Is the machine CPU-bound in 
the first place, or is the bottleneck elsewhere?  If these are the worst 
culprits in your app, I'll be very surprised.

- Peter
   





-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Code performance question #2

2006-08-07 Thread Mark Thomas
David Kerber wrote:
 I haven't run a profiler on this code; I've tried, but getting the
 configuration figured out has stumped me every time.

I have had good results with YourKit. Simple to set up and a nice
output that shows where the time is spent. I have used to to
investigate reported performance issues with core Tomcat code in the
past with no problems. There is a 15-day evaluation licence you can
use to see if you can get it working in your environment. I should be
able to help you get it set up if you like - and no, I am not
commission ;)

Mark

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]