I was going to say all of these same things ... plus

Consider two tables, one of "stable keys" and one of "new keys." Sort and
binary search the stable list; do a linear search on the additions.

You can pseudo-delete a key from the stable table by turning on a "deleted"
flag.

You might also consider a B-Tree that you update dynamically.

For either of these possibilities you need to consider frequency of update
versus frequency of lookup.

Charles


-----Original Message-----
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU]
On Behalf Of Tom Harper
Sent: Tuesday, February 22, 2022 2:45 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: SRST vs. SRSTU

Gary,

The first thought that comes to mind is why not sort the table and use a
binary search?

If in fact the table is being updated and that is not practical, then your
options are a serial search (which you have ruled out) or a hash. 

As you may have guessed, SRST and SRSTU are almost certainly milli-coded
instructions and probably not very fast. 

Secondly, hash performance is highly dependent on the number of synonyms you
have. Considering how many entries in your table, this is a distinct
possibility. A full word makes for a large number of entries. The best hash
tables are of a size of a Mersenne Prime. Right now you are not using one. 

Lastly, obtain storage for your table in a one Mb page frame to minimize TLB
misses. 

Summary:

1 - If table is static, sort and use a binary search.

2 - Otherwise, create a hash table of a Mersenne Prime of size large enough
to avoid most synonyms. 

Tom Harper

Phoenix Software International 

Sent from my iPhone

> On Feb 22, 2022, at 5:09 PM, Gary Weinhold <weinh...@dkl.com> wrote:
> 
> We are trying to optimize a search routine for keys in fixed length rows
in an unordered array.  As the number of rows in the array grows, a serial
search becomes relatively inefficient, so we looked for another technique.
We tried a SEARCH STRING (SRST) against a one byte hash of the key to see if
it could give us better performance.  The relative positive of the matching
byte in the SRST array was used to determine the location of the key in the
original array; if the keys match, the row is found; if they don't match, we
redrive the SRST.  At about 50 rows, SRST is more efficient than a serial
search so it justifies maintaining the hash array.
>  On the average, we assume the SRST would have to be redriven about
(n/256)/2 times, where n is the number of rows in the array.  This would not
be a big factor for several thousand rows, but as the number of rows went
into the tens of thousands, we tried Search String Unicode (SRSTU).  It
appears to be identical to SRST, except it compares 2-byte values (at 2-byte
boundaries). So we created a 2-byte hash and, using the same technique based
on relative position, tested for performance improvements compared to SRST
when the number of rows exceeded 10000.  We thought that the reduction in
the number of redrives due to non-matching keys (on average,  (n/65536)/2)
would more than offset the hash array doubling in size.
> 
> Our preliminary results show SRSTU about taking about 50-60% more time for
15000 and 25000 rows.  That came as a surprise to us.  We will do more
testing.
> 
> Is there a possibility we are encountering a hardware vs. microcode
implementation of the instuctions?  Has anyone else tested the performance
of these instructions?
> 
> Regards, Gary
> 
> Gary Weinhold
> Senior Application Architect
> We are trying to optimize a search routine for keys in fixed length rows
in an unordered array.  As the number of rows in the array grows, a serial
search becomes relatively inefficient, so we looked for another technique.
We tried a SEARCH STRING (SRST) against a one byte hash of the key to see if
it could give us better performance.  The relative positive of the matching
byte in the SRST array was used to determine the location of the key in the
original array; if the keys match, the row is found; if they don't match, we
redrive the SRST.  At about 50 rows, SRST is more efficient than a serial
search so it justifies maintaining the hash array.
>  On the average, we assume the SRST would have to be redriven about
(n/256)/2 times, where n is the number of rows in the array.  This would not
be a big factor for several thousand rows, but as the number of rows went
into the tens of thousands, we tried Search String Unicode (SRSTU).  It
appears to be identical to SRST, except it compares 2-byte values (at 2-byte
boundaries). So we created a 2-byte hash and, using the same technique based
on relative position, tested for performance improvements compared to SRST
when the number of rows exceeded 10000.  We thought that the reduction in
the number of redrives due to non-matching keys (on average,  (n/65536)/2)
would more than offset the hash array doubling in size.
> 
> Our preliminary results show SRSTU about taking about 50-60% more time for
15000 and 25000 rows.  That came as a surprise to us.  We will do more
testing.
> 
> Is there a possibility we are encountering a hardware vs. microcode
implementation of the instuctions?  Has anyone else tested the performance
of these instructions?
> 
> Regards, Gary
> 
> Gary Weinhold
> Senior Application Architect
> DATAKINETICS | Data Performance & Optimization
> Phone:+1.613.523.5500 x216
> Email: weinh...@dkl.com
> Visit us online at www.DKL.com
> E-mail Notification: The information contained in this email and any
attachments is confidential and may be subject to copyright or other
intellectual property protection. If you are not the intended recipient, you
are not authorized to use or disclose this information, and we request that
you notify us by reply mail or telephone and delete the original message
from your mail system. 


----------------------------------------------------------------------------
----
This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

Reply via email to