Re: Curosity Question About ESTA and MSTA
The short answer to your question is No. I think you have a fundamental misunderstanding of the operation of the Linkage Stack mechanism. You should carefully read "Chapter 2. Linkage stack" within IBM publication SA23-1394-00 "z/OS MVS Programming: Extended Addressability Guide" (this is the z/OS 2.1 version, but nothing of great significance has changed in this area for quite some time). The Modifiable Area of a Linkage Stack frame is typically intended to be used in association with ARR's. For some relevant discussion, look at "Chapter 18. Providing recovery" within IBM publication SA23-1371-02 "z/OS MVS Programming: Authorized Assembler Services Guide Version 2 Release 1" With respect to the z/Architecture perspective on the Modifiable Area of a Linkage Stack frame, read subtopic "Adding and Retrieving Information" within major topic "Linkage-Stack Introduction" within IBM publication SA22-7832-11 "z/Architecture Principles of Operation". Here is the text extracted from that subtopic: + Adding and Retrieving Information + + The instruction MODIFY STACKED STATE can be used by a program to + place two words of information, contained in a designated + general-register pair, in an area, called the modifiable area, of + the current linkage-stack state entry (a branch state entry or a + program-call state entry). This is intended to allow a called + program to establish a recovery routine that will be given control + by the control program, if necessary. Bob
Re: Curosity Question About ESTA and MSTA
No, MSTA will not change these fields. But I am curious as to what you are trying to build and why you want to change the AX while in stacked state/PC routine. On Sun, 10 Jun 2018 18:58:16 GMT "esst...@juno.com" wrote: :>Is this possible ? :> :>A hypothetical scenario - :> :>A Long Running Started Task issues an Extract Stacked State Instruction (ESTA) using code zero. :>LA R2,00 :>ESTA R0,R2 :>. :>Code zero returns the PKM, SASN, EAX, and PASN of the Started Task. :>The Started Task provides the EAX to other jobs. :>. :>A Problem Program (Non Supervisor state) retrieves the Started task EAX. :>. :>The Problem Program issues an Extract Stacked State Instruction to obtain an save its PKM, SASN, EAX, and PASN. :>. :>Can the Problem Program Issue a Modify Stack State Instruction (MSTA) :>replacing its own EAX with that EAX of the Started Task ? :>. :>For some reason I expected this to be an Exception of some sort. :>ESTA and MSTA are not privileged instructions. :>. :>Would this sequence of Instructions actually change the problem programs EAX in other words giving it the EAX of the Started Task ? :>. :>. :>Paul :>. -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies.
Re: Curosity Question About ESTA and MSTA
I don't know about Modify Stack State (sic), but Modify Stacked State (MSTA) does not have an R2 field and can only alter the modifiable area, byte positions 152-159, of the stack entry. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List on behalf of esst...@juno.com Sent: Sunday, June 10, 2018 2:58 PM To: ASSEMBLER-LIST@listserv.uga.edu Subject: Curosity Question About ESTA and MSTA Is this possible ? A hypothetical scenario - A Long Running Started Task issues an Extract Stacked State Instruction (ESTA) using code zero. LA R2,00 ESTA R0,R2 . Code zero returns the PKM, SASN, EAX, and PASN of the Started Task. The Started Task provides the EAX to other jobs. . A Problem Program (Non Supervisor state) retrieves the Started task EAX. . The Problem Program issues an Extract Stacked State Instruction to obtain an save its PKM, SASN, EAX, and PASN. . Can the Problem Program Issue a Modify Stack State Instruction (MSTA) replacing its own EAX with that EAX of the Started Task ? . For some reason I expected this to be an Exception of some sort. ESTA and MSTA are not privileged instructions. . Would this sequence of Instructions actually change the problem programs EAX in other words giving it the EAX of the Started Task ? . . Paul .
Re: Curosity Question About TCTL
On Sun, 24 Feb 2013 14:57:02 GMT "esst...@juno.com" wrote: :>Let me see If I understand this. :>Lets Say I Attach a Task (TCB) - the program (RB) associated with that TCB runs and ends. The TCB is not detached. :>At some undertimed time latter an SRB is scheduled into this address space. :>The SRB Issues TCTL with/to the TCB. :>So Do I understand that the program (RB) will be re-executed Again ? There is no RB, thus there is nothing to dispatch. I would expect an 070 abend. :>Is my understanding correct. No. :>Paul D'Angelo :> :> :>-- Original Message -- :>From: David Stokes :>To: ASSEMBLER-LIST@LISTSERV.UGA.EDU :>Subject: AW: Curosity Question About TCTL :>Date: Sat, 23 Feb 2013 17:48:23 + :> :>The TCTL or RESUME kicks off the TCB code without it having to wait for the usual scheduling, as long as it is dispatchable. What the TCB code does then is up to it. :> :>>>When the resumed routine finishes executing under the TCB should it re-enter the WAIT state again ? :> :>No, but it presumably will at some point or terminate. :> :>-Ursprüngliche Nachricht- :>Von: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] Im Auftrag von esst...@juno.com :>Gesendet: Freitag, 22. Februar 2013 17:31 :>An: ASSEMBLER-LIST@LISTSERV.UGA.EDU :>Betreff: Curosity Question About TCTL :> :>I was reading about Scheduling SRBs using IEAMSCHD and discovered the TCTL macro. :> :>The TCTL (transfer control) macro allows an SRB routine to exit from its processing and to pass control to a task with minimal system overhead. When an SRB specifies RESUME RETURN=N, control transfers to the resumed TCB. Control then passes to the top RB on the TCB/RB chain, but only if the system determines that the RB is dispatchable. :> :>Again this is a curosity question and I have nop real reason to use TCTL. :>Am I to understand that the TCB needs to be in a WAIT state ? :> :>When the resumed routine finishes executing under the TCB should it re-enter :>the WAIT state again ? :> :>Its not clear to me how this facility should work. :> :>Can Some one explain the mechanics behind using TCTL and the target Task Control Block. :> :>Paul D'Angelo :>--- -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies.
Re: Curosity Question About TCTL
(Reminder: this list is for questions about the assembler and use of it, not about the behavior of z/OS system services. IBM-Main would be a much better place to ask such questions.) >but only if the system determines that the RB is dispatchable. >... >Am I to understand that the TCB needs to be in a WAIT state ? No, there is no necessary correlation between 'dispatchable" and "WAIT". Consider a task that is running, then gets interrupted for a time slice. It is still dispatchable. It is just not running, waiting for its next opportunity to be dispatched. Peter Relson z/OS Core Technology Design
Re: Curosity Question
If you want to see a good example of a well used assembler hashing algorithm you need to be an IMS customer. IMS has the, curiously misnamed, HDAM, PHDAM and DEDB "randomiser". It should be more correctly called a hashing algorithm. The purpose of DFSHDC40, DBFHDC40 and DBFHDC44 is to take a database key and hash it to a root anchor point within the database. Those routines have been around since IMS first got the hierarchical direct access method. We pass four parameters: key, # of RAPS, # of blocks in root addressable area and max number of bytes in a root segment to go in the RAA. The output is a block & rap number (or area, block & RAP for DEDB). The code used to be published in the IMS docs, it's now only in IMS.SDFSSRC The comments say is uses a "RANGE RATIO METHOD" to hash the key to the RAP. If anyone would like a copy of the source code send my an email off the list. On 2 November 2012 21:43, Paul Gilmartin wrote: > On 2012-11-02 15:01, Martin Truebner wrote: > > I still do not see how changing a numbering scheme from based > > on ten to a system based on thirty-six does create any clusters. > > > It doesn't. Robin has pretty much acknowledged that if the > input data are uniform, a modulus hash will likewise be uniform. > Others in this thread are more fixated on defending prime > moduli. > > But consider: if the input data are binary strings and you > choose 2^n as a modulus, the hash will merely be the last > n bits of the input datum. Powers of two are a bad choice > for modulus-hashing EBCDIC text where that last character > is likely to be clustered around displayable code points. > > OTOH if the input data are base-37 numbers a modulus 37 > hash merely returns the last digit; again a bad choice. > > But computer scientists have a cultural bias toward base > 2, not any other prime such as 37, even as number theorists > have some cultural bias toward base 10 (evident, at least, > in recreational essays). > > -- gil > -- http://twitter.com/DougieLawson
Re: Curosity Question
On 2012-11-02 15:01, Martin Truebner wrote: > I still do not see how changing a numbering scheme from based > on ten to a system based on thirty-six does create any clusters. > It doesn't. Robin has pretty much acknowledged that if the input data are uniform, a modulus hash will likewise be uniform. Others in this thread are more fixated on defending prime moduli. But consider: if the input data are binary strings and you choose 2^n as a modulus, the hash will merely be the last n bits of the input datum. Powers of two are a bad choice for modulus-hashing EBCDIC text where that last character is likely to be clustered around displayable code points. OTOH if the input data are base-37 numbers a modulus 37 hash merely returns the last digit; again a bad choice. But computer scientists have a cultural bias toward base 2, not any other prime such as 37, even as number theorists have some cultural bias toward base 10 (evident, at least, in recreational essays). -- gil
Re: Curosity Question
I still do not see how changing a numbering scheme from based on ten to a system based on thirty-six does create any clusters. -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
Re: Curosity Question
On 2 November 2012 04:05, Martin Truebner wrote: > John, > >>> Dividing and taking the remainder to achieve a more-fewer mapping > is---by definition---division-method hashing; > > and I insist: the number of possible input values is 46656 and this is > the number of output values exact 46656 (not more-fewer but identical) > > - and > > (surprise surprise) 46656 is 36 ** 3 Regretably the code snippets posted at the start of this thread are just that, and there are at least three reasons (flow of control, mismatch of code and comments, and line numbering anomalies) to think that the code we see is not the code that runs successfully. For example, the GETMAIN appears to be done using a length that is 46656+46656, whereas the length of each map entry (whatever such may be) is unknown. If we infer from the code and comments on lines 9804 and 9904 that each map entry is just one byte long, and that the doubling of the GETMAIN value is an error (or for all I know of CICS, that its GETMAIN works in nybbles), then the range of input offsets is indeed 0 to 46655, and the code merely converts from notional base 256 to base 36, and then converts that to a handily printable scheme. I have trouble calling this hashing up to this point. But if the size of a map entry is greater than 1, the scheme wraps, and two (or more) addresses will produce identical outputs and therefor terminal IDs in an entirely predictable way. This could reasonably be called hashing, but the regularity of the input and absense of any sign of collision detection, makes it a bad configuration. Conceivably the "terminal prefix" in WSYSIDCR is used elsewhere in the code to manage more than 36**3 IDs. > If we all go back and look at the what the OP posted as code-snippet it > would be fun- If we argue about distribution and clusters (and > algorithms), I am out. Your analysis of my analysis could be useful, for those who are still listening... Tony H.
Re: Curosity Question
On 11/2/2012 7:40 AM, Bill Fairchild wrote: Rather than belabor this issue any further, if you could post a link to some explanation of the mathematical proof why clustering occurs around prime factors of a composite modulo, I would love to read it and try to understand what is going on. ISTR that Knuth provides this proof in his SaS book. Unfortunately, it's in my other office so I can't check for myself until Monday... -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 310-338-0400 x318 edja...@phoenixsoftware.com http://www.phoenixsoftware.com/
Re: Curosity Question
Those interested in the mathematics of hashing should, predictably, consult Knuth. The 2nd volume, Sorting and searching, of TACP, any edition, contains a section called Hash functions. On page 509 of the 1st edition there appears the summarizing text: Such considerations suggest that we choose M to be a prime number such that r^k = +|-a modulo M for small k and a. Knuth explains his notation there, and I will not rehearse it here. --jg -- John Gilmore, Ashland, MA 01721 - USA Avant d'imprimer cet e-mail, réfléchissons à l'impact sur l'environnement.
Re: Curosity Question
John, I understand prime numbers and also channel programming. What I don't understand is the mathematical basis for the CAUSE of the clustering around prime factors of a composite modulo. You have explained what happens but not why. I agree with Robin's statement that real data is not random (or, if some prefer, real datums are not random). E.g., the initial source of data for this discussion was some group of characters that are used in some real, living language. English has 26 letters and 10 numerical digits, so 36 was one possible modulo. If the 36 possible characters are seriously random, then there should be no clustering. Yet we find clustering. I aver that this is caused by quirks in the English language rather than a mathematical cause due simply to the fact that 36 is not prime, and that any clustering of any group of data hashed with any given modulo will always cluster around certain values that are intrinsically related to the underlying non-randomness of the input data. The most commonly used letters in English, in decreasing usage, are approximately ETAIONSHRDLU. If we assign unique values to these letters based on A=0, B=1, ..., Z=25, and then the numbers 0=26, 1=27, ... 9=35, then those same most common 12 letters convert to 4, 19, 0, 8, 14, 13, 18, 7, 17, 3, 11, and 20. Hashing a typical sample of English text will produce significant clustering around these 12 values, but the clustering has nothing to do with the primeness of the modulo value but everything to do with how English has evolved and the sequence of letters in our alphabet, which also is not random but has evolved over 3,000 or so years beginning with the sequence used in the ancient Phoenician alphabet. If we should see clustering around the values 2 and 3, which are prime factors of 36, then we should expect to see also a large percent of English words beginning with or containing the letters C and D. There are many such English words, but there are far more beginning with or containing all five vowels, e.g., none of which is a C or D, the other consonants T, N, S, etc. Another post described using the integers 1 through 1000 for input to hashing by dividing by 36. It seems to me that the sequence of modulo values produced will be exactly 27 sets of 1, 2, 3, ..., 35, 0 and then one final set having the values 1, 2, 3..., 27, 28. I don't see any evidence of clustering here, nor any reason why there would be 84 occurrences of any one value. Rather than belabor this issue any further, if you could post a link to some explanation of the mathematical proof why clustering occurs around prime factors of a composite modulo, I would love to read it and try to understand what is going on. Bill Fairchild Programmer Rocket Software 408 Chamberlain Park Lane . Franklin, TN 37069-2526 . USA t: +1.617.614.4503 . e: bfairch...@rocketsoftware.com . w: www.rocketsoftware.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of John Gilmore Sent: Friday, November 02, 2012 8:41 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Martin Truebner (Trübner?) writes: and I insist: the number of possible input values is 46656 and this is the number of output values exact 46656 (not more-fewer but identical) It is of course immediate that for a set of n characters the number of permutations of them taken m at a time is n^m and that, in particular, 36^3 = 46656. If one calculates N hash values one has or at least at some point had N hash values. The obvious disposed of, what is of interest here is the distribution of these values. Consider the function f(i) = mod(i,2) for i = 0, 1, 2, 3, . . . Evaluating it for the first N elements of this sequence certainly yields N values, but what is interesting about them is not that there are N of them but that, crudely, half of them are 0 and half of them are 1. Now one can of course arrange things otherwise. The function f(i) = mod(2i + 1,2), i = 0, 1, 2, 3, . . . always has the value 1, and the function f(i) = mod(2i,2), i = 0, 1, 2, 3, . . . always has the value 0. I entered this discussion because an examination of clustering side effects is necessary for any hashing scheme. They may be important, or again they may not be, but it is essential that they be evaluated. What I got for my trouble was a lot of guff, much of it specious and some of it literal nonsense, from people who lack a minimal grasp of the issues involved. I shall take this to heart. In the future I shall avoid discussing mathematical topics, in which I have some competence, here just as I avoid discussing channel-programming topics, in which I have none. --jg
Re: Curosity Question
Greetings, Mr. Gilmore, Apparently I fail to understand some aspect of your claim regarding the remainders when 36 is used as a divisor. If I divide each of the numbers from 1 to n by 36, each of the values from 0 to 35 is encountered as a remainder as many times as all other values, provided n is a multiple of 36; if n is not a multiple of 36 then there are n mod 36 remainders that occur (only) one additional time. If 2 and/or 3 appear as remainders more often then there must be some attribute of the distribution of the dividends that causes this. Have I misunderstood your claim, or is there another condition that has been implied (or stated explicitly) that I overlooked? Thank you for any information you can give me. - mb Mark Boonie z/TPF Development 845-433-4918 (t/l 293-4918) From: John Gilmore To: ASSEMBLER-LIST@listserv.uga.edu, Date: 11/01/2012 06:19 PM Subject:Re: Curosity Question Sent by:IBM Mainframe Assembler List If a hashing scheme is working well there is almost no clustering. Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our hashing modulus.. Remainders will have one of the 17 values 0, 1, 2, . . . , 16. Then some goodly number of hashing operations the same or about the same number of of the hash values 0, 1, 2, . . . , 16 are generated, clustering does not occur. For concreteness, suppose we do 170 divisions. Then if clustering does not occur there are about ten remainders having the value 0, about 10 having the value 1, about 10 having the value 2, etc., etc. What happens when the divisor used is composite is that hash values that are prime factors of the divisor occur more frequently than others. For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2 or 2**2 x 3**2. Its prime factors are 2 and 3; and when it is used as a divisor there are more remainders having the value 2 and the value 3 than there are having other pairs of values. 37, on the other hand, is prime, divisible only by 1 and itself. Its use as a divisor yields no clustering of remainders. Never hesitate to ask notional gurus such questions. A request for a further explanation is always in order. --jg
Re: Curosity Question
Martin Truebner (Trübner?) writes: and I insist: the number of possible input values is 46656 and this is the number of output values exact 46656 (not more-fewer but identical) It is of course immediate that for a set of n characters the number of permutations of them taken m at a time is n^m and that, in particular, 36^3 = 46656. If one calculates N hash values one has or at least at some point had N hash values. The obvious disposed of, what is of interest here is the distribution of these values. Consider the function f(i) = mod(i,2) for i = 0, 1, 2, 3, . . . Evaluating it for the first N elements of this sequence certainly yields N values, but what is interesting about them is not that there are N of them but that, crudely, half of them are 0 and half of them are 1. Now one can of course arrange things otherwise. The function f(i) = mod(2i + 1,2), i = 0, 1, 2, 3, . . . always has the value 1, and the function f(i) = mod(2i,2), i = 0, 1, 2, 3, . . . always has the value 0. I entered this discussion because an examination of clustering side effects is necessary for any hashing scheme. They may be important, or again they may not be, but it is essential that they be evaluated. What I got for my trouble was a lot of guff, much of it specious and some of it literal nonsense, from people who lack a minimal grasp of the issues involved. I shall take this to heart. In the future I shall avoid discussing mathematical topics, in which I have some competence, here just as I avoid discussing channel-programming topics, in which I have none. --jg
Re: Curosity Question
John, >> Dividing and taking the remainder to achieve a more-fewer mapping is---by definition---division-method hashing; and I insist: the number of possible input values is 46656 and this is the number of output values exact 46656 (not more-fewer but identical) - and (surprise surprise) 46656 is 36 ** 3 If you want to give it a try: entry no 1 gets 001 entry no 9 gets 009 entry no 10 gets 00A entry no 35 gets 00Z entry no 36 gets 010 >> But enough! If this is a dispute about terminology rather than substance, it is not very important. If we all go back and look at the what the OP posted as code-snippet it would be fun- If we argue about distribution and clusters (and algorithms), I am out. -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
Re: Curosity Question
On Nov 1, 2012, at 23:49, robin wrote: > >> Here I see no evidence of clustering. > > There won't be. > All the data is uniformly distributed. > > Real data is not uniform. OK. I'm starting to understand. But can you be more specific? Narrow down the definitions of "real" and "not uniform" For example, if I I choose the prime divison 37, advocated by JG, and my real data are "not uniform" but their density happens to correlate with a periodicity of 37, I will get adverse clustering with that modulus. I suppose JG's assertion (and yours) is a consequence of some postulate such as "'Real data' are unlikely to exhibit a periodicity of a(ny) prime number, but more likely to exhibit a periodicity which is a composite number." Is this intuitive (can you motivate me?) or empirical (can you cite the observations?) I grant that the non-uniformity of "real data" is the reason that compression techniques work. The Kolmogorov complexity of uniform data is itself. -- gil
Re: Curosity Question
From: Paul Gilmartin Sent: Friday, 2 November 2012 3:37 PM I'm missing something here; most likely an unstated assumption. My intuition agrees with test program making a histogram of consecutive integers mod 3: SPPG@MVS3:139$ cat foomod /* Rexx */ signal on novalue; /* Doc: Demonstrate hash clustering. */ Counts. = 000 /* Set all counters to 0. */ do I = 1 to 9 /* Generate some samples. */ R = I // 36/* mod 36 */ Counts.R = Counts.R + 1; end do I = 0 to 35 /* Display counters. */ say 'Counts.'right(I, 2, 0 ) '=' right( Counts.I, 4 ); end Which produces output: SPPG@MVS3:140$ foomod Counts.00 = 2777 ... Counts.35 = 2777 Here I see no evidence of clustering. There won't be. All the data is uniformly distributed. Real data is not uniform.
Re: Curosity Question
Really? I'd have thought the remainders would be equally dispersed among 0,1,2...35. In fact, for divisions of the numbers 1 through 1000, there are 84 that have a remainder of each of the numbers 1 through 12, and 84 of each of the other numbers, including zero. You can demonstrate it in an Excel spreadsheet, in about 2 minutes. David de Jongh -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of John Gilmore Sent: Thursday, November 01, 2012 5:15 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question If a hashing scheme is working well there is almost no clustering. Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our hashing modulus.. Remainders will have one of the 17 values 0, 1, 2, . . . , 16. Then some goodly number of hashing operations the same or about the same number of of the hash values 0, 1, 2, . . . , 16 are generated, clustering does not occur. For concreteness, suppose we do 170 divisions. Then if clustering does not occur there are about ten remainders having the value 0, about 10 having the value 1, about 10 having the value 2, etc., etc. What happens when the divisor used is composite is that hash values that are prime factors of the divisor occur more frequently than others. For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2 or 2**2 x 3**2. Its prime factors are 2 and 3; and when it is used as a divisor there are more remainders having the value 2 and the value 3 than there are having other pairs of values. 37, on the other hand, is prime, divisible only by 1 and itself. Its use as a divisor yields no clustering of remainders. Never hesitate to ask notional gurus such questions. A request for a further explanation is always in order. --jg
Re: Curosity Question
On Nov 1, 2012, at 16:14, John Gilmore wrote: > If a hashing scheme is working well there is almost no clustering. > Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our > hashing modulus.. Remainders will have one of the 17 values > > 0, 1, 2, . . . , 16. > > Then some goodly number of hashing operations the same or about the > same number of of the hash values 0, 1, 2, . . . , 16 are generated, > clustering does not occur. > > For concreteness, suppose we do 170 divisions. Then if clustering > does not occur there are about ten remainders having the value 0, > about 10 having the value 1, about 10 having the value 2, etc., etc. > > What happens when the divisor used is composite is that hash values > that are prime factors of the divisor occur more frequently than > others. > > For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2 > or 2**2 x 3**2. Its prime factors are 2 and 3; and when it is used as > a divisor there are more remainders having the value 2 and the value 3 > than there are having other pairs of values. > > 37, on the other hand, is prime, divisible only by 1 and itself. Its > use as a divisor yields no clustering of remainders. > > Never hesitate to ask notional gurus such questions. A request for a > further explanation is always in order. > I'm missing something here; most likely an unstated assumption. My intuition agrees with test program making a histogram of consecutive integers mod 3: SPPG@MVS3:139$ cat foomod /* Rexx */ signal on novalue; /* Doc: Demonstrate hash clustering. */ Counts. = 000 /* Set all counters to 0. */ do I = 1 to 9 /* Generate some samples. */ R = I // 36/* mod 36 */ Counts.R = Counts.R + 1; end do I = 0 to 35 /* Display counters. */ say 'Counts.'right(I, 2, 0 ) '=' right( Counts.I, 4 ); end Which produces output: SPPG@MVS3:140$ foomod Counts.00 = 2777 Counts.01 = 2778 Counts.02 = 2778 Counts.03 = 2778 Counts.04 = 2778 Counts.05 = 2778 Counts.06 = 2778 Counts.07 = 2778 Counts.08 = 2778 Counts.09 = 2778 Counts.10 = 2778 Counts.11 = 2778 Counts.12 = 2778 Counts.13 = 2778 Counts.14 = 2778 Counts.15 = 2778 Counts.16 = 2778 Counts.17 = 2778 Counts.18 = 2778 Counts.19 = 2778 Counts.20 = 2778 Counts.21 = 2778 Counts.22 = 2778 Counts.23 = 2778 Counts.24 = 2778 Counts.25 = 2778 Counts.26 = 2778 Counts.27 = 2778 Counts.28 = 2777 Counts.29 = 2777 Counts.30 = 2777 Counts.31 = 2777 Counts.32 = 2777 Counts.33 = 2777 Counts.34 = 2777 Counts.35 = 2777 Here I see no evidence of clustering. Perhaps I should try it with random rather than consecutive integers. But if it were to show clustering I'd be more inclined to consider it an indictment of the PRNG than of the modulo 36 hash. -- gil
Re: Curosity Question
At 11:30 AM 11/2/2012 +1100, you wrote: >From: John Gilmore >Sent: Friday, 2 November 2012 8:25 AM > >>Long division is not hashing. > >Even long division can be hashing. Just so. See Knuth, TAoCP, Ch 6.4 . Mike >> Dividing and taking the remainder to >>achieve a more-fewer mapping is---by definition---division-method >>hashing; and it of course makes use of just one divide operation.
Re: Curosity Question
From: John Gilmore Sent: Friday, 2 November 2012 8:25 AM Long division is not hashing. Even long division can be hashing. Dividing and taking the remainder to achieve a more-fewer mapping is---by definition---division-method hashing; and it of course makes use of just one divide operation.
Re: Curosity Question
From: Jon Perryman Sent: Friday, 2 November 2012 10:10 AM Clustering is a statics term that describes the tendency of events to occur around groups and less frequently around other groups. In this case, it is where we are trying to predict (or guess) values to make better use of the hash table storage. To make it easier to understand, try to guess the first character of a word that I am thinking. Did you guess X or Z? No, because there are very few words that begin with with these letters. No, you might well have chosen one of those X or Z for that very reason. On the other hand a larger number of words begin with T and A so these are more likely. Actually, "S" would have been a better choice, because the number of words beginning with "S" is several times more than those commencing with "T". The cluster size of T and A is significantly lager than X and Z. You can see the clustering at http://www.ask.com/wiki/Letter_frequency if you scroll down to the graphs. You can also see it by looking in any dictionary.
Re: Curosity Question
From: Martin Truebner Sent: Friday, 2 November 2012 5:23 AM However named, this is a hashing scheme. Okay - If converting a number from a system based on whatever to a system based on a different number is hashing.. maybe my english needs amending Long established number-theoretic results then predict clustering of these hash values at the prime divisors of a composite m. I do not argue that and I understand the value of prime numbers, but I highly doubt that the simple occurrence of a divide operation does automatically make the conversion a hashing algorithm. It does, really truly. Hashing is a means of transforming some value in whatever form to another form, in order that it may be used as an index.
Re: Curosity Question
Clustering is a statics term that describes the tendency of events to occur around groups and less frequently around other groups. In this case, it is where we are trying to predict (or guess) values to make better use of the hash table storage. To make it easier to understand, try to guess the first character of a word that I am thinking. Did you guess X or Z? No, because there are very few words that begin with with these letters. On the other hand a larger number of words begin with T and A so these are more likely. The cluster size of T and A is significantly lager than X and Z. You can see the clustering at http://www.ask.com/wiki/Letter_frequency if you scroll down to the graphs. Clustering changes depending upon the situation. E.g. how often a letter occurs in words which is another graph on that web page which you can compare. MVS messages cluster even more. Regards, Jon Perryman. From: "esst...@juno.com" Not being a matchamtician, what exactly do you mean by clustering. -- Original Message -- From: John Gilmore Long established number-theoretic results then predict clustering of these hash values at the prime divisors of a composite m.
Re: Curosity Question
On Thu, Nov 1, 2012 at 6:14 PM, John Gilmore wrote: > If a hashing scheme is working well there is almost no clustering. > Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our > hashing modulus.. Remainders will have one of the 17 values > > 0, 1, 2, . . . , 16. > > <...snip...> > Never hesitate to ask notional gurus such questions. A request for a > further explanation is always in order. > > --jg > Thank you John. Lucid and cogent. -- Mike Shaw MVS/QuickRef Support Group Chicago-Soft, Ltd.
Re: Curosity Question
If a hashing scheme is working well there is almost no clustering. Suppose we divide by 17, a prime, i.e., use it, in the jargon, as our hashing modulus.. Remainders will have one of the 17 values 0, 1, 2, . . . , 16. Then some goodly number of hashing operations the same or about the same number of of the hash values 0, 1, 2, . . . , 16 are generated, clustering does not occur. For concreteness, suppose we do 170 divisions. Then if clustering does not occur there are about ten remainders having the value 0, about 10 having the value 1, about 10 having the value 2, etc., etc. What happens when the divisor used is composite is that hash values that are prime factors of the divisor occur more frequently than others. For 36 we have 36 = 2 x 2 x 3 x 3, which is usually written 2^2 x 3^2 or 2**2 x 3**2. Its prime factors are 2 and 3; and when it is used as a divisor there are more remainders having the value 2 and the value 3 than there are having other pairs of values. 37, on the other hand, is prime, divisible only by 1 and itself. Its use as a divisor yields no clustering of remainders. Never hesitate to ask notional gurus such questions. A request for a further explanation is always in order. --jg
Re: Curosity Question
I resemble one of those old time mainframers and proud of it -- Original Message -- From: "McKown, John" To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Date: Thu, 1 Nov 2012 09:45:02 -0500 Yes, and also must update the BASETAB and BASETAB2 tables appropriately. Alternately, you could extend to base 62 by using lower case letters as well. I.e. 0-9A-Za-z . But most people don't seem to like to distinguish between upper and lower case. Especially old tyme mainframers. -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets(r) 9151 Boulevard 26 * N. Richland Hills * TX 76010 (817) 255-3225 phone * john.mck...@healthmarkets.com * www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets(r) is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM > -Original Message- > From: IBM Mainframe Assembler List [mailto:ASSEMBLER- > l...@listserv.uga.edu] On Behalf Of esst...@juno.com > Sent: Thursday, November 01, 2012 9:25 AM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Re: Curosity Question > > Rob van der Heij > wrote > " > It's encoding an address in base-36 (using only uppercase letters and > digits). With 4 positions you can create 36**4 different combinations > (about 2**24/10). Depending on the size of the control block and the > total address space, this could produce enough unique identifiers (if > you're careful)." > > Thanks For responding . > If I wanted to add additional 10 characters such as ! @ # $ % & * + = / > wOULD i HAVE TO CHANGE THE base 36 TO 46 ? >
Re: Curosity Question
John Gilmore Not being a matchamtician, what exactly do you mean by clustering. -- Original Message -- From: John Gilmore To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Date: Thu, 1 Nov 2012 12:09:20 -0500 Perhaps you are being dense. However named, this is a hashing scheme. Division by a modulus m yields the remainders, m of them, 0, 1, 2, . . . , m - 1. Long established number-theoretic results then predict clustering of these hash values at the prime divisors of a composite m. Without clustering the expected number of instances of a hash value H in a sample of size N is just N/m. When clustering occurs the values for the prime factors of m are larger. --jg
Re: Curosity Question
Long division is not hashing. Dividing and taking the remainder to achieve a more-fewer mapping is---by definition---division-method hashing; and it of course makes use of just one divide operation. But enough! If this is a dispute about terminology rather than substance, it is not very important. --jg
Re: Curosity Question
John, >> However named, this is a hashing scheme. Okay - If converting a number from a system based on whatever to a system based on a different number is hashing.. maybe my english needs amending >> Long established number-theoretic results then predict clustering of these hash values at the prime divisors of a composite m. I do not argue that and I understand the value of prime numbers, but I highly doubt that the simple occurrence of a divide operation does automatically make the conversion a hashing algorithm. -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
Re: Curosity Question
On 1 November 2012 15:45, McKown, John wrote: > Yes, and also must update the BASETAB and BASETAB2 tables appropriately. > Alternately, you could extend to base 62 by using lower case letters as well. > I.e. 0-9A-Za-z . But most people don't seem to like to distinguish between > upper and lower case. Especially old tyme mainframers. Or whatever steps in between that don't preserve case. But if lowercase works, then I'd go for base-64 and avoid division... PS If the addresses are far enough apart (because of the size of the block) then encoding the 20 most significant bits should be unique enough (eg 2K blocks in a 2G address space) Rob
Re: Curosity Question
Perhaps you are being dense. However named, this is a hashing scheme. Division by a modulus m yields the remainders, m of them, 0, 1, 2, . . . , m - 1. Long established number-theoretic results then predict clustering of these hash values at the prime divisors of a composite m. Without clustering the expected number of instances of a hash value H in a sample of size N is just N/m. When clustering occurs the values for the prime factors of m are larger. --jg
Re: Curosity Question
Am I dense or what? >> Here you may expect disproportionately high numbers of the hash values 2 and 3. To me the code takes the entry-number and uses that number to come to a number in a base 36 system All that is happening is Benford's Law will show, but where will the clustering occur? -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
Re: Curosity Question
The number 36 is composite: 36 = 2^2 x 3^2. Long experience with division-method hashing schemes has established that the use of composite hashing moduli produces significant clustering at their prime factors. Here you may expect disproportionately high numbers of the hash values 2 and 3. For a hashing modulus of 46 = 2 x 23 clustering instead occurs at 2 and 23. Without knowing more about the uses you plan to make of these hash values it is hard to know how serious a limitation this clustering will be, but you should be aware of it. Also worth noting is that if 37 had been chosen as the hashing modulus originally---say by treating A-Z, 0-9, and $ as licit characters---clustering would have been avoided entirely because 37 is prime. Analogously and serendipitously, if you made 47 characters licit, thus making 47 your hashing modulus, you could avoid clustering too since 47 is also prime. John Gilmore, Ashland, MA 01721 - USA
Re: Curosity Question
Hey! I resemble that remark about "old tyme"! Please don't tar us all with that particular brush. SOME of us are not that creaky. To the OP: Yes, you can extend the range of terminal ID's generated by changing the BASENUM to 46 and updating the BASETAB characters and BASETAB2 table to use your new characters, or to 62 with lower case letters as John says. Peter -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of McKown, John Sent: Thursday, November 01, 2012 10:45 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Yes, and also must update the BASETAB and BASETAB2 tables appropriately. Alternately, you could extend to base 62 by using lower case letters as well. I.e. 0-9A-Za-z . But most people don't seem to like to distinguish between upper and lower case. Especially old tyme mainframers. -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets(r) 9151 Boulevard 26 * N. Richland Hills * TX 76010 (817) 255-3225 phone * john.mck...@healthmarkets.com * www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets(r) is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM > -Original Message- > From: IBM Mainframe Assembler List [mailto:ASSEMBLER- > l...@listserv.uga.edu] On Behalf Of esst...@juno.com > Sent: Thursday, November 01, 2012 9:25 AM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Re: Curosity Question > > Rob van der Heij > wrote > " > It's encoding an address in base-36 (using only uppercase letters and > digits). With 4 positions you can create 36**4 different combinations > (about 2**24/10). Depending on the size of the control block and the > total address space, this could produce enough unique identifiers (if > you're careful)." > > Thanks For responding . > If I wanted to add additional 10 characters such as ! @ # $ % & * + = / > wOULD i HAVE TO CHANGE THE base 36 TO 46 ? -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
Re: Curosity Question
Yes, and also must update the BASETAB and BASETAB2 tables appropriately. Alternately, you could extend to base 62 by using lower case letters as well. I.e. 0-9A-Za-z . But most people don't seem to like to distinguish between upper and lower case. Especially old tyme mainframers. -- John McKown Systems Engineer IV IT Administrative Services Group HealthMarkets(r) 9151 Boulevard 26 * N. Richland Hills * TX 76010 (817) 255-3225 phone * john.mck...@healthmarkets.com * www.HealthMarkets.com Confidentiality Notice: This e-mail message may contain confidential or proprietary information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. HealthMarkets(r) is the brand name for products underwritten and issued by the insurance subsidiaries of HealthMarkets, Inc. -The Chesapeake Life Insurance Company(r), Mid-West National Life Insurance Company of TennesseeSM and The MEGA Life and Health Insurance Company.SM > -Original Message- > From: IBM Mainframe Assembler List [mailto:ASSEMBLER- > l...@listserv.uga.edu] On Behalf Of esst...@juno.com > Sent: Thursday, November 01, 2012 9:25 AM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Re: Curosity Question > > Rob van der Heij > wrote > " > It's encoding an address in base-36 (using only uppercase letters and > digits). With 4 positions you can create 36**4 different combinations > (about 2**24/10). Depending on the size of the control block and the > total address space, this could produce enough unique identifiers (if > you're careful)." > > Thanks For responding . > If I wanted to add additional 10 characters such as ! @ # $ % & * + = / > wOULD i HAVE TO CHANGE THE base 36 TO 46 ? >
Re: Curosity Question
Thank You -- Original Message -- From: Chuck Arney To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Date: Thu, 1 Nov 2012 09:08:42 -0500 It divides by 36 and uses the remainder as a character digit because there are 36 possible characters in it's translate table. A-Z + 0-9 = 36 possible characters. Chuck Arney Arney Computer Systems -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of esst...@juno.com Sent: Thursday, November 01, 2012 8:49 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Curosity Question Im providing some snipets of code which is used to derive a CICS Terminal ID from an address in storage. The Starting address resides above the line (31-bit). In the CALCID Routine below: The code snipet derives the Termid from the lower order 3 bytes of the storage address by dividing a F'36' (BASENUM). This value is further divided by F'36' (BASENUM) and then again (3Times in Total) This calcualtion produces the last three characters of the 4 Character Terminal ID. Im not sure I understand the significance of the F'36' (BASENUM). Can someone explain the significance of using a F'36' bsing used to divide into a 31 Bit Address ? I suspect it has to do with 24 Bit Addresses. Its not clear to me what that vaule is used. What is the significance of F'36' in the CALCID routine below.
Re: Curosity Question
Rob van der Heij wrote " It's encoding an address in base-36 (using only uppercase letters and digits). With 4 positions you can create 36**4 different combinations (about 2**24/10). Depending on the size of the control block and the total address space, this could produce enough unique identifiers (if you're careful)." Thanks For responding . If I wanted to add additional 10 characters such as ! @ # $ % & * + = / wOULD i HAVE TO CHANGE THE base 36 TO 46 ? -- Original Message -- From: Rob van der Heij To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Curosity Question Date: Thu, 1 Nov 2012 15:03:51 +0100 On 1 November 2012 14:48, esst...@juno.com wrote: > Im not sure I understand the significance of the F'36' (BASENUM). > Can someone explain the significance of using a F'36' bsing used to divide > into a 31 Bit Address ? > I suspect it has to do with 24 Bit Addresses. > Its not clear to me what that vaule is used. > What is the significance of F'36' in the CALCID routine below. It's encoding an address in base-36 (using only uppercase letters and digits). With 4 positions you can create 36**4 different combinations (about 2**24/10). Depending on the size of the control block and the total address space, this could produce enough unique identifiers (if you're careful). -Rob
Re: Curosity Question
It divides by 36 and uses the remainder as a character digit because there are 36 possible characters in it's translate table. A-Z + 0-9 = 36 possible characters. Chuck Arney Arney Computer Systems -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of esst...@juno.com Sent: Thursday, November 01, 2012 8:49 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Curosity Question Im providing some snipets of code which is used to derive a CICS Terminal ID from an address in storage. The Starting address resides above the line (31-bit). In the CALCID Routine below: The code snipet derives the Termid from the lower order 3 bytes of the storage address by dividing a F'36' (BASENUM). This value is further divided by F'36' (BASENUM) and then again (3Times in Total) This calcualtion produces the last three characters of the 4 Character Terminal ID. Im not sure I understand the significance of the F'36' (BASENUM). Can someone explain the significance of using a F'36' bsing used to divide into a 31 Bit Address ? I suspect it has to do with 24 Bit Addresses. Its not clear to me what that vaule is used. What is the significance of F'36' in the CALCID routine below.
Re: Curosity Question
On 1 November 2012 14:48, esst...@juno.com wrote: > Im not sure I understand the significance of the F'36' (BASENUM). > Can someone explain the significance of using a F'36' bsing used to divide > into a 31 Bit Address ? > I suspect it has to do with 24 Bit Addresses. > Its not clear to me what that vaule is used. > What is the significance of F'36' in the CALCID routine below. It's encoding an address in base-36 (using only uppercase letters and digits). With 4 positions you can create 36**4 different combinations (about 2**24/10). Depending on the size of the control block and the total address space, this could produce enough unique identifiers (if you're careful). -Rob