Re: Sortlessness?
Googling turns up the fact that the fundamental paper by René Haentjens, "Ordering universal character strings", is available at http://www.hpl.hp.com/hpjournal/dtj/vol5num3/vol5num3art4.txt Tony H was right to mention the work of the National Language Technical Center at IBM's Toronto Laboratory. (It is|was actually in North York, Ontario, a Toronto suburb.) I cherish my copies of its multivolume National Language Design Guide, and anyone who can find copies of them on the net should download them. Why the NLTC was killed off, notionally by IBM Canada, is unlikely ever to be fully understood. No one outside IBM is in any position to speculate about such things, and those inside it all have their own organizational political imperatives to defend. What is clear in the record is that it was a centre of excellence. Nowhere else, for example, have I seen other cogent treatments of the problems of treating cyrillic text embedded in roman text, roman text embedded ir arabic text, and the like. John Gilmore, Ashland, MA 01721 - USA
Re: Sortlessness?
(I'm trying to move this to IBM-MAIN; it really doesn't belong on ASSEMBLER-LIST. And not trimming quoted material as much as I usually would.) On 2013-09-04 10:29, Tony Harminc wrote: > On 1 September 2013 00:51, Paul Gilmartin wrote: >> On 2013-08-31, at 08:55, John Gilmore wrote: >>> >>> ... They use data transformations to make it possible >>> for two keys to be compared using a single CLC[L]. (DB2 does similar >>> things too.) >>> >> This can be particularly complex for literary collating conventions >> such as EN_US which DFSORT gets terribly wrong. I tried a PMR on >> this a few years ago. When I reported that DFSORT and a C program >> using strcoll() produce similar incorrect results, DFSORT and I >> agreed that the problem should belong to LE. >> >> LE gave me WAD with a rationale so outrageous that I gave up in >> disgust, making no effort to escalate. > > Isn't it a POSIX violation to produce incorrect collation results for > a locale? Not, I suppose, that that's stopped them before. > > It's a shame because IBM was in the forefront of getting this > collation stuff right, and into the POSIX standards. See the early > Redbook GG24-3516 Keys to Sort and Search for Culturally Expected > Results, and much subsequent work from IBM's long gone National > Language Technical Center. > Thanks for the reference. I'll look for it on publibz. Or might I find it on InfoCenter? The first point of frustration is the inconsistency in the *names* of the locales. They're case-sensitive on most platforms; case- insensitive (I think) on z/OS. I needed to supply the following preamble to make my test case portable: static char #if defined( __APPLE__ ) *US = "en_US.UTF-8", *CA = "en_CA.UTF-8", #elif defined( __linux__ ) *US = "en_US.utf8", *CA = "en_CA.utf8", #elif defined( __MVS__ ) #if ( '0' == 0xf0 ) *US = "En_US.IBM-1047", /* EBCDIC */ *CA = "En_CA.IBM-1047", #else *US = "En_US.UTF-8.xplink", /* ASCII */ *CA = "En_GB.UTF-8.xplink", #endif #elif defined( __sun ) *US = "en_US.ISO8859-1", *CA = "en_CA.ISO8859-1", #else *US = "en_US.utf8", *CA = "en_CA.utf8", #endif *C = "C"; gil
Re: Sortlessness?
Peter, The URL you cite is worthy, but it is a small proper subset of the NLTC materials. I found myself disagreeing with some of it as I read it, but that is no bad thing. John Gilmore, Ashland, MA 01721 - USA
Re: Sortlessness?
John, Does this set of IBM "globalization guidelines" web pages match any part(s) of the NLTC design guide you mentioned? http://www-01.ibm.com/software/globalization/guidelines/outline.html Just curious if what I found there matches what you have. Peter -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of John Gilmore Sent: Wednesday, September 04, 2013 5:07 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Sortlessness? Googling turns up the fact that the fundamental paper by René Haentjens, "Ordering universal character strings", is available at http://www.hpl.hp.com/hpjournal/dtj/vol5num3/vol5num3art4.txt Tony H was right to mention the work of the National Language Technical Center at IBM's Toronto Laboratory. (It is|was actually in North York, Ontario, a Toronto suburb.) I cherish my copies of its multivolume National Language Design Guide, and anyone who can find copies of them on the net should download them. Why the NLTC was killed off, notionally by IBM Canada, is unlikely ever to be fully understood. No one outside IBM is in any position to speculate about such things, and those inside it all have their own organizational political imperatives to defend. What is clear in the record is that it was a centre of excellence. Nowhere else, for example, have I seen other cogent treatments of the problems of treating cyrillic text embedded in roman text, roman text embedded ir arabic text, and the like. -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
Re: Sortlessness?
On 1 September 2013 00:51, Paul Gilmartin wrote: > On 2013-08-31, at 08:55, John Gilmore wrote: >> >> ... They use data transformations to make it possible >> for two keys to be compared using a single CLC[L]. (DB2 does similar >> things too.) >> > This can be particularly complex for literary collating conventions > such as EN_US which DFSORT gets terribly wrong. I tried a PMR on > this a few years ago. When I reported that DFSORT and a C program > using strcoll() produce similar incorrect results, DFSORT and I > agreed that the problem should belong to LE. > > LE gave me WAD with a rationale so outrageous that I gave up in > disgust, making no effort to escalate. Isn't it a POSIX violation to produce incorrect collation results for a locale? Not, I suppose, that that's stopped them before. It's a shame because IBM was in the forefront of getting this collation stuff right, and into the POSIX standards. See the early Redbook GG24-3516 Keys to Sort and Search for Culturally Expected Results, and much subsequent work from IBM's long gone National Language Technical Center. Tony H.
Re: Sortlessness?
Paul, >> ... so outrageous that I gave up in disgust Reflects my feeling in some of the arguments I had lately about PMRs (which had to be opened to begin with, and the reasons they were closed later). -- Martin Pi_cap_CPU - all you ever need around MWLC/SCRT/CMT in z/VSE more at http://www.picapcpu.de
Automatic reply: Sortlessness?
I'm currently out of the office until Monday 09/09/2013. MARKSANDSPENCER.COM Unless otherwise stated above: Marks and Spencer plc Registered Office: Waterside House 35 North Wharf Road London W2 1NW Registered No. 214436 in England and Wales. Telephone (020) 7935 4422 Facsimile (020) 7487 2670 www.marksandspencer.com Please note that electronic mail may be monitored. This e-mail is confidential. If you received it by mistake, please let us know and then delete it from your system; you should not copy, disclose, or distribute its contents to anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful.
Re: Sortlessness?
On 2013-08-31, at 08:55, John Gilmore wrote: > > ... They use data transformations to make it possible > for two keys to be compared using a single CLC[L]. (DB2 does similar > things too.) > This can be particularly complex for literary collating conventions such as EN_US which DFSORT gets terribly wrong. I tried a PMR on this a few years ago. When I reported that DFSORT and a C program using strcoll() produce similar incorrect results, DFSORT and I agreed that the problem should belong to LE. LE gave me WAD with a rationale so outrageous that I gave up in disgust, making no effort to escalate. "Never argue with an idiot. They will only bring you down to their level and beat you with experience" -- diverse attributions, from George Carlin to Mark Twain to far older. If anyone else cares to take up the banner, I'll gladly donate my test cases, both DFSORT and C; a mere few dozen lines each. -- gil
Re: Sortlessness?
zMan ... /me notes that his quip about using a card sorter has, as usual, caused rampant pedanticism and topic drift, wonders why he bothers... No idea how many will search the list archives after the topic was finally beaten to death. Hopefully they will find one or more posts with complete, accurate and comprehensible information. That's why I bothered posting on this trivia subject. Andreas Geissbuehler
Re: Sortlessness?
On 2013-08-30 16:04, zMan wrote: If you need one, there's always http://www.ebay.com/itm/IBM-Model-83-Punch-Card-Sorter-/300954726197?pt=US_Vintage_Computers_Mainframes&hash=item46124caf35 At 16:28 -0600 on 08/30/2013, Paul Gilmartin wrote about Re: Sortlessness?: And it operates in time linear with respect to the size of the input data set, implying that for a sufficiently large input data set it will outperform most competing technologies. -- gil Robert A. Rosenberg addded: I think that should be "it operates in time linear with respect to the size of the input data set TIMES THE LENGTH OF THE SORT FIELD". IOW: The actual time (ignoring the time it takes to collect the 12 stacks of cards and putting them back into the feed tray for sorting on the next column) is the same as a single column/pass sort of a deck whose size is X times as large (where X is the number of columns you are sorting on). FWIW...alpha-numeric columns need 2 passses through the sorter !! I think it shold be "it operates in time linear with respect to the NUMBER OF CARDS TIMES the SUM of the sort field columns PLUS the number of alpha-numeric columns in the sort field. The latter includes the number of numeric field columns with +/- sign." e.g. request for some report sorted on: cc.71-72 Prov/State and cc.11-16 Date MMDDYY requires *10* passes through the sorter in this order: sort N on cc. 14, 13, 12, 11, 16, 15, 72 sort Z on cc. 72 sort N on cc. 71 sort Z on cc. 71 Say 20'000 cards / 1000 cpm = 20 min / pass, 3:20 hrs total excluding card jams, human errors, ... Andreas Geissbuehler
Re: Sortlessness?
/me notes that his quip about using a card sorter has, as usual, caused rampant pedanticism and topic drift, wonders why he bothers... On Sat, Aug 31, 2013 at 10:55 AM, John Gilmore wrote: > There is a qualitative difference between modern sorting technology > and the very simple 'logical' or lexicographic sorting operations > performed by a card sorter, one that Chris.Baicher properly emphasized > in an earlier post. They use data transformations to make it possible > for two keys to be compared using a single CLC[L]. (DB2 does similar > things too.) > > A single example will suffice here. The four signed binary-integer > storage formats and the nine floating-point storage formats all use > the twos-complement sign-representation, 0b for non-negative or 1b for > negative. The single-byte signed representation of -128 is thus > b and that of +127 is 0111b. Lexicographically, the 2C > representation of -128 is greater than the 2C representation of +127. > This inconvenience can be dealt with in a constructed key that > concatenates 'mixed' data-type sort fields in at least two ways, e.g., > by complementing the high-order, leftmost bits of such quantities. > > Operations of this kind are well beyond the scope of card sorters. > Sorts have become black boxes. Few of their users know or care much > about what goes on inside them, and this is a pity because they embody > a lot of not at all obvious technology that is of considerable > interest. Much of it is or, better, would be useful elsewhere too. > > John Gilmore, Ashland, MA 01721 - USA > -- zMan -- "I've got a mainframe and I'm not afraid to use it"
Re: Sortlessness?
There is a qualitative difference between modern sorting technology and the very simple 'logical' or lexicographic sorting operations performed by a card sorter, one that Chris.Baicher properly emphasized in an earlier post. They use data transformations to make it possible for two keys to be compared using a single CLC[L]. (DB2 does similar things too.) A single example will suffice here. The four signed binary-integer storage formats and the nine floating-point storage formats all use the twos-complement sign-representation, 0b for non-negative or 1b for negative. The single-byte signed representation of -128 is thus b and that of +127 is 0111b. Lexicographically, the 2C representation of -128 is greater than the 2C representation of +127. This inconvenience can be dealt with in a constructed key that concatenates 'mixed' data-type sort fields in at least two ways, e.g., by complementing the high-order, leftmost bits of such quantities. Operations of this kind are well beyond the scope of card sorters. Sorts have become black boxes. Few of their users know or care much about what goes on inside them, and this is a pity because they embody a lot of not at all obvious technology that is of considerable interest. Much of it is or, better, would be useful elsewhere too. John Gilmore, Ashland, MA 01721 - USA
Re: Sortlessness?
On 8/31/2013 9:42 AM, Blaicher, Christopher Y. wrote: Consider that a fast card sorter could process about 2,000 cards a minute, or even if it could process 20,000 cards a minute, that works out to about 2,666 bytes a second for the 2,000 card case or 26,666 bytes a second for the fictional 20,000 card case. It's worse than that, since the sorter requires either one (numeric only) or two (alphanumeric) passes per sort column. Luckily my only use was confined to short numeric fields Gerhard Postpischil Bradford, Vermont
Re: Sortlessness?
I find the comment "for a sufficiently large input data set it will outperform most competing technologies" most interesting, and dated. It beat competing technologies of the day, but not of today. Consider that a fast card sorter could process about 2,000 cards a minute, or even if it could process 20,000 cards a minute, that works out to about 2,666 bytes a second for the 2,000 card case or 26,666 bytes a second for the fictional 20,000 card case. Syncsort MFX typically will process over 100,000,000 bytes per second, depending on the input and output devices as they tend to be the limiting factors. CPU time may not be linear based on a number of factors, but I can buy more processors, I cannot buy more wall clock time. The card sorter was the speed demon of its day, but it was replaced for a reason. Chris Blaicher Principal Software Engineer, Software Development Syncsort Incorporated 50 Tice Boulevard, Woodcliff Lake, NJ 07677 P: 201-930-8260 | M: 512-627-3803 E: cblaic...@syncsort.com -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Paul Gilmartin Sent: Friday, August 30, 2013 5:29 PM To: MVS List Server 2 Subject: Re: Sortlessness? On 2013-08-30 16:04, zMan wrote: > If you need one, there's always > http://www.ebay.com/itm/IBM-Model-83-Punch-Card-Sorter-/300954726197?p > t=US_Vintage_Computers_Mainframes&hash=item46124caf35 > And it operates in time linear with respect to the size of the input data set, implying that for a sufficiently large input data set it will outperform most competing technologies. -- gil
Re: Sortlessness?
At 16:28 -0600 on 08/30/2013, Paul Gilmartin wrote about Re: Sortlessness?: On 2013-08-30 16:04, zMan wrote: If you need one, there's always http://www.ebay.com/itm/IBM-Model-83-Punch-Card-Sorter-/300954726197?pt=US_Vintage_Computers_Mainframes&hash=item46124caf35 And it operates in time linear with respect to the size of the input data set, implying that for a sufficiently large input data set it will outperform most competing technologies. -- gil I think that should be "it operates in time linear with respect to the size of the input data set TIMES THE LENGTH OF THE SORT FIELD". IOW: The actual time (ignoring the time it takes to collect the 12 stacks of cards and putting them back into the feed tray for sorting on the next column) is the same as a single column/pass sort of a deck whose size is X times as large (where X is the number of columns you are sorting on).
Re: Sortlessness?
As a programmer, I only had to put my deck in the sorter AFTER I dropped it! :) On Fri, Aug 30, 2013 at 2:43 PM, Capps, Joey wrote: > Not since the old days when you dropped your card deck in a physical card > sorter :-) > > -Original Message- > From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] > On Behalf Of Gord Tomlin > Sent: Friday, August 30, 2013 3:33 PM > To: ASSEMBLER-LIST@LISTSERV.UGA.EDU > Subject: Sortlessness? > > Just a little Friday afternoon curiosity...does anyone have, or know of, > any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) > installed? Before anyone asks, no, I am not planning to develop one! > > -- > > Regards, Gord Tomlin > Action Software International > (a division of Mazda Computer Corporation) > Tel: (905) 470-7113, Fax: (905) 470-6507 >
Re: Sortlessness?
On 2013-08-30 14:32, Gord Tomlin wrote: > Just a little Friday afternoon curiosity...does anyone have, or know of, > any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) > installed? Before anyone asks, no, I am not planning to develop one! > Does the POSIX sort command count? I find it very useful. -- gil
Re: Sortlessness?
If you need one, there's always http://www.ebay.com/itm/IBM-Model-83-Punch-Card-Sorter-/300954726197?pt=US_Vintage_Computers_Mainframes&hash=item46124caf35 On Fri, Aug 30, 2013 at 2:54 PM, Paul Gilmartin wrote: > On 2013-08-30 14:32, Gord Tomlin wrote: > > Just a little Friday afternoon curiosity...does anyone have, or know of, > > any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) > > installed? Before anyone asks, no, I am not planning to develop one! > > > Does the POSIX sort command count? I find it very useful. > > -- gil > -- zMan -- "I've got a mainframe and I'm not afraid to use it"
Re: Sortlessness?
On 2013-08-30 16:04, zMan wrote: > If you need one, there's always > http://www.ebay.com/itm/IBM-Model-83-Punch-Card-Sorter-/300954726197?pt=US_Vintage_Computers_Mainframes&hash=item46124caf35 > And it operates in time linear with respect to the size of the input data set, implying that for a sufficiently large input data set it will outperform most competing technologies. -- gil
Re: Sortlessness?
Not since the old days when you dropped your card deck in a physical card sorter :-) -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Gord Tomlin Sent: Friday, August 30, 2013 3:33 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Sortlessness? Just a little Friday afternoon curiosity...does anyone have, or know of, any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) installed? Before anyone asks, no, I am not planning to develop one! -- Regards, Gord Tomlin Action Software International (a division of Mazda Computer Corporation) Tel: (905) 470-7113, Fax: (905) 470-6507
Re: Sortlessness?
On 8/30/2013 1:32 PM, Gord Tomlin wrote: Just a little Friday afternoon curiosity...does anyone have, or know of, any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) installed? Before anyone asks, no, I am not planning to develop one! We ran for many years without a commercial sort product. Hard to imagine a production customer could get away with that... -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 http://www.phoenixsoftware.com/
Sortlessness?
Just a little Friday afternoon curiosity...does anyone have, or know of, any z/OS systems that have *no* sort product (DFSORT, Syncsort, etc.) installed? Before anyone asks, no, I am not planning to develop one! -- Regards, Gord Tomlin Action Software International (a division of Mazda Computer Corporation) Tel: (905) 470-7113, Fax: (905) 470-6507