subject:"Why sort \(was Microprocessor Optimization Primer\)"

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread Wayne Bickerdike

David,

I haven't seen benchmarks yet, just a bunch of wall time comparisons. We
have terabytes of memory on our Z13 that hasn't been deployed yet, I'm
cynical but our guys have to start to run interference.

On Thu, Apr 7, 2016 at 9:46 PM, David Crayford  wrote:

> On 7/04/2016 6:59 PM, Wayne Bickerdike wrote:
>
>> I'm slightly gobsmacked that this discussion is needed. I guess the forest
>> is lost in the trees.
>>
>> I can recommend "Principles of Program Design" by Michael Jackson c. 1975.
>>
>> Of greater concern is the implication that Oracle on AIX outperforms DB2
>> on
>> z/OS at our shop. Surely not :(
>>
>
> Do you have real workload benchmarks that prove it Wayne?
>
>
> On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing  wrote:
>>
>> On 04/06/2016 07:01 AM, Andrew Rowley wrote:
>>>
 On 05/04/2016 01:20 AM, Tom Marchant wrote:

> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>
> A Hashmap potentially allows you to read sequentially and match records
>> between files, without caring about the order.
>>
> Can you please explain what you mean by this? Are you talking about
> using
> the hashmap to determine which record to read next, and so to read the
> records in an order that is logically sequential, but physically
> random? If so,
> that is not at all like reading the records sequentially.
>
> If one file fits in memory, you can read it sequentially into a
 Hashmap with the using the data you want to match as the key.
 Then read the second one, also sequentially, retrieving matching
 records from the Hashmap by key. You can also remove them from the
 Hashmap as they are found if you need to know if any are unmatched.

 But this is a solution for a made up case - I don't know whether it is
 a common situation. I was interested in hearing real reasons why sort
 is so common on z/OS i.e. Why sort?

 On Hashmaps etc. in general - they are the memory equivalent to
 indexed datasets (VSAM etc) versus sequential datasets. Their
 availability opens up many new ways to process data - and algorithm
 changes are often where the big savings can be made.

 I believe others have already alluded to the potential time advantage of
>>> processing a large number of updates in key order rather than randomly
>>> when external data is indexed but actually physically ordered by some
>>> key.  The reason why this has historically been the case is that
>>> external disk storage devices which allow random access have
>>> rotational-latency delay and access-head-positioning delay which is
>>> minimized when doing full-track or even multi-track I/O and when
>>> accessing adjacent cylinders.  The way to update the data in minimal
>>> real time is to do the I/O in minimal disk rotations, accessing all data
>>> needed on the same track in one rotation and all data in one cylinder
>>> before moving to an adjacent cylinder. Crucial to this concept is
>>> understanding that z/OS includes support within I/O access methods which
>>> allows applications to successfully exploit the ability of DASD hardware
>>> to transfer one, several, or all data blocks on a track as a single
>>> operation within a single disk revolution.
>>>
>>> With emulated DASD and hardware DASD caching, the effects of physical
>>> track and cylinder boundaries may be unknownl, but it is still likely
>>> that minimizing repeated visitations to an emulated track  or an
>>> emulated cylinder will achieve similar locality of reference on physical
>>> DASD, reduce latency delays and improve the effectiveness of hardware
>>> caching.  Processing transaction records in the same order as the
>>> database records are physically stored on an external file gives the
>>> best odds of grouping transactions needing the same track and cylinder
>>> together and for minimizing I/O delays and minimizing demands on DASD
>>> cache storage and processor storage for file buffers.  Processing
>>> transactions in a different order increases the likelihood that the
>>> needed file data to process the transaction is no longer in processor
>>> memory or disk cache and that at a minimum the time equivalent of
>>> another disk revolution  will be required to obtain it.
>>>
>>> It was not uncommon with VSAM files for transaction sorting to improve
>>> real-time processing speed sufficiently that the break-even point even
>>> with sorting overhead could be as low as updating only 5% of the
>>> database.  These techniques were common in MVS and its z/OS successor
>>> applications because it was common for those systems to deal with very
>>> large files and databases where tricks like this were necessary in order
>>> to meet constrained nightly batch processing windows..  Since it is
>>> common in z/OS to be dealing with very large files and databases, there
>>> are always files in those environments that are too large to consider

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread Jesse 1 Robinson

An excellent synopsis of mainframe history. It follows that most mature shops 
use SORT extensively because until recently, the platform pretty much required 
it for reasonable performance as measured by wall clock. One could argue--maybe 
even prove--that today's DASD allows more random updating than in the days of 
yore, but a mature shop that has orchestrated batch around sorting would find 
it a hard sell to convince business units (i.e. paying customers) to reengineer 
massive production processes just it's possible. 

We explored TVS (Transactional VSAM) in ESP some years ago. As wonderful as it 
sounded--and probably was--the target applications folks balked at having to 
redesign their update programs because the processing logic is totally 
different. Unfortunately, I think they moved off of mainframe instead. ;-(( 

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
323-715-0595 Mobile
626-302-7535 Office
robin...@sce.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Joel C. Ewing
Sent: Wednesday, April 06, 2016 9:59 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: (External):Re: Why sort (was Microprocessor Optimization Primer)

On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> On 05/04/2016 01:20 AM, Tom Marchant wrote:
>> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>>
>>> A Hashmap potentially allows you to read sequentially and match 
>>> records between files, without caring about the order.
>> Can you please explain what you mean by this? Are you talking about 
>> using the hashmap to determine which record to read next, and so to 
>> read the records in an order that is logically sequential, but 
>> physically random? If so, that is not at all like reading the records 
>> sequentially.
>>
>
> If one file fits in memory, you can read it sequentially into a 
> Hashmap with the using the data you want to match as the key.
> Then read the second one, also sequentially, retrieving matching 
> records from the Hashmap by key. You can also remove them from the 
> Hashmap as they are found if you need to know if any are unmatched.
>
> But this is a solution for a made up case - I don't know whether it is 
> a common situation. I was interested in hearing real reasons why sort 
> is so common on z/OS i.e. Why sort?
>
> On Hashmaps etc. in general - they are the memory equivalent to 
> indexed datasets (VSAM etc) versus sequential datasets. Their 
> availability opens up many new ways to process data - and algorithm 
> changes are often where the big savings can be made.
>
I believe others have already alluded to the potential time advantage of 
processing a large number of updates in key order rather than randomly when 
external data is indexed but actually physically ordered by some key.  The 
reason why this has historically been the case is that external disk storage 
devices which allow random access have rotational-latency delay and 
access-head-positioning delay which is minimized when doing full-track or even 
multi-track I/O and when accessing adjacent cylinders.  The way to update the 
data in minimal real time is to do the I/O in minimal disk rotations, accessing 
all data needed on the same track in one rotation and all data in one cylinder 
before moving to an adjacent cylinder. Crucial to this concept is understanding 
that z/OS includes support within I/O access methods which allows applications 
to successfully exploit the ability of DASD hardware to transfer one, several, 
or all data blocks on a track as a single operation within a single disk 
revolution. 

With emulated DASD and hardware DASD caching, the effects of physical track and 
cylinder boundaries may be unknownl, but it is still likely that minimizing 
repeated visitations to an emulated track  or an emulated cylinder will achieve 
similar locality of reference on physical DASD, reduce latency delays and 
improve the effectiveness of hardware caching.  Processing transaction records 
in the same order as the database records are physically stored on an external 
file gives the best odds of grouping transactions needing the same track and 
cylinder together and for minimizing I/O delays and minimizing demands on DASD 
cache storage and processor storage for file buffers.  Processing transactions 
in a different order increases the likelihood that the needed file data to 
process the transaction is no longer in processor memory or disk cache and that 
at a minimum the time equivalent of another disk revolution  will be required 
to obtain it.

It was not uncommon with VSAM files for transaction sorting to improve 
real-time processing speed sufficiently that the break-even point even with 
sorting overhead could be as low as updating only 5% of the database.  These 
techniq

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread Mitch Mccluhan

...hey Wayne.

Mitch Mccluhan
mitc...@aol.com

On Thursday, April 7, 2016 Wayne Bickerdike  wrote:
I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.

I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(

On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing  wrote:

> On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> > On 05/04/2016 01:20 AM, Tom Marchant wrote:
> >> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
> >>
> >>> A Hashmap potentially allows you to read sequentially and match records
> >>> between files, without caring about the order.
> >> Can you please explain what you mean by this? Are you talking about
> >> using
> >> the hashmap to determine which record to read next, and so to read the
> >> records in an order that is logically sequential, but physically
> >> random? If so,
> >> that is not at all like reading the records sequentially.
> >>
> >
> > If one file fits in memory, you can read it sequentially into a
> > Hashmap with the using the data you want to match as the key.
> > Then read the second one, also sequentially, retrieving matching
> > records from the Hashmap by key. You can also remove them from the
> > Hashmap as they are found if you need to know if any are unmatched.
> >
> > But this is a solution for a made up case - I don't know whether it is
> > a common situation. I was interested in hearing real reasons why sort
> > is so common on z/OS i.e. Why sort?
> >
> > On Hashmaps etc. in general - they are the memory equivalent to
> > indexed datasets (VSAM etc) versus sequential datasets. Their
> > availability opens up many new ways to process data - and algorithm
> > changes are often where the big savings can be made.
> >
> I believe others have already alluded to the potential time advantage of
> processing a large number of updates in key order rather than randomly
> when external data is indexed but actually physically ordered by some
> key. The reason why this has historically been the case is that
> external disk storage devices which allow random access have
> rotational-latency delay and access-head-positioning delay which is
> minimized when doing full-track or even multi-track I/O and when
> accessing adjacent cylinders. The way to update the data in minimal
> real time is to do the I/O in minimal disk rotations, accessing all data
> needed on the same track in one rotation and all data in one cylinder
> before moving to an adjacent cylinder. Crucial to this concept is
> understanding that z/OS includes support within I/O access methods which
> allows applications to successfully exploit the ability of DASD hardware
> to transfer one, several, or all data blocks on a track as a single
> operation within a single disk revolution.
>
> With emulated DASD and hardware DASD caching, the effects of physical
> track and cylinder boundaries may be unknownl, but it is still likely
> that minimizing repeated visitations to an emulated track or an
> emulated cylinder will achieve similar locality of reference on physical
> DASD, reduce latency delays and improve the effectiveness of hardware
> caching. Processing transaction records in the same order as the
> database records are physically stored on an external file gives the
> best odds of grouping transactions needing the same track and cylinder
> together and for minimizing I/O delays and minimizing demands on DASD
> cache storage and processor storage for file buffers. Processing
> transactions in a different order increases the likelihood that the
> needed file data to process the transaction is no longer in processor
> memory or disk cache and that at a minimum the time equivalent of
> another disk revolution will be required to obtain it.
>
> It was not uncommon with VSAM files for transaction sorting to improve
> real-time processing speed sufficiently that the break-even point even
> with sorting overhead could be as low as updating only 5% of the
> database. These techniques were common in MVS and its z/OS successor
> applications because it was common for those systems to deal with very
> large files and databases where tricks like this were necessary in order
> to meet constrained nightly batch processing windows.. Since it is
> common in z/OS to be dealing with very large files and databases, there
> are always files in those environments that are too large to consider
> placing the entire file in memory, no matter how large processor memory
> becomes.
>
> Hash maps are not really equivalent to VSAM data sets because a VSAM
> file is not just indexed, but indexed-sequential, which means once you
> have successfully stored records in the file, reading the records in key
> order from a VSAM file is just a trivial sequential read. A hash map
> makes it trivial to find a record with a given key, but if

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread David Crayford


On 7/04/2016 7:56 PM, John McKown wrote:

On Thu, Apr 7, 2016 at 5:59 AM, Wayne Bickerdike  wrote:


I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.

I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(


Without knowing the hardware & software setups, I can believe this. Why?
Because our distributed systems people "proved" that a Sun running Solaris
was faster than z/Linux on a z. Of course, they were comparing a
_dedicated_ Sun server (don't know the exact model) which 10 CPs and 10Gb
of memory to z/Linux running on a z890 with 2 Gb memory and a single IFL.
Kind of like proving that a Chevy is better than a Mazda by comparing a
Corvette's performance on a race track to my Mazda 3 on the same race
track.



So that Sun setup is basically what you get with a $5000 x86 blade!



--
Wayne V. Bickerdike




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread John McKown

On Thu, Apr 7, 2016 at 5:59 AM, Wayne Bickerdike  wrote:

> I'm slightly gobsmacked that this discussion is needed. I guess the forest
> is lost in the trees.
>
> I can recommend "Principles of Program Design" by Michael Jackson c. 1975.
>
> Of greater concern is the implication that Oracle on AIX outperforms DB2 on
> z/OS at our shop. Surely not :(
>

Without knowing the hardware & software setups, I can believe this. Why?
Because our distributed systems people "proved" that a Sun running Solaris
was faster than z/Linux on a z. Of course, they were comparing a
_dedicated_ Sun server (don't know the exact model) which 10 CPs and 10Gb
of memory to z/Linux running on a z890 with 2 Gb memory and a single IFL.
Kind of like proving that a Chevy is better than a Mazda by comparing a
Corvette's performance on a race track to my Mazda 3 on the same race
track.

>
>
> --
> Wayne V. Bickerdike
>
>

-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread David Crayford


On 7/04/2016 6:59 PM, Wayne Bickerdike wrote:

I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.

I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(


Do you have real workload benchmarks that prove it Wayne?


On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing  wrote:


On 04/06/2016 07:01 AM, Andrew Rowley wrote:

On 05/04/2016 01:20 AM, Tom Marchant wrote:

On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:


A Hashmap potentially allows you to read sequentially and match records
between files, without caring about the order.

Can you please explain what you mean by this? Are you talking about
using
the hashmap to determine which record to read next, and so to read the
records in an order that is logically sequential, but physically
random? If so,
that is not at all like reading the records sequentially.


If one file fits in memory, you can read it sequentially into a
Hashmap with the using the data you want to match as the key.
Then read the second one, also sequentially, retrieving matching
records from the Hashmap by key. You can also remove them from the
Hashmap as they are found if you need to know if any are unmatched.

But this is a solution for a made up case - I don't know whether it is
a common situation. I was interested in hearing real reasons why sort
is so common on z/OS i.e. Why sort?

On Hashmaps etc. in general - they are the memory equivalent to
indexed datasets (VSAM etc) versus sequential datasets. Their
availability opens up many new ways to process data - and algorithm
changes are often where the big savings can be made.


I believe others have already alluded to the potential time advantage of
processing a large number of updates in key order rather than randomly
when external data is indexed but actually physically ordered by some
key.  The reason why this has historically been the case is that
external disk storage devices which allow random access have
rotational-latency delay and access-head-positioning delay which is
minimized when doing full-track or even multi-track I/O and when
accessing adjacent cylinders.  The way to update the data in minimal
real time is to do the I/O in minimal disk rotations, accessing all data
needed on the same track in one rotation and all data in one cylinder
before moving to an adjacent cylinder. Crucial to this concept is
understanding that z/OS includes support within I/O access methods which
allows applications to successfully exploit the ability of DASD hardware
to transfer one, several, or all data blocks on a track as a single
operation within a single disk revolution.

With emulated DASD and hardware DASD caching, the effects of physical
track and cylinder boundaries may be unknownl, but it is still likely
that minimizing repeated visitations to an emulated track  or an
emulated cylinder will achieve similar locality of reference on physical
DASD, reduce latency delays and improve the effectiveness of hardware
caching.  Processing transaction records in the same order as the
database records are physically stored on an external file gives the
best odds of grouping transactions needing the same track and cylinder
together and for minimizing I/O delays and minimizing demands on DASD
cache storage and processor storage for file buffers.  Processing
transactions in a different order increases the likelihood that the
needed file data to process the transaction is no longer in processor
memory or disk cache and that at a minimum the time equivalent of
another disk revolution  will be required to obtain it.

It was not uncommon with VSAM files for transaction sorting to improve
real-time processing speed sufficiently that the break-even point even
with sorting overhead could be as low as updating only 5% of the
database.  These techniques were common in MVS and its z/OS successor
applications because it was common for those systems to deal with very
large files and databases where tricks like this were necessary in order
to meet constrained nightly batch processing windows..  Since it is
common in z/OS to be dealing with very large files and databases, there
are always files in those environments that are too large to consider
placing the entire file in memory, no matter how large processor memory
becomes.

Hash maps are not really equivalent to VSAM data sets because a VSAM
file is not just indexed, but indexed-sequential, which means once you
have successfully stored records in the file, reading the records in key
order from a VSAM file is just a trivial sequential read.  A hash map
makes it trivial to find a record with a given key, but if you also need
to access the records in key order, a sort of the keys is still
required.  I have applications that have used hash tables in exactly
that way, doing a tag-sort of the keys after the fact to allow

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-07 Thread Wayne Bickerdike

I'm slightly gobsmacked that this discussion is needed. I guess the forest
is lost in the trees.

I can recommend "Principles of Program Design" by Michael Jackson c. 1975.

Of greater concern is the implication that Oracle on AIX outperforms DB2 on
z/OS at our shop. Surely not :(

On Thu, Apr 7, 2016 at 2:59 PM, Joel C. Ewing  wrote:

> On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> > On 05/04/2016 01:20 AM, Tom Marchant wrote:
> >> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
> >>
> >>> A Hashmap potentially allows you to read sequentially and match records
> >>> between files, without caring about the order.
> >> Can you please explain what you mean by this? Are you talking about
> >> using
> >> the hashmap to determine which record to read next, and so to read the
> >> records in an order that is logically sequential, but physically
> >> random? If so,
> >> that is not at all like reading the records sequentially.
> >>
> >
> > If one file fits in memory, you can read it sequentially into a
> > Hashmap with the using the data you want to match as the key.
> > Then read the second one, also sequentially, retrieving matching
> > records from the Hashmap by key. You can also remove them from the
> > Hashmap as they are found if you need to know if any are unmatched.
> >
> > But this is a solution for a made up case - I don't know whether it is
> > a common situation. I was interested in hearing real reasons why sort
> > is so common on z/OS i.e. Why sort?
> >
> > On Hashmaps etc. in general - they are the memory equivalent to
> > indexed datasets (VSAM etc) versus sequential datasets. Their
> > availability opens up many new ways to process data - and algorithm
> > changes are often where the big savings can be made.
> >
> I believe others have already alluded to the potential time advantage of
> processing a large number of updates in key order rather than randomly
> when external data is indexed but actually physically ordered by some
> key.  The reason why this has historically been the case is that
> external disk storage devices which allow random access have
> rotational-latency delay and access-head-positioning delay which is
> minimized when doing full-track or even multi-track I/O and when
> accessing adjacent cylinders.  The way to update the data in minimal
> real time is to do the I/O in minimal disk rotations, accessing all data
> needed on the same track in one rotation and all data in one cylinder
> before moving to an adjacent cylinder. Crucial to this concept is
> understanding that z/OS includes support within I/O access methods which
> allows applications to successfully exploit the ability of DASD hardware
> to transfer one, several, or all data blocks on a track as a single
> operation within a single disk revolution.
>
> With emulated DASD and hardware DASD caching, the effects of physical
> track and cylinder boundaries may be unknownl, but it is still likely
> that minimizing repeated visitations to an emulated track  or an
> emulated cylinder will achieve similar locality of reference on physical
> DASD, reduce latency delays and improve the effectiveness of hardware
> caching.  Processing transaction records in the same order as the
> database records are physically stored on an external file gives the
> best odds of grouping transactions needing the same track and cylinder
> together and for minimizing I/O delays and minimizing demands on DASD
> cache storage and processor storage for file buffers.  Processing
> transactions in a different order increases the likelihood that the
> needed file data to process the transaction is no longer in processor
> memory or disk cache and that at a minimum the time equivalent of
> another disk revolution  will be required to obtain it.
>
> It was not uncommon with VSAM files for transaction sorting to improve
> real-time processing speed sufficiently that the break-even point even
> with sorting overhead could be as low as updating only 5% of the
> database.  These techniques were common in MVS and its z/OS successor
> applications because it was common for those systems to deal with very
> large files and databases where tricks like this were necessary in order
> to meet constrained nightly batch processing windows..  Since it is
> common in z/OS to be dealing with very large files and databases, there
> are always files in those environments that are too large to consider
> placing the entire file in memory, no matter how large processor memory
> becomes.
>
> Hash maps are not really equivalent to VSAM data sets because a VSAM
> file is not just indexed, but indexed-sequential, which means once you
> have successfully stored records in the file, reading the records in key
> order from a VSAM file is just a trivial sequential read.  A hash map
> makes it trivial to find a record with a given key, but if you also need
> to access the records in key order, a sort of the keys is still
> required.  I have

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-06 Thread Joel C. Ewing

On 04/06/2016 07:01 AM, Andrew Rowley wrote:
> On 05/04/2016 01:20 AM, Tom Marchant wrote:
>> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>>
>>> A Hashmap potentially allows you to read sequentially and match records
>>> between files, without caring about the order.
>> Can you please explain what you mean by this? Are you talking about
>> using
>> the hashmap to determine which record to read next, and so to read the
>> records in an order that is logically sequential, but physically
>> random? If so,
>> that is not at all like reading the records sequentially.
>>
>
> If one file fits in memory, you can read it sequentially into a
> Hashmap with the using the data you want to match as the key.
> Then read the second one, also sequentially, retrieving matching
> records from the Hashmap by key. You can also remove them from the
> Hashmap as they are found if you need to know if any are unmatched.
>
> But this is a solution for a made up case - I don't know whether it is
> a common situation. I was interested in hearing real reasons why sort
> is so common on z/OS i.e. Why sort?
>
> On Hashmaps etc. in general - they are the memory equivalent to
> indexed datasets (VSAM etc) versus sequential datasets. Their
> availability opens up many new ways to process data - and algorithm
> changes are often where the big savings can be made.
>
I believe others have already alluded to the potential time advantage of
processing a large number of updates in key order rather than randomly
when external data is indexed but actually physically ordered by some
key.  The reason why this has historically been the case is that
external disk storage devices which allow random access have
rotational-latency delay and access-head-positioning delay which is
minimized when doing full-track or even multi-track I/O and when
accessing adjacent cylinders.  The way to update the data in minimal
real time is to do the I/O in minimal disk rotations, accessing all data
needed on the same track in one rotation and all data in one cylinder
before moving to an adjacent cylinder. Crucial to this concept is
understanding that z/OS includes support within I/O access methods which
allows applications to successfully exploit the ability of DASD hardware
to transfer one, several, or all data blocks on a track as a single
operation within a single disk revolution. 

With emulated DASD and hardware DASD caching, the effects of physical
track and cylinder boundaries may be unknownl, but it is still likely
that minimizing repeated visitations to an emulated track  or an
emulated cylinder will achieve similar locality of reference on physical
DASD, reduce latency delays and improve the effectiveness of hardware
caching.  Processing transaction records in the same order as the
database records are physically stored on an external file gives the
best odds of grouping transactions needing the same track and cylinder
together and for minimizing I/O delays and minimizing demands on DASD
cache storage and processor storage for file buffers.  Processing
transactions in a different order increases the likelihood that the
needed file data to process the transaction is no longer in processor
memory or disk cache and that at a minimum the time equivalent of
another disk revolution  will be required to obtain it.

It was not uncommon with VSAM files for transaction sorting to improve
real-time processing speed sufficiently that the break-even point even
with sorting overhead could be as low as updating only 5% of the
database.  These techniques were common in MVS and its z/OS successor
applications because it was common for those systems to deal with very
large files and databases where tricks like this were necessary in order
to meet constrained nightly batch processing windows..  Since it is
common in z/OS to be dealing with very large files and databases, there
are always files in those environments that are too large to consider
placing the entire file in memory, no matter how large processor memory
becomes.

Hash maps are not really equivalent to VSAM data sets because a VSAM
file is not just indexed, but indexed-sequential, which means once you
have successfully stored records in the file, reading the records in key
order from a VSAM file is just a trivial sequential read.  A hash map
makes it trivial to find a record with a given key, but if you also need
to access the records in key order, a sort of the keys is still
required.  I have applications that have used hash tables in exactly
that way, doing a tag-sort of the keys after the fact to allow ordered
access, but that is not a feature inherent in hash mapped records like
it is with a VSAM data set.

While as you point out it is possible to process a transaction file
against a database file without either being sorted by reading records
from one file (presumably the smaller one) into a hash map memory table
and then processing the other file and searching the hash table for
records with

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-06 Thread John McKown

On Wed, Apr 6, 2016 at 7:01 AM, Andrew Rowley 
wrote:

> On 05/04/2016 01:20 AM, Tom Marchant wrote:
>
>> On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:
>>
>> A Hashmap potentially allows you to read sequentially and match records
>>> between files, without caring about the order.
>>>
>> Can you please explain what you mean by this? Are you talking about using
>> the hashmap to determine which record to read next, and so to read the
>> records in an order that is logically sequential, but physically random?
>> If so,
>> that is not at all like reading the records sequentially.
>>
>>
> If one file fits in memory, you can read it sequentially into a Hashmap
> with the using the data you want to match as the key.
> Then read the second one, also sequentially, retrieving matching records
> from the Hashmap by key. You can also remove them from the Hashmap as they
> are found if you need to know if any are unmatched.
>
> But this is a solution for a made up case - I don't know whether it is a
> common situation. I was interested in hearing real reasons why sort is so
> common on z/OS i.e. Why sort?
>

Not meaning to sound silly, but I fear the main reason may be the good
old: "We've always done it that way". 

And, since most of the in-house software written in z/OS is in some version
of COBOL, there is no other real choice because COBOL does not have
anything like a content addressable "array" built into the language. IMO, a
major deficiency in IBM's COBOL, and maybe other vendors' COBOLs, is that
it does not come with a great library of functionality. It is simple to do
things in Java, Perl, PHP, python, and Go because of the huge amount of
support in the libraries. COBOL basically has the barest of native data
types. And basically only has integer indexed arrays and structures as ways
to "group" things together. Also, COBOL has pretty much the barest of run
time routines. And the only invocation of anything in a library is via the
CALL verb. I guess that it's sad that the object oriented portion of the
latest COBOL compilers seem to be ignored.

So, why not migrate away from COBOL to a more advanced language? Many
places are doing so for new work or development (or going to a non-z
platform). Also, do you really need to buffer up everything in a Hashmap if
your data resides in a relational database? It is generally much better to
let the RDBMS do most of the work. And it will buffer up the active data,
not only from your program but every program which is accessing the data.
In this case, do a SORT could possibly be unnecessary. Or you may need to
do a SORT if you are writing a report sorted by a value created in the
program itself. Do you really want to use a Hashmap to store the unsorted
electricity bills for Los Angeles, and then, at the end, read & write said
bills by reading the Hashmap by key? This sort of thing goes on a _lot_ on
z/OS. Just my take on it.

I'm not against using something other than SORT if I think it will work
well. But SORT (DFSORT & Syncsort) are extremely fast and efficient. So if
I need something done which they can do, then I think it is best to use
them rather than code something up myself, in any language.

>
> On Hashmaps etc. in general - they are the memory equivalent to indexed
> datasets (VSAM etc) versus sequential datasets. Their availability opens up
> many new ways to process data - and algorithm changes are often where the
> big savings can be made.
>
>
-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-06 Thread Andrew Rowley


On 05/04/2016 01:20 AM, Tom Marchant wrote:

On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:


A Hashmap potentially allows you to read sequentially and match records
between files, without caring about the order.

Can you please explain what you mean by this? Are you talking about using
the hashmap to determine which record to read next, and so to read the
records in an order that is logically sequential, but physically random? If so,
that is not at all like reading the records sequentially.



If one file fits in memory, you can read it sequentially into a Hashmap 
with the using the data you want to match as the key.
Then read the second one, also sequentially, retrieving matching records 
from the Hashmap by key. You can also remove them from the Hashmap as 
they are found if you need to know if any are unmatched.


But this is a solution for a made up case - I don't know whether it is a 
common situation. I was interested in hearing real reasons why sort is 
so common on z/OS i.e. Why sort?


On Hashmaps etc. in general - they are the memory equivalent to indexed 
datasets (VSAM etc) versus sequential datasets. Their availability opens 
up many new ways to process data - and algorithm changes are often where 
the big savings can be made.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-04 Thread Tom Marchant

On Mon, 4 Apr 2016 16:45:37 +1000, Andrew Rowley wrote:

>A Hashmap potentially allows you to read sequentially and match records
>between files, without caring about the order.

Can you please explain what you mean by this? Are you talking about using 
the hashmap to determine which record to read next, and so to read the 
records in an order that is logically sequential, but physically random? If so, 
that is not at all like reading the records sequentially.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-04 Thread Andrew Rowley


On 4/04/2016 11:25, David Betten wrote:

First the idea of loading all the data into a large hashmap to do the sort
tends to eliminate one very important thing and that's overlap.
Essentially, you read the entire input, conduct your massive hashsort, and
then write the output with no overlap of those three phases.  The approach
I prefer is an iterative process of sorting smaller amounts and writing
them to work files (either on disk or in memory) and then at end of input,
you almost immediately begin the output process of merging those sorted
strings.  This technique is very efficient and I can tell you many z/OS
customers are sorting tens to hundreds of gigabytes of data this way.


I wasn't actually suggesting sorting using a Hashmap, or that Java sort 
was more efficient than DFSORT (although the overhead of transferring 
data between Java<->DFSORT might make Java sort preferable when the data 
is already in Java).


I was more wondering whether collection classes like Hashmap could avoid 
the need to sort the data altogether, at which point the efficiency 
becomes moot. One common example given for sorting of data is to do 
grouping and totals, which can easily be implemented using a Hashmap 
with unordered data.



Second point I'd like to make also is related to overlap.  Sorting the
files allows downstream process to read them sequentially rather than
random gets from say VSAM or a data base.  When you read or write
sequentially, you have opportunities for I/O overlap along with blocking
and chaining.  So you can be reading the next set of data while your
program is processing the previous set of data.  This results in
considerable elapsed time savings and reduction in I/O overhead since more
data is transferred with each I/O.


This is more what I had in mind - other reasons for sorting data before 
processing. I can see that VSAM would benefit from reading in order. I'm 
not so sure that a database like DB2 stores data in order - DB2 might be 
fastest if you don't specify a sort order and just take it as it comes 
from the database. There's also the question of whether you save enough 
CPU and I/O to make up for the cost of the sort.


A Hashmap potentially allows you to read sequentially and match records 
between files, without caring about the order.


This doesn't really relate to the work I am doing. It was just 
speculation about whether Java etc. on z/OS provided opportunity to 
reduce CPU by implementing better algorithms, prompted by the comment 
about the amount of batch DFSORT people run.




--
Andrew Rowley
Black Hill Software
+61 413 302 386

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Paul Gilmartin

On 2016-04-03, at 19:25, David Betten wrote:

> First of all, full disclaimer that I was in DFSORT development for about 8
> years so I might be biased.  But I just want to share a few thoughts.
> 
> First the idea of loading all the data into a large hashmap to do the sort
> tends to eliminate one very important thing and that's overlap.
> Essentially, you read the entire input, conduct your massive hashsort, and
> then write the output with no overlap of those three phases.  ...
>  
Strawman.  Or red herring.  Or some metaphor.

You seem to have deliberately made an adverse choice so you can refute
it.  Rather than hash, use a B-tree so sorting fully overlaps input.

One might argue that given sufficient page data space any sort could be
performed in virtual storage.  I suspect performance would be suboptimal.

I suspect that for a large enough data set Cooley-Tookey FFT brutally
defies LoR.  But some of the operations in C-T are hauntingly similar
to a balanced merge.  Might sorting techniques with workfiles implement
a C-T that outperforms a virtual storage implementation?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Ed Jaffe


On 4/3/2016 6:21 PM, Ed Jaffe wrote:
DFSORT, Syncsort, etc. use the CPC/UPT hardware instructions to 
implement the fastest sort on the platform.


Typo. Of course, I meant to write CFC/UPT... :-[

--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread John McKown

[image: Mic Drop]
On Apr 3, 2016 20:53, "David Crayford"  wrote:
>
> On 4/04/2016 7:41 AM, John McKown wrote:
>>
>>
>> I'm not an application programmer. But I can just imagine the looks of
>> astonishment and the "talk", if I were to write a COBOL program which
does
>> a SORT verb with INPUT PROCEDURE IS and OUTPUT PROCEDURE IS which only
did
>> a SORT FIELDS=COPY operation. Even more astonishment if I coded the
INCLUDE
>> or EXCLUDE to subset my data in addition to, or instead of, using COBOL
>> code. I don't know if such coding would pass the majority of the "peer
>> review" type processes. I'd love to try. Especially if I were smart
enough
>> to do so initially and keep the output listing. Then allow code review to
>> force me to use normal COBOL methods. And then show the differences,
>> assuming the SORT method actually is superior. Of course, I'd better know
>> my management. I was at one shop (sysprog) where my boss (sysprog +
>> manager) did that with a major application that would max the 3083 (long
>> ago). Basically he proved it was due a flawed design. Unfortunately, that
>> cost him him his job because the design was actually done by the head of
>> the company (software development company).
>>
>
> I'm sure the application folks would thinks you're a crazy, performance
obsessed systems programmer and should go back to your cave!

And they'd be right! And they do, sometimes. But, my management would
adore it __IF IT COULD BE DONE RELIABLY BY THE REGULAR PROGRAMMERS__. Why?
Because more than __anything__ else at present, they want to decrease the
cost of I.T. (They consider it a "money pit" and seem to emotionally
consider it to be an "unnecessary" expense which is not really related to
the core business) . So if a technique, if consistently applied, would
allow them to reduce the MSU cap, thus reducing our software bill, they
want it to be done. I was typing more, but really got way too sarcastic.

> FileManager was developed at the IBM APC labs in Perth. I worked with one
of the lead developers on that product and they try to utilize DFSORT as
much as possible.

> There must be significant man years of work optimizing the I/O in DFSORT.
It's sensible to try and leverage that. In the case of Andrews I/O bound
product he could possibly
> significantly accelerate the throughput if he could somehow hook into
sort. Is it a big deal that DFSORT doesn't run on a zIIP when most of the
workload is I/O bound?
>
>
http://www.ibm.com/support/knowledgecenter/SSXJAV_13.1.0/com.ibm.filemanager.doc_13.1/base/funtips.htm
>
> LOL! IBM had to write a FASTREXX subset because standard REXX was a dog!
>
>

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread David Crayford

On 4/04/2016 7:41 AM, John McKown wrote:

On Sun, Apr 3, 2016 at 6:00 PM, Andrew Rowley
wrote:

On 3/04/2016 22:43, David Crayford wrote:

Good question! Sort can be utilised for other purposes than sorting, it
can be used as an I/O engine. DFSORT (or Syncsort) implements bespoke
highly optinized I/O using low-level programming interfaces such as chained
EXCPs which are significantly faster than using standard access methods
like QSAM or BSAM, including overlapping BSAM I/O. DFSORT has exit routines
(callbacks) which get called for each record. Basically it's supercharged
I/O. One of our products does just that as do many others. IIRC, IBM
FileManager uses sort for I/O. The trouble with using this technique with
Java is the JNI/callback overhead.

I'm aware of the efficient I/O, but I'm more interested in the use to put
data into a particular order. My own programs I never sort input data,
frequently sort small subsets of data during processing (likely always too
small quantities for something like DFSORT) and almost always sort for
presentation. Presentation is hopefully also too small quantities for
DFSORT.

It is an interesting idea though to use it to read data via the exits
without actually giving it back to DFSORT to process.

I'm not an application programmer. But I can just imagine the looks of
astonishment and the "talk", if I were to write a COBOL program which does
a SORT verb with INPUT PROCEDURE IS and OUTPUT PROCEDURE IS which only did
a SORT FIELDS=COPY operation. Even more astonishment if I coded the INCLUDE
or EXCLUDE to subset my data in addition to, or instead of, using COBOL
code. I don't know if such coding would pass the majority of the "peer
review" type processes. I'd love to try. Especially if I were smart enough
to do so initially and keep the output listing. Then allow code review to
force me to use normal COBOL methods. And then show the differences,
assuming the SORT method actually is superior. Of course, I'd better know
my management. I was at one shop (sysprog) where my boss (sysprog +
manager) did that with a major application that would max the 3083 (long
ago). Basically he proved it was due a flawed design. Unfortunately, that
cost him him his job because the design was actually done by the head of
the company (software development company).

I'm sure the application folks would thinks you're a crazy, performance
obsessed systems programmer and should go back to your cave!

FileManager was developed at the IBM APC labs in Perth. I worked with
one of the lead developers on that product and they try to utilize
DFSORT as much as possible.
There must be significant man years of work optimizing the I/O in
DFSORT. It's sensible to try and leverage that. In the case of Andrews
I/O bound product he could possibly
significantly accelerate the throughput if he could somehow hook into
sort. Is it a big deal that DFSORT doesn't run on a zIIP when most of
the workload is I/O bound?

http://www.ibm.com/support/knowledgecenter/SSXJAV_13.1.0/com.ibm.filemanager.doc_13.1/base/funtips.htm

LOL! IBM had to write a FASTREXX subset because standard REXX was a dog!

--
Andrew Rowley
Black Hill Software
+61 413 302 386

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread David Betten

First of all, full disclaimer that I was in DFSORT development for about 8
years so I might be biased.  But I just want to share a few thoughts.

First the idea of loading all the data into a large hashmap to do the sort
tends to eliminate one very important thing and that's overlap.
Essentially, you read the entire input, conduct your massive hashsort, and
then write the output with no overlap of those three phases.  The approach
I prefer is an iterative process of sorting smaller amounts and writing
them to work files (either on disk or in memory) and then at end of input,
you almost immediately begin the output process of merging those sorted
strings.  This technique is very efficient and I can tell you many z/OS
customers are sorting tens to hundreds of gigabytes of data this way.

Second point I'd like to make also is related to overlap.  Sorting the
files allows downstream process to read them sequentially rather than
random gets from say VSAM or a data base.  When you read or write
sequentially, you have opportunities for I/O overlap along with blocking
and chaining.  So you can be reading the next set of data while your
program is processing the previous set of data.  This results in
considerable elapsed time savings and reduction in I/O overhead since more
data is transferred with each I/O.

And that's just my 2 cents!

Have a nice day,
Dave Betten
z/OS Performance Specialist
Cloud and Systems Performance
IBM Corporation
email:  bet...@us.ibm.com

IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> wrote on
04/03/2016 07:28:39 PM:

> From: Andrew Rowley <and...@blackhillsoftware.com>
> To: IBM-MAIN@LISTSERV.UA.EDU
> Date: 04/03/2016 07:32 PM
> Subject: Re: Why sort (was Microprocessor Optimization Primer)
> Sent by: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>
>
> The reason I like Java on Z so much is I got used to using Hashtable in
> C#, then tried to use Rexx stems to do the same thing. (It was semi
> successful but I always felt like it was very fragile due to the
> potential for unexpected values etc. for the stems.) Then I found Java
> had real hash tables. They make so many different problems so much
easier.
>
> A million 1500 byte entries should be about 1.5 GB I think, and I would
> expect a hashmap to handle it without difficulty as long as the real
> storage was available. But typically a hashtable would hold an object
> with the specific items you're interested in rather than the whole 1500
> byte item.
>
> As for sorting a List of a million 1500 byte items - again I would
> expect Java to do this without difficulty as long as real storage is
> available. Java is actually pretty efficient at this because you're
> actually sorting a list of pointers - you go all over memory to do the
> compares, but should be only shuffling 8MB of data in storage if you
> have a million 64 bit pointers. I regularly test EasySMF (written in C#)
> displaying lists of 1,000,000+ items on the PC. It has column click
> sorting, and it copes just fine with 1,000,000+ lists. Sorting a column
> takes a few seconds at most on a not particularly fast PC.
>
> DFSORT seems to be most useful where you need to sort more data than can
> be processed in storage - but I'm wondering how often that really needs
> to be done. I'm not so interested in utilities and databases calling it
> under the covers - more in applications that require records in a
> particular order. Nor am I saying that's wrong - I'm really just asking
> whether languages like Java provide opportunities to eliminate some
sorting.
>
> On 3/04/2016 22:36, John McKown wrote:
> > Sure, but how often do you have a Java HashMap which contains, say, a
> > million entries? Oh, and the entries are not something like an "int",
but
> > more like a C struct where the size of each struct is around 1500
bytes.
> > That would require about 1.5 Terabytes of memory. Not many systems have
> > that much to give you for a single "object". And yes, we _do_ sort such
> > monsters. Not often, granted, but we're doing a conversion right now
and
> > the programmer is doing work on claims which go back 10 years! That's a
> > _lot_ of data! And , we don't have _any_ data bases, just VSAM and
> > sequential data sets. I've actually used VSAM to do "sorting", by
inserting
> > records randomly, then reading them back in keyed order. The
performance
> > was horrible. DB2, or other database system, could be used in such as
> > manner to avoid sorting. But I'd bet it would also be horrible. Of
course,
> > if you're reading an already existing VSAM keyed file, or a database,
then
> > you're golden. I'd bet most of the data in the non-z/OS world is kept
in
> > such a manner, as opposed to a regular "file".
>

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Ed Jaffe


On 4/3/2016 4:28 PM, Andrew Rowley wrote:
DFSORT seems to be most useful where you need to sort more data than 
can be processed in storage - but I'm wondering how often that really 
needs to be done. I'm not so interested in utilities and databases 
calling it under the covers - more in applications that require 
records in a particular order. Nor am I saying that's wrong - I'm 
really just asking whether languages like Java provide opportunities 
to eliminate some sorting.


DFSORT, Syncsort, etc. use the CPC/UPT hardware instructions to 
implement the fastest sort on the platform.


Are there java methods that also do this? Or do they use relatively 
inefficient software-based algorithms?


--
Edward E Jaffe
Phoenix Software International, Inc
831 Parkview Drive North
El Segundo, CA 90245
http://www.phoenixsoftware.com/

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread John McKown

On Sun, Apr 3, 2016 at 6:28 PM, Andrew Rowley 
wrote:

> The reason I like Java on Z so much is I got used to using Hashtable in
> C#, then tried to use Rexx stems to do the same thing. (It was semi
> successful but I always felt like it was very fragile due to the potential
> for unexpected values etc. for the stems.) Then I found Java had real hash
> tables. They make so many different problems so much easier.
>
> A million 1500 byte entries should be about 1.5 GB I think, and I would
> expect a hashmap to handle it without difficulty as long as the real
> storage was available. But typically a hashtable would hold an object with
> the specific items you're interested in rather than the whole 1500 byte
> item.
>

Yeah, my arithmetic is really bad.

> As for sorting a List of a million 1500 byte items - again I would expect
> Java to do this without difficulty as long as real storage is available.
> Java is actually pretty efficient at this because you're actually sorting a
> list of pointers - you go all over memory to do the compares

Hum, just a concern or mine (it may be obsolete), would be the working set
in memory of doing that. z/OS, even on our small shop, is probably running
8 other batch jobs, 5 TSO users (we're small), and 7 CICS regions. I'd
worry about sizing the real memory on the LPAR if all 8 jobs were "going
all over memory". But, again, I have a very small z9BC system, so I worry
about things that the big boy would sneer at.

> , but should be only shuffling 8MB of data in storage if you have a
> million 64 bit pointers. I regularly test EasySMF (written in C#)
> displaying lists of 1,000,000+ items on the PC. It has column click
> sorting, and it copes just fine with 1,000,000+ lists. Sorting a column
> takes a few seconds at most on a not particularly fast PC.
>
> DFSORT seems to be most useful where you need to sort more data than can
> be processed in storage - but I'm wondering how often that really needs to
> be done. I'm not so interested in utilities and databases calling it under
> the covers - more in applications that require records in a particular
> order. Nor am I saying that's wrong - I'm really just asking whether
> languages like Java provide opportunities to eliminate some sorting.
>

You have a good point about using SORT directly. Let the thing in the
infrastructure use sort, like SQL "ORDER BY" or other things. Of course, it
would be easier in our shop to do this if the COBOL language had a hashing
facility built into it. Most of our code is COBOL and a CA product called
EasyTrieve. We don't have any "fancy" or "up to date" languages like Java,
Python, Ruby, Go, ... insert others ... .

Took me a while to write and I had to rewrite a number of times when my
current bitterness about things at work got to be too much. I'm going to go
watch some ALF and "Get Smart" episodes to cheer up.

>
>
> --
> Andrew Rowley
> Black Hill Software
> +61 413 302 386
>

-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread John McKown

On Sun, Apr 3, 2016 at 6:00 PM, Andrew Rowley 
wrote:

> On 3/04/2016 22:43, David Crayford wrote:
>
>> Good question! Sort can be utilised for other purposes than sorting, it
>> can be used as an I/O engine. DFSORT (or Syncsort) implements bespoke
>> highly optinized I/O using low-level programming interfaces such as chained
>> EXCPs which are significantly faster than using standard access methods
>> like QSAM or BSAM, including overlapping BSAM I/O. DFSORT has exit routines
>> (callbacks) which get called for each record. Basically it's supercharged
>> I/O. One of our products does just that as do many others. IIRC, IBM
>> FileManager uses sort for I/O. The trouble with using this technique with
>> Java is the JNI/callback overhead.
>>
>
> I'm aware of the efficient I/O, but I'm more interested in the use to put
> data into a particular order. My own programs I never sort input data,
> frequently sort small subsets of data during processing (likely always too
> small quantities for something like DFSORT) and almost always sort for
> presentation. Presentation is hopefully also too small quantities for
> DFSORT.
>
> It is an interesting idea though to use it to read data via the exits
> without actually giving it back to DFSORT to process.

I'm not an application programmer. But I can just imagine the looks of
astonishment and the "talk", if I were to write a COBOL program which does
a SORT verb with INPUT PROCEDURE IS and OUTPUT PROCEDURE IS which only did
a SORT FIELDS=COPY operation. Even more astonishment if I coded the INCLUDE
or EXCLUDE to subset my data in addition to, or instead of, using COBOL
code. I don't know if such coding would pass the majority of the "peer
review" type processes. I'd love to try. Especially if I were smart enough
to do so initially and keep the output listing. Then allow code review to
force me to use normal COBOL methods. And then show the differences,
assuming the SORT method actually is superior. Of course, I'd better know
my management. I was at one shop (sysprog) where my boss (sysprog +
manager) did that with a major application that would max the 3083 (long
ago). Basically he proved it was due a flawed design. Unfortunately, that
cost him him his job because the design was actually done by the head of
the company (software development company).

>
>
> --
> Andrew Rowley
> Black Hill Software
> +61 413 302 386
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>

-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Andrew Rowley

The reason I like Java on Z so much is I got used to using Hashtable in 
C#, then tried to use Rexx stems to do the same thing. (It was semi 
successful but I always felt like it was very fragile due to the 
potential for unexpected values etc. for the stems.) Then I found Java 
had real hash tables. They make so many different problems so much easier.


A million 1500 byte entries should be about 1.5 GB I think, and I would 
expect a hashmap to handle it without difficulty as long as the real 
storage was available. But typically a hashtable would hold an object 
with the specific items you're interested in rather than the whole 1500 
byte item.


As for sorting a List of a million 1500 byte items - again I would 
expect Java to do this without difficulty as long as real storage is 
available. Java is actually pretty efficient at this because you're 
actually sorting a list of pointers - you go all over memory to do the 
compares, but should be only shuffling 8MB of data in storage if you 
have a million 64 bit pointers. I regularly test EasySMF (written in C#) 
displaying lists of 1,000,000+ items on the PC. It has column click 
sorting, and it copes just fine with 1,000,000+ lists. Sorting a column 
takes a few seconds at most on a not particularly fast PC.


DFSORT seems to be most useful where you need to sort more data than can 
be processed in storage - but I'm wondering how often that really needs 
to be done. I'm not so interested in utilities and databases calling it 
under the covers - more in applications that require records in a 
particular order. Nor am I saying that's wrong - I'm really just asking 
whether languages like Java provide opportunities to eliminate some sorting.


On 3/04/2016 22:36, John McKown wrote:

Sure, but how often do you have a Java HashMap which contains, say, a
million entries? Oh, and the entries are not something like an "int", but
more like a C struct where the size of each struct is around 1500 bytes.
That would require about 1.5 Terabytes of memory. Not many systems have
that much to give you for a single "object". And yes, we _do_ sort such
monsters. Not often, granted, but we're doing a conversion right now and
the programmer is doing work on claims which go back 10 years! That's a
_lot_ of data! And , we don't have _any_ data bases, just VSAM and
sequential data sets. I've actually used VSAM to do "sorting", by inserting
records randomly, then reading them back in keyed order. The performance
was horrible. DB2, or other database system, could be used in such as
manner to avoid sorting. But I'd bet it would also be horrible. Of course,
if you're reading an already existing VSAM keyed file, or a database, then
you're golden. I'd bet most of the data in the non-z/OS world is kept in
such a manner, as opposed to a regular "file".

On z/OS, REXX has "stem" variables which are "content addressable", much
like a HashMap (keep type HaspMap, ). The COBOL language doesn't have
anything like this built in. Neither does PL/I. Of course, IBM's Java for
z/OS does. As do other languages in the UNIX environment such as Perl. But
there just aren't as many of them in z/OS due to the effort to make them
work in an EBCDIC environment instead of an ASCII (or Unicode) environment.
For Perl, Larry Wall just said "forget it, we're not doing it any more". I
know that there is a port of LUA ( http://lua4z.com/ ), but I don't know
how popular it is. Unfortunately, z/OS people (programmers, sysprogs, and
management) don't really seem to be very interested in doing UNIX type work
on z/OS. Possibly because "it's too expensive!" or "it's not how we have
done things in the past and it's too difficult to bother learning." Or,
maybe, just plain NIH syndrome (Not Invented Here). I mean, have you read
the screams here about the latest COBOL requiring PDSEs for their
executable output? You'd think that they'd been told to convert their COBOL
to FORTRAN.





--
Andrew Rowley
Black Hill Software
+61 413 302 386

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Andrew Rowley


On 3/04/2016 22:43, David Crayford wrote:
Good question! Sort can be utilised for other purposes than sorting, 
it can be used as an I/O engine. DFSORT (or Syncsort) implements 
bespoke highly optinized I/O using low-level programming interfaces 
such as chained EXCPs which are significantly faster than using 
standard access methods like QSAM or BSAM, including overlapping BSAM 
I/O. DFSORT has exit routines (callbacks) which get called for each 
record. Basically it's supercharged I/O. One of our products does just 
that as do many others. IIRC, IBM FileManager uses sort for I/O. The 
trouble with using this technique with Java is the JNI/callback overhead.


I'm aware of the efficient I/O, but I'm more interested in the use to 
put data into a particular order. My own programs I never sort input 
data, frequently sort small subsets of data during processing (likely 
always too small quantities for something like DFSORT) and almost always 
sort for presentation. Presentation is hopefully also too small 
quantities for DFSORT.


It is an interesting idea though to use it to read data via the exits 
without actually giving it back to DFSORT to process.


--
Andrew Rowley
Black Hill Software
+61 413 302 386

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Tony Harminc

On 3 April 2016 at 02:50, Andrew Rowley  wrote:
> One question that puzzles me (maybe it's my lack of an application
> programming background): Why is sort used so much on z/OS?

As others have pointed out, sort on z/OS (whether IBM's or other
vendors') can be used for all sorts (heh) of general I/O with high
performance. But "sort" also covers the notion of merge, and more
generally of collation. Many languages have constructs that implicitly
sort, and all relational (and probably other) databases will sort
implicitly as required, whether they implement their own sorts, or
call the system one. The database product I worked on 20 years ago had
three levels of sort: for a few rows it did its own in-storage sort,
for thousands of rows it did it's own with work files, and for bigger
stuff it called the system sort. Today those thresholds would be x10
at least because of much cheaper and bigger main storage, but the
concept holds.

A historical reason for use of sort on z/OS may be that "way back in
the days of steam powered computing", main storage was very expensive,
disk was expensive, and tape was cheap. It was not unusual for sort to
use tapes for work files; how else would you sort tens of millions of
records on a machine with, say, 512 kB of storage and a few hundred
megabytes of disk?

UNIX and indeed most other systems didn't start out doing commercial
data processing, and to this day don't do batch processing very well.

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Blaicher, Christopher Y.

I have always argued that a company can buy more CPU, but no one can buy more 
wall clock time.

Yes, sometimes you need CPU to be king, which is why MFX, Syncsort to you old 
timers, has offered multiple optimization options for years.

Chris Blaicher
Technical Architect
Ironstream Development
Syncsort Incorporated
50 Tice Boulevard, Woodcliff Lake, NJ 07677
P: 201-930-8234  |  M: 512-627-3803
E: cblaic...@syncsort.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Jesse 1 Robinson
Sent: Sunday, April 03, 2016 4:21 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Why sort (was Microprocessor Optimization Primer)

I used to work for the late great Security Pacific, at the time the largest 
bank based in Los Angeles. When DFSORT was a pimply-faced teenager, some of us 
sysprogs were invited to Santa Teresa to meet with product developers to share 
some real-world feedback. They were a young and earnest bunch. They wanted us 
to help them decide between two frequently conflicting goals:

  Minimize CPU time
  Minimize I/O count

Enhancing one often came at the expense of the other. We couldn't wait to lay 
it on them. Every business day at 2 AM, a messenger would arrive at our data 
center to collect a bag containing all the checks processed that day along with 
reports tied to them. The bag was to be delivered to 'the feds downtown'. If 
the bag was ready for pickup, all was sweetness and light. If the bag was late, 
there would be h*ll to pay. That's all that mattered: wall clock time. It was a 
revelation to the developers.

Every serious business has to sort data for a myriad reasons, all of which boil 
down to this: somewhere along the line--surely more than once--data must be 
processed in some kind of order. Maybe by account number. Maybe by account 
name. Maybe by account value. Each of these needs requires ordering unsorted 
data or data previously sorted for another purpose. Sort is a huge lynchpin in 
the foundation of any large business. Argue about CPU or I/O stats all you 
want. You either meet your 'messenger' deadline or you don't.

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler
SHARE MVS Program Co-Manager
323-715-0595 Mobile
626-302-7535 Office
robin...@sce.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Blaicher, Christopher Y.
Sent: Sunday, April 03, 2016 6:52 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: (External):Re: Why sort (was Microprocessor Optimization Primer)

Along with the other reasons outlined by others, it significantly improves bulk 
processing, I shy away from the term batch because that has come to have a bad 
connotation.

When dealing with individual transactions, such as an ATM transaction or a web 
transaction, sorted data is not needed.  But, when company goes to process all 
the payments received that day, or checks that cleared, etc., processing is 
much improved when the data coming in is in the same sequence as the existing 
data structure.  It improves because of locality of reference.

Using a relational data base, or any other random access method, doesn't mean 
you have to access it randomly.

Chris Blaicher
Technical Architect
Software Development
Syncsort Incorporated
50 Tice Boulevard, Woodcliff Lake, NJ 07677
P: 201-930-8234  |  M: 512-627-3803
E: cblaic...@syncsort.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Andrew Rowley
Sent: Sunday, April 03, 2016 2:51 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Why sort (was Microprocessor Optimization Primer)

On 02/04/2016 10:09 PM, David Crayford wrote:
> IBM switched the magic bit to offload the JZOS JNI C/C++ workload to a
> zIIP so they could do the same for DFSORT. A well engineered library
> could handle the callbacks so the client just reads records like a
> normal API. That would certainly push Java batch up a notch.
One question that puzzles me (maybe it's my lack of an application programming 
background): Why is sort used so much on z/OS?

I know you can then e.g. do grouping based on key changes, but is that really 
necessary in current programs? Is that the reason it is commonly used?

I generally use e.g. Java HashMap, C# Hashtable for grouping so the data 
doesn't need to be sorted. Do other common languages on z/OS provide similar 
functions? (C++ I know.) Are there opportunities to use programming language 
features to avoid sorts altogether?

Andrew Rowley

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

ATTENTION: -

The information contained in this message (including any files transmitted with 
this message) may contain proprietary, trade sec

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Jesse 1 Robinson

I used to work for the late great Security Pacific, at the time the largest 
bank based in Los Angeles. When DFSORT was a pimply-faced teenager, some of us 
sysprogs were invited to Santa Teresa to meet with product developers to share 
some real-world feedback. They were a young and earnest bunch. They wanted us 
to help them decide between two frequently conflicting goals:

  Minimize CPU time
  Minimize I/O count

Enhancing one often came at the expense of the other. We couldn't wait to lay 
it on them. Every business day at 2 AM, a messenger would arrive at our data 
center to collect a bag containing all the checks processed that day along with 
reports tied to them. The bag was to be delivered to 'the feds downtown'. If 
the bag was ready for pickup, all was sweetness and light. If the bag was late, 
there would be h*ll to pay. That's all that mattered: wall clock time. It was a 
revelation to the developers. 

Every serious business has to sort data for a myriad reasons, all of which boil 
down to this: somewhere along the line--surely more than once--data must be 
processed in some kind of order. Maybe by account number. Maybe by account 
name. Maybe by account value. Each of these needs requires ordering unsorted 
data or data previously sorted for another purpose. Sort is a huge lynchpin in 
the foundation of any large business. Argue about CPU or I/O stats all you 
want. You either meet your 'messenger' deadline or you don't.  

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
323-715-0595 Mobile
626-302-7535 Office
robin...@sce.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Blaicher, Christopher Y.
Sent: Sunday, April 03, 2016 6:52 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: (External):Re: Why sort (was Microprocessor Optimization Primer)

Along with the other reasons outlined by others, it significantly improves bulk 
processing, I shy away from the term batch because that has come to have a bad 
connotation.

When dealing with individual transactions, such as an ATM transaction or a web 
transaction, sorted data is not needed.  But, when company goes to process all 
the payments received that day, or checks that cleared, etc., processing is 
much improved when the data coming in is in the same sequence as the existing 
data structure.  It improves because of locality of reference.

Using a relational data base, or any other random access method, doesn't mean 
you have to access it randomly.

Chris Blaicher
Technical Architect
Software Development
Syncsort Incorporated
50 Tice Boulevard, Woodcliff Lake, NJ 07677
P: 201-930-8234  |  M: 512-627-3803
E: cblaic...@syncsort.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Andrew Rowley
Sent: Sunday, April 03, 2016 2:51 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Why sort (was Microprocessor Optimization Primer)

On 02/04/2016 10:09 PM, David Crayford wrote:
> IBM switched the magic bit to offload the JZOS JNI C/C++ workload to a 
> zIIP so they could do the same for DFSORT. A well engineered library 
> could handle the callbacks so the client just reads records like a 
> normal API. That would certainly push Java batch up a notch.
One question that puzzles me (maybe it's my lack of an application programming 
background): Why is sort used so much on z/OS?

I know you can then e.g. do grouping based on key changes, but is that really 
necessary in current programs? Is that the reason it is commonly used?

I generally use e.g. Java HashMap, C# Hashtable for grouping so the data 
doesn't need to be sorted. Do other common languages on z/OS provide similar 
functions? (C++ I know.) Are there opportunities to use programming language 
features to avoid sorts altogether?

Andrew Rowley

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Blaicher, Christopher Y.

Along with the other reasons outlined by others, it significantly improves bulk 
processing, I shy away from the term batch because that has come to have a bad 
connotation.

When dealing with individual transactions, such as an ATM transaction or a web 
transaction, sorted data is not needed.  But, when company goes to process all 
the payments received that day, or checks that cleared, etc., processing is 
much improved when the data coming in is in the same sequence as the existing 
data structure.  It improves because of locality of reference.

Using a relational data base, or any other random access method, doesn't mean 
you have to access it randomly.

Chris Blaicher
Technical Architect
Software Development
Syncsort Incorporated
50 Tice Boulevard, Woodcliff Lake, NJ 07677
P: 201-930-8234  |  M: 512-627-3803
E: cblaic...@syncsort.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Andrew Rowley
Sent: Sunday, April 03, 2016 2:51 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Why sort (was Microprocessor Optimization Primer)

On 02/04/2016 10:09 PM, David Crayford wrote:
> IBM switched the magic bit to offload the JZOS JNI C/C++ workload to a
> zIIP so they could do the same for DFSORT. A well engineered library
> could handle the callbacks so the client just reads records like a
> normal API. That would certainly push Java batch up a notch.
One question that puzzles me (maybe it's my lack of an application programming 
background): Why is sort used so much on z/OS?

I know you can then e.g. do grouping based on key changes, but is that really 
necessary in current programs? Is that the reason it is commonly used?

I generally use e.g. Java HashMap, C# Hashtable for grouping so the data 
doesn't need to be sorted. Do other common languages on z/OS provide similar 
functions? (C++ I know.) Are there opportunities to use programming language 
features to avoid sorts altogether?

Andrew Rowley

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

ATTENTION: -

The information contained in this message (including any files transmitted with 
this message) may contain proprietary, trade secret or other confidential 
and/or legally privileged information. Any pricing information contained in 
this message or in any files transmitted with this message is always 
confidential and cannot be shared with any third parties without prior written 
approval from Syncsort. This message is intended to be read only by the 
individual or entity to whom it is addressed or by their designee. If the 
reader of this message is not the intended recipient, you are on notice that 
any use, disclosure, copying or distribution of this message, in any form, is 
strictly prohibited. If you have received this message in error, please 
immediately notify the sender and/or Syncsort and destroy all copies of this 
message in your possession, custody or control.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread David Crayford


On 3/04/2016 2:50 PM, Andrew Rowley wrote:

On 02/04/2016 10:09 PM, David Crayford wrote:
IBM switched the magic bit to offload the JZOS JNI C/C++ workload to 
a zIIP so they could do the same for DFSORT. A well engineered library
could handle the callbacks so the client just reads records like a 
normal API. That would certainly push Java batch up a notch.
One question that puzzles me (maybe it's my lack of an application 
programming background): Why is sort used so much on z/OS?




Good question! Sort can be utilised for other purposes than sorting, it 
can be used as an I/O engine. DFSORT (or Syncsort) implements bespoke 
highly optinized I/O using low-level programming interfaces such as 
chained EXCPs which are significantly faster than using standard access 
methods like QSAM or BSAM, including overlapping BSAM I/O. DFSORT has 
exit routines (callbacks) which get called for each record. Basically 
it's supercharged I/O. One of our products does just that as do many 
others. IIRC, IBM FileManager uses sort for I/O. The trouble with using 
this technique with Java is the JNI/callback overhead.


I know you can then e.g. do grouping based on key changes, but is that 
really necessary in current programs? Is that the reason it is 
commonly used?


I generally use e.g. Java HashMap, C# Hashtable for grouping so the 
data doesn't need to be sorted. Do other common languages on z/OS 
provide similar functions? (C++ I know.) Are there opportunities to 
use programming language features to avoid sorts altogether?


Andrew Rowley

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread John McKown

On Sun, Apr 3, 2016 at 1:50 AM, Andrew Rowley 
wrote:

> On 02/04/2016 10:09 PM, David Crayford wrote:
>
>> IBM switched the magic bit to offload the JZOS JNI C/C++ workload to a
>> zIIP so they could do the same for DFSORT. A well engineered library
>> could handle the callbacks so the client just reads records like a normal
>> API. That would certainly push Java batch up a notch.
>>
> One question that puzzles me (maybe it's my lack of an application
> programming background): Why is sort used so much on z/OS?
>
> I know you can then e.g. do grouping based on key changes, but is that
> really necessary in current programs? Is that the reason it is commonly
> used?
>

In my shop, it is used mainly so that the output, such as reports sent to
a web site for perusal or email, are in some order such as account number.
Also, DB2 uses it a lot when you do a CREATE INDEX, I think. IDCAMS uses it
when you build an alternate INDEX. This done when a VSAM file (yes, there
are a lot of them still around) is "reorganized" for performance or space
reasons.

>
> I generally use e.g. Java HashMap, C# Hashtable for grouping so the data
> doesn't need to be sorted. Do other common languages on z/OS provide
> similar functions? (C++ I know.) Are there opportunities to use programming
> language features to avoid sorts altogether?
>

Sure, but how often do you have a Java HashMap which contains, say, a
million entries? Oh, and the entries are not something like an "int", but
more like a C struct where the size of each struct is around 1500 bytes.
That would require about 1.5 Terabytes of memory. Not many systems have
that much to give you for a single "object". And yes, we _do_ sort such
monsters. Not often, granted, but we're doing a conversion right now and
the programmer is doing work on claims which go back 10 years! That's a
_lot_ of data! And , we don't have _any_ data bases, just VSAM and
sequential data sets. I've actually used VSAM to do "sorting", by inserting
records randomly, then reading them back in keyed order. The performance
was horrible. DB2, or other database system, could be used in such as
manner to avoid sorting. But I'd bet it would also be horrible. Of course,
if you're reading an already existing VSAM keyed file, or a database, then
you're golden. I'd bet most of the data in the non-z/OS world is kept in
such a manner, as opposed to a regular "file".

On z/OS, REXX has "stem" variables which are "content addressable", much
like a HashMap (keep type HaspMap, ). The COBOL language doesn't have
anything like this built in. Neither does PL/I. Of course, IBM's Java for
z/OS does. As do other languages in the UNIX environment such as Perl. But
there just aren't as many of them in z/OS due to the effort to make them
work in an EBCDIC environment instead of an ASCII (or Unicode) environment.
For Perl, Larry Wall just said "forget it, we're not doing it any more". I
know that there is a port of LUA ( http://lua4z.com/ ), but I don't know
how popular it is. Unfortunately, z/OS people (programmers, sysprogs, and
management) don't really seem to be very interested in doing UNIX type work
on z/OS. Possibly because "it's too expensive!" or "it's not how we have
done things in the past and it's too difficult to bother learning." Or,
maybe, just plain NIH syndrome (Not Invented Here). I mean, have you read
the screams here about the latest COBOL requiring PDSEs for their
executable output? You'd think that they'd been told to convert their COBOL
to FORTRAN.

>
> Andrew Rowley
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>

-- 
How many surrealists does it take to screw in a lightbulb? One to hold the
giraffe and one to fill the bathtub with brightly colored power tools.

Maranatha! <><
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Why sort (was Microprocessor Optimization Primer)

2016-04-03 Thread Andrew Rowley


On 02/04/2016 10:09 PM, David Crayford wrote:
IBM switched the magic bit to offload the JZOS JNI C/C++ workload to a 
zIIP so they could do the same for DFSORT. A well engineered library
could handle the callbacks so the client just reads records like a 
normal API. That would certainly push Java batch up a notch.
One question that puzzles me (maybe it's my lack of an application 
programming background): Why is sort used so much on z/OS?


I know you can then e.g. do grouping based on key changes, but is that 
really necessary in current programs? Is that the reason it is commonly 
used?


I generally use e.g. Java HashMap, C# Hashtable for grouping so the data 
doesn't need to be sorted. Do other common languages on z/OS provide 
similar functions? (C++ I know.) Are there opportunities to use 
programming language features to avoid sorts altogether?


Andrew Rowley

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Re: Why sort (was Microprocessor Optimization Primer)

Why sort (was Microprocessor Optimization Primer)

29 matches

Site Navigation

Mail list logo

Footer information