> Message: 2 > Date: Wed, 13 Feb 2019 15:03:41 -0500 > From: Paul Koning <paulkon...@comcast.net> > To: Jay Jaeger <cu...@charter.net>, "General Discussion: On-Topic and > Off-Topic Posts" <cctalk@classiccmp.org> > Subject: Re: PDP-11/45 RSTS/E boot problem > Message-ID: <c07861a6-bfd8-4ad0-aab9-f4715904b...@comcast.net> > Content-Type: text/plain; charset=us-ascii > > > > On Feb 13, 2019, at 1:20 PM, Jay Jaeger via cctalk > <cctalk@classiccmp.org> wrote: > > > > ... > > Maybe that story about FE's using Unix as a test to confirm operation > > even when diagnostics said the machine was OK was not so much just a > > legend? > > It still fels like a legend. My experience with DEC field service > engineers is that they used the diagnostics. In the PDP-11 era, Unix > knowledge around DEC was pretty sparse, especially early on when it > could be found only in the Telephone Products Group (Armando > Stettner). RSTS would be more plausible, but I never saw that in the > hads of FS engineers either. > By and large diagnostics would find problems. I've seen a number in > the 1970s, including a messy data path failure in the 11/45 MMU where > we (college students) did the initial diagnosis while the FS engineer > was on his way. My suspicion is that things not solved by diagnostics > would be escalated to the "wizard from Maynard". And they'd probably > start replacing whole subsystems. I've seen that once, when our > college RSTS-11 system (11/20, 16 DL-11 lines) was crashing on average > once a day for months. DEC brought in several of those "wizards". The > "fix" was to replace the 11/20 by a "spare part" -- an 11/45 with more > memory, a DH11, and RSTS/E. Decades later I was told that the wizards > actually pinned the blame on the college FM broadcast transmitter, > about 200 feet down the hall from the computer center. That may well > be, though I didn't heard that at the time. RSTS did get used in > manufacturing, at Final Assembly & Test sites like Westminster MA and > Salem NH, where PDP-11 systems large enough to run RSTS/E were > subjected to a load test of exerciser programs running under that OS. > The way it was explained to us is that a system that would be happy > with such a test would also be happy with any customer application. > It's not clear if that was because RSTS would load things more than > most, or was more finicky about hardware glitches than most, but it > certainly was the practice for quite some time. Of course, not all > PDP-11 configurations could be tested that way. paul
I guess the experience in NJ was a bit different since AT&T had two dedicated Field Service offices who handled their sites including Bell Labs. I was on the Commercial/Government side from 81-86 and we didn't get to play with RSTS on customer sites at all (but sometimes we got to play in the in-house machines in Princeton or on our own hardware). It was a bit different in the Vax side since many diags were run under VAX/VMS and as a brand new hire I was doing Vax installs -- including installing the VMS 2.x and 3.x on 11/780's and 11/750's at install time. If they had paid for software installation -- the software guys would wipe and reinstall. If not we left the pack and prayed the customer wouldn't wipe the diags that we installed on the disk when we build the VMS pack. Realistically the only thing the customer needed to do after we got done was tweak the systen parameters, check the swap etc. and lay on the layered products like languages. Things got much more interesting when the VMS3.x and 4.x got CI780's and HSC50's. That was more involved than the easy VMS 2.x-3.x install. As far as the 11/70's -- I'm building a pidp1170... My last 11/70 install was around 84 or so when I put in a late DECDatasystem 570 blue 11/70 with the FCC Cabinets at AT&T in Freehold. As far as the Wizard from Maynard -- one story from my branch support guy (rumored to be about his brother on the 11/70 line in (I think in Westminster MA... not Salem or other NH plants) had an intermittant 11/70 that would crash every couple of days and they could run all the diags and DEC X11 with no issues. They called over their in-house wizard who ran toggle-in programs from the front panel -- playing the switches like piano keys with both hands. After about a half hour his comment was "Clean the terminator fingers." Machine ran like a SOB once the gold fingers were cleaned. Weirdest 11/70 mess I had was after I left DEC to work for a third party maintenance group. Their regional support was in Dallas. I was in NJ. They couldn't find their support guy so they rushed me on a plane to Chicago to work with two techs who were babysitting a mess they had no clue on. The site was WW Granger in Skokie and I arrived at 3AM... They had a 5 or 6 story warehouse which was a totally robotic automated site picking water heaters and other industrial equipment from what looked like an over-sized 6 floor tape library. Two 11/34's running RSX11 ran the picker. One was down for weeks. Their 11/70 was half disassembled with two techs working on it. They were VAX trained at a third party school but they weren't PDP11 techs. An RM03 on the 11/34's was down as well. The 11/70 was a RSTS/E box doing all the billing and inventory for Granger at the site. I walked in at 3AM with my Digital truckers cap on and found they couldn't boot XXDP+ from tape. The OS wouldn't come up either. The customer gave me a pile of error logs dating back over six months -- (I think Sept through March) and they all showed memory management error aborts and retries. The techs who thought they were changing memory never found the MOS memory box... they were swapping cache boards thinking they were memory. Went to 10000 and deposited 014747 and ran it... It either failed on addresses ending in 0 or 4 or 2 or 6. The MOS on the 11/70 had two controllers and interleaved the memory. Pulled one of the interleave controllers -- ran the toggle in and it worked. Aha... bad memory controller. Booted diags and sent for the board spare. Decided the RM03 would be a bitch to work on without the tester or tools and the management found a spare locally at a used DEC joint in the area. Swapped the drive once we carried the new one up the stairs. The 11/34 had a problem... the machine wouldn't boot and the run light (IIRC) was on all the time. The machine had two full unibus dd11-dk boxes even though it didn't need them all. Terminated at the CPU backplane and did toggle-ins. OK. Worked... jumpered out the next UNUSED segment of the Unibus backplane with a Unibus ribbon cable and the problem was gone. The guys had been there over two weeks digging themselves a hole. Third party service on DEC stuff varied with the person. Some were ex-DEC genius types who were consultant level experts on the hardware. Some just knew to swap the board with the Red Led lit. Another time I ran into an engineer who told me (chip info here faked -- don't pull the prints...too many years to have kept TE16/TM03 prints). A call comes in to dispatch with the following information: "The TE16 at Naval Air Propulsion, Trenton is down. It doesn't come on line. The light is lit but the system doesn't see it. I put the board on an They supposedly changed memory on the 11/70 -- but wextender and U34 pin 12 is low and doesn't go high. I need someone to come out and change the chip." I call the site back. I'm in Princeton 15-20 minutes away. I get the customer on line and tell him I'll be there in 3 weeks or so. DECservice 2 hour response won't cover the call since he wants a chip changed in 1985 and we don't stock them -- so it will be a special source issue for logistics and we'll get back to him. Or... I can swap the M8916 Logic And Write board in about15 minutes. Does he want it fixed or does he want to prove he called the correct chip... Bill