Re: Cultural underground legende Seymour Cray and his legacy
On Thu, Apr 22, 2021 at 08:00:04PM +0200, Balder Oddson wrote: > On Thu, Apr 22, 2021 at 12:28:28AM +0200, Balder Oddson wrote: > > Whereof everyone is interested, > > > > > > > > A few things about his architecture is extraordinary special. > > > > #1 ideal properties, can never be done better for some things. > > #1.1 analogue, you need ground and good drain, to do work during weak force > > pull. > > #1.2 physical, independent IC's, relying on physics for syncronization. > > #1.2.1 allowing digital global sync between die slots, async, but local > > sync with global clock. > > #1.3 as a turing machine, everything is virtually represented with > > arrays of addresesses in cintinous memory. > > #1.3.1 You get scalar operations on your vectors with SIMD insutrctions. > > #1.3.2 Remotely scatter data in remote memory, that is gathered into > > another continous area of memory with addresses to data. > > > > > > On the one hand, where this gives 8x the performance at a high price, it > > likely caused as much awe, inspiration and anxiety in the finance sector > > where Cray got the funding to research, build and sell these beasts. > > > > The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if > > the contexts abd historic circumstances are to be considered. > > > > Using hex numbers, would ideally imply an understanding of the Cray > > architecture, and why it perhaps now can be be software defined. > > > > The puns where uninviting, and didn't inspire snide remarks and comments > that weren't drivel without content and context. > > Thereof interests in logic has invited investigations of tautologies as > a concept in logic, whereof one cannt speak and merely add drivel. > > Not sure if it's true entirely, but for the orginal Cray's, first an > engineer came to try and get it to work, if not, Seymour gave it a try > before shipping a replacement. Likely because he tortured the > electromechanical properties around the central part so much that it was > touchy feely. > > Anyone intelligeble around this topic likely have passing interest for > having a gray beard and being sick and tiered of "what did cray do", > "what if he set a more reasonable goal than 10x the closest competitor". > That how that machine worked, also synonymous with supercomputing which essentially died with the company. Only relevant reason to have a Demon as a logo for UNIX is allusions to Maxwells tortured physics demon. Anyone not a pundit, familiar with this may correct me?
Re: Cultural underground legende Seymour Cray and his legacy
On Thu, Apr 22, 2021 at 12:28:28AM +0200, Balder Oddson wrote: > Whereof everyone is interested, > > > > A few things about his architecture is extraordinary special. > > #1 ideal properties, can never be done better for some things. > #1.1 analogue, you need ground and good drain, to do work during weak force > pull. > #1.2 physical, independent IC's, relying on physics for syncronization. > #1.2.1 allowing digital global sync between die slots, async, but local > sync with global clock. > #1.3 as a turing machine, everything is virtually represented with > arrays of addresesses in cintinous memory. > #1.3.1 You get scalar operations on your vectors with SIMD insutrctions. > #1.3.2 Remotely scatter data in remote memory, that is gathered into > another continous area of memory with addresses to data. > > > On the one hand, where this gives 8x the performance at a high price, it > likely caused as much awe, inspiration and anxiety in the finance sector > where Cray got the funding to research, build and sell these beasts. > > The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if > the contexts abd historic circumstances are to be considered. > > Using hex numbers, would ideally imply an understanding of the Cray > architecture, and why it perhaps now can be be software defined. > The puns where uninviting, and didn't inspire snide remarks and comments that weren't drivel without content and context. Thereof interests in logic has invited investigations of tautologies as a concept in logic, whereof one cannt speak and merely add drivel. Not sure if it's true entirely, but for the orginal Cray's, first an engineer came to try and get it to work, if not, Seymour gave it a try before shipping a replacement. Likely because he tortured the electromechanical properties around the central part so much that it was touchy feely. Anyone intelligeble around this topic likely have passing interest for having a gray beard and being sick and tiered of "what did cray do", "what if he set a more reasonable goal than 10x the closest competitor". Ciao, Balder
Re: Cultural underground legende Seymour Cray and his legacy
On Thu, Apr 22, 2021 at 10:24:32AM +0200, Marc Espie wrote: > Is this a new UMF experiment ? Does it involve integrating this on a chip? Not sure if past successes are that great. -- Balder Oddson
Cultural underground legende Seymour Cray and his legacy
Whereof everyone is interested, A few things about his architecture is extraordinary special. #1 ideal properties, can never be done better for some things. #1.1 analogue, you need ground and good drain, to do work during weak force pull. #1.2 physical, independent IC's, relying on physics for syncronization. #1.2.1 allowing digital global sync between die slots, async, but local sync with global clock. #1.3 as a turing machine, everything is virtually represented with arrays of addresesses in cintinous memory. #1.3.1 You get scalar operations on your vectors with SIMD insutrctions. #1.3.2 Remotely scatter data in remote memory, that is gathered into another continous area of memory with addresses to data. On the one hand, where this gives 8x the performance at a high price, it likely caused as much awe, inspiration and anxiety in the finance sector where Cray got the funding to research, build and sell these beasts. The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if the contexts abd historic circumstances are to be considered. Using hex numbers, would ideally imply an understanding of the Cray architecture, and why it perhaps now can be be software defined. -- Balder Oddson
Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote: > > > On 2 Apr 2021, at 14:17, Benjamin Baier wrote: > > > > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day. > > > > If it’s GPT-3, it’s slipping. Yes and no, but if you draw the architecture up: 6 segments in a circle with flat sides and close. One control line for double data rate to opposite segment and its neigbhours. Such that the only data path goes straight forward. Let's imagine that each segment is the equivalent of 16*32 bit vector operations per core per cycle, and that the chip maths the speed of light across this octagon or whatever, such that you can pull and push on this link so hard you cause bremsstrahlung for trying to go to fast in parts of the segment or chip, killing parts of its over time and inoperable during the operation. Before saying that it's insane to run this at 10 Ghz, and that Von Neumann architecture is better or have a better tuned pipeline. I'll pump my neighbouring nodes at full speed. Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef, 0xbeef, 0xfeedface. So the two neigbhouring segments does deadbeef and use the beefy link to pump data to the other half of the cpu, I'll start doing remote ddr sram operations to drive as a von neumann chip. Which patent would you suggest for this if the important vectorization is done in software, in a UNIX model that should run on it, where some things are physical necessities, like a unix consol to a segment and a daemon that filter instructions, data and handles address space. You have your big lock that mainly creates the machine state every clock cycle. There are six fully functional segments that must initialise and run a local terminal. Very few have a relationship to Cray, I don't, not original nor modern Cray's. If you open up a Cray to try and work out how it works, you find empty space with a bunch of wires, get angry for the evil inside and go with a bunch of DEC's, as it doesn't involve physics shenanigans and actually has the important part inside. But it easier to tweak your digital spec based on length of wires. There were possible even a reason for picking Intel, as they focused on the part everyone liked about IBM compared to Cray's.
Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote: > > > On 2 Apr 2021, at 14:17, Benjamin Baier wrote: > > > > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day. > > > > If it’s GPT-3, it’s slipping. Yes and no, but if you draw the architecture up: 6 segments in a circle with flat sides and close. One control line for double data rate to opposite segment and its neigbhours. Such that the only data path goes straight forward. Let's imagine that each segment is the equivalent of 16*32 bit vector operations per core per cycle, and that the chip maths the speed of light across this octagon or whatever, such that you can pull and push on this link so hard you cause bremsstrahlung for trying to go to fast in parts of the segment or chip, killing parts of its over time and inoperable during the operation. Before saying that it's insane to run this at 10 Ghz, and that Von Neumann architecture is better or have a better tuned pipeline. I'll pump my neighbouring nodes at full speed. Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef, 0xbeef, 0xfeedface. So the two neigbhouring segments does deadbeef and use the beefy link to pump data to the other half of the cpu, I'll start doing remote ddr sram operations to drive as a von neumann chip. Which patent would you suggest for this if the important vectorization is done in software, in a UNIX model that should run on it, where some things are physical necessities, like a unix consol to a segment and a daemon that filter instructions, data and handles address space. -- Balder Oddson
Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work
On Fri, Apr 02, 2021 at 02:39:42PM +0200, Balder Oddson wrote: > Made of three processing rings, with 3 control wires, direct opposite > ring segment, and its two neighbours, this is your double data rate, or > dead beef and the global clock. The local clock is the segment and its > immediate neighbours. Stack three of them, and add a dimension in the > topology, and as many datapaths as possible between the faster parts of > the system, with digital sync between the local clock and speed of light > in vacume. Which is an architecture where scatter-gather is extremely > useful, as that works on the global clock. So a total 18 die's and a > very difficult juggling act, where cable length's are legendary for the > premium original Cray's. If you think you have a problem with your local > segment, just feed beef. > > Not many explanations of this architecture that's around, but culture > references like cult of the dead cow as a pun and wishes on those that > occupied the whole system. Anyone that's been around a real one to know? > If you want to know what's inside a cray, it's basically evil inside if > you thought that would reveal something. > Yes and no, as this likely works because: With direct wires and shortest distance and speed of light in the material as the clock. Simplest setup is one ring with 6 sockets, what's on each segment, which is a beef, or a processor as usual. Guarantees on digital sync that it knows. #1 being wrriten to, or writing to another. #2 that you are beef, and may or may not being doing a shared task. #3 idle or beef, exception level, local/global root. This being important, as the digital clock should be the same as the wired clock, where the die clock can skew just fine as long as being in the state of feedbeef or deadbeef is very tight. This being the general purpose brute force method you have, of scattering instructions in memory to your exact opposite node in the circle, with or without your neighbours. This allows wriggleroom where this may work, and where spending extra on cooling and perhaps carbon nano tubes for the wries to make this cache coherent beast fly. These pop-culture references like feedbeef, deadcow, deadbeef and feedface (terminal), likewise the temptation of calling it a scalar-vector machine data-core as its not an inefficient or rubbish architecture, just complicated about this 6 segment configuration. Due to the ability to skew, its practically going faster than the speed of light with the premiss that it is cache coherent with control wires to direct opposite node and its neighbours, not your own, with just one datapath across with wires for each segment. You SIMD and vector scatter and gather as if it werent for Cray aspirations in most things ever since. And it should be open for relying on some ideal properties and quirks. How that system would behave and make noise I don't know, but you could likely guess when it was writing the results, or gathering it in memory. Doubt this would be interesting to bitcoin, but you should be able to scrub any size link you can fit on a segment. Many old and cool antique architectures, Cray is the premiere architecture, he promised 10x performance and did so, not likely to get one on ebay to boot BSD on, not sure if you can get the OS or blueprints either.
The simplest full cray data core with 3 cpu's and a physics hack that makes it work
Made of three processing rings, with 3 control wires, direct opposite ring segment, and its two neighbours, this is your double data rate, or dead beef and the global clock. The local clock is the segment and its immediate neighbours. Stack three of them, and add a dimension in the topology, and as many datapaths as possible between the faster parts of the system, with digital sync between the local clock and speed of light in vacume. Which is an architecture where scatter-gather is extremely useful, as that works on the global clock. So a total 18 die's and a very difficult juggling act, where cable length's are legendary for the premium original Cray's. If you think you have a problem with your local segment, just feed beef. Not many explanations of this architecture that's around, but culture references like cult of the dead cow as a pun and wishes on those that occupied the whole system. Anyone that's been around a real one to know? If you want to know what's inside a cray, it's basically evil inside if you thought that would reveal something. -- Balder Oddson
The old argument for the original cray architecture
Everyone know mosti common classical architectures. The ideal solution to use many chips to make one beefy, and in a lack of a better word due to the difference, a data core and something that can be referred to as super or fomula1. Supporting two configurations, circular or horizontal pie segments. Each segment of a circle has DDR on the digital clock of the "beef". Perhaps ideally super conducting to increase available space and speed. As the first any segment need to ask electrically, is if they are deadbeef or feedbeef, do I have the unix console, or does another one hav it? Am I single data rate and feed beef, or double data rate dead beef. If you have sync on double data rate, you are deadbeef, if not feedbeef. You have a local clock that is always good, then you have this internal structure where speed is more important as its the global clock that should ideally match the local clock in speed. By more modern standards, there would be something better than direct wires between segments. A virtual cray architecture can be done with SR-IOV and MR-IOV to handle device addresses, and likewise with IOMMU and hardware virtualization. To achieve ideal properties around electrical and physical properties by creating this hardware mapping using aarch64 EL 3, and treat the processor as a classical Cray scalar-vector machine. Whether you connect each segment to memory or a data link shouldn't matter for the architecture itself, and gather-scatter and scatter-gather doesn't give you an ideal ethernet switch, but it can probably act as a hub for such a protocol as well. I think this is an ideal general purpose architecture something like BSD was meant to run on, or striving towards. For IT security and performance, feed beef was the right answer for decades if you could get a Cray. Vectorizing pF towards scalar-vector operations as a more viable option where security and performance both matter, given inherent qualities of a real cray architecture that is bad at doing one thing at a time very few times. Maybe something that looks like a super computer will be built again. Can a moster be built to handle the largest internet cable in the world? -- Balder Oddson