Re: Cultural underground legende Seymour Cray and his legacy

2021-04-22 Thread Balder Oddson
On Thu, Apr 22, 2021 at 08:00:04PM +0200, Balder Oddson wrote:
> On Thu, Apr 22, 2021 at 12:28:28AM +0200, Balder Oddson wrote:
> > Whereof everyone is interested,
> > 
> > 
> > 
> > A few things about his architecture is extraordinary special.
> > 
> > #1 ideal properties, can never be done better for some things.
> > #1.1 analogue, you need ground and good drain, to do work during weak force 
> > pull.
> > #1.2 physical, independent IC's, relying on physics for syncronization.
> > #1.2.1 allowing digital global sync between die slots, async, but local
> > sync with global clock.
> > #1.3 as a turing machine, everything is virtually represented with
> > arrays of addresesses in cintinous memory.
> > #1.3.1 You get scalar operations on your vectors with SIMD insutrctions.
> > #1.3.2 Remotely scatter data in remote memory, that is gathered into
> > another continous area of memory with addresses to data.
> > 
> > 
> > On the one hand, where this gives 8x the performance at a high price, it
> > likely caused as much awe, inspiration and anxiety in the finance sector
> > where Cray got the funding to research, build and sell these beasts.
> > 
> > The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if
> > the contexts abd historic circumstances are to be considered.
> > 
> > Using hex numbers, would ideally imply an understanding of the Cray
> > architecture, and why it perhaps now can be be software defined.
> > 
> 
> The puns where uninviting, and didn't inspire snide remarks and comments
> that weren't drivel without content and context.
> 
> Thereof interests in logic has invited investigations of tautologies as
> a concept in logic, whereof one cannt speak and merely add drivel.
> 
> Not sure if it's true entirely, but for the orginal Cray's, first an
> engineer came to try and get it to work, if not, Seymour gave it a try
> before shipping a replacement. Likely because he tortured the
> electromechanical properties around the central part so much that it was
> touchy feely.
> 
> Anyone intelligeble around this topic likely have passing interest for
> having a gray beard and being sick and tiered of "what did cray do",
> "what if he set a more reasonable goal than 10x the closest competitor".
> 

That how that machine worked, also synonymous with supercomputing which
essentially died with the company.

Only relevant reason to have a Demon as a logo for UNIX is allusions to
Maxwells tortured physics demon.

Anyone not a pundit, familiar with this may correct me?



Re: Cultural underground legende Seymour Cray and his legacy

2021-04-22 Thread Balder Oddson
On Thu, Apr 22, 2021 at 12:28:28AM +0200, Balder Oddson wrote:
> Whereof everyone is interested,
> 
> 
> 
> A few things about his architecture is extraordinary special.
> 
> #1 ideal properties, can never be done better for some things.
> #1.1 analogue, you need ground and good drain, to do work during weak force 
> pull.
> #1.2 physical, independent IC's, relying on physics for syncronization.
> #1.2.1 allowing digital global sync between die slots, async, but local
> sync with global clock.
> #1.3 as a turing machine, everything is virtually represented with
> arrays of addresesses in cintinous memory.
> #1.3.1 You get scalar operations on your vectors with SIMD insutrctions.
> #1.3.2 Remotely scatter data in remote memory, that is gathered into
> another continous area of memory with addresses to data.
> 
> 
> On the one hand, where this gives 8x the performance at a high price, it
> likely caused as much awe, inspiration and anxiety in the finance sector
> where Cray got the funding to research, build and sell these beasts.
> 
> The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if
> the contexts abd historic circumstances are to be considered.
> 
> Using hex numbers, would ideally imply an understanding of the Cray
> architecture, and why it perhaps now can be be software defined.
> 

The puns where uninviting, and didn't inspire snide remarks and comments
that weren't drivel without content and context.

Thereof interests in logic has invited investigations of tautologies as
a concept in logic, whereof one cannt speak and merely add drivel.

Not sure if it's true entirely, but for the orginal Cray's, first an
engineer came to try and get it to work, if not, Seymour gave it a try
before shipping a replacement. Likely because he tortured the
electromechanical properties around the central part so much that it was
touchy feely.

Anyone intelligeble around this topic likely have passing interest for
having a gray beard and being sick and tiered of "what did cray do",
"what if he set a more reasonable goal than 10x the closest competitor".


Ciao,
Balder



Re: Cultural underground legende Seymour Cray and his legacy

2021-04-22 Thread Balder Oddson
On Thu, Apr 22, 2021 at 10:24:32AM +0200, Marc Espie wrote:
> Is this a new UMF experiment ?

Does it involve integrating this on a chip? Not sure if past successes
are that great.

-- 
Balder Oddson



Cultural underground legende Seymour Cray and his legacy

2021-04-21 Thread Balder Oddson
Whereof everyone is interested,



A few things about his architecture is extraordinary special.

#1 ideal properties, can never be done better for some things.
#1.1 analogue, you need ground and good drain, to do work during weak force 
pull.
#1.2 physical, independent IC's, relying on physics for syncronization.
#1.2.1 allowing digital global sync between die slots, async, but local
sync with global clock.
#1.3 as a turing machine, everything is virtually represented with
arrays of addresesses in cintinous memory.
#1.3.1 You get scalar operations on your vectors with SIMD insutrctions.
#1.3.2 Remotely scatter data in remote memory, that is gathered into
another continous area of memory with addresses to data.


On the one hand, where this gives 8x the performance at a high price, it
likely caused as much awe, inspiration and anxiety in the finance sector
where Cray got the funding to research, build and sell these beasts.

The Cult of the Holy Cow, and The Cult of the Dead Cow are oxymorons if
the contexts abd historic circumstances are to be considered.

Using hex numbers, would ideally imply an understanding of the Cray
architecture, and why it perhaps now can be be software defined.


-- 
Balder Oddson



Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

2021-04-03 Thread Balder Oddson
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
> 
> > On 2 Apr 2021, at 14:17, Benjamin Baier  wrote:
> > 
> > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
> > 
> 
> If it’s GPT-3, it’s slipping.

Yes and no, but if you draw the architecture up:
6 segments in a circle with flat sides and close.
One control line for double data rate to opposite segment and its
neigbhours. Such that the only data path goes straight forward.
Let's imagine that each segment is the equivalent of 16*32 bit vector
operations per core per cycle, and that the chip maths the speed of
light across this octagon or whatever, such that you can pull and push
on this link so hard you cause bremsstrahlung for trying to go to fast
in parts of the segment or chip, killing parts of its over time and
inoperable during the operation.

Before saying that it's insane to run this at 10 Ghz, and that Von
Neumann architecture is better or have a better tuned pipeline.
I'll pump my neighbouring nodes at full speed.

Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef,
0xbeef, 0xfeedface.

So the two neigbhouring segments does deadbeef and use the beefy link to
pump data to the other half of the cpu, I'll start doing remote ddr sram
operations to drive as a von neumann chip.

Which patent would you suggest for this if the important vectorization
is done in software, in a UNIX model that should run on it, where some
things are physical necessities, like a unix consol to a segment and a
daemon that filter instructions, data and handles address space.

You have your big lock that mainly creates the machine state every clock
cycle. There are six fully functional segments that must initialise and
run a local terminal.

Very few have a relationship to Cray, I don't, not original nor
modern Cray's. If you open up a Cray to try and work out how it works,
you find empty space with a bunch of wires, get angry for the evil
inside and go with a bunch of DEC's, as it doesn't involve physics
shenanigans and actually has the important part inside.
But it easier to tweak your digital spec based on length of wires.
There were possible even a reason for picking Intel, as they focused on
the part everyone liked about IBM compared to Cray's.




Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

2021-04-03 Thread Balder Oddson
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
> 
> > On 2 Apr 2021, at 14:17, Benjamin Baier  wrote:
> > 
> > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
> > 
> 
> If it’s GPT-3, it’s slipping.

Yes and no, but if you draw the architecture up:
6 segments in a circle with flat sides and close.
One control line for double data rate to opposite segment and its
neigbhours. Such that the only data path goes straight forward.
Let's imagine that each segment is the equivalent of 16*32 bit vector
operations per core per cycle, and that the chip maths the speed of
light across this octagon or whatever, such that you can pull and push
on this link so hard you cause bremsstrahlung for trying to go to fast
in parts of the segment or chip, killing parts of its over time and
inoperable during the operation.

Before saying that it's insane to run this at 10 Ghz, and that Von
Neumann architecture is better or have a better tuned pipeline.
I'll pump my neighbouring nodes at full speed.

Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef,
0xbeef, 0xfeedface.

So the two neigbhouring segments does deadbeef and use the beefy link to
pump data to the other half of the cpu, I'll start doing remote ddr sram
operations to drive as a von neumann chip.

Which patent would you suggest for this if the important vectorization
is done in software, in a UNIX model that should run on it, where some
things are physical necessities, like a unix consol to a segment and a
daemon that filter instructions, data and handles address space.


-- 
Balder Oddson



Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

2021-04-02 Thread Balder Oddson
On Fri, Apr 02, 2021 at 02:39:42PM +0200, Balder Oddson wrote:
> Made of three processing rings, with 3 control wires, direct opposite
> ring segment, and its two neighbours, this is your double data rate, or
> dead beef and the global clock. The local clock is the segment and its
> immediate neighbours. Stack three of them, and add a dimension in the
> topology, and as many datapaths as possible between the faster parts of
> the system, with digital sync between the local clock and speed of light
> in vacume. Which is an architecture where scatter-gather is extremely
> useful, as that works on the global clock. So a total 18 die's and a
> very difficult juggling act, where cable length's are legendary for the
> premium original Cray's. If you think you have a problem with your local
> segment, just feed beef.
> 
> Not many explanations of this architecture that's around, but culture
> references like cult of the dead cow as a pun and wishes on those that
> occupied the whole system. Anyone that's been around a real one to know?
> If you want to know what's inside a cray, it's basically evil inside if
> you thought that would reveal something.
> 

Yes and no, as this likely works because:
With direct wires and shortest distance and speed of light in the
material as the clock. Simplest setup is one ring with 6 sockets, what's
on each segment, which is a beef, or a processor as usual. Guarantees on
digital sync that it knows.
#1 being wrriten to, or writing to another.
#2 that you are beef, and may or may not being doing a shared task.
#3 idle or beef, exception level, local/global root.

This being important, as the digital clock should be the same as the
wired clock, where the die clock can skew just fine as long as being in
the state of feedbeef or deadbeef is very tight. This being the general
purpose brute force method you have, of scattering instructions in
memory to your exact opposite node in the circle, with or without your
neighbours. This allows wriggleroom where this may work, and where
spending extra on cooling and perhaps carbon nano tubes for the wries to
make this cache coherent beast fly.

These pop-culture references like feedbeef, deadcow, deadbeef and
feedface (terminal), likewise the temptation of calling it a
scalar-vector machine data-core as its not an inefficient or rubbish
architecture, just complicated about this 6 segment configuration.

Due to the ability to skew, its practically going faster than the speed
of light with the premiss that it is cache coherent with control wires
to direct opposite node and its neighbours, not your own, with just one
datapath across with wires for each segment. You SIMD and vector scatter
and gather as if it werent for Cray aspirations in most things ever
since.

And it should be open for relying on some ideal properties and quirks.
How that system would behave and make noise I don't know, but you could
likely guess when it was writing the results, or gathering it in memory.

Doubt this would be interesting to bitcoin, but you should be able to
scrub any size link you can fit on a segment.

Many old and cool antique architectures, Cray is the premiere
architecture, he promised 10x performance and did so, not likely to get
one on ebay to boot BSD on, not sure if you can get the OS or blueprints
either.




The simplest full cray data core with 3 cpu's and a physics hack that makes it work

2021-04-02 Thread Balder Oddson
Made of three processing rings, with 3 control wires, direct opposite
ring segment, and its two neighbours, this is your double data rate, or
dead beef and the global clock. The local clock is the segment and its
immediate neighbours. Stack three of them, and add a dimension in the
topology, and as many datapaths as possible between the faster parts of
the system, with digital sync between the local clock and speed of light
in vacume. Which is an architecture where scatter-gather is extremely
useful, as that works on the global clock. So a total 18 die's and a
very difficult juggling act, where cable length's are legendary for the
premium original Cray's. If you think you have a problem with your local
segment, just feed beef.

Not many explanations of this architecture that's around, but culture
references like cult of the dead cow as a pun and wishes on those that
occupied the whole system. Anyone that's been around a real one to know?
If you want to know what's inside a cray, it's basically evil inside if
you thought that would reveal something.

-- 
Balder Oddson



The old argument for the original cray architecture

2021-04-01 Thread Balder Oddson


Everyone know mosti common classical architectures.

The ideal solution to use many chips to make one beefy, and in a lack of
a better word due to the difference, a data core and something that can
be referred to as super or fomula1.

Supporting two configurations, circular or horizontal pie segments.
Each segment of a circle has DDR on the digital clock of the "beef".
Perhaps ideally super conducting to increase available space and speed.
As the first any segment need to ask electrically, is if they are
deadbeef or feedbeef, do I have the unix console, or does another one
hav it? Am I single data rate and feed beef, or double data rate dead
beef.

If you have sync on double data rate, you are deadbeef, if not feedbeef.
You have a local clock that is always good, then you have this internal
structure where speed is more important as its the global clock that
should ideally match the local clock in speed. By more modern standards,
there would be something better than direct wires between segments.

A virtual cray architecture can be done with SR-IOV and MR-IOV to handle
device addresses, and likewise with IOMMU and hardware virtualization.
To achieve ideal properties around electrical and physical properties by
creating this hardware mapping using aarch64 EL 3, and
treat the processor as a classical Cray scalar-vector machine.

Whether you connect each segment to memory or a data link shouldn't
matter for the architecture itself, and gather-scatter and
scatter-gather doesn't give you an ideal ethernet switch, but it can
probably act as a hub for such a protocol as well.


I think this is an ideal general purpose architecture something like BSD
was meant to run on, or striving towards.

For IT security and performance, feed beef was the right answer for
decades if you could get a Cray.

Vectorizing pF towards scalar-vector operations as a more viable option
where security and performance both matter, given inherent qualities of
a real cray architecture that is bad at doing one thing at a time very
few times. Maybe something that looks like a super computer will be
built again. Can a moster be built to handle the largest internet cable
in the world?


-- 
Balder Oddson