> > > > > > > > > > > > [so how did they get deepseek running on this thing?
> > > > > > > > > > > > the page on it
> > > > > > > > > > > > has a link to their OS image ...
> > > > > > > > > > >
> > > > > > > > > > > https://www.eswincomputing.com/en/bocupload/2024/06/19/17187920991529ene8q.pdf
> > > > > > > > > > > indicates that there are GPU drivers for normal
> > > > > > > > > > > python-based
> > > > > > > > > > > frameworks for this CPU ("Pytorch, Tensorflow,
> > > > > > > > > > > PaddlePaddle, ONNX,
> > > > > > > > > > > etc")
> > > > > > > > > >
> > > > > > > > > > i wonder if the lesson is that if you port a mainstream
> > > > > > > > > > language model
> > > > > > > > > > to a small chip and then sell it, everybody will buy it.
> > > > > > > > >
> > > > > > > > > OOPS
> > > > > > > >
> > > > > > > > not today it seems
> > > > > > > >
> > > > > > > > there's still some interest in image generation, but it mostly
> > > > > > > > assumes
> > > > > > > > that there's already a local way to do this.
> > > > > > > >
> > > > > > > > i'm on windows right now, hopefully temporarily. i use wsl2
> > > > > > > > ubuntu :s :s : s
> > > > > > >
> > > > > > > so maybe network servic--- [becau--
> > > > > >
> > > > > > what if we used a super tiny model? maybe that's interesting
> > > > > > or music synthesis or something !
> > > > >
> > > > > pulling away from httptransformer could help sort out a little. it's
> > > > > definitely never been designed for diffusion models
> > > >
> > > > [machine learning in general is designed in opposition to the things
> > > > people like me try to do, the design choices are based around lots of
> > > > infrastructure and minimal algorithmic resear
> > >
> > > maybe let's try a tiny diffusion model or something
> > >
> > > the huggingface diffusers architecture seems more flexible than the
> > > transformers architecture, they [seem to kind of parameterize their
> > > pipelines to load submodels and wire them together, of course it also
> > > looks hardcoded into constructor classes,
> >
> > ran into a problem needing further exploration regarding
> > torch.unsqueeze(NetTensor)
> > sometimes this was returning a NetTensor for me, other times a
> > tensor(device=meta) which is a bug.
> > placing a breakpoint in __torch_function__ my handler was not getting
> > called when tensor(device=meta) was returned
> > a normal next step for me could be to step into the torch source
> > (which i have) and comprehend where and when __torch_function__ is
> > called and such.
> > if torch.unsqueeze() can't be hooked via the normal api then i could
> > kind of polyfill something in.
possible information on torch polyfills
>>> import torch, functools
>>> __unsqueeze = torch.unsqueeze
>>> torch.unsqueeze = functools.wraps(__unsqueeze)(lambda *ps, **ks:
>>> [print('wrapping unsqueeze!'), __unsqueeze(*ps,**ks)][1])
>>> torch.unsqueeze(torch.tensor([1]),0)
wrapping unsqueeze!
tensor([[1]])
>>>
torch lets you assign to its public attriutes. so functionality can be
added by simply writing functions that compose them.
of course it would be appropriate to make issues for any bugs found at
https://github.com/pytorch/pytorch/issues and that does of course
notify the world that you are trying to work around it
> > a simple approach would be to ensure this tensor is fetched in advance
> > or to patch the code using it, but this wouldn't address the problem
> > in other cases
> [[when i see errors that could be nondeterministic in an unexplained
> manner i wonder if they are a cybercomprimise which stimulates [i
> guess my internal part that some identify as something like the slave
> boss character [possible mistake]] to make it harder to do what i am
> doing to avoid engaging it