>You can always check what kind of data the program gives to the neural
network as the program is free software. If the data is valid runtime
input it is also valid training data.

That's not necessarily true. Like an image generating program will be
trained on image + caption pairs, but running it involves giving it
just the captions. Thus, running the model doesn't inherently show you
how to retrain the model.

>You can't exactly *know* that any extra training doesn't break the model
but the same holds for editing the original training data.

You can know with more certainty that it doesn't break the model.

On Tue, Aug 1, 2023 at 11:46 PM Saku Laesvuori <s...@laesvuori.fi> wrote:
>
> > >If you know how to convert the blob to weights in the neural network
> > >(something the program has to do to make any use of the blob) and know
> > >the error function, you can continue the training with new data.
> >
> > Yeah, I get that, but you don't necessarily know what the weights
> > mean. Let's charitably assume you know the blob works on image data
> > (instead of audio data or whatever). Do you know if it needs to be
> > trained on images of a particular size, or color depth, or encoding,
> > or color format, etc.? And what about models for more complex data
> > than images like genetic data?
>
> You can always check what kind of data the program gives to the neural
> network as the program is free software. If the data is valid runtime
> input it is also valid training data.
>
> > How do you know you're not going to end up with a network that spews
> > out invalid garbage if you re-train it with things that are
> > incompatible with the original training dataset? And how do you know
> > that, beyond trial and error, unless you have the original dataset?
>
> You can't exactly *know* that any extra training doesn't break the model
> but the same holds for editing the original training data. It is only
> very likely that training with new data improves the model, but you
> can't know it before you try.
>
> In this specific case we also do have access to the training data. We
> just don't want to spend the computing resources on training the model
> from scratch.

Reply via email to