On Mon, May 9, 2022, 8:14 AM Undiscussed Horrific Abuse, One Victim of Many <gmk...@gmail.com> wrote:
> > > On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of > Many <gmk...@gmail.com> wrote: > >> >> >> On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of >> Many <gmk...@gmail.com> wrote: >> >>> To represent normal goal behavior with maximization, the >>>>>>>>> >>>>>>>> >> This is all confused to me, but normally when we meet goals we don't >> influence things not related to the goal. This is not usually included in >> maximization, unless >> >> return function needs to not only be incredibly complex, but >>>>>>>>> >>>>>>>> >> the return to be maximized were to include them, by maybe always being >> 1.0, I don't really know. >> >> also feed back to its own evaluation, in a way not >>>>>>>>> >>>>>>>> >> Maybe this relates to not learning habits unrelated to the goal, that >> would influence other goals badly. >> >> provided for in these libraries. >>>>>>>>> >>>>>>>> >> But something different is thinking at this time. It is the role of a >> part of a mind to try to relate with the other parts. Improving this in a >> general way is likely known well to be important. >> >> >>> Daydreaming: I'm thinking of how in reality and normality, we have many >>> many goals going at once (most of them "common sense" and/or "staying being >>> a living human"). Similarly, I'm thinking of how with normal transformer >>> models, one trains according to a loss rather than a reward. >>> >>> I'm considering what if it were more interesting when an agent _fails_ >>> to meet a goal. Its reward would usually be full, 1.0, but would multiply >>> by losses when goals are not met. >>> >>> This seems much nicer to me. >>> >> > I don't know how RL works since I haven't taken the course, but it looks > to me from a distance like it would just learn at a different (slower) rate > [with other differences] > > yes > > I think it relates to the other inhibited concept, of value vs action learning. a reward starts at just the event of interest, for example, but the system then learns to apply rewards to things that can relate to the event, like preceding time points [states]. >