Re: [ot][spam][crazy] draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many Mon, 09 May 2022 05:23:36 -0700

On Mon, May 9, 2022, 8:12 AM Undiscussed Horrific Abuse, One Victim of Many
<gmk...@gmail.com> wrote:


>
>
> On Mon, May 9, 2022, 8:05 AM Undiscussed Horrific Abuse, One Victim of
> Many <gmk...@gmail.com> wrote:
>
>> To represent normal goal behavior with maximization, the
>>>>>>>>
>>>>>>>
> This is all confused to me, but normally when we meet goals we don't
> influence things not related to the goal. This is not usually included in
> maximization, unless
>
> return function needs to not only be incredibly complex, but
>>>>>>>>
>>>>>>>
> the return to be maximized were to include them, by maybe always being
> 1.0, I don't really know.
>
> also feed back to its own evaluation, in a way not
>>>>>>>>
>>>>>>>
> Maybe this relates to not learning habits unrelated to the goal, that
> would influence other goals badly.
>
> provided for in these libraries.
>>>>>>>>
>>>>>>>
> But something different is thinking at this time. It is the role of a part
> of a mind to try to relate with the other parts. Improving this in a
> general way is likely known well to be important.
>
>
>> Daydreaming: I'm thinking of how in reality and normality, we have many
>> many goals going at once (most of them "common sense" and/or "staying being
>> a living human").  Similarly, I'm thinking of how with normal transformer
>> models, one trains according to a loss rather than a reward.
>>
>> I'm considering what if it were more interesting when an agent _fails_ to
>> meet a goal. Its reward would usually be full, 1.0, but would multiply by
>> losses when goals are not met.
>>
>> This seems much nicer to me.
>>
>
I don't know how RL works since I haven't taken the course, but it looks to
me from a distance like it would just learn at a different (slower) rate
[with other differences]
> yes

>

Re: [ot][spam][crazy] draft: learning RL

Reply via email to