Re: [ot][spam][crazy] draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many Mon, 09 May 2022 01:42:06 -0700

On Mon, May 9, 2022, 4:40 AM Undiscussed Horrific Abuse, One Victim of Many
<gmk...@gmail.com> wrote:


>
>
> On Mon, May 9, 2022, 4:38 AM Undiscussed Horrific Abuse, One Victim of
> Many <gmk...@gmail.com> wrote:
>
>>
>>
>> On Mon, May 9, 2022, 4:22 AM Undiscussed Horrific Abuse, One Victim of
>> Many <gmk...@gmail.com> wrote:
>>
>>> To represent normal goal behavior with maximization, the return function
>>>> needs to not only be incredibly complex, but also feed back to its own
>>>> evaluation, in a way not provided for in these libraries.
>>>>
>>>
>>> It should have anything inside the policy that can change as part of its
>>> environment state.
>>>
>>
There is censorship here: many important parts of the idea are left out,
focusing only on one projection of error.

The concern is a severe norm of action prior to observation, a habit known
to cause severe errors, regardless of training and practice.


>>> This is so important that even if it doesn't help it should be done,
>>> because it's so important to observe before action, in all situations.
>>>
>>
>> There is unexpected conflict around this combined expression of more
>> useful processes, and safer observation before influence. I believe this is
>> important (if acontextual), and wrong only in ways that are smaller than
>> the eventual problems it reduces, but I understand that my perception is
>> incorrect in some way.
>>
>
> I am hearing/guessing that the problem is that the information is designed
> for human consumption rather than automated consumption, and the harm is
> significantly increased when automated consumption happens before human
> consumption.
>
>>

Re: [ot][spam][crazy] draft: learning RL

Reply via email to