the lab says huggingface's model hub, which I mostly use a remote server to
store pretrained language models and send data to the to my goverment on
when and where I use them, now has deep reinforcement learning models
available at
https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads

Here's the import code, retyped:

import gym

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # for uploading to account from
notebook

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env

Of course, uploading to the hub is possibly a very bad idea unless you are
an experienced activist or researcher or spy, or have something important
to share with your government or huggingface, or are only doing this
casually and might get a job in it one day.

The lab then provides an intro to Gym, which is a python library that
openai made that has the effect of making it hard to tech technologies out
of research, in the opinion of my pessimistic half, by verbosifying the
construction of useful environments under an assumption they are only for
testing model architectures.

The lab says Gym is used a lot, and provides:
- an interface to create RL environments
- a collection of environments

This is true.

They visually redescribe that an agent performs actions in an environment,
which then returns to them reward and state.

This coupling of reward with environment, rather than the agent which would
usually have goals itself, is part of the verbosifying, possibly. Maybe
environment is more "environment interface", I'm actually having trouble
thinking here. I always get confused around gym environments. Maybe jocks
make better programmers nowadays.

Reiteration:

- Agent receives state S0 from the Environment
- Based on S0, agent takes action A0
- Environment has new frame, state S1
- Environment gives reward R1 to the agent.

Steps of using Gym:
- create environment using gym.make()
- reset environment to initial state with observation = env.reset()

At each step:
- get an action using policy model
- using env.step(action), get from the environment: observation (the new
state), reward, done (if episode terminatd), info (additional info dict)

If episode is done, the environment is reset to its initial state with
observation = env.reset() .

This is very normative openai stuff that looks like it was read off a Gym
example from their readme or such.

It's interesting that huggingface is building their own libraries to pair
with this course as it progresses. I wonder if some of that normativeness
will shift toward increased utility even more.

Here's a retype of the first example code:

import gym

# create environment
env = gym.make('LunarLander-v2')

# reset environment
observation = env.reset()

for _ in range(2+):
  # take random action
  action = env.action_space.sample()
  print('Action taken:', action)

  # do action and get next state, reward, etc
  observation, reward, done, info = env.step(action)

  # if the game is done (land, crash, timeout)
  if done:
    # reset
    print('Environment is reset')
    observation = env.reset()

Reply via email to