On Thu, Dec 8, 2011 at 10:04 PM, Takatoshi MATSUO <matsuo....@gmail.com>wrote:

> Hello Serge
>
> 2011/12/8 Serge Dubrouski <serge...@gmail.com>:
> >
> >
> > On Mon, Dec 5, 2011 at 9:15 PM, Takatoshi MATSUO <matsuo....@gmail.com>
> > wrote:
> >>
> >> Hello Serge
> >>
> >> Serge Dubrouski <serge...@gmail.com>:
> >> > Hello -
> >> >
> >> > Takatoshi MATSUO did a tremendous job on implementing support for
> >> > streaming
> >> > replication feature in pgsql RA. Also it looks like PostgeSQL 9.1 has
> >> > all
> >> > necessary interfaces to successfully implement  Pacemaker's M/S
> concept.
> >> > So
> >> > I think it's time to start discussion on how to merge Takatoshi's work
> >> > into
> >> > pgsql RA baseline. Here is the link to Takatoshi's GitHUB if somebody
> >> > wants
> >> > to test his RA:
> >> >
> >> > https://github.com/t-matsuo/
> >> >
> >> > So far I tested it for backward compatibility in a standard
> >> > non-replication
> >> > mode  and also tested M/S model and found no real issues. Though it
> >> > definitely requires some more polishing and testing.
> >> >
> >> > Takatoshi, here are some changes that I want to discuss with you:
> >> >
> >> > 1. Is it possible to add a check for PostgreSQL version and fail with
> >> > OCF_ERR_INSTALLED when one tries to start replication on version less
> >> > than
> >> > 9.1? A simple cat on PG_VERSION with some analysis would probably do.
> >>
> >> I'll add a check.
>
> I added a check.
>
> https://github.com/t-matsuo/resource-agents/commit/3ab7cfdcce118043cd149b348740e50e7a946eb3
>
> >>
> >> > 2. I think that following lines should be moved from pgsql_start to
> >> > pgsql_validate_all
> >> >
> >> >  535     # Check whether tmpdir is readable by pgdba user
> >> >  536     if ! runasowner "test -r $OCF_RESKEY_tmpdir"; then
> >> >  537         ocf_log err "Directory $OCF_RESKEY_tmpdir is not readable
> >> > by
> >> > $OCF_RESKEY_pgdba"
> >> >  538         return $OCF_ERR_PERM
> >> >  539     fi
> >>
> >> Thanks. I think so too.
> >> I'll fix it.
> >>
>
> I fixed it and I deleted a check for tmpdir existence
> because the checking for permittion fills the role.
>
> https://github.com/t-matsuo/resource-agents/commit/82d4939486bcca429e2deb804d7faf756099bb59
>
>
> > On a second thought I'm not sure why we need that parameter and
> directory at
> > all. Why not to create rep_mode.conf, PGSQL.lock and xlog.note in
> > $OCF_RESKEY_pgdata ? What problems it can create?
> >
> > One more advantage to do
> > it in $OCF_RESKEY_pgdata is an ability to handle more than PostgreSQL
> > instance on the same server without a need for additional temp
> directories.
>
> When backup is needed, customers may backup these files and restore it.
> It may cause problems.
> Specially PGSQL.lock causes an error on start.
>
> I think that they should be treated as a separate thing.
> because they are independent from PostgreSQL's data.
>

Ok. Then may be it should default to something like
${OCF_RESKEY_pgdata}/temp and RA should create it and set the right
ownership and permissions if it doesn't exist? And again may be in this
case we don't need that parameter?



>
> Incidentally I considered to handle more than PostgreSQL instance.
> Initially I added port number to these filenames, but I deleted it
> to simplify filenames.
>
> https://github.com/t-matsuo/resource-agents/commit/b16faf2d797200048dc0fc07a45b6751cf5be190
>
>
> > Also I think it would be good if RA was able to take care of adding
> "include
> > $WHATEVER_DIR/rep_mode.conf" in postgresql.conf. It will make the RA self
> > sustainable. In a current situation admin has add that directive
> manually.
> > RA though can something like this in a start function for replication
> mode:
> >
> > if ! grep -i "include $WHATEVER_DIR/rep_mode.conf" $OCF_RESKEY_config
> > then
> >      echo "include $WHATEVER_DIR/rep_mode.conf" >> $OCF_RESKEY_config
> > fi
>
> Sounds good.
>
> > Don't know if it makes sense to remove it on stop.
>
> I think it doesn't make sense to remove it,
> because rep_mode.conf becomes empty on stop.
>

The only problem here if admin changes temp_dir parameter but doesn't
delete records from postgres.conf. We could end up with several include
records then or with records pointing to the non existing files.



> >> > 3. I don't really like this part of pgsql_real_monitor:
> >> >
> >> >  775     if ! is_replication; then
> >> >  776         OCF_RESKEY_monitor_sql=`escape_string >
> >> > "$OCF_RESKEY_monitor_sql"`
> >> >  777         runasowner -q $loglevel "$OCF_RESKEY_psql $psql_options
> -c
> >> > > '$OCF_RESKEY_monitor_sql'"
> >> >  778         rc=$?
> >> >  779     else
> >> >  780         output=`su $OCF_RESKEY_pgdba -c "cd $OCF_RESKEY_pgdata; >
> >> > $OCF_RESKEY_psql $psql_options -Atc \"${CHECK_MS_SQL}\""`
> >> >  781         rc=$?
> >> >  782     fi
> >> >
> >> > I think that functional monitor (the one that uses monitor_sql) should
> >> > run
> >> > always independently of DB mode since its primary role is to check
> data
> >> > and
> >> > fail if it's not correct or corrupted. In replication mode there
> should
> >> > be
> >> > additional monitoring. Other way it misleads customer on a usage of
> >> > monitor_sql.
> >>
> >> All right.
> >>
> >> Does it need to execute "select now();" if monitor_sql parameter is
> empty?
> >> I think it's unnecessary.
> >
> >
> > For a case of an empty parameter you are right and running "select now()"
> > probably unnecessary, Non-empty monitro_sql shall be executed in my
> opinion.
> >
> >>
> >>
> >> > 4. You already populate several attributes with crm_attribute. Does it
> >> > make
> >> > sense to populate a name of a Master node in promote function and use
> it
> >> > later on instead of running crm_mon on each monitor command?
> >> >
> >>
> >> Do you mean that you don't want to use crm_mon in monitor?
> >
> >
> > I prefer to have as few dependencies  on external programs as possible.
>
> Agree.
>
> > Using crm_attribute to communicate the name of a master node would be
> > consistent with the rest of the script since you already use it for
> > communicating state of the nodes. If you prefer using crm_mon than you
> have
> > to add a check that that binary exists on the server. In 99.99% of the
> case
> > it will but still the check is necessary I think.
>
> I use crm_mon to check for node existence and node online too.
> It is impossible to get it from attribute, so I prefer to using crm_mon.
>
> Do I have to add a check for crm_master, crm_attribute, crm_failcount too?
>
>
That's a different  interesting topic. What is the reason for doing this?

my_fail_count=`$CRM_FAILCOUNT -r $OCF_RESOURCE_INSTANCE -N $HOSTNAME -G -Q
| sed "s/INFINITY/1000000/g"`
if [ "$my_fail_count" != "0" ]; then
      ocf_log info "I don't start PostgreSQL on post-demote because of my
fail-count=$my_fail_count."
      return $OCF_SUCCESS
fi

Why not to leave pacemaker to do its job on managing failcounts?



>
> >>
> >>
> >> > 5. It also requires some changes in terms of a proper English but we
> can
> >> > go
> >> > through it later.
> >>
> >> I hope your help.
> >
> >
> > Will do.
> >
> >>
> >>
> >> >
> >> > And yet again thanks for a brilliant work.
> >> >
> >> Thank you for the compliment.
> >>
> >> > Florian, Dejan how would you like to merge a patch when we are ready?
> >> > The
> >> > patch will be rather big one and AFAIK you have some policy on the
> >> > amount of
> >> > changes for one patch.
>
>
> Regards,
> Takatoshi MATSUO
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Serge Dubrouski.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to