bug#48945: PostgreSQL + Cuirass Errors

2021-08-11 Thread Mathieu Othacehe


Hello,

> deleting the records with `starttime` equal to 0 from `builds` table,
> cuirass could start again. but the issue happens again after a while.

This is fixed with aa2f682facce5de727bdae5bbd5d1a2a27923ebb and
1dcaebc66097ce503bd827c7b28e0a0936c1daee.

Thanks,

Mathieu





bug#48945: PostgreSQL + Cuirass Errors

2021-07-19 Thread Reza Alizadeh Majd
Hi, 

Is there any update about this issue?

I receive same "division by zero" error after running cuirass for 
a while. 

deleting the records with `starttime` equal to 0 from `builds` table,
cuirass could start again. but the issue happens again after a while.



--
Reza Alizadeh Majd
PantherX Team
https://pantherx.org





bug#48945: PostgreSQL + Cuirass Errors

2021-06-15 Thread Eric Brown
Eric Brown  writes:

> Hello:
>
> Executive Summary:
> - Can't reinstall Cuirass and/or PostgreSQL
> - Divide by 0 error reported by postgres when computing metrics
>

An update on this:

I have reinstalled, and I can get PostgreSQL working.  I think my
problem was trying to "reset" cuirass by removing it from config.scm
repeatedly, and shuffling up uid/gid's etc.  I think I can avoid this.

The other problem remains:  it seems that cuirass rolls along pretty
well for a while and then will report an error.  It could also be
triggered perhaps because I am adding a package build rule after
reconfigure -- but I think it's appeared with just these packages as
well.


-


2021-06-14T09:35:01 Updating metric percentage-failure-10-last-eval-per-spec 
(my-texlive) to 0.0. 
2021-06-14T09:35:01 Updating metric percentage-failure-100-last-eval-per-spec 
(my-texlive) to 18.42105263157895.  
2021-06-14T09:35:01 Updating metric percentage-failed-eval-per-spec 
(my-texlive) to 18.42105263157895.  
  
2021-06-14T09:35:01 Updating metric average-10-last-eval-duration-per-spec 
(my-xfce) to 31.0. 
2021-06-14T09:35:01 Updating metric average-100-last-eval-duration-per-spec 
(my-xfce) to 31.0.
2021-06-14T09:35:01 Updating metric average-eval-duration-per-spec (my-xfce) to 
31.0. 
2021-06-14T09:35:01 Updating metric percentage-failure-10-last-eval-per-spec 
(my-xfce) to 0.0.
2021-06-14T09:35:01 Updating metric percentage-failure-100-last-eval-per-spec 
(my-xfce) to 0.0.   
2021-06-14T09:35:01 Updating metric percentage-failed-eval-per-spec (my-xfce) 
to 0.0. 
2021-06-14T09:35:01 Failed to compute metric average-eval-build-start-time 
(14335).   
2021-06-14T09:35:01 Updating metric average-eval-build-complete-time (14335) to 
12.0. 
2021-06-14T09:35:01 Updating metric evaluation-completion-speed (14335) to 
300.0. 
2021-06-14T09:35:01 Failed to compute metric average-eval-build-start-time 
(14206).   
2021-06-14T09:35:01 Updating metric average-eval-build-complete-time (14206) to 
1.0.  
2021-06-14T09:35:01 Updating metric evaluation-completion-speed (14206) to 
3600.0.
2021-06-14T09:35:01 Failed to compute metric average-eval-build-start-time 
(14196).   
2021-06-14T09:35:01 Updating metric average-eval-build-complete-time (14196) to 
0.0.  
2021-06-14T09:35:01 fatal: uncaught exception 'psql-query-error' in 'metrics' 
fiber!  
2021-06-14T09:35:01 exception arguments: (fatal-error "PGRES_FATAL_ERROR" 
"ERROR:  division by zero\n")   
In ice-9/boot-9.scm:
  
  1747:15 11 (with-exception-handler # ?)
  
  1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)   
  
724:2  9 (call-with-prompt ("break") # ?)
  
724:2  8 (call-with-prompt ("continue") # ?)
   





724:2  8 (call-with-prompt ("continue") # ?)
[416/1949]
In ice-9/eval.scm:  
  
619:8  7 (_ #(#(# ?))) 
  
In cuirass/logging.scm: 
  
58:18  6 (call-with-time-logging "Metrics update" #) 
  
In ice-9/boot-9.scm:
 

bug#48945: PostgreSQL + Cuirass Errors

2021-06-10 Thread Eric Brown
Hello:

Executive Summary:
- Can't reinstall Cuirass and/or PostgreSQL
- Divide by 0 error reported by postgres when computing metrics

Details:
I am having issues reconfiguring Cuirass and PostgreSQL . I wonder if these are 
related
to several issues in PostgreSQL, and seem to occur when I reconfigure
either cuirass and/or postgres without Cuirass present, i.e. my "database 
server"


/etc/config.scm:


(define %cuirass-specs
  #~(list (specification
   (name "my-cbc")
   (build '(packages "cbc")))
  (specification
   (name "my-ipopt")
   (build '(packages "ipopt")))
  (specification
   (name "my-linux-libre")
   (build '(packages "linux-libre")))
  (specification
   (name "my-openblas-ilp64")
   (build '(packages "openblas-ilp64")))
  (specification
   (name "my-qtbase")
   (build '(packages "qtbase")))
  (specification
   (name "my-sylpheed")
   (build '(packages "sylpheed")))
  (specification
   (name "my-texlive")
   (build '(packages "texlive")

(service cuirass-service-type
 (cuirass-configuration
  (specifications %cuirass-specs)))




An example session trying to get cuirass re-installed:

1. Comment out Cuirass in /etc/config.scm and reconfigure

building 
/gnu/store/9nmk3q8nwk51wqanpw4a5agwak0yfhpj-upgrade-shepherd-services.scm.drv...
shepherd: Removing service 'cuirass-web'...
shepherd: Done.
shepherd: Removing service 'postgres-roles'...
shepherd: Done.
shepherd: Removing service 'cuirass'...
shepherd: Done.
shepherd: Removing service 'postgres'...
shepherd: Done.
shepherd: Service host-name has been started.
shepherd: Service user-homes has been started.
shepherd: Service sysctl has been started.
shepherd: Service host-name has been started.
shepherd: Service term-auto could not be started.
To complete the upgrade, run 'herd restart SERVICE' to stop,
upgrade, and restart each service that was not automatically restarted.
Run 'herd status' to view the list of services on your system

2) At shell:
# rm -rf /var/log/cuirass /var/log/cuirass.log* /var/log/cuirass.log 
/var/log/cuirass-web.log /var/cache/cuirass /var/lib/postgresql/data 
/var/lib/cuirass

3) Reboot

4) Check no files above are regenerated, e.g. by other services requiring 
postgresql (none found)

5) Re-enable Cuirass in /etc/config.scm, reconfigure:  (frequently observed 
error at end of this item)

selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... US/Central
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... sh: locale: command not found
2021-06-10 05:57:26.532 CDT [1370] WARNING:  no usable system locales were found
ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/gnu/store/jsa77nkqcvsck4ksvm2b8sccl174hai4-postgresql-10.17/bin/pg_ctl -D 
/var/lib/postgresql/data -l logfile start

The following derivation will be built:
   /gnu/store/bmzhdkki40d8y6d6n9a3gw4g70xmv824-install-bootloader.scm.drv

building 
/gnu/store/bmzhdkki40d8y6d6n9a3gw4g70xmv824-install-bootloader.scm.drv...
guix system: bootloader successfully installed on '/boot/efi'
shepherd: Service host-name has been started.
shepherd: Service user-homes has been started.
shepherd: Service sysctl has been started.
shepherd: Service host-name has been started.
shepherd: Service term-auto could not be started.
guix system: warning: exception caught while executing 'start' on service 
'postgres':
Throw to key `%exception' with args `("#<&invoke-error program: 
\"/gnu/store/4x3h2096cvzvq65wv40a4acwdyks9ivc-pg_ctl-wrapper\" arguments: 
(\"start\") exit-status: 1 term-signal: #f stop-signal: #f>")'.
guix system: warning: some services could not be upgraded
hint: To allow changes to all the system services to take effect, you will need 
to reboot.

6) Reboot

7) telnet localhost 5432

telnet localhost 5432
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused



I am also observing divide-by-zero errors reported by a PG process when 
computing metrics. Perhaps it is ignorable, but it seems to throw a Scheme 
"stack trace" that doesn't look good.  I was unable to capture the specific 
message due to thrashing to restart Curirass and the DB.

I am able to reproduce this on several machines, this is my third attempt to 
install on a fresh machine, use as I expect (ability to add/remove/reconfigure 
services) etc.

This may be a red herring, but I can't help but feel that postgres is getting 
pulled in from other services as well, and that the