Hi Gordon,
Thanks for your assistance and the recommendations. Freezing postgres
sounds like hell to me :-)
abrt was filling the root directory indeed. So disabled it.
I have done some exporting tests, and the behaviour is not consistent.
1. *size*: in general, it worked out for smaller datasets, and usually
crashed on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able to
export... So far for the size hypothesis.
Another observation: when the export crashes, the corresponding
webhandler process dies.
So now I suspect something to be wrong with the datasets, but I am not
able to trace something meaningful in the logs. I am not confident in
turning on logging in Python yet, but apparently this happens with the
module "logging" initiated like logging.getLogger( __name__ ).
Cheers,
Joachim
Joachim Jacob
Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib
On 03/25/2013 05:18 PM, Assaf Gordon wrote:
Hello Joachim,
Couple of things to check:
On Mar 25, 2013, at 10:01 AM, Joachim Jacob | VIB | wrote:
Hi,
About the exporting of history, which fails:
1. the preparation seems to work fine: meaning: choosing 'Export this history'
in the History menu leads to a URL that reports initially that the export is
still in progress.
2. when the export is finished, and I click the download link, the root partition fills
and the browser displays "Error reading from remote server". A folder
ccpp-2013-03-25-14:51:15-27045.new is created in the directory /var/spool/abrt, which
fills the root partition.
Something in your export is likely not finishing fine, but crashes instead
(either the creation of the archive, or the download).
The folder "/var/spool/abrt/ccpp-XXXX" (and especially a file named "coredump")
hints that the program crashed.
"abrt" is a daemon (at least on Fedora) that monitors crashes and tries to keep
all relevant information about the program which crashed
(http://docs.fedoraproject.org/en-US/Fedora/13/html/Deployment_Guide/ch-abrt.html).
So what might have happened, is that a program (galaxy's export_history.py or other)
crashed during your export, and then "abrt" picked-up the pieces (storing a
memory dump, for example), and then filled your disk.
The handler reports in its log:
"""
galaxy.jobs DEBUG 2013-03-25 14:38:33,322 (8318) Working directory for job is:
/mnt/galaxydb/job_working_directory/008/8318
galaxy.jobs.handler DEBUG 2013-03-25 14:38:33,322 dispatching job 8318 to local
runner
galaxy.jobs.handler INFO 2013-03-25 14:38:33,368 (8318) Job dispatched
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,432 Local runner: starting
job 8318
galaxy.jobs.runners.local DEBUG 2013-03-25 14:38:33,572 executing: python
/home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs.runners.local DEBUG 2013-03-25 14:41:29,420 execution finished:
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
galaxy.jobs DEBUG 2013-03-25 14:41:29,476 Tool did not define exit code or
stdio handling; checking stderr for success
galaxy.tools DEBUG 2013-03-25 14:41:29,530 Error opening galaxy.json file:
[Errno 2] No such file or directory:
'/mnt/galaxydb/job_working_directory/008/8318/galaxy.json'
galaxy.jobs DEBUG 2013-03-25 14:41:29,555 job 8318 ended
"""
The system reports:
"""
Mar 25 14:51:26 galaxy abrt[16805]: Write error: No space left on device
Mar 25 14:51:27 galaxy abrt[16805]: Error writing
'/var/spool/abrt/ccpp-2013-03-25-14:51:15-27045.new/coredump'
"""
One thing to try: if you have galaxy keeping temporary files, try running the
"export" command manually:
===
python /home/galaxy/galaxy-dist/lib/galaxy/tools/imp_exp/export_history.py -G
/mnt/galaxytemp/tmpHAEokb/tmpQM6g_R /mnt/galaxytemp/tmpHAEokb/tmpeg7bYF
/mnt/galaxytemp/tmpHAEokb/tmpPXJ245 /mnt/galaxydb/files/013/dataset_13993.dat
===
Another thing to try: modify "export_history.py", adding debug messages to
track progress and whether it finishes or not.
And: check the "abrt" program's GUI, perhaps you'll see previous crashes that
were stored successfully, providing more information about which program crashed.
As a general rule, it's best to keep the "/var" directory on a separate
partition for production systems, exactly so that filling it up with junk wouldn't
intervene with other programs.
Even better, set each sub-directory of "/var" to a dedicated partition, so that filling up
"/var/log" or "/var/spool" would not fill up "/var/lib/pgsql" and stop Postgres from
working.
-gordon
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/