Re: [galaxy-dev] Galaxy Cloudman - How to analyse > 1TB data ?

Enis Afgan Sun, 19 Feb 2012 14:51:03 -0800

Hi Yves,
When you create the LVM file system - are you composing it from the volume
that already contains the data (i.e., directory structure++ created by
CloudMan) and then adding another volume into the LVM or starting with 2
new, clean volumes?
Maybe trying again and not messing with SGE at all would at least resolve
the SGE issue. Namely, SGE is on the root file system so it should be fine
as is. I'd suggest stopping Galaxy and PostgreSQL services (from the
CloudMan Admin), from the CLI, unmount galaxyData file system and proceed
to create the LVM. Mount the file system and ensure the directories and the
data that were there are still there. The start back PostgreSQL and Galaxy
services. See if it all comes up fine and try adding a worker node if it
does.


Currently, CloudMan does not support composition of a file system from
multiple volumes but I would think that as long as you did not restart the
cluster and created the file system manually, things would work fine. I've
been thinking about why you're seeing the described behavior and am not
really sure so please let me know how the above process works out.



On Thu, Feb 16, 2012 at 7:37 PM, Wetzels, Yves [JRDBE Extern] <
ywet...@its.jnj.com> wrote:

> Hi Brad
>
> I did not restart the master CloudMan node.
> I only restarted the services (Galaxy, PostgreSQL and SGE).
> I do not have these problems without  creating the logical volume.
>
> Kind Regards
> Yves
>
>
> Yves;
> I'm hoping Enis can jump in here since he is more familiar with the
> internals of CloudMan and may be able to offer better advice. I can tell
> you what I see from your error messages.
>
> > I used LVM2 to create the logical volume.
>
> Does this involve stopping and restarting the master CloudMan node? The
> error messages you are seeing look like SGE is missing or not properly
> configured on the master node:
>
> > 02/15/2012 11:22:08|  main|domU-12-31-39-0A-62-12|E|error opening file
> > "/opt/sge/default/common/./sched_configuration" for reading: No such
> > file or directory
> [...]
> > DeniedByDrmException: code 17: error: no suitable queues
>
> which is causing the job submission to fail since it can't find the SGE
> cluster environment to submit to. The strange thing is that SGE is
> present in /opt on the main EBS store, so I wouldn't expect your
> modified /mnt/galaxyData volume to influence this.
>
> Since starting worker nodes appears to be fine, I'd focus on the main
> instance manipulations you are doing. Perhaps some of the setup causes
> the problem without creating the logical volume? This could help narrow
> down the issue and hopefully get you running again.
>
> Hope this helps,
> Brad
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy Cloudman - How to analyse > 1TB data ?

Reply via email to