Chris,

I have confirmed that entering an Instance name of > 50 characters does indeed 
cause a: 500 Internal Server Error.  So I am going to assume this is the issue.

This needs to be addressed both in Condor and in the RHEVM Driver.  The Driver 
should do any validation checks before sending the API request to the BackEnd 
provider, and thus return an Error message with a more helpful message.

Cheers

Martyn

----- Original Message -----
From: "Chris Lalancette" <[email protected]>
To: "Martyn Taylor" <[email protected]>
Cc: "Chris Lalancette" <[email protected]>, "Michael Orazi" 
<[email protected]>
Sent: Monday, 23 May, 2011 1:17:04 PM
Subject: Re: e2e failing w/rhevm 2.2

On 05/23/11 - 07:38:12AM, Martyn Taylor wrote:
> Hi Chris,
> 
> I've been trying this morning to get full e2e working with rhevm 2.2.
> 
> However, I'm still getting: Create_Instance_Failure: 500 Internal Server Error
> 
> You mentioned on Fridays call that there still might be some issues with 
> Condor.  Is this now fixed?  Could this issue I'm seeing here?
> 
> Snippet of GridManager Log:
> 
> 05/23/11 12:30:49 [18785] querying for removed/held jobs
> 05/23/11 12:30:49 [18785] Using constraint 
> ((Owner=?="aeolus"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && 
> (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= 
> "External"))
> 05/23/11 12:30:49 [18785] Fetched 0 job ads from schedd
> 05/23/11 12:30:49 [18785] Updating classad values for 6.0:
> 05/23/11 12:30:49 [18785]    EnteredCurrentStatus = 1306150249
> 05/23/11 12:30:49 [18785]    HoldReason = "Create_Instance_Failure: 500 
> Internal Server Error"
> 05/23/11 12:30:49 [18785]    HoldReasonCode = 0
> 05/23/11 12:30:49 [18785]    HoldReasonSubCode = 0
> 05/23/11 12:30:49 [18785]    JobStatus = 5
> 05/23/11 12:30:49 [18785]    Managed = "Schedd"
> 05/23/11 12:30:49 [18785]    NumSystemHolds = 1
> 05/23/11 12:30:49 [18785]    ReleaseReason = undefined
> 05/23/11 12:30:49 [18785] No jobs left, shutting down
> 05/23/11 12:30:49 [18785] leaving doContactSchedd()
> 05/23/11 12:30:49 [18785] Got SIGTERM. Performing graceful shutdown.
> 05/23/11 12:30:49 [18785] Started timer to call main_shutdown_fast in 1800 
> seconds
> 05/23/11 12:30:49 [18785] **** condor_gridmanager (condor_GRIDMANAGER) pid 
> 18785 EXITING WITH STATUS

It is hard to say if this is the exact problem.

The problem I still know about is that condor generates instance names like:

Condor_localhost.localdomain_job#12.0 (or something like that)

Unfortunately, RHEV-M VM names can only be 50 characters, and the above is
always longer than 50 characters.  Whether that causes the above error is
unclear to me, but until we fix that, RHEV-M is not going to work.

-- 
Chris Lalancette
_______________________________________________
deltacloud-devel mailing list
[email protected]
https://fedorahosted.org/mailman/listinfo/deltacloud-devel

Reply via email to