Re: [Alchemi-users] [Alchemi-developers] CPU usage

Matt Valerio Wed, 06 Sep 2006 20:12:34 -0700

Hi Krishna,

Thanks for replying.

Could you please put this into a feature request?

Sure.

Having said that, there is another important consideration for Alchemi
users.
This relates to the upcoming changes in Alchemi's Executor code, which
will run GThreads inside a tight sandbox. In such cases, you may not be
able to spawn off a seperate Process, since the threads may not be
assigned sufficient privileges. This is actually a good time to start
discussion about questions such as:
(i) what level of sandboxing is considered acceptable in a real
production environment

Disclaimer: I've only been using Alchemi for 8 days and I am still a "n00b". (On the other hand, that is a testimony to how well-written and easy to use the Alchemi framework is.)

Here is my application, maybe you can tell me whether a GJob would be better to use than a GThread.

I have a legacy FORTRAN application that I have written a C# wrapper for. My C# wrapper knows how to read and write all of the various input/output files that the FORTRAN app consumes/produces. This was a lot of work to write, but has certainly paid off. The main motivation behind this was that I have a document object model for these files, meaning that I can create a file programatically in memory and then write it to a file. I can also read the file into memory and access all of the various pieces of information at will from code. I've also written "tasks", which are C# classes that know how to write the input files, run the FORTRAN exe, read the files back into memory, and delete the files. I have tasks that execute both synchronously (the function returns when the simulation is done) as well as asynchronously (the task produces events signaling that the FORTRAN exe has produced data on stdout and stderr, which I mainly use for updating a textbox in a Windows Form...). These "task" classes contain the details involved with spawning the exe process.

My C# wrapper for the legacy code works well for what I am doing. A generous sprinkling of [Serializable] attributes was all I needed to get my code working from within a GThread.

So, basically to answer your question "Why use a GThread instead of a GJob?", my answer would be:
I am dynamically creating the input files using my document object model in memory, and I can avoid the performance hit of having to write this data out to disk (though it is possible) by using serialization.
I would also possibly need to create and destroy directories, because the legacy FORTRAN code expects all input files to be named the same, and produces output files named the same.

But, I admit that the other reason I have been using the GThread was because that is how the examples were written, and I really didn't know much about the GJob until I just now looked it up :)
I suppose that my application could be re-worked to fit within the guidelines of a GJob.

My code doesn't need any fancy file permissions: just a temporary place to dump files and delete them when the exe is done, whether that is in the current directory or another temporary location.

for eg: if a GThread is fully sandboxed, and has only 'Internet'
permissions, it will not be able to do much more than just use CPU. My
idea is to give it some more permissions, such as ability to write to
its working directory on disk. Other than that, nothing else may be
allowed. We could then add policies to allow for execution of 'unsafe'
threads, such as those that spawn off a Process, (including the GJob),
IF and only if, it is in the GAC / is digitally signed or something.

Just my $0.02 about this...

I would stay away from relying on if the code is in the GAC or not.....this makes a pretty big deployment headache.
Part of why Alchemi is so flexible is that no DLLs or EXEs need to be deployed to the Executor computer -- everything is serialized with the thread. I've been doing some pretty heavy development in the last week or so by adding Alchemi support to my C# wrapper library, and have changed the code executing in the DLL so much that I don't think I've used the same DLL version twice. If I had to deploy a new version of my DLL to the GAC of every machine in the Alchemi grid after each build, that would be a nightmare.

(ii) in continuation with (i) above, I would like to know why people are
using a GThread to spawn off a process, as opposed to a GJob. Are there
additional things that need to be done? Do we need to extend the GJob
framework to support more scenarios for pre- and post- processing?

Part of what helps my C# wrapper library is that there are actually 2 FORTRAN exes that need to be run, one before the other. Possibly extending the GJob to allow an arbitrary number of command executions would prove useful. For example, my process flow has to be

input1.txt -> Preprocessor.exe -> data1.txt, data2.txt
data1.txt, data2.txt -> Simulator.exe -> output1.txt, output2.txt, output3.txt

There would be a pretty hefty waste if that were required to be split into 2 GJobs, because then data1.txt and data2.txt (can be pretty large files) would need to be transferred back to the manager and then back to an executor again.

If
so, we can do that, and allow only GJob to run with more privileges than
a normal users' GThread, so that it can spawn processes etc.

Also, I have not looked into how we can sandbox un-managed (i.e
non-.Net) code in Windows, yet. Any suggestions will be welcome...

I would like to know what others think about this...

Hope my feedback helps, and thanks a ton for a great framework!
-Matt

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
alchemi-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-users

Re: [Alchemi-users] [Alchemi-developers] CPU usage

Reply via email to