Re: [Alchemi-developers] ideas for a dll cache

Matt Valerio Thu, 13 Sep 2007 22:55:56 -0700

Hi Anton,

Here are some of my thoughts for what they're worth.  Just opening up things
for discussion on the list :)

On 9/13/07, Anton Melser <[EMAIL PROTECTED]> wrote:
>
> ...
> One of the concerns is security - how can we guarantee that a dll has
> not been tampered with? I am not sure this is a big issue but wanted
> to see if anyone else had any thoughts. My original impression was
> that it would be a problem, but the executor either executes as the
> user (who needs the password to connect to the grid, so we can assume
> is trustworthy), or as system, who we can also assume is trustworthy.
> There are very probably nuances to Windows security that I am missing
> however... (my *nix security knowledge is much stronger...).

I think that we should require that all DLLs in the cache be signed.  This
would also imply that Alchemi itself needs to be signed (which we should
have done a long time ago, as I'm mentioned on the lists before as well...)

Currently Alchemi is set up to use .NET CAS (code access security) to create
a limited sandbox to run the thread in.  CAS itself give you fine-grained
control over what kinds of operations a DLL can perform, though I think it's
hard to control it now.  Perhaps we could associate a set of permissions
with each user or group in the database.

The next is speed. Clearly, a dll cache has an interest for both
> reducing network traffic and speed. What is the best we are reasonably
> going to get?

Ideally the Manager and all Executors would have a cache, that would follow
the rules:
1) An assembly is referenced by its full name (name, version, culture, and
public key) -- like "System.Drawing, Version=1.0.3300.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089 " and is unique (Assembly.FullName property
gives you this)
2) The assembly is only transferred over the wire from the Owner to the
Manager only once
3) The assembly is only transferred over the wire from the Manager to each
Executor only once
4) The assembly is never transferred to an Executor that never executes a
thread having the assembly as a dependency

 It seems clear that some sort of storage mechanism is
> necessary. But do we go for a (embedded) DB?

We basically want to model our Alchemi-only assembly storage like the .NET
GAC.  Call it the Alchemi GAC (or AGAC for short).
I think that it could just be a folder (like c:\program
files\alchemi\{manager,executor}\agac\)

 Do we need for the
> executor to remember across restarts?

Yes, I think we should definately remember threads.  If we just have a
folder with dll's and exe's in it, this should be easy.

 If we have a restart and the
> executor has to reflect all the dlls in his cache to get what he has
> got... if we can use a db (like db4o, for example), then this info can
> be stored. But is this a big deal?

I don't think this will be a big deal.  We can use ReflectionOnlyLoad.
Perhaps we could just add some methods to the IExecutor and IManager
interfaces so that there are things like
bool HasAssembly(string assemblyFullName)
void AddAssemblyToCache(string assemblyFullName, byte[] data)
void RemoveAssemblyFromCache(string assemblyFullName)
List<string> GetAssembliesInCache()
etc

 Should we keep the dlls loaded?

Definately not -- especially if there are quite a few assemblies this could
get out of hand.

Will this work with the appdomains?

Yeah, an assembly can be loaded into an AppDomain, and the only way to
unload it is to unload the whole AppDomain.  This is the sandbox.  So we'd
just need to copy the DLL from the AGAC to the executor thread's private
folder while the thread is running.

 We would have a potential
> situation where 5 meg+ of dlls (yes, they are definitely doing
> cleanup, and no, I didn't write any of the code!) would be needed for
> each thread, so even copying locally is something to be avoided.

Yeah, I have some pretty large DLLs as well, but I don't think copying
locally will be that big of a slowdown at all.  Also, we need to do the
local copy because when we create the sandbox, we create an AppDomain to run
in, and for the deserialization of the objects to work, the assemblies
containing those types need to be found, and the easiest way to do that is
to set the CodeBase property of the AppDomain to the current directory of
the thread that is executing (this is how it's done now).  So basically
instead of serializing all of the byte[] arrays containing the
thread's ModuleDependencies, we just instead have a string[] array of the
full names of the assemblies that we copy from the AGAC to the thread's
current working directory.

I definitely think that the Manager should store the dlls in the DB.

Maybe, maybe not. Need to think about that....

Anyway, my first idea was the following, which would probably require
> reimplementing the relation between ModuleDependency and
> EmbeddedFileDependency as one of has-a instead of is-a (sorry, I
> forget the terms :-)).
>
> 1. application starts, basically as is.
> 2. application adds module dependencies and starts.
> 3. In GApplication Init(), application says "My app needs this".
> 4. Manager replies with the modules he needs.
> 5. app loads file dependencies and sends off (a couple of extra lines
> in init()).
> 6. Manager does his usual thing.
> 7. Manager schedules a thread on an executor, sending the new module
> dependencies.
> 8. Executor replies with a "well you had better send x, y, z then!".
> 9. Manager loads the EmbeddedFileDependency's that the executor needs
> in the module dependencies.

Hmmmm.
I think we're thinking on the same lines, though I would phrase the
interaction like this:
1. A GApplication is created with a number of GThread-derived classes, just
like it is now
2. ModuleDependencies are added that point to a specific assembly on disk.
Instead of serializing these DLLs to a byte[] array, though, only the
fully-qualified name is added to the list.
3. When the GApplication is started, it sends the manager a list of the full
names of all of the assemblies it needs, and the manager replies back with
the list of assemblies that it doesn't have
4. If there are some assemblies that the manager doesn't have yet, then the
owner sends them to the manager, and the manager puts them in its AGAC
(whether it's just a folder or in the DB)
5. Manager does its usual thing and schedules threads to executors
6. Like the owner-manager interaction, the manager-executor interaction is
the same -- when a thread is scheduled to an executor, the manager sends
along a list of the full names of the assemblies it needs.  The executor
checks its AGAC and replies back with the list of assemblies that it needs
from the manager's AGAC.
7.  If there are some assemblies that the executor doesn't have yet, then
the manager sends them to the executor, and the executor puts them in its
AGAC.
8.  Before the executor starts the thread in its sandboxed AppDomain, it
copies all of the DLLs (given the list of their full names) from the AGAC to
the thread's working directory.

Maybe that helps? :)  What does everyone think?

-Matt
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Alchemi-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-developers

Re: [Alchemi-developers] ideas for a dll cache

Reply via email to