Re: Harmonizing on modularity
Hi Renaud, Renaud BECHADE wrote: > >I think this discussion soon gets into a java language/system debate, > >because one could argue why we need to do this tight bundling between > >the bunch of classes in rt.jar and the vm version. For instance: Why > do > >I have to wait for JVM 6 to fix that bug in Swing, which I need now in > >my implementation. On the other hand this "expected behavior" is what > >makes Java very appealing to integrators. > > You are kind of pinpointing subtle incompatibilities that /will/ exist and > require some packaging effort to get users well... use the VM with ergonomic > perception ("it just works"). If we consider an OS parabola, this is just > like FreeBSD vs. Linux: both are POSIX and you should not patch code to have > it run on the other (you can even run Linux ELF code on FreeBSD), but in > practice some adaptations are required (for instance on my machines the > Sun-Linux-JDK crashes with a great facility...). Hence FreeBSD has its own > ports system. Sure these are minor issues. But on the other hand Java is more than an operating system. In current operating systems, the developer works with 3rd party libs most of the time. The only direct access to posix stuff is through core libraries which don't change much. The java runtime packages many things - from java.lang.String to Swing to Xalan and Xerces. So I wouldn't say that rt.jar makes the java runtime - rt.jar is just a packaging model suitable to Sun. One could take other directions towards that. But to remain copatible (and thus be competitive) one has to make sure that it works like in rt.jar. Java's way of binary compatibility makes this easier than it is in C/C++ for instance. -- Jakob
Re: Harmonizing on modularity
Hi Doug, thanks for joining the discussion. Doug Lea wrote: > > No matter whether you think you are starting with a JVM written in > Java or a micro-kernel-ish one in C (which seem to be the leading > options), you will probably discover that you end up writing most of > it in Java. I think that a SystemJava dialect, like in Jikes (where the compiler does the magic) is very interesting. I was and am a big Self/Smalltalk fan, where this debate is much larger, but I also think writing the core in a C/C++ can give some interesting points too. Some kind of language argnosticism and the ability to export some features more easily to non java systems, and I think when it comes to Address manipulation, C is much more handy. [this is just my opinion] As I said the most interesting part is that of a lower level common infrastructure a) to extend the vm more safely b) to build bridges to other runtimes (parley) c) to make pluggable extensions. Implementing a middle layer in Java makes much sense to me. > For just about every major subsystem, you will find that > some of it has to be in Java anyway, I would agree with you, but I wouldn't call it Java here. Because that is mostly a very restricted sub-dialect of Java. ("No" invocation costs of methods, etc). > and most of it turns out to be > better to write in Java. Steve Blackburn mentioned some of the > technical issues with respect to GC. Similar ones arise with > concurrency support, reflection, IO, verification, code generation, > and so on, as has been discovered by people developing both > research and commercial JVMs. What your results from the OVM project? Probably true. But I would favor: * System langauge for the very low level infrastructure and to do express raw access (pointers, explicit memory layout) to objects for high performance very rare use. * Special Java (JikesRVM like) to have a low level interface, which abstracts from much very low level stuff (implemented on top/in terms of that) -> GC details Perhaps even restrict much of Java's dynamic behavior for that. Having multiple linkages would be interesting for that (implemented via attributes for instance). * Java for much of the rest AFAIK working with LLVM is also a joy (it is written in C++) and this has many advantages too. > > One of the challenges here is that the Java programming language > curently does not make it possible to distinguish those classes > sitting in the JRE library (usually, the stuff in "rt.jar") that are > logically part of the JVM vs the normal APIs that normal Java > programmers are supposed to use. (Although most existing JVMs > dynamically enforce inaccessiblity of some APIs by exploiting rules > about bootclasspaths in some special cases, but this is not a general > solution.) Good point. I would see it abit otherwise. One has to question what classes are "system" on whose not. rt.jar is kind of grown out of suns implementation. I think they have chosen this big rt.jar because you can mmap it and you have access to all the "core" Java apis faster. I thinkk one should make a distinction between classes which are closer to the vm (for instance AtomcXxx would be of that kind) and others not (xerces,xalan classes). rt.jar is much too coarse grained for that. But making system classes cross vm implies that there exists some special API for that, which is probably hard to achieve (as you said below). But having such system classes would at least make explict treatment of special classes much more easy. I think this discussion soon gets into a java language/system debate, because one could argue why we need to do this tight bundling between the bunch of classes in rt.jar and the vm version. For instance: Why do I have to wait for JVM 6 to fix that bug in Swing, which I need now in my implementation. On the other hand this "expected behavior" is what makes Java very appealing to integrators. > > Independently, there has been a lot of discussion lately of possible > language enhancements resulting in some kind of module support for > J2SE 7.0 (Yes, the one after Mustang) to provide a more general > solution to the need for semi-private interfaces among subsystem-level > components, as well as for similar issues that arise in layered > middleware, as well as higher-level escape-from-jar-hell issues. A powerful friend like approach would be very cool. I would like to join the discussion on that. Some > people would like to see a first class module system (see for example > MJ http://www.research.ibm.com/people/d/dgrove/papers/oopsla03.html, > as well as similar work at Utah http://www.cs.utah.edu/flux/) Some > people would like something more like a saner version of C++ > "friends". I don't think that any of these have been subjected to > enough thought and empirical experience to make a decision about which > way to go. :-) - I have started a modjava project at sf.net (modjava.sf.net) some time ago, but doing it without the power of th
Re: timeframe for mid-level decissions
hi Geir, Geir Magnusson Jr. wrote: > > On May 19, 2005, at 8:18 AM, Jakob Praher wrote: >> Geir Magnusson Jr. wrote: >>> On May 19, 2005, at 5:24 AM, Jakob Praher wrote: >>> >>> Both of these are conventional expectations, and we can meet this via >>> pluggability, right? >>> If you have for instance completly different object layouts, caching mechanisms, method lookup, then pluggablity becomes difficult. We are talking about high performance stuff - so it should't get too much wrapping - except for #ifdefs (also jikes rvm has this sort of stuff in java :-)). One thing that should be in common are the intermediate representations - see below. What would be interesting is to implement method lookup based on hashing (java signatures are strings as well), with inline caching based on hashes (not vtables) and then compare the result with a vtable approach, which in Java must also be built at load time. The self/smalltalk/... community has proven that this can be done quite efficient and gives you a whole lot of flexibility - speaking of "hot swapping" for instance, or implementing scripting langauge on top of it. >>> >> Depends on the divergence of the 2 systems. If for instance you have 2 >> vms (eg. one in jave for server and one in c/c++ for client then IMHO it >> would be better to make two separate projects). These two vms wouldn't >> have much in common. > > That's sort of true. :) > > I'd agree that they would be separate efforts, but there's no reason > why they couldn't be in the same Apache Harmony community. For > example, look at the Apache Logging communty- there are alternate > language implementations of the same functionality. > > I'd want to keep things close, as we do want to be able to share things. > Yes thats sort of clear. Since its likely that harmony is becoming top-level project, these projects should definitely go under the "harmony roof". What I was trying to say with LLVM is that you could implement exactly that. You have a spec, a MIR, ... . So you implement one VM/Compiler in Java and the other in C/C++ - you get the advantage of sticking to the same specs and could also reuse some of the portions - for instance the dynamic compiler, after the C implementation has bootstrapped the VM and set up the execution environment, it can compile itself using in Java. This also works with other MIRs. But to make sure: Build on something that was already specified nicely (like LLVM for instance) and extend that specs to fix things. (More evolutionary). I think starting with a whole new IR spec and stuff would be much work. >>> >> Ok. It's only for me to get an understanding of the projects identity. >> Yes thats probably true especially since we - as opposed to the closed >> source vms - don't have business interest in keeping secrets about inner >> layouts. > > > Right - we're forming our identity as we do this. Patience :) > :-) > > > I want to use APR to *implement* our platform interfacing layer, but I > have no idea if APR is the right *definition* of the API for OS > interfacing. I'd rather not presume an API until we understand what's > required by the VM. > Ok. APR is quite interesting as a low level abstraction layer. And its favourable because it is a very low layer. Too much abstraction from the OS is probably more problematic then having some modules implemented for different OSes clearly. > > > Much more :) > Yes, much more is in my head. counter question: What's your favorable VM technology - I've heard on the list that a VM might be contributed - what should that mean? --Jakob
Re: timeframe for mid-level decissions
hi Tom, Tom Tromey wrote: >>>>>>"Jakob" == Jakob Praher <[EMAIL PROTECTED]> writes: > > > Jakob> do we want to build something that competes with sun j2se/mono on the > Jakob> desktop side (gnome/redhat would be interested in that) > > I don't speak for Red Hat, but I can explain a little about why we > ship gcj and not other VMs. In addition to all our in house history > with gcc and gcj, it basically boils down to 3 things: > > 1. Platform coverage. The solution has to work on at least whatever > platforms Fedora Core and RHEL work on. > > 2. Performance. The result has to be reasonable competitive > performance wise. E.g., starting eclipse has to be reasonable both in > time and space. How are doing with gcj in this direction? > > 3. Debugging. There has to be some debugger story. > > Harmony would have to excel on all of these before I would even > consider, say, recommending it for FC. > These are important points. I think that the 2 platforms should be interoperable. So probably many installations will use gcj soon.(at least if its the default for fedora) Especially for UI stuff it would be really interesting to get interoperability right. IMHO harmony should be ably to understand the gcj abi and the so cache, which is right now at the very heart of gcj. Perhaps a second implementation of that would lead to more specification which would have positive results on gcj too. -- Jakob
Re: timeframe for mid-level decissions
hi David, thanks for pointing that out. I haven't looked into the application but, some notes from my side. David Griffiths wrote: > From the llvm web site: "LLVM does not currently support garbage > collection of multi-threaded programs or GC-safe points other than > function calls, but these will be added in the future as there is > interest." I would imagine that's quite a lot of work. > First of all that is an issue regarding the existing llvm garbage collector implementation not with the specification/bytecode stuff. So its more or less an implementation detail. AFAIK: GC-safe points are important for stop-the-world algorithms. There are some papers around for implementing patching/polling to advance another thread to a safe point (the one that triggers GC is already at a safe point) = A safe point is a distinct point where GC is allowed or "save" to happen. So the problem really is multi-threaded apps (since safe points don't matter for single threaded apps). The question is why they are facing problems. 1) allocation is in principle the GC trigger safepoints could be basic blocks that don't contain execution calls or allocations. 2) Other problems with GC and multi threaded apps will have to be dealt with (I don't think they are because the garbage collector is not thread safe in its own respect - Such a thing could be simply worked around). I will try to get into the details. Given the amount of good papers and work in that area I don't think that it is a too big problem to get worked out. -- Jakob
Re: timeframe for mid-level decissions
I've put some corrections in, so that its more understandable. Jakob Praher wrote: > Geir Magnusson Jr. wrote: > >>On May 19, 2005, at 5:24 AM, Jakob Praher wrote: > >> >>I don't understand > > Take classpath project. It aims at working accross open vms. So you have > to build a glue layer between what is intrinsic to the vm and what is > expressable in plain java. Classpath has VM classes which are > implemented by the different VMs (mostly public static and often native > stuff), that gets called from within Classpath, so per VM you only have > to reimplement the VM classes. It has pros and cons, if you proceed like > this. For Classpath its the only way to meet the goal of implementing a > whole runtime without knowing exact details of the VM. > On the other hand > projects like gcj, which have c++ abi for performance reason for most of > the core classes (I mean the really core stuff) - they currently have > libjava - where the whole classpath - but also would like to use the > classpath java classes without importing them into libjava (for code > management and stuff) where appropriate. To state it more clearly: GCJ has a "copy" of classpath in its own tree. In former classpath and gcj haven't worked that closely toghether. Most of the libjava classes are 1:1 copies of the classpath classes. But for some high performance classes / low level classes the gcj team rewrites them. This is a problem if they want to merge the whole classpath tree in. Why did I mention that: This is an example of the VMXxx classes to be to high level - a faster implementation can be done if using CNI (the GCJ native interface that uses C++ ABI). -- Jakob
Re: timeframe for mid-level decissions
Geir Magnusson Jr. wrote: > > On May 19, 2005, at 5:24 AM, Jakob Praher wrote: >> >> -> do we want to concentrate on the server side (jikes rvm would >> probably be fine for that) - for instance: no startup issues >> >> -> do we want to build something that competes with sun j2se/mono on the >> desktop side (gnome/redhat would be interested in that) > > > Both of these are conventional expectations, and we can meet this via > pluggability, right? > Depends on the divergence of the 2 systems. If for instance you have 2 vms (eg. one in jave for server and one in c/c++ for client then IMHO it would be better to make two separate projects). These two vms wouldn't have much in common. >> >> -> do we want to have different projects for different tasks (is that >> effordable now - what is harmony then - a meta project?) > > > Not now - right now, I think we stick close together until we start > getting big. This *will* get big, but I think that the structure > should be driven over time. > This sounds promising :-). Sticking close together is what I'd like to see. Don't get too diverse now. >> >> -> are the java specs anough for vm interoperability or should we add >> yet another interoperability layer between runtimes? > > I don't understand Take classpath project. It aims at working accross open vms. So you have to build a glue layer between what is intrinsic to the vm and what is expressable in plain java. Classpath has VM classes which are implemented by the different VMs (mostly public static and often native stuff), that gets called from within Classpath, so per VM you only have to reimplement the VM classes. It has pros and cons, if you proceed like this. For Classpath its the only way to meet the goal of implementing a whole runtime without knowing exact details of the VM. On the other hand projects like gcj, which have c++ abi for performance reason for most of the core classes (I mean the really core stuff) - they currently have libjava - where the whole classpath - but also would like to use the classpath java classes without importing them into libjava (for code management and stuff) where appropriate. Appropriate here means that the gcj project makes the trade of towards performance (at least that was the result of FOSDEMs discussion) if they have to decide. >> >> -> should we just be a forum for vm implementors and should we specify >> cross vm stuff (like the gnu.gcj.RawData class) in terms of enhancement >> requests? > > > I think that we clearly want to implement, but it's not a bad place for > "enhancement", as long as we are clear that enhancement doesn't mean > distortion of the standard, or "extending". > Ok. It's only for me to get an understanding of the projects identity. Yes thats probably true especially since we - as opposed to the closed source vms - don't have business interest in keeping secrets about inner layouts. >> >> I think the best projects (in that area) are those, that have a special >> goal and don't want to be all things to all people. I don't know which >> way harmony is going here. > > I think we have a goal, and I do think it's important that we hear > about alternative paths to get there. I do agree that such discussions > can't go on forever, and hence the pushing to start looking at some of > the existing VMs (both in C and Java). I'm really hoping we can focus > a little on that, how we can find ways to couple cleanly to GNU > Classpath, etc. > That's true. Perhaps I'm a little too eager - I have seen a lot of projects in the past. >> >> You might disagree here - but i think that this project is a bit >> different from other apache projects. Many things completly depend on >> the initial decisions. So I don't want to see all people waiting for >> some technical decision to take place and thus deadlock their efforts. >> At the same the possibily matrix is so huge that you can't take into >> account every project thats going on. So again: Make some decissions in >> the next months and go for that. Sure the project is in its infancy - I >> don't want to push too much. > > > Right. Where do you stand on current VMs to look at or language? As I've posted, I think we should stick close to the indiviual target platforms (Unix,Win32,...) and build on stuff that works quite well (no java in java stuff). In recent time, I am a big fan of the LLVM project, since it is a very well defined base line. And I think that having a good foundation, like: -> MIR (intermediate representation to do optimizations in) -> Bytecode format to store op
Re: timeframe for mid-level decissions
Hi Leo, Leo Simons wrote: > Hi Jakob! > > On 18-05-2005 22:29, "Jakob Praher" <[EMAIL PROTECTED]> wrote: > >>When do you want the first Harmony J2SE alpha snapshots to reach the masses? > > > "when they're ready" I think that the psychological aspect of having actual some milestones/deadlines helps sort out some of the long-term stuff from the short term hacking. Especially in VM technology there is so much great stuff out there - so many projects to build on - that you might never reach a point of concensus, if you only ship when everyone is happy with it. IMHO thinking about this "big picture" stuff helps sort out some infrastructure decisions: -> do we want to concentrate on the server side (jikes rvm would probably be fine for that) - for instance: no startup issues -> do we want to build something that competes with sun j2se/mono on the desktop side (gnome/redhat would be interested in that) -> do we want to have different projects for different tasks (is that effordable now - what is harmony then - a meta project?) -> are the java specs anough for vm interoperability or should we add yet another interoperability layer between runtimes? -> should we just be a forum for vm implementors and should we specify cross vm stuff (like the gnu.gcj.RawData class) in terms of enhancement requests? -> how much manpower is available in the early stages - that helps to clarify how broad the first aim would be? Questions like that are essential for establishing the projects identity. I think the best projects (in that area) are those, that have a special goal and don't want to be all things to all people. I don't know which way harmony is going here. You might disagree here - but i think that this project is a bit different from other apache projects. Many things completly depend on the initial decisions. So I don't want to see all people waiting for some technical decision to take place and thus deadlock their efforts. At the same the possibily matrix is so huge that you can't take into account every project thats going on. So again: Make some decissions in the next months and go for that. Sure the project is in its infancy - I don't want to push too much. > > >>But to be clear - the message from my side: fix decision deadlines and >>stick to them. build something, ship it early. This is the only way to make >>*) testers/developers aware >>*) gain ground in the VM cake > > > Seriously, that works well when you can actually do resource planning. In > volunteer efforts like this, we don't know what resources we have (esp. Not > the human resources, ie developer hours to spend) so we can't have a > deadline. Ok. I don't meant that everybody should work for 24 hours to meet deadlines. I see deadlines merely as some meta guidlines. For people to orientate - for instance if I work on some big topics I soon get lost, when I don't have any organisational structuring. This should help people - not force them to work! > > Do you know any open source project friendly to volunteer participation that > is able to fix then keep any kind of deadline? We don't do that at apache, > it causes stress :) Ok. I don't want to cause but rather relieve stress by decision making, structuring and creating a project identity. bye, -- Jakob
Re: timeframe for mid-level decissions
Hi Renaud, thanks for your response on the technical side. See my coments inline. Renaud BECHADE wrote: > > >* Stay close to the operating system - GCJ,LLVM,... - See > > http://www.research.ibm.com/vee04/Boehm.pdf > > Especially as GCJ and LLVM (especially LLVM) are relatively fast, NOW. > (I remember comparing some floating point test between gcc -O 10^8 and > llvm-gcc + llvi. The llvm was damn fast on my machine - I think I had a 20% > or 30% improvement, which is indeed quite good - so good as say the Intel > compiler or so on Intel machines. That was a pity Objective-C was not > available :-( ) Yeah its pretty agressive. Objective C is something which is very intersting too - since they (apple) have one of the fastest dynamic dispatch algorithms based on caching [http://www.mulle-kybernetik.com/artikel/Optimization/opti-9.html]. > > >As a side node: > >Also the LLVM guys are heavily thinking about implementing many > >intrinsics based on APR and other core libraries. Sure there are many > >things which have to be built yet (garbage collecter, memory model > >(which is very low level right now) ...) but a llvm-java frontend is > >arleady being worked on. > > Since GCJ and GCC share the same code generator, and since GCC code > generator has been ported to LLVM (with may-be heavy changes, though), why > not just make sure a GCJ-LLVM-with-jit-plugin is developed, once GCJ gets > some minimal support for java5 bytecode ("all-except-reflection")? There are some things I'm working on as a side project for now. LLVM for instance uses a different exception handling mechanism then GCC. But one could work around this because LLVM has a very high level view of exceptions (which is quite nice) - every invoke instruction has an implicit catch target. So the way exceptions are actually implemented can be user specific. I personally see LLVM as a MIR but with the advantage of beeing well specified. Jikes also has this sort of IR - but in LLVM you can see the MIR as a kind of execution engine since it supports a type system and has a well carried out binary representation. The binary representation would help in many things for instance one could store optimizations in the llvm bytecode format and reuse them later. >From this side I'm also interesting in the c-- project which comes form the ML/OCAML people. It is a very low level c representation, which features continuations,exceptions, ... . [I haven't had time to look through it carefully] > So far as I know, LLVM enables or is scheduled to enable at-runtime LLVM > bytecode generation (used for Ruby and Scheme interpreters I think[1]), so > that we could have a cool JIT without the burden of actually generating > low-level machine code (kind of a write once, run everywhere JIT). So you mean a JIT that generates LLVM bytecode - please tell me more on that. Actually there is a interpreter and a jit in the cvs tree. The JIt (called JELLO) is special since it works with the target machine code as if it was in SSA form. I don't know how good/bad this is practice, I've just read the paper from 2002. You can come a long way with the interpreter alone since it is very low levelish - many of the optimizations is already done in in the bytecode - which is not possible for HIRs like Java - and the registers are in SSA form so you have superb data flow information available at no cost. > This kind of plan could also have the theoretical advantage of being able to > release early and on a very big bunch of architectures code that /actually/ > works, and is kind-of used by default by many, many, many people (*BSD > users, embedded systems, linux hackers, and so on). Yes thats true. Such a system would be very interesting, since you can extend the VM with LLVM bytecodes and reuse many aspects. There is also a new wave of building secure systems based on Proof Carrying Codes. (PCC). Since LLVM has typed instruction set and needs explicit casts to do type conversion (cast instruction) - you can say whether a piece of code is type safe or not. I don't know how hard it would be to add proof carrying codes into the LLVM. But if that would be enabled, a whole new interesting possibilties would arise. (extending the VM by PCC based LLVM instructions). [http://www.cs.nott.ac.uk/~fxr/papers/2004_scp/franz_scp.pdf] -- Jakob
timeframe for mid-level decissions
hi people, first of all it's great to hear so much harmony for open source java! Ever since I'm working with open source java software, I was dreaming of a great open source java vm movement. Now given the amount of great people involved here sounds it would truly be possible. Now the reason why I write this mail, is that I'd like to see some actual timelines put up and I would like to see a separation of the philosophical (long-term) issues from implementation issues (tasks that can be started assap). The central point is: When do you want the first Harmony J2SE alpha snapshots to reach the masses? >From this central question, I think the tasks which are of highest priority: -> Which VM(s) should be chosen? [this will be the hardest decision] -> Should there be a compatibilty Layer between VMs (like the Classpath VM classes) IMHO it would be important to take a multi-layer approach: A) A short term project (ship it early and improve it along the way) B) A mid term project (use the huge amount of knowledge which is out there in the vm projects and build on that) Some "advices": * First try to be conservative and build a VM based on what works now, and try to gain people by actually having a VM that works quite good. [ I don't think that building a VM in java is good for achieving that results for now ] * Stay close to the operating system - GCJ,LLVM,... - See http://www.research.ibm.com/vee04/Boehm.pdf I am currently working with LLVM as part of my diploma thesis and really like it for its open possibilties (for other languages too) and for its typed ssa risk-like ir specification. The compiler implementation is also quite nice and it would give enough room to implement many features and different runtimes [http://www.research.ibm.com/vee04/Grove.pdf] . As a side node: Also the LLVM guys are heavily thinking about implementing many intrinsics based on APR and other core libraries. Sure there are many things which have to be built yet (garbage collecter, memory model (which is very low level right now) ...) but a llvm-java frontend is arleady being worked on. But to be clear - the message from my side: fix decision deadlines and stick to them. build something, ship it early. This is the only way to make *) testers/developers aware *) gain ground in the VM cake just some thoughts form my side. -- Jakob