Re: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core]
Hi, However, I'm a bit concerned about the revolutionary approach of the SPI effort. Rather than refactoring the Jackrabbit core to better separate the session-local parts, the SPI comes up with a brand new interface contract. This is probably the best thing to do given the SPI goals, but it does leave the big question of how and when are we going to integrate it with Jackrabbit core unanswered. I think you are right. Just to be clear, I do not look at the architecture suggested or hinted by the SPI to be implemented in a the very near future in a clean way. My original intention of this thread really was to stimulate some discussions around a possible Jackrabbit 2.x architecture. As of now the easiest way I see to integrate the SPI effort with the Jackrabbit core is through a generic spi2jcr adapter, but that doesn't really affect the core design or increase code re-use. I agree. As a side note: I even see value for an spi2jcr adapter beyond the mid-term goal of a better remoting for the current Jackrabbit core. I think that a spi2jcr adaptor (in conjunction with the jcr2spi-client and the protocol bindings of the SPI) could serve as a general purpose remoting layer for any JCR compliant repository. Generally, I think we could also look at a phased approach, that allows us to test, evolve and mature the components individually. I think we could also do something like: (Step1) Isolate the session-local parts into a standalone client (JCR2SPI) (Step2) Build the SPI2JCR layer that exposes the current Jackrabbit impl to the SPI (Step3) Refactor the Jackrabbit core to natively implement the SPI Thoughts? What would a more deeply integrated spi2jackrabbit component look like, and how would we implement it in the core? I am not sure how that would look like and I guess that this would be subject to some investigations. It may well be that some portions would benefit from being refactored to work efficiently. And on the other hand, how can the SPI effort better reuse the experience built into the session-local parts of Jackrabbit core? For example looking at the SessionImpl implementations from both jcr2spi and the core, I see quite a lot of duplicate functionality. How does the SPI make sure that the lessons learned developing the core are included in the new codebase? I agree. I think the lessons learned should be transported through the Jackrabbit Community and its experience with JCR and Jackrabbit over the past years. Of course I would also prefer to re-use as much existing and well tested code as possible. But personally I think we should not make architectural sacrifices at this point. I believe that the overlap and the redundance of the code between the session-local parts and the core are rooted on the original compact and intertwined design. Do you think we would see the same overlap if we would basically have a straight-up SPI implementation (on the server-side) more or less from scratch and strictly separate the session-local parts? regards, david
Re: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core]
Hi, On 9/7/06, David Nuescheler [EMAIL PROTECTED] wrote: In my mind the introduction of the SPI would lead to a clean split of the Jackrabbit core architecture that allows for much better re-use and better transparency. Essentially the core could be reduced to the server which should siginifcantly reduce the complexity. I very much agree with the benefits of the SPI approach, especially for remoting and re-use. However, I'm a bit concerned about the revolutionary approach of the SPI effort. Rather than refactoring the Jackrabbit core to better separate the session-local parts, the SPI comes up with a brand new interface contract. This is probably the best thing to do given the SPI goals, but it does leave the big question of how and when are we going to integrate it with Jackrabbit core unanswered. As of now the easiest way I see to integrate the SPI effort with the Jackrabbit core is through a generic spi2jcr adapter, but that doesn't really affect the core design or increase code re-use. What would a more deeply integrated spi2jackrabbit component look like, and how would we implement it in the core? And on the other hand, how can the SPI effort better reuse the experience built into the session-local parts of Jackrabbit core? For example looking at the SessionImpl implementations from both jcr2spi and the core, I see quite a lot of duplicate functionality. How does the SPI make sure that the lessons learned developing the core are included in the new codebase? BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development
Re: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core]
Hi, I think it's a good thing to do. Some random ideas: I don't understand why it needs to be stateless (about my understanding of stateless, see http://en.wikipedia.org/wiki/Stateless_server). As far as I see stateless means it's slower, and I really don't like slow ;-) Even HTTP is becoming more and more stateful to improve performance I guess. Maybe Roy could give his view about stateless versus stateful. I know there are some other advantages / disadvantages. Maybe we actually need two 'standards': a stateful binary protocol, and a stateless SOAP-style protocol. TCP/IP is the most common 'fast' protocol. I suggest to standardize the binary wire level protocol as well (details could be done after/independent of the SPI). Requirements of clustering should be kept in mind as well. And I like the graphic! Nice 3D effect! Thomas
RE: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core]
David, Agreed regarding use of client and server as terminology - I think this leads to some fuzzy thinking as the client and the application that uses the client get confused. Could we borrow from JDBC and call them the JCR Driver and JCR Server? That to me gives the right sort of thinking in terms of having a Driver that is used by a client application. The JCR Driver can handle calls to the JCR Server either locally or remotely (via whatever actual transport is required). The JCR Server exposes the SPI as a low-level interface and the JCR Driver translates this into a standards-compliant API. Miro -Original Message- From: David Nuescheler [mailto:[EMAIL PROTECTED] Sent: 07 September 2006 10:00 To: dev@jackrabbit.apache.org Subject: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core] Hi All, I would like to use Jukka's initiative as a starting point to discuss a couple of high level architecture topics around the SPI initiative and its potential effect on the overall Jackrabbit architecture. Please consider all of the following comments as my personal views which I would like to put up for discussion. Nothing should at any point be considered set in stone. I would like to trigger a design discussion on a potential revision of the Jackrabbit architecture. Looking at the current core architecture of Jackrabbit [1] we have a relatively tightly coupled, heavily interdependant, monolithic and compact core architecture, which really has no additional public interface between the JCR API and the Persistence Manager Interface. Over the course of the last months (even years) we encountered new requirements that people wanted to re-use portions of the Jackrabbit core code or that we wanted to support deployment Model 3 [2] in a reasonable way from a network perspective. This lead to the SPI initiative [3] which was first intended to be developed as part of JSR-283, but was decided to be out of scope for this spec. Generally, the idea of the SPI (Service Provider Interface) is to create an interface that separates the transient space (I will call it the client for the lack of a better name) from the persistent portion of a repository (which I will call the server [better naming would be very welcome]) of Jackrabbit. Not only should this allow to better componentize the Jackrabbit core and re-use the client and the server independantly but it also should allow for meaningful remoting. I think a resulting architecture could look something like this [4] showing a clean split into a client or server portion. As mentioned above the introduction of the SPI should allow for meaningful remoting, this involves a somewhat stateless and flat (service oriented) interface that lends itself to remoting. The SPI should also provide an abstraction for people to plug in any remoting layer be it RMI, WebDAV, SOAP or a more efficient binary protocol that is specifically designed for that layer. The SPI should also work without a remoting layer alltogether to still support the deployment models 1 and 2 [5] in an efficient way. Providing such a more suitable remoting layer will also allow people to write clients in non-java environments more easily [6][7] (I am also aware of .NET and Javascript clients that are at early stages of the development) The SPI also has triggered the interest of (commercial) developers who want to implement a JCR layer on top of their existing legacy repository, without having to re-implement all the client portions needed in the transient space. Those developers would look at implementing the SPI (the server) and leverage the common Jackrabbit client to reach JCR compliance quicker. I think this could lead to a more widely used Jackrabbit client and therefore to a very well tested and scalable implementation. In my mind the introduction of the SPI would lead to a clean split of the Jackrabbit core architecture that allows for much better re-use and better transparency. Essentially the core could be reduced to the server which should siginifcantly reduce the complexity. Please let me know what you think, as mentioned before this should be the starting point of an architecture discussion. regards, david [1] http://jackrabbit.apache.org/images/arch/jackrabbit-ism.jpg [2] http://jackrabbit.apache.org/doc/deploy/howto-model3.html [3] http://www.mail-archive.com/dev@jackrabbit.apache.org/msg01496.html [4] http://www.day.com/o.file/spi-arch.jpg?get=0a1a63a2a86ab7041a6bce2e0f55b 4a0 [5] http://jackrabbit.apache.org/doc/deploy.html [6] http://search.cpan.org/~hanenkamp/Java-JCR-0.07/ [7] http://svn.apache.org/repos/asf/jackrabbit/trunk/contrib/phpcr/
Re: Nuclear Fission, Splitting the core: The SPI Effect [was: Improving the accessibility of the Jackrabbit core]
Hi Thomas, Thanks for your thoughtful comment. I don't understand why it needs to be stateless (about my understanding of stateless, see http://en.wikipedia.org/wiki/Stateless_server). As far as I see stateless means it's slower, and I really don't like slow ;-) Even HTTP is becoming more and more stateful to improve performance I guess. Maybe Roy could give his view about stateless versus stateful. I know there are some other advantages / disadvantages. Hmm... I am not sure if I would agree with the generally slower statement, but I completely agree that this touchy topic. I remember a somewhat lengthy verbal discussion revolving around that a similar topic. Some of the legacy repositories that may want to implement the SPI are stateful, which makes it less intuitive for them to implement a completely stateless SPI. I still have not found a completely satisfying solution for that, but somehow it would be great if a well-behaved client could issue something like login() and possibly logout() to indicate to the server that some heavy-weight resources can be disposed. I think I understand that something like that could possibly break the stateless contract, but it could solve a very practical need. I could envison something along the lines of passing something like a token (or a cookie to borrow an HTTP analogy) on the login() call which would be passed back to the server on subsequent calls to help identify the server-session. Of course the server should also be able to work without this token but from a performance perspective would be capable of optimizing the use of some of its resources. What do you think? regards, david
Re: Improving the accessibility of the Jackrabbit core
Hi All, Dave, thanks a lot for your input. . Screenshots or easily downloadable sample app which actually does something with custom node types. the base war download is good, but how far could you go with it. Most open source applications have a contacts application or a phone book, or something similar. something that has a face, like a jsp to view whats in the repository would be great . the wiki has not been updated regularly, either the information is old or not many people go to it . the deployment models - creating a complete tomcat dist, which has the various deployment options running right out of the box would be nice. . a java example to add node types, for example for a phone book, which CRUDs the node types would be nice . maybe a page, which lists the possibilities of applications that could be built with JR will be useful for newbies. I completely agree with you that all of the above are excellent measures that we should be looking at to ease the adoption of new content application developers. I think it is very important that people get things up and running very quickly and are equipped with very good user documentation. Personally, I think we have to separate the concerns though, I think Jukka's initial post was going into the direction of making the internals of the core more accessible to more developers. I think that there are a number of steps that we can take into that direction and I also think that for example the separation eventually provided by the SPI will bring some more architectural clarity. While I agree that we need to have a modular design where people can plug-in their extensions at certain defined interfaces and extension points, I would discourage the idea that every user needs to be able to submit patches to the core. In my mind the core should be very compact and very controlled since it has to be extremely stable and scalable, meaning that there is not really a need to have dozens of developers working on a more smallish core. regards, david
Re: Improving the accessibility of the Jackrabbit core
On 9/6/06, David Nuescheler [EMAIL PROTECTED] wrote: While I agree that we need to have a modular design where people can plug-in their extensions at certain defined interfaces and extension points, I would discourage the idea that every user needs to be able to submit patches to the core. In my mind the core should be very compact and very controlled since it has to be extremely stable and scalable, meaning that there is not really a need to have dozens of developers working on a more smallish core. Hi, My two cents on the subject drawing from my experience on the backup tool. At first Jukka and I wanted to avoid impact on the core for the reasons you mentionned. It turned out we had to eventually update some parts of the core: some functionnalities were simply not there. We minimized the changes (only a few lines)... But they were quite bad (I exposed something that shouldn't). After some rethinking and a few try out, I am back to my initial plan with a few classes added to the core. This example shows the Core is not over in the sense, it lacks some functionnality (for instance in my case a way to import the versions). I think we need to remember JR is still a fairly new project and some use cases have still not been detected. Some functionnalities have not been needed yet for the core contributors but might emerge from other companies/individual (for instance my company would need to extend JR to support our needs). I think discouraging those contributions can be a bad idea: we should encourage them, keep the code and refactor them if necessary. This way both the contributor and the communitu take benefit from it: a new functionnality with a cleaner code. I agree with you though that we should encourage contribution and not update to the core. But we should document the core. In my case, it took me a lot of time the part I needed (I wrote a new UpdatableStateManager since I couldn't figure out how the EventFactory was working). BR Nicolas my blog! http://www.deviant-abstraction.net !!
Re: Improving the accessibility of the Jackrabbit core
Hi, On 9/6/06, David Nuescheler [EMAIL PROTECTED] wrote: Personally, I think we have to separate the concerns though, I think Jukka's initial post was going into the direction of making the internals of the core more accessible to more developers. Correct. In any case, Dave's points are a valuable addition to the feedback I gathered a while ago before the 1.0 release with the issue of streamlining the end-user experience. While I agree that we need to have a modular design where people can plug-in their extensions at certain defined interfaces and extension points, I would discourage the idea that every user needs to be able to submit patches to the core. I'm most concerned about the overhead for people going in trying to trace why Jackrabbit is behaving the way it does in some specific issue. This is often the first step of becoming a contributor, and in my opinion it's currently quite a high step to overcome. In my mind the core should be very compact and very controlled since it has to be extremely stable and scalable, meaning that there is not really a need to have dozens of developers working on a more smallish core. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development
Re: Improving the accessibility of the Jackrabbit core
On 9/6/06, Nicolas [EMAIL PROTECTED] wrote: On 9/6/06, David Nuescheler [EMAIL PROTECTED] wrote: While I agree that we need to have a modular design where people can plug-in their extensions at certain defined interfaces and extension points, I would discourage the idea that every user needs to be able to submit patches to the core. In my mind the core should be very compact and very controlled since it has to be extremely stable and scalable, meaning that there is not really a need to have dozens of developers working on a more smallish core. Hi, My two cents on the subject drawing from my experience on the backup tool. At first Jukka and I wanted to avoid impact on the core for the reasons you mentionned. It turned out we had to eventually update some parts of the core: some functionnalities were simply not there. We minimized the changes (only a few lines)... But they were quite bad (I exposed something that shouldn't). After some rethinking and a few try out, I am back to my initial plan with a few classes added to the core. This example shows the Core is not over in the sense, it lacks some functionnality (for instance in my case a way to import the versions). I think we need to remember JR is still a fairly new project and some use cases have still not been detected. Some functionnalities have not been needed yet for the core contributors but might emerge from other companies/individual (for instance my company would need to extend JR to support our needs). I think discouraging those contributions can be a bad idea: we should encourage them, keep the code and refactor them if necessary. This way both the contributor and the communitu take benefit from it: a new functionnality with a cleaner code. i don't follow your argumentation. why would this lead to cleaner code? cheers stefan I agree with you though that we should encourage contribution and not update to the core. But we should document the core. In my case, it took me a lot of time the part I needed (I wrote a new UpdatableStateManager since I couldn't figure out how the EventFactory was working). BR Nicolas my blog! http://www.deviant-abstraction.net !!
Re: Improving the accessibility of the Jackrabbit core
Hi Nico, Thanks for your mail. I will work on the documentation directly on the wiki (when I can start this task). I will ask a lot of questions *though*. Looking forward to it ;) One precision on the backup tool: it is working (and I am polishing the code that needs to fit in Core). And with my new JR understanding, I plan to start implementing a version 2 in my spare time having hotbackup. Excellent, thanks for all your efforts. I did not mean to imply that the backup tool was not working. If I should have said anything like that, I would like to apologize. regards, david
Re: Improving the accessibility of the Jackrabbit core
Hi, On 9/6/06, David Nuescheler [EMAIL PROTECTED] wrote: Got it. Generally, I am more of a given the right eyeballs, all bugs are shallow type of person to begin with. Perhaps we can find common ground at enough right eyeballs. ;-) If I currently take look at the shallowness of actual core bugs ;) in Jackrabbit I see that the Jackrabbit community has an outstanding bug resolution time. To me this is probably one of the biggest strengths of Jackrabbit and its community. Do you see this as a weakness that needs improvement? Definitely not. :-) What I do see as a weakness is that we rely on a handful of core developers to keep up this level of support when we could better tap the great potential within the community. In fact I'd rather see the core developers spending more time being proactive designing new features and improvements (like improving performance, scalability, etc.) than reactive analyzing user issues when large parts of that work could be distributed. I think in the end it all boils down to matter of priorities and I would be very interested in having a discussion around what we think drives and hinders the Jackrabbit adoption and community today and tomorrow, and therefore what we should focus on. +1 There's already quite a lot of feedback on the adoption part, but that would need to be summarized and analyzed to better focus the efforts. BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development
Re: Improving the accessibility of the Jackrabbit core
On Sep 6, 2006, at 4:14 AM, David Nuescheler wrote: Personally, I believe that for example a restore facility has to be buried deep down in the core and therefore the code has to comply with the high quality requirements that we have for code in the core and for the seasoned Jackrabbit experience of a developer. That is why each of the core developers has veto power over the code. If we want to ensure that every line is adequately reviewed, then ask for the core code to be governed by the RTC (review-then-commit) rule. Note, however, that such a requirement will extend to all commits on that part of the code. In my mind your experience with developing very close to the heart of Jackrabbit should not lead us to opening up the core so inexperienced Jackrabbit developers can contribute, but it should help us realize that we have very high requirements for Jackrabbit developers that make modifications to the core. I don't think you understand. This is an Apache project and anyone can contribute to any part of it. The degree of review we require of those contributions is decided by the PMC (our committers). We can increase the requirements on review of the core code and we can separate compatible and incompatible changes into versioned branches, but we cannot ask of others what we do not accept of ourselves. In my opinion, the core code continues to evolve as people try to do larger and more expressive things with Jackrabbit and apply JCR to real problem sets. We need to welcome that and change things based on their technical merits, not any preconceived notions of how much a person knows about the current (highly opaque) core architecture. Most likely, this will mean simplifying the core by removing or refactoring some of the spaghetti dependencies. One of those things that will change is the degree of extensibility, since that is the heart of any successful open source project and Jackrabbit isn't even halfway there yet. I am sure that others with fresh energy will see new ways to solve the same problem that will not be burdened with the legacy decisions that we made for one reason or another. When those ideas are presented, they will be subject to intense scrutiny and adopted based only on their proven benefits. They will not be judged based on who wrote them or how much time they spent writing the initial core code. Roy
Re: Improving the accessibility of the Jackrabbit core
I will put in my 2c since I did not see many replies to this post and I think addressing this question is very important for any open source project. i have not had much time to play with JR due to other work, so some of this might already be there. . Screenshots or easily downloadable sample app which actually does something with custom node types. the base war download is good, but how far could you go with it. Most open source applications have a contacts application or a phone book, or something similar. something that has a face, like a jsp to view whats in the repository would be great . the wiki has not been updated regularly, either the information is old or not many people go to it . the deployment models - creating a complete tomcat dist, which has the various deployment options running right out of the box would be nice. . a java example to add node types, for example for a phone book, which CRUDs the node types would be nice . maybe a page, which lists the possibilities of applications that could be built with JR will be useful for newbies. just my 2c. Thanks Dave Nicolas [EMAIL PROTECTED] wrote: Hi, I have got familiar with JR codebase in the last few months and follow is based on my experience in the backup tool. The community is really helpful when you need some help but in order to understand the basic concept you need to dig into the code and into the JCR spec. A general documentation might be a good idea: a user one where key concepts are explained (versioning, nodetypes, and so on). We can I think mostly copy/paste from the JCR to the Wiki. We also need I believe some documentations about JR 's internals: how a node is updated what is an ItemState. BR Nico my blog! http://www.deviant-abstraction.net !! - All-new Yahoo! Mail - Fire up a more powerful email and get things done faster.
Improving the accessibility of the Jackrabbit core
Hi, Based on private discussions I'd like to raise the issue of the accessibility of the Jackrabbit core codebase. We have a small number of people who are intimately familiar with the core codebase (see the numbers below), but others find the core hard to navigate and that this drives up the barrier of entry of contributing to Jackrabbit. Please share any good ideas on how we could best lower the barrier. I'm open to all sorts of ideas, like more documentation (javadocs, UML diagrams, architectural descriptions, etc.), scheduled QA sessions on IRC, an informal Jackrabbit workshop during the Hackathon in ApacheCon, etc. I'm also interested in the priorities, i.e. what would give us the most bang for buck in terms of making it easier for people to get familiar with the Jackrabbit core and start contributing. $ svn log src/main/java/org/apache/jackrabbit/core | \ perl -lne '/^r[0-9]+ \| (.*?) \|/ and print $1' | sort | uniq -c | sort -n -r 371 stefan 199 tripod 185 mreutegg 127 jukka 27 dpfister 13 fielding 10 angela 4 fmeschbe 3 edgarpoce 2 sylvain BR, Jukka Zitting -- Yukatan - http://yukatan.fi/ - [EMAIL PROTECTED] Software craftsmanship, JCR consulting, and Java development