> I am a little bit impressed by the lack of action on this topic. I hate to be > "that guy", specially being new here, but it has to be done. > If I've got this right, we have here a chance of developing Gluster even > further, sponsored by Google, with a dedicated programmer for the summer. > In other words, if we play our cards right, we can get a free programmer and > at least a good start/advance on this fantastic.
Welcome, Carlos. I think it's great that you're taking initiative here. However, it's also important to set proper expectations for what a GSoC intern could reasonably be expected to achieve. I've seen some amazing stuff out of GSoC, but if we set the bar too high then we end up with incomplete code and the student doesn't learn much except frustration. GlusterFS consists of 430K lines of code in the core project alone. Most of it's written in a style that is generally hard for newcomers to pick up - both callback-oriented and highly concurrent, often using our own "unique" interpretation of standard concepts. It's also in an area (storage) that is not well taught in most universities. Given those facts and the short duration of GSoC, it's important to focus on projects that don't require deep knowledge of existing code, to keep the learning curve short and productive time correspondingly high. With that in mind, let's look at some of your suggestions. > I think it would be nice to listen to the COMMUNITY (yes, that means YOU), > for either suggestions, or at least a vote. It certainly would have been nice to have you at the community IRC meeting yesterday, at which we discussed release content for 3.6 based on the feature proposals here: http://www.gluster.org/community/documentation/index.php/Planning36 The results are here: http://titanpad.com/glusterfs-3-6-planning > My opinion, being also my vote, in order of PERSONAL preference: > 1) There is a project going on ( https://forge.gluster.org/disperse ), that > consists on re-writing the stripe module on gluster. This is specially > important because it has a HUGE impact on Total Cost of Implementation > (customer side), Total Cost of Ownership, and also matching what the > competition has to offer. Among other things, it would allow gluster to > implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and > would, as far as I understand, allow you to use 3 nodes as a minimum > stripe+replication. This means 25% less money in computer hardware, with > increased data safety/resilience. This was decided as a core feature for 3.6. I'll let Xavier (the feature owner) answer w.r.t. whether there's any part of it that would be appropriate for GSoC. > 2) We have a recurring issue with split-brain solution. There is an entry on > trello asking/suggesting a mechanism that arbitrates this resolution > automatically. I pretty much think this could come together with another > solution that is file replication consistency check. This is also core for 3.6 under the name "policy based split brain resolution": http://www.gluster.org/community/documentation/index.php/Features/pbspbr Implementing this feature requires significant knowledge of AFR, which both causes split brain and would be involved in its repair. Because it's also one of our most complicated components, and the person who just rewrote it won't be around to offer help, I don't think this project *as a whole* would be a good fit for GSoC. On the other hand, there might be specific pieces of the policy implementation (not execution) that would be a good fit. > 3) Accelerator node project. Some storage solutions out there offer an > "accelerator node", which is, in short, a, extra node with a lot of RAM, > eventually fast disks (SSD), and that works like a proxy to the regular > volumes. active chunks of files are moved there, logs (ZIL style) are > recorded on fast media, among other things. There is NO active project for > this, or trello entry, because it is something I started discussing with a > few fellows just a couple of days ago. I thought of starting to play with > RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to do > something more efficient, or at the very least start it, why not ? Looks like somebody has read the Isilon marketing materials. ;) A full production-level implementation of this, with cache consistency and so on, would be a major project. However, a non-consistent prototype good for specific use cases - especially Hadoop, as Jay mentions - would be pretty easy to build. Having a GlusterFS server (for the real clients) also be a GlusterFS client (to the real cluster) is pretty straightforward. Testing performance would also be a significant component of this, and IMO that's something more developers should learn about early in their careers. I encourage you to keep thinking about how this could be turned into a real GSoC proposal. Keep the ideas coming! _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users