I don't have enough experience in this field to comment very sensibly (or offer to mentor), but I wanted to say that I think it would be really nice to have a broader selection of specialised persistent data structures available for Clojure.
On 12 March 2014 09:56, Matteo Ceccarello <matteo.ceccare...@gmail.com> wrote: > The student application opened, so I'm going to submit my proposal. > Does anyone have suggestions or requests? > > Cheers, > Matteo > > Il giorno lunedì 3 marzo 2014 15:36:43 UTC+1, Matteo Ceccarello ha scritto: >> >> Hello everybody, >> >> I'm Matteo Ceccarello, a PhD student in Computer Engineering from >> university of >> Padova, Italy. Recently, I've started working with Clojure, a language >> that I >> find both powerful and fun. I've participated to GSoC 2013 and 2012 with >> JPF. I'm interested in >> participating to GSoC 2014 with Clojure, in order to gain confidence with >> clojure under the guidance of a mentor while contributing to the >> community. I'll >> describe what I'd like to do below. >> >> Motivation >> >> Currently, alongside my research on parallel graph algorithms, I've >> started >> writing a web crawler in Clojure. The reasons I want to do this in clojure >> are >> many: >> >> The web crawling software we are using >> (Heritrix) performs blocking IO. I think that in >> big crawls this is a showstopper, since to achieve a good throughput one >> must >> use many Java threads (in the order of 500) that end up spending most of >> their >> time performing blocking waits. >> Clojure has the wonderful core.async library, with all the magic that go >> blocks and channels bring. >> The http-kit library makes possible to perform async requests to web >> servers in a very simple way. >> Clojure has an awesome support for concurrency. >> >> What I want to achieve eventually is a concurrent asynchronous web >> crawler, >> without a single blocking call. >> >> Since I'm new to the language, before starting to get some real work done, >> I've >> worked on a few small projects, to get a feel of the language. What I >> learned is >> that everything you put in a ref or atom must be immutable. A fundamental >> component of a web crawler is a data structure that tells you if you have >> already visited a URL. Bloom filters are a common choice for the >> implementation >> of this component. Since this bloom filter will be put in a ref, I want it >> to >> be immutable and, for efficiency reasons, persistent. After some research >> on the >> web, I found out that an immutable persistent implementation of the bloom >> filter >> is still missing from the Clojure ecosystem. >> >> Since it seems that Clojure is missing a persistent implementation of many >> probabilistic data structures, I came up with the following idea. >> >> Persistent probabilistic data structures for Clojure >> >> Clojure seems to miss persistent implementations of many useful >> probabilistic >> data structures, in particular: >> >> Bloom filters >> Counting Bloom filters >> Compact approximators >> HyperLogLog counters >> Count-min sketches >> >> Of course one can use one of the various implementations of these data >> structures for >> Java, however, being these implementations mutable, they cannot be used >> in idiomatic concurrent clojure code (as for my understanding of idiomatic >> concurrent clojure code). >> >> What I propose to realize is an optimized persistent implementation of >> these >> libraries. Hence I plan to explore different paths using >> benchmarks as a guide, >> for instance to decide whether it's more convenient to use standard >> clojure vectors >> to represent bit vectors or to provide a custom persistent >> bit vector implementation. >> >> Since I like the feeling of having a static type checker that >> prevents common errors, the library will be annotated with >> Typed Clojure. Moreover I find interesting the idea >> of using tests as machine checkable documentation, and >> midje-doc seems the right tool for the job. >> >> Is there someone interested in mentoring me with this project? >> >> Yours sincerely >> Matteo > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.