Re: [GSoC] Proposal: persistent probabilistic data structures
Thank you Nicola :D Il giorno mercoledì 19 marzo 2014 13:42:31 UTC+1, Nicola Mometto ha scritto: Matteo, When sending the CA adding a project name is optional, you can leave it empty. Usually the name for a contrib project will be chosen in the thread in clojure-dev when proposing the library as a contrib project, see http://dev.clojure.org/display/community/Guidelines+for+Clojure+Contrib+committers for more details Nicola Matteo Ceccarello writes: I just had a short conversation with Ambrose through comments on my proposal. He suggests to offer this library to Clojure contrib. So I have to sign a Clojure CA, however I don't know what project name to put in the form, since the project has yet to start. Any suggestion? What should I do? Thanks Matteo Il giorno venerdì 14 marzo 2014 14:53:44 UTC+1, Matteo Ceccarello ha scritto: Nicola, thank you :) It's really nice to see that I'm not alone! Everybody, I just updated the proposal, following the guidelines suggested by the wiki. Cheers, Matteo Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha scritto: Matteo, best of luck with your proposal, all of those seem like good potential additions to the clojure data-structure landscape. BTW I'm too a fellow UniPD clojure user so you can inc the counter :) Nicola Matteo Ceccarello writes: Hello everybody, I just submitted my proposal for this year's GSoC. I already introduced the topic on this mailing list. Here you can find the complete proposal: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872 So, if you have suggestions on how to improve it, or if you have any request, please tell me :) I'm really looking forward to work with Clojure for this year's GSoC: I want to learn it really well, since my long term goal is to make more and more people at my University use it. Working on an open source project under the guidance of an expert mentor seems to me the best way to achieve this. Bests Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [GSoC] Proposal: persistent probabilistic data structures
I just had a short conversation with Ambrose through comments on my proposal. He suggests to offer this library to Clojure contrib. So I have to sign a Clojure CA, however I don't know what project name to put in the form, since the project has yet to start. Any suggestion? What should I do? Thanks Matteo Il giorno venerdì 14 marzo 2014 14:53:44 UTC+1, Matteo Ceccarello ha scritto: Nicola, thank you :) It's really nice to see that I'm not alone! Everybody, I just updated the proposal, following the guidelines suggested by the wiki. Cheers, Matteo Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha scritto: Matteo, best of luck with your proposal, all of those seem like good potential additions to the clojure data-structure landscape. BTW I'm too a fellow UniPD clojure user so you can inc the counter :) Nicola Matteo Ceccarello writes: Hello everybody, I just submitted my proposal for this year's GSoC. I already introduced the topic on this mailing list. Here you can find the complete proposal: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872 So, if you have suggestions on how to improve it, or if you have any request, please tell me :) I'm really looking forward to work with Clojure for this year's GSoC: I want to learn it really well, since my long term goal is to make more and more people at my University use it. Working on an open source project under the guidance of an expert mentor seems to me the best way to achieve this. Bests Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [GSoC] Proposal: persistent probabilistic data structures
Nicola, thank you :) It's really nice to see that I'm not alone! Everybody, I just updated the proposal, following the guidelines suggested by the wiki. Cheers, Matteo Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha scritto: Matteo, best of luck with your proposal, all of those seem like good potential additions to the clojure data-structure landscape. BTW I'm too a fellow UniPD clojure user so you can inc the counter :) Nicola Matteo Ceccarello writes: Hello everybody, I just submitted my proposal for this year's GSoC. I already introduced the topic on this mailing list. Here you can find the complete proposal: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872 So, if you have suggestions on how to improve it, or if you have any request, please tell me :) I'm really looking forward to work with Clojure for this year's GSoC: I want to learn it really well, since my long term goal is to make more and more people at my University use it. Working on an open source project under the guidance of an expert mentor seems to me the best way to achieve this. Bests Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[GSoC] Proposal: persistent probabilistic data structures
Hello everybody, I just submitted my proposal for this year's GSoC. I already introduced the topic on this mailing list. Here you can find the complete proposal: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872 So, if you have suggestions on how to improve it, or if you have any request, please tell me :) I'm really looking forward to work with Clojure for this year's GSoC: I want to learn it really well, since my long term goal is to make more and more people at my University use it. Working on an open source project under the guidance of an expert mentor seems to me the best way to achieve this. Bests Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Google Summer of Code 2014 proposal
The student application opened, so I'm going to submit my proposal. Does anyone have suggestions or requests? Cheers, Matteo Il giorno lunedì 3 marzo 2014 15:36:43 UTC+1, Matteo Ceccarello ha scritto: Hello everybody, I’m Matteo Ceccarello, a PhD student in Computer Engineering from university of Padova, Italy. Recently, I’ve started working with Clojure, a language that I find both powerful and fun. I’ve participated to GSoC 2013 and 2012 with JPF http://babelfish.arc.nasa.gov/trac/jpf/wiki. I’m interested in participating to GSoC 2014 with Clojure, in order to gain confidence with clojure under the guidance of a mentor while contributing to the community. I’ll describe what I’d like to do below. Motivation Currently, alongside my research on parallel graph algorithms, I’ve started writing a web crawler in Clojure. The reasons I want to do this in clojure are many: - The web crawling software we are using (Heritrix https://crawler.archive.org) performs blocking IO. I think that in big crawls this is a showstopper, since to achieve a good throughput one must use many Java threads (in the order of 500) that end up spending most of their time performing blocking waits. - Clojure has the wonderful core.async library, with all the magic that go blocks and channels bring. - The http-kit library makes possible to perform async requests to web servers in a very simple way. - Clojure has an awesome support for concurrency. What I want to achieve eventually is a concurrent asynchronous web crawler, without a single blocking call. Since I’m new to the language, before starting to get some real work done, I’ve worked on a few small projects, to get a feel of the language. What I learned is that everything you put in a ref or atom must be immutable. A fundamental component of a web crawler is a data structure that tells you if you have already visited a URL. Bloom filters are a common choice for the implementation of this component. Since this bloom filter will be put in a ref, I want it to be immutable and, for efficiency reasons, persistent. After some research on the web, I found out that an immutable persistent implementation of the bloom filter is still missing from the Clojure ecosystem. Since it seems that Clojure is missing a persistent implementation of many probabilistic data structures, I came up with the following idea. Persistent probabilistic data structures for Clojure Clojure seems to miss persistent implementations of many useful probabilistic data structures, in particular: - Bloom filters http://webhdd.ru/library/files/10.1.1.127.9672.pdf - Counting Bloom filtershttp://webhdd.ru/library/files/10.1.1.127.9672.pdf - Compact approximators http://arxiv.org/pdf/cs.DS/0306046.pdf - HyperLogLog countershttp://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewPDFInterstitial/dmAH0110/2100 - Count-min sketcheshttp://twiki.di.uniroma1.it/pub/Ing_algo/WebHome/p14_Cormode_JAl_05.pdf Of course one can use one of the various implementations of these data structures for Java, however, being these implementations mutable, they cannot be used in idiomatic concurrent clojure code (as for my understanding of idiomatic concurrent clojure code). What I propose to realize is an optimized persistent implementation of these libraries. Hence I plan to explore different paths using benchmarks https://github.com/hugoduncan/criterium as a guide, for instance to decide whether it’s more convenient to use standard clojure vectors to represent bit vectors or to provide a custom persistent bit vector implementation. Since I like the feeling of having a static type checker that prevents common errors, the library will be annotated with Typed Clojure http://typedclojure.org/. Moreover I find interesting the idea of using tests as machine checkable documentation, and midje-doc http://docs.caudate.me/lein-midje-doc/ seems the right tool for the job. Is there someone interested in mentoring me with this project? Yours sincerely Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Google Summer of Code 2014 proposal
Hello everybody, I’m Matteo Ceccarello, a PhD student in Computer Engineering from university of Padova, Italy. Recently, I’ve started working with Clojure, a language that I find both powerful and fun. I’ve participated to GSoC 2013 and 2012 with JPF http://babelfish.arc.nasa.gov/trac/jpf/wiki. I’m interested in participating to GSoC 2014 with Clojure, in order to gain confidence with clojure under the guidance of a mentor while contributing to the community. I’ll describe what I’d like to do below. Motivation Currently, alongside my research on parallel graph algorithms, I’ve started writing a web crawler in Clojure. The reasons I want to do this in clojure are many: - The web crawling software we are using (Heritrix https://crawler.archive.org) performs blocking IO. I think that in big crawls this is a showstopper, since to achieve a good throughput one must use many Java threads (in the order of 500) that end up spending most of their time performing blocking waits. - Clojure has the wonderful core.async library, with all the magic that go blocks and channels bring. - The http-kit library makes possible to perform async requests to web servers in a very simple way. - Clojure has an awesome support for concurrency. What I want to achieve eventually is a concurrent asynchronous web crawler, without a single blocking call. Since I’m new to the language, before starting to get some real work done, I’ve worked on a few small projects, to get a feel of the language. What I learned is that everything you put in a ref or atom must be immutable. A fundamental component of a web crawler is a data structure that tells you if you have already visited a URL. Bloom filters are a common choice for the implementation of this component. Since this bloom filter will be put in a ref, I want it to be immutable and, for efficiency reasons, persistent. After some research on the web, I found out that an immutable persistent implementation of the bloom filter is still missing from the Clojure ecosystem. Since it seems that Clojure is missing a persistent implementation of many probabilistic data structures, I came up with the following idea. Persistent probabilistic data structures for Clojure Clojure seems to miss persistent implementations of many useful probabilistic data structures, in particular: - Bloom filters http://webhdd.ru/library/files/10.1.1.127.9672.pdf - Counting Bloom filtershttp://webhdd.ru/library/files/10.1.1.127.9672.pdf - Compact approximators http://arxiv.org/pdf/cs.DS/0306046.pdf - HyperLogLog countershttp://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewPDFInterstitial/dmAH0110/2100 - Count-min sketcheshttp://twiki.di.uniroma1.it/pub/Ing_algo/WebHome/p14_Cormode_JAl_05.pdf Of course one can use one of the various implementations of these data structures for Java, however, being these implementations mutable, they cannot be used in idiomatic concurrent clojure code (as for my understanding of idiomatic concurrent clojure code). What I propose to realize is an optimized persistent implementation of these libraries. Hence I plan to explore different paths using benchmarks https://github.com/hugoduncan/criterium as a guide, for instance to decide whether it’s more convenient to use standard clojure vectors to represent bit vectors or to provide a custom persistent bit vector implementation. Since I like the feeling of having a static type checker that prevents common errors, the library will be annotated with Typed Clojure http://typedclojure.org/. Moreover I find interesting the idea of using tests as machine checkable documentation, and midje-doc http://docs.caudate.me/lein-midje-doc/ seems the right tool for the job. Is there someone interested in mentoring me with this project? Yours sincerely Matteo -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.