Re: [GSoC] Proposal: persistent probabilistic data structures

2014-03-20 Thread Matteo Ceccarello
Thank you Nicola :D

Il giorno mercoledì 19 marzo 2014 13:42:31 UTC+1, Nicola Mometto ha scritto:


 Matteo, 
 When sending the CA adding a project name is optional, you can leave 
 it empty. 

 Usually the name for a contrib project will be chosen in the thread in 
 clojure-dev when proposing the library as a contrib project, see 

 http://dev.clojure.org/display/community/Guidelines+for+Clojure+Contrib+committers
  
 for more details 

 Nicola 

 Matteo Ceccarello writes: 

  I just had a short conversation with Ambrose through comments on my 
  proposal. He suggests to offer this library to Clojure contrib. 
  So I have to sign a Clojure CA, however I don't know what project name 
 to 
  put in the form, since the project has yet to start. 
  
  Any suggestion? What should I do? 
  
  Thanks 
  Matteo 
  
  Il giorno venerdì 14 marzo 2014 14:53:44 UTC+1, Matteo Ceccarello ha 
  scritto: 
  
  Nicola, 
  thank you :) It's really nice to see that I'm not alone! 
  
  Everybody, I just updated the proposal, following the guidelines 
 suggested 
  by the wiki. 
  
  Cheers, 
  Matteo 
  
  Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha 
 scritto: 
  
  
  Matteo, 
  best of luck with your proposal, all of those seem like good potential 
  additions to the clojure data-structure landscape. 
  
  BTW I'm too a fellow UniPD clojure user so you can inc the counter :) 
  
  Nicola 
  
  Matteo Ceccarello writes: 
  
   Hello everybody, 
   
   I just submitted my proposal for this year's GSoC. I already 
 introduced 
  the 
   topic on this mailing list. 
   Here you can find the complete proposal: 
   
   
  
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872
  
   
   So, if you have suggestions on how to improve it, or if you have any 
   request, please tell me :) 
   
   I'm really looking forward to work with Clojure for this year's 
 GSoC: I 
   want to learn it really well, since 
   my long term goal is to make more and more people at my University 
 use 
  it. 
   Working on an open source project 
   under the guidance of an expert mentor seems to me the best way to 
  achieve 
   this. 
   
   Bests 
   Matteo 
  
  


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [GSoC] Proposal: persistent probabilistic data structures

2014-03-19 Thread Matteo Ceccarello
I just had a short conversation with Ambrose through comments on my 
proposal. He suggests to offer this library to Clojure contrib. 
So I have to sign a Clojure CA, however I don't know what project name to 
put in the form, since the project has yet to start.

Any suggestion? What should I do?

Thanks
Matteo

Il giorno venerdì 14 marzo 2014 14:53:44 UTC+1, Matteo Ceccarello ha 
scritto:

 Nicola,
 thank you :) It's really nice to see that I'm not alone!

 Everybody, I just updated the proposal, following the guidelines suggested 
 by the wiki.

 Cheers,
 Matteo

 Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha scritto:


 Matteo, 
 best of luck with your proposal, all of those seem like good potential 
 additions to the clojure data-structure landscape. 

 BTW I'm too a fellow UniPD clojure user so you can inc the counter :) 

 Nicola 

 Matteo Ceccarello writes: 

  Hello everybody, 
  
  I just submitted my proposal for this year's GSoC. I already introduced 
 the 
  topic on this mailing list. 
  Here you can find the complete proposal: 
  
  
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872
  
  
  So, if you have suggestions on how to improve it, or if you have any 
  request, please tell me :) 
  
  I'm really looking forward to work with Clojure for this year's GSoC: I 
  want to learn it really well, since 
  my long term goal is to make more and more people at my University use 
 it. 
  Working on an open source project 
  under the guidance of an expert mentor seems to me the best way to 
 achieve 
  this. 
  
  Bests 
  Matteo 



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [GSoC] Proposal: persistent probabilistic data structures

2014-03-14 Thread Matteo Ceccarello
Nicola,
thank you :) It's really nice to see that I'm not alone!

Everybody, I just updated the proposal, following the guidelines suggested 
by the wiki.

Cheers,
Matteo

Il giorno giovedì 13 marzo 2014 23:06:11 UTC+1, Nicola Mometto ha scritto:


 Matteo, 
 best of luck with your proposal, all of those seem like good potential 
 additions to the clojure data-structure landscape. 

 BTW I'm too a fellow UniPD clojure user so you can inc the counter :) 

 Nicola 

 Matteo Ceccarello writes: 

  Hello everybody, 
  
  I just submitted my proposal for this year's GSoC. I already introduced 
 the 
  topic on this mailing list. 
  Here you can find the complete proposal: 
  
  
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872
  
  
  So, if you have suggestions on how to improve it, or if you have any 
  request, please tell me :) 
  
  I'm really looking forward to work with Clojure for this year's GSoC: I 
  want to learn it really well, since 
  my long term goal is to make more and more people at my University use 
 it. 
  Working on an open source project 
  under the guidance of an expert mentor seems to me the best way to 
 achieve 
  this. 
  
  Bests 
  Matteo 


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[GSoC] Proposal: persistent probabilistic data structures

2014-03-13 Thread Matteo Ceccarello
Hello everybody,

I just submitted my proposal for this year's GSoC. I already introduced the 
topic on this mailing list.
Here you can find the complete proposal:

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/matteo_ceccarello/5778586438991872

So, if you have suggestions on how to improve it, or if you have any 
request, please tell me :)

I'm really looking forward to work with Clojure for this year's GSoC: I 
want to learn it really well, since
my long term goal is to make more and more people at my University use it. 
Working on an open source project
under the guidance of an expert mentor seems to me the best way to achieve 
this.

Bests
Matteo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Google Summer of Code 2014 proposal

2014-03-12 Thread Matteo Ceccarello
The student application opened, so I'm going to submit my proposal.
Does anyone have suggestions or requests?

Cheers,
Matteo

Il giorno lunedì 3 marzo 2014 15:36:43 UTC+1, Matteo Ceccarello ha scritto:

 Hello everybody,

 I’m Matteo Ceccarello, a PhD student in Computer Engineering from 
 university of
 Padova, Italy. Recently, I’ve started working with Clojure, a language 
 that I
 find both powerful and fun. I’ve participated to GSoC 2013 and 2012 with
 JPF http://babelfish.arc.nasa.gov/trac/jpf/wiki. I’m interested in
 participating to GSoC 2014 with Clojure, in order to gain confidence with
 clojure under the guidance of a mentor while contributing to the 
 community. I’ll
 describe what I’d like to do below.
 Motivation 

 Currently, alongside my research on parallel graph algorithms, I’ve started
 writing a web crawler in Clojure. The reasons I want to do this in clojure 
 are
 many:

- The web crawling software we are using
(Heritrix https://crawler.archive.org) performs blocking IO. I think 
that in
big crawls this is a showstopper, since to achieve a good throughput 
one must
use many Java threads (in the order of 500) that end up spending most 
of their
time performing blocking waits. 
- Clojure has the wonderful core.async library, with all the magic 
that go
blocks and channels bring. 
- The http-kit library makes possible to perform async requests to web
servers in a very simple way. 
- Clojure has an awesome support for concurrency. 

 What I want to achieve eventually is a concurrent asynchronous web crawler,
 without a single blocking call.

 Since I’m new to the language, before starting to get some real work done, 
 I’ve
 worked on a few small projects, to get a feel of the language. What I 
 learned is
 that everything you put in a ref or atom must be immutable. A fundamental
 component of a web crawler is a data structure that tells you if you have
 already visited a URL. Bloom filters are a common choice for the 
 implementation
 of this component. Since this bloom filter will be put in a ref, I want 
 it to
 be immutable and, for efficiency reasons, persistent. After some research 
 on the
 web, I found out that an immutable persistent implementation of the bloom 
 filter
 is still missing from the Clojure ecosystem.

 Since it seems that Clojure is missing a persistent implementation of many
 probabilistic data structures, I came up with the following idea.
 Persistent probabilistic data structures for Clojure 

 Clojure seems to miss persistent implementations of many useful 
 probabilistic
 data structures, in particular:

- Bloom filters http://webhdd.ru/library/files/10.1.1.127.9672.pdf 
- Counting Bloom 
 filtershttp://webhdd.ru/library/files/10.1.1.127.9672.pdf 
- Compact approximators http://arxiv.org/pdf/cs.DS/0306046.pdf 
- HyperLogLog 
 countershttp://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewPDFInterstitial/dmAH0110/2100
  
- Count-min 
 sketcheshttp://twiki.di.uniroma1.it/pub/Ing_algo/WebHome/p14_Cormode_JAl_05.pdf
  

 Of course one can use one of the various implementations of these data 
 structures for
 Java, however, being these implementations mutable, they cannot be used
 in idiomatic concurrent clojure code (as for my understanding of idiomatic
 concurrent clojure code).

 What I propose to realize is an optimized persistent implementation of 
 these
 libraries. Hence I plan to explore different paths using
 benchmarks https://github.com/hugoduncan/criterium as a guide,
 for instance to decide whether it’s more convenient to use standard 
 clojure vectors
 to represent bit vectors or to provide a custom persistent
 bit vector implementation.

 Since I like the feeling of having a static type checker that
 prevents common errors, the library will be annotated with
 Typed Clojure http://typedclojure.org/. Moreover I find interesting the 
 idea
 of using tests as machine checkable documentation, and
 midje-doc http://docs.caudate.me/lein-midje-doc/ seems the right tool 
 for the job.

 Is there someone interested in mentoring me with this project?

 Yours sincerely
 Matteo


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Google Summer of Code 2014 proposal

2014-03-03 Thread Matteo Ceccarello


Hello everybody,

I’m Matteo Ceccarello, a PhD student in Computer Engineering from 
university of
Padova, Italy. Recently, I’ve started working with Clojure, a language that 
I
find both powerful and fun. I’ve participated to GSoC 2013 and 2012 with
JPF http://babelfish.arc.nasa.gov/trac/jpf/wiki. I’m interested in
participating to GSoC 2014 with Clojure, in order to gain confidence with
clojure under the guidance of a mentor while contributing to the community. 
I’ll
describe what I’d like to do below.
Motivation 

Currently, alongside my research on parallel graph algorithms, I’ve started
writing a web crawler in Clojure. The reasons I want to do this in clojure 
are
many:

   - The web crawling software we are using
   (Heritrix https://crawler.archive.org) performs blocking IO. I think 
   that in
   big crawls this is a showstopper, since to achieve a good throughput one 
   must
   use many Java threads (in the order of 500) that end up spending most of 
   their
   time performing blocking waits. 
   - Clojure has the wonderful core.async library, with all the magic that 
   go
   blocks and channels bring. 
   - The http-kit library makes possible to perform async requests to web
   servers in a very simple way. 
   - Clojure has an awesome support for concurrency. 

What I want to achieve eventually is a concurrent asynchronous web crawler,
without a single blocking call.

Since I’m new to the language, before starting to get some real work done, 
I’ve
worked on a few small projects, to get a feel of the language. What I 
learned is
that everything you put in a ref or atom must be immutable. A fundamental
component of a web crawler is a data structure that tells you if you have
already visited a URL. Bloom filters are a common choice for the 
implementation
of this component. Since this bloom filter will be put in a ref, I want it 
to
be immutable and, for efficiency reasons, persistent. After some research 
on the
web, I found out that an immutable persistent implementation of the bloom 
filter
is still missing from the Clojure ecosystem.

Since it seems that Clojure is missing a persistent implementation of many
probabilistic data structures, I came up with the following idea.
Persistent probabilistic data structures for Clojure 

Clojure seems to miss persistent implementations of many useful 
probabilistic
data structures, in particular:

   - Bloom filters http://webhdd.ru/library/files/10.1.1.127.9672.pdf 
   - Counting Bloom filtershttp://webhdd.ru/library/files/10.1.1.127.9672.pdf 
   - Compact approximators http://arxiv.org/pdf/cs.DS/0306046.pdf 
   - HyperLogLog 
countershttp://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewPDFInterstitial/dmAH0110/2100
 
   - Count-min 
sketcheshttp://twiki.di.uniroma1.it/pub/Ing_algo/WebHome/p14_Cormode_JAl_05.pdf
 

Of course one can use one of the various implementations of these data 
structures for
Java, however, being these implementations mutable, they cannot be used
in idiomatic concurrent clojure code (as for my understanding of idiomatic
concurrent clojure code).

What I propose to realize is an optimized persistent implementation of these
libraries. Hence I plan to explore different paths using
benchmarks https://github.com/hugoduncan/criterium as a guide,
for instance to decide whether it’s more convenient to use standard clojure 
vectors
to represent bit vectors or to provide a custom persistent
bit vector implementation.

Since I like the feeling of having a static type checker that
prevents common errors, the library will be annotated with
Typed Clojure http://typedclojure.org/. Moreover I find interesting the 
idea
of using tests as machine checkable documentation, and
midje-doc http://docs.caudate.me/lein-midje-doc/ seems the right tool for 
the job.

Is there someone interested in mentoring me with this project?

Yours sincerely
Matteo

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.