Re: Clojure / Common Lisp Question

2010-06-26 Thread Michael Gardner
On Jun 26, 2010, at 12:37 AM, rob levy wrote:

>> Rich Hickey's insightful videos have caused me to stop
>> writing loops whenever possible. For me this is the same
>> level of "thinking-change" that happened when I moved to
>> using "Structured Programming" rather than GOTO (in Fortran).
>> Rich needs to write a paper called
>>  "Loops considered harmful"
>> 
> 
> That is a great thing, I like that about both Common Lisp and Clojure.  
> Compare with Perl or even Python; you can use map/grep, list comprehensions 
> etc some of the time but not all of the time.  I Lisp it's always possible to 
> that in a neat way I think.  I know there is a loop macro in CL, which I'm 
> sure can cause many people to just write in some other language's idiom 
> instead of the native one.

You can use Perl's map/grep pretty much anywhere, though they're not as nice to 
use because the language is a mess (albeit more functional than Python). I, 
too, have found myself using fewer and fewer explicit loops as time goes on, 
starting with a revelation about the versatility of map while I was still 
living in Perlistan. I haven't used a single loop statement yet in Clojure, and 
I doubt I will ever do so except for performance reasons.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Enhanced primitive support - redux

2010-06-26 Thread Stefan Kamphausen
Hi,

On 25 Jun., 21:04, Rich Hickey  wrote:
> equiv, the revenge of num
[...]
> Feedback welcome,

sorry, no feedback, but one question which is rather important to me:
will this make it into Clojure 1.2?

Kind regards,
Stefan

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Enhanced primitive support - redux

2010-06-26 Thread Nicolas Oury
I think I pointed it out, and I reiterate it will probably not improve
performance a lot (Except if you use always the 5 same numbers).

On Sat, Jun 26, 2010 at 7:59 AM, B Smith-Mannschott
wrote:

> This was suggested on the previous thread on this topic as well, but
> I don't think it was pointed out that *Java already does this*.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

ClojureX

2010-06-26 Thread Michael Kohl
Please not that ClojureX is now discontinued. There are enough viable
alternatives like David's clj [1] by now, so I don't see the need for
this project anymore. If you do however, feel free to fork away :-)

Michael

[1] http://github.com/liebke/clj

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: State of Clojure web development

2010-06-26 Thread James Reeves
On 26 June 2010 03:36, MarkSwanson  wrote:
> I still like timestamps even with per-user logging.

Sorry, I guess wasn't entirely clear.

The maps would still contain timestamps. My point is that we should
get away from the idea thinking of logs as unstructured text strings.

For example, a typical log message for a web application might look
something like:

{:application "Some app"
 :timestamp "20100626T112154Z"
 :server-ip "10.23.1.34"
 :request-uuid "885902e5-4fd0-43b3-b3d6-8961e8987278"
 :session-user-id 102
 :level "error"
 :message "Inconsistent user data"
 :stacktrace ["" "" ...]}

Something with lots of juicy information we can filter on, and
something that can be elegantly displayed in a web-based user
interface.

> I prefer files. Using a DB prevents me from using existing powerful
> tools to search and query the logs: vim and emacs.

Databases are usually quite good at querying information as well; that
is something they are designed to do, after all. I believe MongoDB has
support for regular expression searches, so there's not really
anything you can do with a text editor that you can't with a
sufficiently advanced database.

You also get more fine-grained control over storage. For instance, I
might want to have debugging messages expire after five days, but
error messages expire after three months. Or perhaps I want to blind
user information before exposing the logs to developers.

That said, any good logging library has support for multiple back
ends. If you didn't want to write out to a database, you could plug in
your own storage system that divides files up by user ID and date, and
then appends the log messages to a text file.

- James

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Drawable Sets

2010-06-26 Thread Nicolas Oury
Dear all,

for a project, I need a data structure to represent finite distribution,
that is a set of elements, each of which having a mass in the set, and two
functions:

- total-mass : the sum of the masses of each element
- draw : return an element with a probability proportional to its mass

I currently use the simple idea : a Persistent Hash Map from elt to mass:
- the total-mass is the reduce of (+ . second)
- drawing is choosing a random-number between 0 and the total-mass and
looping through the seq, to look for the corresponding element.

Both operation are linear in the number of elements. This is a problem,
because these sets tend to be quite big in my project.

I think that if I used a HashTrie with total-mass cached in each node,
total-mass would be in constant time and draw in log time.

So here is my question : does anyone has, for example for the "clojure in
clojure" project, a sketch of an implementation of PersistentHashMap
in clojure itself? (I would be happy to contribute, if it needs more work).
Or has anyone a better idea that mine to have drawable sets?

Thanks for your help,

Best regards,

Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced primitive support - redux

2010-06-26 Thread Mike Meyer
[Format recovered from top posting.]

"Nicolas Oury"  wrote:
>On Sat, Jun 26, 2010 at 7:59 AM, B Smith-Mannschott
>wrote:
>
>> This was suggested on the previous thread on this topic as well, but
>> I don't think it was pointed out that *Java already does this*.

Doesn't matter if clojure is boxing them as well. Yes, all the boxes will point 
to the cache, but you still have different boxes, and allocating those is the 
problem.

>I think I pointed it out, and I reiterate it will probably not improve
>performance a lot (Except if you use always the 5 same numbers).

Reiteration won't make it true.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Enhanced primitive support - redux

2010-06-26 Thread Andrzej
On Sat, Jun 26, 2010 at 3:59 PM, B Smith-Mannschott
 wrote:
>
> This was suggested on the previous thread on this topic as well, but
> I don't think it was pointed out that *Java already does this*.
>
> See the inner class IntegerCache at line 608:
>
> http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/3956cdee6712/src/share/classes/java/lang/Integer.java

Thank you. I wasn't aware of it.

It is not exactly what I meant, though. The above Java code
preallocates some low Integers, that are likely to be encountered in
actual programs. It obviously doesn't help at all when you happen to
use other numbers.

What I'd rather like to have is an array of N preallocated objects
waiting to be assigned values and used. This way an allocation cycle
could be triggered every N Integer constructor calls and all boxes
used in a single procedure would be gathered in one place.

Andrzej

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Enhanced primitive support - redux

2010-06-26 Thread Nicolas Oury
http://tech.puredanger.com/2007/02/01/valueof/

Apparently, a program that
only access 1 integer (0) is only twice as fast when caching.
We are very far from the *10/20 that occurs.

On Sat, Jun 26, 2010 at 2:04 PM, Mike Meyer <
mwm-keyword-googlegroups.620...@mired.org> wrote:

>
> >I think I pointed it out, and I reiterate it will probably not improve
> >performance a lot (Except if you use always the 5 same numbers).
>
> Reiteration won't make it true.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced primitive support - redux

2010-06-26 Thread Nicolas Oury
And, by the way, after having looked into the compiler source, the current
Master already uses the valeOf function.

On Sat, Jun 26, 2010 at 2:31 PM, Nicolas Oury wrote:

> http://tech.puredanger.com/2007/02/01/valueof/
>
> Apparently, a program that
> only access 1 integer (0) is only twice as fast when caching.
> We are very far from the *10/20 that occurs.
>
> On Sat, Jun 26, 2010 at 2:04 PM, Mike Meyer <
> mwm-keyword-googlegroups.620...@mired.org> wrote:
>
>>
>> >I think I pointed it out, and I reiterate it will probably not improve
>> >performance a lot (Except if you use always the 5 same numbers).
>>
>> Reiteration won't make it true.
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Drawable Sets

2010-06-26 Thread Jules
You can store the numbers in a vector [2 5 3]. Instead of doing this
store the cumulative weight v = [0 2 7 10]. What your problem comes
down to is given a number n, find the the lowest index i such that
v[i] <= n. This can be accomplished with binary search. You can also
use a binary search tree instead of a vector. You also keep a vector w
with the actual objects. Drawing is done like this: generate a random
number n between 0 and total-weight = (last v). Then find the smallest
index i such that v[i] <= n. The drawn object is w[i]. Hope this
helps.

On 26 jun, 15:01, Nicolas Oury  wrote:
> Dear all,
>
> for a project, I need a data structure to represent finite distribution,
> that is a set of elements, each of which having a mass in the set, and two
> functions:
>
> - total-mass : the sum of the masses of each element
> - draw : return an element with a probability proportional to its mass
>
> I currently use the simple idea : a Persistent Hash Map from elt to mass:
> - the total-mass is the reduce of (+ . second)
> - drawing is choosing a random-number between 0 and the total-mass and
> looping through the seq, to look for the corresponding element.
>
> Both operation are linear in the number of elements. This is a problem,
> because these sets tend to be quite big in my project.
>
> I think that if I used a HashTrie with total-mass cached in each node,
> total-mass would be in constant time and draw in log time.
>
> So here is my question : does anyone has, for example for the "clojure in
> clojure" project, a sketch of an implementation of PersistentHashMap
> in clojure itself? (I would be happy to contribute, if it needs more work).
> Or has anyone a better idea that mine to have drawable sets?
>
> Thanks for your help,
>
> Best regards,
>
> Nicolas.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Drawable Sets

2010-06-26 Thread Nicolas Oury
Thank you very much for your answer. I forgot to tell you  that I also needs
to be able to have usual set operations.

They become linear with your 1st idea. (But, sorry, it's my bad! I should
have asked the right question. Your answer is very good for the question I
actually asked, not the one I had in mind :))
The second idea is also good, but would perform a bit less good than using
wider trees than binary.
(I might use this second idea, though, with a set instead of a vector, if I
can't manage to make HashTries work).

On Sat, Jun 26, 2010 at 2:50 PM, Jules  wrote:

> You can store the numbers in a vector [2 5 3]. Instead of doing this
> store the cumulative weight v = [0 2 7 10]. What your problem comes
> down to is given a number n, find the the lowest index i such that
> v[i] <= n. This can be accomplished with binary search. You can also
> use a binary search tree instead of a vector. You also keep a vector w
> with the actual objects. Drawing is done like this: generate a random
> number n between 0 and total-weight = (last v). Then find the smallest
> index i such that v[i] <= n. The drawn object is w[i]. Hope this
> helps.
>
> On 26 jun, 15:01, Nicolas Oury  wrote:
> > Dear all,
> >
> > for a project, I need a data structure to represent finite distribution,
> > that is a set of elements, each of which having a mass in the set, and
> two
> > functions:
> >
> > - total-mass : the sum of the masses of each element
> > - draw : return an element with a probability proportional to its mass
> >
> > I currently use the simple idea : a Persistent Hash Map from elt to mass:
> > - the total-mass is the reduce of (+ . second)
> > - drawing is choosing a random-number between 0 and the total-mass and
> > looping through the seq, to look for the corresponding element.
> >
> > Both operation are linear in the number of elements. This is a problem,
> > because these sets tend to be quite big in my project.
> >
> > I think that if I used a HashTrie with total-mass cached in each node,
> > total-mass would be in constant time and draw in log time.
> >
> > So here is my question : does anyone has, for example for the "clojure in
> > clojure" project, a sketch of an implementation of PersistentHashMap
> > in clojure itself? (I would be happy to contribute, if it needs more
> work).
> > Or has anyone a better idea that mine to have drawable sets?
> >
> > Thanks for your help,
> >
> > Best regards,
> >
> > Nicolas.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Clojure / Common Lisp Question

2010-06-26 Thread rob levy
> You can use Perl's map/grep pretty much anywhere, though they're not as
> nice to use because the language is a mess (albeit more functional than
> Python). I, too, have found myself using fewer and fewer explicit loops as
> time goes on, starting with a revelation about the versatility of map while
> I was still living in Perlistan. I haven't used a single loop statement yet
> in Clojure, and I doubt I will ever do so except for performance reasons.
>
>
Yes, that is true, you can use that approach everywhere in Perl, but for the
exact reason you cite an explicit loop in Perl can sometimes be less
awkward/ hard to read/ require less auxiliary yak-shaving, whereas in Lisps
these higher order forms that abstract away the looping are consistently the
natural way to do things, and the code is more  declarative/clean.  In Perl,
the "right way" to do it seems to depend on the situation, sometimes a while
loop, sometimes a foreach-style loop (though I can't remember the last time
it seemed like a good idea to use a classic "for i to n"-style loop).  Perl
offers an obscene amount  of syntax to choose from which is good for
expressiveness, but I'll take ultra-simplicity, homoiconicity,
no-syntax/DIY-syntax approach to expressiveness of the Lisp family of
languages over that any day.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Enhanced primitive support - redux

2010-06-26 Thread David Powell

Hi,

Re: caching boxed ints:

>>I think I pointed it out, and I reiterate it will probably not improve
>>performance a lot (Except if you use always the 5 same numbers).

> Reiteration won't make it true.

At about 10m - 12m into this video, Cliff Click suggests that Java's
caching of Integer objects might do some harm to performance because
it prevents the JIT from being able to do inlining and escape
analysis.

http://www.infoq.com/presentations/click-fast-bytecodes-funny-languages


-- 
Dave

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Drawable Sets

2010-06-26 Thread Garth Sheldon-Coulson
Are there going to be a lot of deletions from the set? Or mostly insertions?

If it's mostly insertions (or if it's just a static data structure that
stays the same once built) then I think I can help.

On Sat, Jun 26, 2010 at 9:01 AM, Nicolas Oury wrote:

> Dear all,
>
> for a project, I need a data structure to represent finite distribution,
> that is a set of elements, each of which having a mass in the set, and two
> functions:
>
> - total-mass : the sum of the masses of each element
> - draw : return an element with a probability proportional to its mass
>
> I currently use the simple idea : a Persistent Hash Map from elt to mass:
> - the total-mass is the reduce of (+ . second)
> - drawing is choosing a random-number between 0 and the total-mass and
> looping through the seq, to look for the corresponding element.
>
> Both operation are linear in the number of elements. This is a problem,
> because these sets tend to be quite big in my project.
>
> I think that if I used a HashTrie with total-mass cached in each node,
> total-mass would be in constant time and draw in log time.
>
> So here is my question : does anyone has, for example for the "clojure in
> clojure" project, a sketch of an implementation of PersistentHashMap
> in clojure itself? (I would be happy to contribute, if it needs more work).
> Or has anyone a better idea that mine to have drawable sets?
>
> Thanks for your help,
>
> Best regards,
>
> Nicolas.
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Drawable Sets

2010-06-26 Thread Nicolas Oury
Insertion and deletions. But I would like to hear your idea anyway.
Always good to hear ideas :)

On Sat, Jun 26, 2010 at 7:58 PM, Garth Sheldon-Coulson  wrote:

> Are there going to be a lot of deletions from the set? Or mostly
> insertions?
>
> If it's mostly insertions (or if it's just a static data structure that
> stays the same once built) then I think I can help.
>
> On Sat, Jun 26, 2010 at 9:01 AM, Nicolas Oury wrote:
>
>> Dear all,
>>
>> for a project, I need a data structure to represent finite distribution,
>> that is a set of elements, each of which having a mass in the set, and two
>> functions:
>>
>> - total-mass : the sum of the masses of each element
>> - draw : return an element with a probability proportional to its mass
>>
>> I currently use the simple idea : a Persistent Hash Map from elt to mass:
>> - the total-mass is the reduce of (+ . second)
>> - drawing is choosing a random-number between 0 and the total-mass and
>> looping through the seq, to look for the corresponding element.
>>
>> Both operation are linear in the number of elements. This is a problem,
>> because these sets tend to be quite big in my project.
>>
>> I think that if I used a HashTrie with total-mass cached in each node,
>> total-mass would be in constant time and draw in log time.
>>
>> So here is my question : does anyone has, for example for the "clojure in
>> clojure" project, a sketch of an implementation of PersistentHashMap
>> in clojure itself? (I would be happy to contribute, if it needs more
>> work).
>> Or has anyone a better idea that mine to have drawable sets?
>>
>> Thanks for your help,
>>
>> Best regards,
>>
>> Nicolas.
>>
>>  --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>>
>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Drawable Sets

2010-06-26 Thread Garth Sheldon-Coulson
Well, my idea involves rejection sampling. In order to sample an element
from the set in a mass-weighted fashion, you can sample an element in a
uniform fashion and then reject the sample with a certain probability (more
below). If you do reject the sample, you keep drawing from the uniform
proposal distribution until you get an acceptance. This can be quite
efficient if the most massive element in the set is not too much more
massive than the majority of the other elements. If, however, the elements
have very different masses, then it can be inefficient. Regardless, my
intuition is that you can't do better than rejection sampling in this case,
because the probability distribution from which you're sampling is
constantly changing and not of any particular functional form.

In order to make rejection sampling work, I think you'll need to be able to
do two things:

1. Efficiently sample an element from the set in a uniform
(non-mass-weighted) fashion.
2. Efficiently look up the mass of the most massive element currently in the
set.

And you said you want set-like lookup by element, so here's a third
requirement:

3. Efficiently look up an element's mass by element.

The hard part is actually (1). In order to get (1), you somewhere need an
index to your elements indexed by index, so that you can uniformly sample an
element efficiently by saying (nth index (rand-int (count index))) or just
(rand-nth index). For this you need the index to be a vector, and this makes
it hard to get efficient deletions. Below I use weak deletions, but this
isn't ideal. This is why I asked about deletions. AFAIK, there's no way to
efficiently sample a uniform random element from a hashmap, hashset, sorted
map, or sorted set... you would need to call seq on any of those to query it
by index, which would be really bad for lookup times.

But here's the idea. I think it's not bad anyway.

Let's say you have j objects object_i with masses mass_i, where 0 <= i < j.
Then the data structure I have in mind is composed of:

1. A vector v whose elements are object_i.
2. A sorted set index ss whose elements are [mass_i, object_i] and whose
comparator is #(> (first %1) (first %2)) - so, sorted from largest to
smallest.
3. A hashmap index h whose key->value pairs are object_i->[mass_i,
vindex_i], where vindex_i is the index of object_i in the vector v. This
allows O(log32 n) access to the mass of each element and also to the vector
index of each element.
4. A scalar tm holding the total mass of all objects in the set.

To insert an object, simply conj it onto each structure as appropriate and
update the total mass.

To delete an element object_k, first look up its mass_k and vindex_k in the
hashmap, and then return a data structure with:

1. Vector (assoc v vindex_k ::unused), i.e. a weak deletion. This is
problematic because the length of the vector never decreases with deletions,
and this affects the efficiency of sampling later. I can't think of any way
around this, because we need an index indexed by numbers.
2. Sorted set (disj ss [mass_k, object_k]).
3. Hashmap (disj h object_k).
4. Scalar (- tm mass_k).

To sample in a mass-weighted manner, we do rejection sampling. Rejection
sampling has two steps: sampling from a uniform proposal distribution over
the elements in the set, and then accepting the sampled element with some
probability (and, if it is rejected, sampling another). The acceptance
probability depends on the mass of the sampled element and also on the mass
of the largest element in the set, because we need to make sure the proposal
distribution, which is a uniform times some factor, envelopes the
distribution from which we are trying to sample.

As an algorithm over v, ss, h, and tm:

1. Sample an integer pindex ("proposal index") from a uniform distribution
over [0, (count v)), i.e. let pindex be (rand-int (count v)).
2. Let p ("proposal") be (nth v pindex). This step is O(log32 n). If p is
::unused, then go back to step 1 -- in other words, keep sampling until we
get a uniform random element of v.
3. Look up the mass of p, i.e. let mass_p be (first (h p)). This step is
O(log32 n).
4. Look up the mass of the most massive element in the set, i.e. let
max-mass be (first (first ss)). This step is O(1).
5. Let n be (count v). Accept and return the proposal with probability
mass_p/max-mass = (mass_p/tm)/((1/n)*(max-mass*n/tm)), and otherwise reject
and go back to step 1. This step is fast if there aren't too many rejections
on average.

On multicore hardware, instead of looping from step 5 back to step 1, you
could do things in parallel: put steps 1-5 in a function f yielding an
accepted value or a nil for rejection, and then do (some (apply pcalls
(repeat f))) to get an acceptance.

I hope this helps. Definitely you should double-check my reasoning =).

Garth


On Sat, Jun 26, 2010 at 3:24 PM, Nicolas Oury wrote:

> Insertion and deletions. But I would like to hear your idea anyway.
> Always good to hear ideas :)
>
>
> On Sat, Jun

Re: Drawable Sets

2010-06-26 Thread Garth Sheldon-Coulson
Sorry - the call at the end of my message should be: (some identity (apply
pcalls (repeat f))).

I also realized that the random seed contention issues that were discussed
on the list earlier might prevent very much gain from parallelizing the
accept-reject.

Garth

On Sat, Jun 26, 2010 at 5:04 PM, Garth Sheldon-Coulson  wrote:

> Well, my idea involves rejection sampling. In order to sample an element
> from the set in a mass-weighted fashion, you can sample an element in a
> uniform fashion and then reject the sample with a certain probability (more
> below). If you do reject the sample, you keep drawing from the uniform
> proposal distribution until you get an acceptance. This can be quite
> efficient if the most massive element in the set is not too much more
> massive than the majority of the other elements. If, however, the elements
> have very different masses, then it can be inefficient. Regardless, my
> intuition is that you can't do better than rejection sampling in this case,
> because the probability distribution from which you're sampling is
> constantly changing and not of any particular functional form.
>
> In order to make rejection sampling work, I think you'll need to be able to
> do two things:
>
> 1. Efficiently sample an element from the set in a uniform
> (non-mass-weighted) fashion.
> 2. Efficiently look up the mass of the most massive element currently in
> the set.
>
> And you said you want set-like lookup by element, so here's a third
> requirement:
>
> 3. Efficiently look up an element's mass by element.
>
> The hard part is actually (1). In order to get (1), you somewhere need an
> index to your elements indexed by index, so that you can uniformly sample an
> element efficiently by saying (nth index (rand-int (count index))) or just
> (rand-nth index). For this you need the index to be a vector, and this makes
> it hard to get efficient deletions. Below I use weak deletions, but this
> isn't ideal. This is why I asked about deletions. AFAIK, there's no way to
> efficiently sample a uniform random element from a hashmap, hashset, sorted
> map, or sorted set... you would need to call seq on any of those to query it
> by index, which would be really bad for lookup times.
>
> But here's the idea. I think it's not bad anyway.
>
> Let's say you have j objects object_i with masses mass_i, where 0 <= i < j.
> Then the data structure I have in mind is composed of:
>
> 1. A vector v whose elements are object_i.
> 2. A sorted set index ss whose elements are [mass_i, object_i] and whose
> comparator is #(> (first %1) (first %2)) - so, sorted from largest to
> smallest.
> 3. A hashmap index h whose key->value pairs are object_i->[mass_i,
> vindex_i], where vindex_i is the index of object_i in the vector v. This
> allows O(log32 n) access to the mass of each element and also to the vector
> index of each element.
> 4. A scalar tm holding the total mass of all objects in the set.
>
> To insert an object, simply conj it onto each structure as appropriate and
> update the total mass.
>
> To delete an element object_k, first look up its mass_k and vindex_k in the
> hashmap, and then return a data structure with:
>
> 1. Vector (assoc v vindex_k ::unused), i.e. a weak deletion. This is
> problematic because the length of the vector never decreases with deletions,
> and this affects the efficiency of sampling later. I can't think of any way
> around this, because we need an index indexed by numbers.
> 2. Sorted set (disj ss [mass_k, object_k]).
> 3. Hashmap (disj h object_k).
> 4. Scalar (- tm mass_k).
>
> To sample in a mass-weighted manner, we do rejection sampling. Rejection
> sampling has two steps: sampling from a uniform proposal distribution over
> the elements in the set, and then accepting the sampled element with some
> probability (and, if it is rejected, sampling another). The acceptance
> probability depends on the mass of the sampled element and also on the mass
> of the largest element in the set, because we need to make sure the proposal
> distribution, which is a uniform times some factor, envelopes the
> distribution from which we are trying to sample.
>
> As an algorithm over v, ss, h, and tm:
>
> 1. Sample an integer pindex ("proposal index") from a uniform distribution
> over [0, (count v)), i.e. let pindex be (rand-int (count v)).
> 2. Let p ("proposal") be (nth v pindex). This step is O(log32 n). If p is
> ::unused, then go back to step 1 -- in other words, keep sampling until we
> get a uniform random element of v.
> 3. Look up the mass of p, i.e. let mass_p be (first (h p)). This step is
> O(log32 n).
> 4. Look up the mass of the most massive element in the set, i.e. let
> max-mass be (first (first ss)). This step is O(1).
> 5. Let n be (count v). Accept and return the proposal with probability
> mass_p/max-mass = (mass_p/tm)/((1/n)*(max-mass*n/tm)), and otherwise reject
> and go back to step 1. This step is fast if there aren't too many rejections
> on average.
>
> On multi

Re: Drawable Sets

2010-06-26 Thread Jules
By storing the set in a binary (or n-ary) tree and maintaining the
total weight in a node you can make every operation (draw, insert,
union, intersect) O(log n) and total-weight O(1). Draw like this
(where node(w,l,r) denotes a node with weight w, left subtree l and
right subtree r and leaf(w,v) denotes a leaf with weight w and value
v):

draw(node) = search(random number between 0 and weight(node), node)
search(x,node(w,l,r)) = if x < weight(l) then search(x,l) else
search(x-weight(l),r)
search(x,leaf(w,v)) = v

To make this work you need to maintain the invariant w =
weight(node(w,l,r)) = weight(l) + weight(r) in the set operations.

You can use the same idea for hash tries, just maintain the total
weight in each node and draw with this algorithm.

On 26 jun, 15:54, Nicolas Oury  wrote:
> Thank you very much for your answer. I forgot to tell you  that I also needs
> to be able to have usual set operations.
>
> They become linear with your 1st idea. (But, sorry, it's my bad! I should
> have asked the right question. Your answer is very good for the question I
> actually asked, not the one I had in mind :))
> The second idea is also good, but would perform a bit less good than using
> wider trees than binary.
> (I might use this second idea, though, with a set instead of a vector, if I
> can't manage to make HashTries work).
>
>
>
> On Sat, Jun 26, 2010 at 2:50 PM, Jules  wrote:
> > You can store the numbers in a vector [2 5 3]. Instead of doing this
> > store the cumulative weight v = [0 2 7 10]. What your problem comes
> > down to is given a number n, find the the lowest index i such that
> > v[i] <= n. This can be accomplished with binary search. You can also
> > use a binary search tree instead of a vector. You also keep a vector w
> > with the actual objects. Drawing is done like this: generate a random
> > number n between 0 and total-weight = (last v). Then find the smallest
> > index i such that v[i] <= n. The drawn object is w[i]. Hope this
> > helps.
>
> > On 26 jun, 15:01, Nicolas Oury  wrote:
> > > Dear all,
>
> > > for a project, I need a data structure to represent finite distribution,
> > > that is a set of elements, each of which having a mass in the set, and
> > two
> > > functions:
>
> > > - total-mass : the sum of the masses of each element
> > > - draw : return an element with a probability proportional to its mass
>
> > > I currently use the simple idea : a Persistent Hash Map from elt to mass:
> > > - the total-mass is the reduce of (+ . second)
> > > - drawing is choosing a random-number between 0 and the total-mass and
> > > looping through the seq, to look for the corresponding element.
>
> > > Both operation are linear in the number of elements. This is a problem,
> > > because these sets tend to be quite big in my project.
>
> > > I think that if I used a HashTrie with total-mass cached in each node,
> > > total-mass would be in constant time and draw in log time.
>
> > > So here is my question : does anyone has, for example for the "clojure in
> > > clojure" project, a sketch of an implementation of PersistentHashMap
> > > in clojure itself? (I would be happy to contribute, if it needs more
> > work).
> > > Or has anyone a better idea that mine to have drawable sets?
>
> > > Thanks for your help,
>
> > > Best regards,
>
> > > Nicolas.
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to clojure@googlegroups.com
> > Note that posts from new members are moderated - please be patient with
> > your first post.
> > To unsubscribe from this group, send email to
> > clojure+unsubscr...@googlegroups.com > >
> > For more options, visit this group at
> >http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Protocols

2010-06-26 Thread Mark Engelberg
Is there a list somewhere of all the protocols built-in to Clojure
1.2's core that are available for extension?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en