[Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread Nicolas Cellier
Klaus D. Witzel klaus.witzel at cobss.com writes:

 
 On Thu, 24 Jul 2008 20:09:53 +0200, Zulq Alam wrote:
 
 This is very good for small n :)
 
  combinations
 | combinations |
 combinations := (PluggableSet new: (2 raisedTo: self size))
   hashBlock:
 [:aSet |
 aSet
   inject: aSet size
   into: [:hash :each | hash * each hash + each]];
   yourself.
 self combinationsInto: combinations.
 ^ combinations
 
  combinationsInto: aSet
 (aSet includes: self) ifTrue: [^ self].
 aSet add: self.
 self do:
   [:each |
   (self copyWithout: each) combinationsInto: aSet].  
 
  Even now, I'm still not clear on how one takes a change like this and  
  drives it into a squeak release - Tests, Mantis, then what?
 
 Put it on Mantis as enhancement into the collection category. The rest  
 follows automatically ;) I suggest to put something like fast powerset  
 method for collections into the summary line.
 
  Z.
 

Zulq,
the algorithm you are proposing is very simple but has major problems:

1) it is not efficient for large size n: it will do (n factorial) loops when
only (2 raisedTo: n) are necessary
(2 raisedTo: 10) - 1024
(10 factorial) - 3628800

2) each loop will result in a costly hash lookup.
Your hashBlock involve LargeInteger of size (B raisedTo: p) where
- B is the size of an elementary hash (presumably 4 bytes),
- p the size of the partition (ranging from 0 to n)
The cost of a single hash of size p involves p multiplications of growing size.
Each multiplication of k digits has a cost k^2
(this is the naive algorithm currently programmed in LargeInteger plugin).
Thus a single hash costs (B^2+(2*B)^2+...+(p*B)^2)
Combine this with the (n factorial) loops and you will understand the problem.

3) the algorithm must store all partitions even if we don't want to collect but
just to iterate on partitions.

No offense, but you'd better not bother opening a mantis issue for this time.

Cheers

Nicolas






___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread Zulq Alam

Nicolas Cellier wrote:


Zulq,
the algorithm you are proposing is very simple but has major problems:

1) it is not efficient for large size n: it will do (n factorial) loops when
only (2 raisedTo: n) are necessary


It's better than N! because it will not loop over a set already 
processed. For instance, for a set of 5 elements it will try 81 sets but 
only process 32 of these. Not 120 in either case (5 factorial).




2) each loop will result in a costly hash lookup.
Your hashBlock involve LargeInteger of size (B raisedTo: p) where


It doesn't need to. That was just a very rough attempt at producing a 
hash that didn't evaluate to only 16 values. It should be possible to 
create one that produces SmallIntegers but with a higher cardinality.



3) the algorithm must store all partitions even if we don't want to collect but
just to iterate on partitions.


Yes.



No offense, but you'd better not bother opening a mantis issue for this time.



Agreed. I was just curious about why the naive algorithm was so slow and 
then as a seperate question how one gets such changes in.


Z.
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread nicolas cellier
Zulq Alam me at zulq.net writes:

 
 Nicolas Cellier wrote:
 
  Zulq,
  the algorithm you are proposing is very simple but has major problems:
  
  1) it is not efficient for large size n: it will do (n factorial) loops when
  only (2 raisedTo: n) are necessary
 
 It's better than N! because it will not loop over a set already 
 processed. For instance, for a set of 5 elements it will try 81 sets but 
 only process 32 of these. Not 120 in either case (5 factorial).
 

Apologies for this case of blindness!
I forgot the includes: test was cutting branches.

  
  2) each loop will result in a costly hash lookup.
  Your hashBlock involve LargeInteger of size (B raisedTo: p) where
 
 It doesn't need to. That was just a very rough attempt at producing a 
 hash that didn't evaluate to only 16 values. It should be possible to 
 create one that produces SmallIntegers but with a higher cardinality.
 

Yes, modular arithmetic would be a way to go.
Beware, small LargeIntegers intermediate might still spoil the game...
Crafting a good hash is an art (see work from Andres Valloud).

  3) the algorithm must store all partitions even if we don't want to collect
 but
  just to iterate on partitions.
 
 Yes.
 
  
  No offense, but you'd better not bother opening a mantis issue for this
 time.
  
 
 Agreed. I was just curious about why the naive algorithm was so slow and 
 then as a seperate question how one gets such changes in.
 
 Z.
 

Right, you did a good job exercizing your (and our) curiousity.
This is very enlightening.
Beginners should also learn using MessageTally spyOn: [],
to analyze were the real costs go.

Cheers

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread cdrick



 He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!


:)




 I like bit sift solution too for it's simplicity.


me too even if we could argue this is not self explaining...
That's what Andres Valoud said here (thanks Marcin):
http://blogten.blogspot.com/2005/09/very-nice-methods.html



 The problem is that it will iterate p times for creating each subset of
 size p.
 #combinations:atATimeDo: does not. It is building subsets in parallel.


uhm, ok...
the diferrence is an order or 2 more or less...



 The required copy does an iteration though, but in a primitive!
 Hence the difference...

 Nicolas


Thanks all, this was fun and interesting ;)...

Cédrick
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread nicolas cellier
nicolas cellier ncellier at ifrance.com writes:
 
 He, press ALT-v to get versions of #combinations:atATimeDo: and thanks tk!
 
 I like bit sift solution too for it's simplicity.
 The problem is that it will iterate p times for creating subset of size p.
 #combinations:atATimeDo: does not. It is building subsets in parallel.
 The required copy does an iteration though, but in a primitive!
 Hence the difference...
 
 Nicolas
 

What i wrote is true only if you collect: Arrays.
It is not true if you use #asSet instead of #copy. #asSet is not primitive.

The cost of bit sift can be greatly reduced: factor the reverse operations:

Setsubsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
  subsets add:
(workArray maskedBy: (sift printStringBase: 2) reverse) asSet].
   ^subsets

ArrayedCollectionmaskedBy: aBitString
   | maxSize result |
   maxSize := self size min: aBitString size.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
  (aBitString  at: index) = $1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

While at it, anyway, you do not need to reverse at all...
Ordering does not matter.
This eliminates one loop.
It still costs 3 loops per partition:
 (1 to: maxSize) and asSet plus printStringBase:...
Thus 2P+Q loops for a subset of size P out of N with P=Q=N

You can eliminate the printString and send directly #bitAt: to integers.
See http://bugs.squeak.org/view.php?id=6985 and load as pre-requisite:
Installer mantis ensureFix: 6985.

Setsubsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
  subsets add:
(workArray maskedByIntegerBits: sift) asSet].
   ^subsets

ArrayedCollectionmaskedByIntegerBits: anInteger
   | maxSize result |
   maxSize := self size min: anInteger highBit.
   result := (self species new: maxSize) writeStream.
   1 to: maxSize do: [:index |
  (anInteger bitAt: index) = 1 ifTrue: [result nextPut: (self at: index)]].
   ^result contents

You still have P+Q loops, P= Q=N
You can now eliminate the asSet loop using collect:into: / select:into:
(I think we can call this Klaus trick).

Setsubsets
   | subsetsSize subsets workArray |
   workArray := self asArray.
   subsetsSize := 2 raisedTo: self size.
   subsets := Set new: subsetsSize.
   1 to: subsetsSize do: [:sift |
  subsets add:
(workArray selectIntegerMask: sift into: (Set new: sift highBit)].
   ^subsets

ArrayedCollectionselectIntegerMask: anInteger into: aCollection
   1 to: (self size min: anInteger highBit) do: [:index |
  (anInteger bitAt: index) = 1 ifTrue: [aCollection add: (self at: index)]].
   ^aCollection

Only costs Q loops per partition...
Maybe you can now approach #combinations:atAtTimeDo:
I let you try.

Nicolas

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread nicolas cellier
cdrick cdrick65 at gmail.com writes:

 
 I like bit sift solution too for it's simplicity.
 
 me too even if we could argue this is not self explaining...That's what Andres
Valoud said here (thanks Marcin):
http://blogten.blogspot.com/2005/09/very-nice-methods.html
  
 

which is the exact transposition of #combinations:atATimeDo: ...

Nicolas




___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Another extension proposal - subsets

2008-07-25 Thread cdrick
really interesting.
but it reveals slower but it was because of Set operations. To be consistent
with other method, I used an array.

subsets6
  | subsetsSize subsets workArray |
  workArray := self asArray.
  subsetsSize := 2 raisedTo: self size.
  subsets := Array new: subsetsSize.
  1 to: subsetsSize do: [:sift |
 subsets at: sift put:
   (workArray selectIntegerMask: sift into: (Set new: sift highBit) )].
  ^subsets

Putting asSet as the end is very expensive when the collection is big


results:

set := (1 to: 15) asSet.
[ set subsets first sift ] timeToRun . 2484
[ set subsets4  combinationsSize:do:] timeToRun.  1598
[ set subsets5  Zulk] timeToRun. 12493I think he uses Set for the
subset
[ set subsets6  optimized sift] timeToRun.  1436   Winner :) 

Thats was instructive, especially the way you deal with bit. Nice also how
you removed the reverse(s).

Cédrick
___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-24 Thread Zulq Alam
Actually, we don't need #permutationsDo: as we're only looking for 
combinations. With a few optimisations:


combinations
  ^ self combinationsInto: (Set new: (2 raisedTo: self size))

combinationsInto: aSet
  (aSet includes: self) ifTrue: [^ self]. optimisation
  aSet add: self.
  self do:
[:each |
(self copyWithout: each) combinationsInto: aSet].   
  ^ aSet

[(1 to: 8) asSet combinations] timeToRun 215
[(1 to: 9) asSet combinations] timeToRun 918
[(1 to: 10) asSet combinations] timeToRun 3989
[(1 to: 11) asSet combinations] timeToRun 16349 
[(1 to: 12) asSet combinations] timeToRun  68780

So a little better, but I was expecting much more. What's worrying is this:

(1 to: 10) asSet combinations size  1024
((1 to: 10) asSet combinations collect: [:e | e hash]) asSet size 16

So, 1024 distinct Sets have only 16 distinct hashes between them? That 
seems pretty bad. It would probably be possible to get a little bit more 
out of the routine with a better (more appropriate) hash function. Even 
so, I don't think it will ever be as fast as your method but will 
happily be proved wrong! :)


Z.

cdrick wrote:

Setsubsets
 | subsets |
 subsets := Set with: self.
 self asArray permutationsDo: [:e | subsets add: e asSet].
 self do: [:e | subsets addAll: (self copyWithout: e) subsets].
 ^ subsets


nice too and frustrating how you got it quick :)

I tried a recursive method too first but found the byte ressemblance so...
My only consolation is that the recursive solution (subsets2) is
slower and hardly work for Set with more than 10 elements.

set := #(1 2 3 4 5) asSet.
[ set subsets  ] timeToRun . 1
[ set subsets2  ] timeToRun.  8

set := #(1 2 3 4 5 6 7) asSet.
[ set subsets  ] timeToRun . 5
[ set subsets2  ] timeToRun.  233

set := #(1 2 3 4 5 6 7 8 ) asSet.
[ set subsets  ] timeToRun .  11
[ set subsets2  ] timeToRun.  1683

set := (1 to: 10) asSet.
[ set subsets  ] timeToRun .  46

set := (1 to: 15) asSet.
[ set subsets  ] timeToRun . 2484

set := (1 to: 20) asSet.
[ set subsets  ] timeToRun .  559953  but here the result has (2
raisedTo: 20)  1 048 576 Sets :)

set := (1 to: 50) asSet.
[ set subsets  ] timeToRun .I got a space is low   wow :)

I have to go,
That was fun :)

See you

Cédrick




___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Another extension proposal - subsets

2008-07-24 Thread cdrick
or withNicolas suggestion:

SetasPowerset
| subset |
subset := (OrderedCollection new: (2 raisedTo: self size))
add: Set new;
yourself.
1 to: self size do: [:subSize | don't copy anymore as there is the
Set conversion
self asArray combinations: subSize atATimeDo: [:subArray | subset
add: subArray asSet]].
^ subset asSet

I'd still prefer #combinationsSize:do:   instead of#combinations:atATimeDo:


 How about naming

 Collection#asPowerset
^ self asSet powersetInto: (Set new: (2 raisedTo: self size))

 with Set#powersetInto: and putting that into the next release.

 /Klaus

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-24 Thread Klaus D. Witzel

On Thu, 24 Jul 2008 18:27:48 +0200, Cedrick wrote:


or withNicolas suggestion:

SetasPowerset
| subset |
subset := (OrderedCollection new: (2 raisedTo: self size))
add: Set new;
yourself.
1 to: self size do: [:subSize | don't copy anymore as there is the
Set conversion
self asArray combinations: subSize atATimeDo: [:subArray | subset
add: subArray asSet]].
^ subset asSet


Ah, you francophones always want to see Blaise Pascal's triangle at work ;)

Didn't check #combinations:* implementation, does it avoid #includes: ?

I'd still prefer #combinationsSize:do:   instead of 
#combinations:atATimeDo:


+1 but it looks alt-w friendly already (like #detectSum: looks ;)



How about naming

Collection#asPowerset
   ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))

with Set#powersetInto: and putting that into the next release.

/Klaus




___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


Re: [Newbies] Re: Another extension proposal - subsets

2008-07-24 Thread cdrick

 Didn't check #combinations:* implementation, does it avoid #includes: ?

I think so. It calls #combinationsAt:in: after: do:  wich is recursive.


 I'd still prefer #combinationsSize:do:   instead of
  #combinations:atATimeDo:

 +1 but it looks alt-w friendly already (like #detectSum: looks ;)


Funny to see it was probably the original name. See the comment of
#combinationsAt:in: after: (last line)

Choose k of N items and put in aCollection.  jj-1 already chosen.
Indexes of items are in numerical order, to avoid the same combo being
used twice.  In this slot, we are allowed to use items in self indexed
by nn+1 to self size.  nn is the index used for position jj-1.
(1 to: 6) combinationsSize: 3 do: [:each | Transcript cr; show: each
printString]


Cédrick

 How about naming

 Collection#asPowerset
   ^ self asSet powersetInto: (Set new: (2 raisedTo: self size))

 with Set#powersetInto: and putting that into the next release.

 /Klaus



 ___
 Beginners mailing list
 Beginners@lists.squeakfoundation.org
 http://lists.squeakfoundation.org/mailman/listinfo/beginners

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-24 Thread Zulq Alam

Klaus D. Witzel wrote:

Squeak looks to be already fast in this case, your routine is almost 
optimal :) Implementing it in Set so that Set's internal can be of 
benefit won't bring these figures down much -- what remains is (aSet 
includes: another), times #copyWithout: loop, and that seems to be 
unavoidable.


It's significantly faster with a more appropriate hashing function:

{2-0 . 3-0 . 4-1 . 5-1 . 6-3 . 7-9 . 8-22 . 9-59 . 10-138 . 
11-305 . 12-686 . 13-1640 . 14-4366 . 15-10550 . 16-28104 . 
17-93373 . 18-303425}


combinations
  | combinations |
  combinations := (PluggableSet new: (2 raisedTo: self size))
hashBlock:
  [:aSet |
  aSet
inject: aSet size
into: [:hash :each | hash * each hash + each]];
yourself.
  self combinationsInto: combinations.
  ^ combinations

combinationsInto: aSet
  (aSet includes: self) ifTrue: [^ self].
  aSet add: self.
  self do:
[:each |
(self copyWithout: each) combinationsInto: aSet].   

I think the problem is that most of the sets contain similar data so the 
hashes (calculated with bitXor) tend towards to a small set of values. 
This means there are a lot of collisions and each #add: and #includes: 
is very expensive.




How about naming

Collection#asPowerset
^ self asSet powersetInto: (Set new: (2 raisedTo: self size))

with Set#powersetInto: and putting that into the next release.



Even now, I'm still not clear on how one takes a change like this and 
drives it into a squeak release - Tests, Mantis, then what?


Z.

___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


[Newbies] Re: Another extension proposal - subsets

2008-07-23 Thread Zulq Alam
Couldn't find anything obvious but remembered #permutationsDo: which may 
help. Not sure if these does what you want, but maybe it can be adapted?


Setsubsets
  | subsets |
  subsets := Set with: self.
  self asArray permutationsDo: [:e | subsets add: e asSet].
  self do: [:e | subsets addAll: (self copyWithout: e) subsets].
  ^ subsets

#(1 2) asSet subsets.
a Set(a Set(1 2) a Set() a Set(2) a Set(1))

#(1 2 3) asSet subsets.
a Set(a Set(1 2) a Set() a Set(2) a Set(1 2 3) a Set(2 3) a Set(3) a 
Set(1 3) a Set(1))


Z.

cdrick wrote:

When we don't find method, we reinvent the wheel. Ok, so this is what
happened with my quest on rounding float. So here is another example.
I needed a method subsets ... to have all subsets of a Set...

#(1 2 ) asSet subsets  - a Set(a Set(1 2) a Set(1) a Set(2) a Set()) 

Is there already something ? I couldn't find so here is what I did ?
probably hackish (because of the binary mask) so feel free to comment
;)

Setsubsets
| subsetsSize subsets workArray |
workArray := self asArray.
subsetsSize := 2 raisedTo: self size.
subsets := OrderedCollection new.
1 to: subsetsSize do: [:ea |
subsets add: ((workArray masquedBy: (ea printStringBase: 2))
asSet)]. masque par une conversion binaire
^subsets asSet  could be an array of sets

ArrayedCollectionmasquedBy: aBitString
| result entry bitString |
entry := self reverse.
bitString := aBitString reverse.
result := OrderedCollection new.
1 to: (self size) do: [:ea |
 ((bitString  at: ea ifAbsent: []) = $1) ifTrue: [result add: 
(entry
at: ea)]].
^result reverse

Cédrick




___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners


___
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners