On Sat, Mar 17, 2012 at 03:27:12PM -0700, Ben Pfaff wrote:
John Darrington <[email protected]> writes:
> Here's another idea that comes to mind: is there a maximum number
> of categories that makes sense? Would a "max categories" setting
> defaulting to, say, 1000, still allow most users to get real work
> done in realistic cases?
>
> 1000 would be much too high. How many machines can allocate 64GB of
heap?
> "Realistic cases" is somewhat subjective. But I cannot envisage that in
> most instances more than 20 categories would be involved - but who knows?
I mean, 1000 categories per instance, not 1000 instances.
I think I've lost track of what we mean by "instance".
Presumably, 1000 categories do not need much memory (a few
kilobytes?) unless the space for categories is, say, O(n**2) in
the number of categories (I haven't looked).
Yes. If there is a very large number of categories, then it is highly likely
that each category contains only a small number of cases. But we don't
know that a priori.
The partcular problem I encountered stems from the fact, that for every
category, I'm calling sort_create_writer, which in turn allocates space for a
large number of cases - even if they are never used. I think I can solve
that problem by sorting the cases before categorising. But I'm wondering if
similar situations will arise where such an optimisation cannot be done.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ pspp-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/pspp-dev
