[
https://issues.apache.org/jira/browse/JENA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118642#comment-13118642
]
Paolo Castagna commented on JENA-126:
-------------------------------------
Stephen, why not to have a threshold below which we never spill (as with the
ThresholdPolicyCount) but once we pass that threshold we start checking how
much free memory we have (every N items) and if the amount of free memory goes
below a certain percentage we spill. It is a simplistic approach (maybe too
simple to become "stupid"), but could it possibly work? This way we would not
need to estimate sizes and/or have a memory manager.
> Change temporary table threshold policy from count to memory size
> -----------------------------------------------------------------
>
> Key: JENA-126
> URL: https://issues.apache.org/jira/browse/JENA-126
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Stephen Allen
>
> The "workCount" setting for temporary table sizes is not a good configuration
> option. Binding sizes could potentially vary from as little as 32 bytes (8
> byte ref to the binding + 8 byte ref to a variable + 8 byte nodeID + 8 byte
> object overhead), to some bindings with multi-megabyte strings. Asking the
> user to know which one it is likely to be, and then how that count translates
> into memory usage (the real resource we are attempting to control) is already
> way too much IMO.
> OK, so what the user wants is a way to specify the amount of memory that can
> be used by each query operator for temporary tables [1][2][3]. Hmm, wait, no
> what he maybe wants is a way to specify a the total memory used for temporary
> tables per query? No, maybe he wants to specify it for the whole query
> engine.
> But that last paragraph is not accurate. What he *really* wants is a system
> that answers all of his queries for whatever data he has as fast as possible.
> He doesn't want to have to configure any parameters. Unfortunately, this is
> a really hard dynamic optimization problem so we foist it off on the user,
> hoping he'll be able to come up with some value.
> We need to decide on what we want to use as a config parameter. I believe it
> should be a "workMem" or "tmpTableSize" setting that specifies the max memory
> usage of a temporary table before it is converted into an on-disk table.
> [1] This is what most DB systems provide, specifically PostgreSQL and MySQL
> both have per operator temporary table sizes. PostgreSQL calls the setting
> "work_mem" and MySQL calls it "tmp_table_size"
> [2] http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html
> [3] http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira