[
https://issues.apache.org/jira/browse/JENA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119377#comment-13119377
]
Stephen Allen commented on JENA-126:
------------------------------------
Paulo, that's a good idea. I've been stuck thinking about the problem in terms
of a full SPARQL server with lots of concurrent requests. I think your idea
could work well when you only have a single databag like in tdbloader. I would
be interested to see how it scales up as the number of bags increases.
> Change temporary table threshold policy from count to memory size
> -----------------------------------------------------------------
>
> Key: JENA-126
> URL: https://issues.apache.org/jira/browse/JENA-126
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Stephen Allen
>
> The "workCount" setting for temporary table sizes is not a good configuration
> option. Binding sizes could potentially vary from as little as 32 bytes (8
> byte ref to the binding + 8 byte ref to a variable + 8 byte nodeID + 8 byte
> object overhead), to some bindings with multi-megabyte strings. Asking the
> user to know which one it is likely to be, and then how that count translates
> into memory usage (the real resource we are attempting to control) is already
> way too much IMO.
> OK, so what the user wants is a way to specify the amount of memory that can
> be used by each query operator for temporary tables [1][2][3]. Hmm, wait, no
> what he maybe wants is a way to specify a the total memory used for temporary
> tables per query? No, maybe he wants to specify it for the whole query
> engine.
> But that last paragraph is not accurate. What he *really* wants is a system
> that answers all of his queries for whatever data he has as fast as possible.
> He doesn't want to have to configure any parameters. Unfortunately, this is
> a really hard dynamic optimization problem so we foist it off on the user,
> hoping he'll be able to come up with some value.
> We need to decide on what we want to use as a config parameter. I believe it
> should be a "workMem" or "tmpTableSize" setting that specifies the max memory
> usage of a temporary table before it is converted into an on-disk table.
> [1] This is what most DB systems provide, specifically PostgreSQL and MySQL
> both have per operator temporary table sizes. PostgreSQL calls the setting
> "work_mem" and MySQL calls it "tmp_table_size"
> [2] http://www.postgresql.org/docs/8.3/static/runtime-config-resource.html
> [3] http://dev.mysql.com/doc/refman/5.0/en/internal-temporary-tables.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira