On 01/02/11 16:36, Stephen Allen wrote:
Andy,

I have started implementing the serializer (SinkBindingOutput) by using
org.openjena.riot.SinkQuadOutput as a guide and using OutputLangUtils to
print out the variable/values.  I created the deserializer (LangBindings) by
extending org.openjena.riot.lang.LangNTuple.  I'm using the paired var/value
format you described below.  For now I'll start with a straightforward
implementation with no compression, but like your ideas in this area.  I'll
try to do some measurements to see if any other compression is beneficial.

Sounds good.


I did not define an org.openjena.riot.Lang enum for the deserializer
(because it isn't an RDF language) but I was planning on putting the
LangBindings class in the org.openjena.riot.lang package.

As good a place as any at the moment.

I've just digging out some code that does tuple I/O from an experiemental system a while ago (a clustered query engine ..).


For determining when to spill bindings to disk, there are a few options (in
order of least difficulty):
1) Store binding objects in an list, and then spill them to disk once the
list size passes a threshold
2) Start serializing bindings immediately into something like
DeferredFileOutputStream [1] that will retain the data in memory until it
passes a memory threshold
3) Do 1), but try to calculate the size of the bindings in memory and use a
memory threshold instead of a number of bindings threshold

I think 1) should be sufficient if we come up with a reasonable guess for
the threshold.  Option 2) lets you get much better control over the memory
management, but I think the cost of unnecessarily serializing/deserializing
small queries may be too high.

Persoanlly, I'd encapsulate this in a policy object and have different implementations. Well, may just one implementation - case 1 with a settable threshold for testing. (3) then becomes a smarter policy object to be done later, if needed.

I share your concern on (2) about the serialization to memory costs.

        Andy

Reply via email to