On Mon, 30 Jan 2012 14:26:00 +0000, Paolo Castagna 
<[email protected]> said:

    paolo> Welcome William.

Thank you.

    paolo> When possible, I do this sort of things locally. I get a
    paolo> copy of the data I need or small slices of it, I load
    paolo> everything in TDB and run my SPARQL queries locally.

Right. However for my applications (!!!) I don't want to do this
because:

  1. I cannot count on the remote data being available in bulk since
     some publishers habitually only make SPARQL endpoints available
     and not dumps.
  2. I don't know beforehand which slices of the data I will need,
     if I knew this I wouldn't need to run the query.
  3. I cannot count on having my own temporary local store to put
     intermediate results into.

    paolo> Looking at HttpQuery.java [1] that seems to me to be the
    paolo> case (and it is probably ok for the majority of use cases).

Perhaps. Although fixing this would improve performance by a
significant amount and should not break anything existing. And it
ought to be simple.

    paolo> See also/related:

    paolo>  "This feature is a basic building block to allow remote
    paolo> access in the middle of a query, not a general solution to
    paolo> the issues in distributed query evaluation...

Yes, I realise this and have read that caveat. I understand quite well
about pattern selectivity and the like.

I am perfectly happy for the query to take a long time to run as a
batch job as long as it doesn't consume a lot of RAM and that a
recoverable failure (e.g. of the HTTP response code 5XX kind not 4XX)
doesn't cause the whole thing to fail and lose the work already done.
"doesn't consume a lot of RAM" probably means "write results to
persistant storage or a file descriptor incrementally". That would
make Jena/ARQ useable in my application.

Cheers,
-w

Attachment: pgp35dDX49bN0.pgp
Description: PGP signature

Reply via email to