Hi Ying,
What is the next process step for GSoC this year?
As mentor, you probably want to see this project as starting now. As
you know, I don't have time to mentor this year and so can't really
guarantee technical invovement at short notice.
The proposal will updating for the comments; it also needs to be made
consistent. This is better now as part of the submission process, not
in the bonding stage. It'll improve the proposal evaluation.
e.g,. the javacc grammar changes should be done first. Not much point
hacking the generated java parser (it'll be over written by the grammar
compiler).
e.g. Documentation should arrive with a deliverable, not later (writing
the document, which isn't going to be very large, helps check the design).
Ying - do you want to work with Qihong to do that? As discussion on dev@?
The first half is ARQ changes, the second is Fuseki changes (Fuseki2) -
the first half is larger so it may not split at the mid-term evaluation.
What will be important for this project is regular email traffic to
dev@. It's all about making a smooth path for change. This isn't a new
module, this is making changes to an area users depend on already.
Several people here will want to know what's happening, hopefully
comment; we should break the deliverables up to get them in piece by
piece, not wait until the end.
Using github and generating pull requests throughout the project will
work best. There needs to be a fallback in case github is not
accessible after our experiences of last year.
Qihong - do you have any questions on the process?
Andy
On 07/04/15 15:00, Ying Jiang wrote:
Hi Andy,
Thanks for answering Qihong's questions! JENA-491 is not original from
my idea. So, I'm very grateful if you can help the student to clarify
the project goals and the scopes. Then, as the mentor, I can push the
project going on when it starts, with technical assistance regarding
Jena.
For the first question, is it OK to expose Quad to end user in
querying? I mean, we have 2 layers of Jena API: the higher one of
Model/Statement/Resource, and the underlying Graph/Triple/Quad/Node.
It makes sense to me if we encourage the end users using the former as
much as possible. Currently, in the API, we already have:
Iterator<Triple> QueryExecution.execConstructTriples(). I have the
same doubt with it. What's your opinion?
Best,
Ying Jiang
On Sun, Apr 5, 2015 at 2:04 AM, Andy Seaborne <[email protected]> wrote:
On 03/04/15 03:47, Qihong Lin wrote:
Hello Andy,
It's submitted in time.
Good.
Ying - what is the next process step?
I saw your notes, thanks. Here're some further
questions.
1) API of QueryExecution
Does the API look like:
- Iterator<Quad> QueryExecution.execConstrucQuads()
- Dataset QueryExecution.execConstructDataset()
Both. (One builds on the other anyway.)
It should mirror how execConstruct/execConstructTriples are done unless
there is a very good reason not to.
2) master.jj
How does master.jj generate arq.jj? What tool? You mentioned "is
processed with cpp". What's cpp?
cpp is the C preprocessor (yes!!) It rewrites one text file to another text
file. ARQ does not cpp macros, it is just using defined symbols ARQ and
SPARQL_11 to put in different blocks of text.
It's also why there are no comments in arq.jj. cpp removes them and blank
lines (the alternative is lots of blank lines - it's yuk).
The script to drive it is jena-arq/Grammar/grammar (it's a bash script - I
don't know how well it runs on MS Windows - it used to using cygwin). The
script directs the output to the right place in the java source code.
If you have trouble running it, edit arq.jj then run javacc.
The SyntaxARQ parts that are not SPARQL 1.1, are in sections
#ifdef ARQ
....
#endif
3) query string sytax
I went through TriG syntax.
- For our query string, can it construct a Dataset with multiple
graphs, or just one default/named graph?
Multiple.
A dataset is one default and zero or more named graphs.
CONSTRUCT
{ GRAPH :g1 { ?s ?p ?o }
GRAPH :g2 { ?s ?p ?o }
GRAPH ?g { ?s ?p ?o }
} ...
only in real use the patterns will be bigger.
- Shall we consider using variables for named graphs? I mean "?g", not
":g":
CONSTRUCT {
# Named graph
GRAPH ?g { ?s :p ?o }
} WHERE
Yes.
Class Template can be made to work purely on quads. Where it current uses
BasicPattern (which is triples), use QuadPattern.
That will work for non-extended SPARQL 1.1 as well because "CONSTRUCT { no
use of GRAPH }" will give a quad pattern of all quads for the default graph.
There is a magic constant for "this quad is for the default graph" - see
class Quad.
So you don't need tow different sets of machinary - update Template to
handle quads and the syntactic restrictions of SPARQL_11 will stop it
getting named graph in CONSTRUCT.
execConstruct/execConstructTriples then work on the default graph of a
dataset.
You may find it helpful to look at the TriG parser output. That parser is
not Javacc (it's much faster). It's informing but you will need to write
the javacc for this project.
regards,
Qihong
Andy