Hi Ying,
It's the mid-point of the GSoC programme so it's a good time to assess
the state of the project. It looks close to the plan and I'd like you to
(briefly) write-up how the project is going. Check you are getting what
you want out of the project as well. It is not just code production. Is
the rest of the plan looking right still?
Looking on at the repository, there are a few things I'd like to see:
1/ More tests - tests should be structured so each tests a specific
thing so when/if there are test failures, it's easier to see what might
the the root cause.
2/ Examples and documentation
3/ Evaluation :
For example, is the property table specialisation resulting in a smaller
storage cost? And, iteratively, can the design be changed to be more
compact? Maybe some indexing isn't needed; maybe a different way to
index the same access patterns would take less space.
Other:
The code can be packaged under org.apache.jena. We're trying to avoid
com.hp.hpl.jena.
A specific question:
Access by subject is an important use case even when the rows are blank
nodes. It will matter for SPARQL and even in since - "find by subject
column/value then get row by subject", that is two graph.find calls,
seems a reasonable access pattern.
I could not see that graph.find(subject, ANY, ANY) is using
PropertyTable.getRow in the graph.find codepath and I expected it would
be. Did I miss something?
Andy