GSoC : CSV PropertyTables : mid-term checkpoint

Andy Seaborne Mon, 23 Jun 2014 13:27:07 -0700

Hi Ying,

It's the mid-point of the GSoC programme so it's a good time to assessthe state of the project. It looks close to the plan and I'd like you to(briefly) write-up how the project is going. Check you are getting whatyou want out of the project as well. It is not just code production. Isthe rest of the plan looking right still?



Looking on at the repository, there are a few things I'd like to see:

1/ More tests - tests should be structured so each tests a specificthing so when/if there are test failures, it's easier to see what mightthe the root cause.


2/ Examples and documentation

3/ Evaluation :

For example, is the property table specialisation resulting in a smallerstorage cost? And, iteratively, can the design be changed to be morecompact? Maybe some indexing isn't needed; maybe a different way toindex the same access patterns would take less space.




Other:

The code can be packaged under org.apache.jena. We're trying to avoidcom.hp.hpl.jena.


A specific question:

Access by subject is an important use case even when the rows are blanknodes. It will matter for SPARQL and even in since - "find by subjectcolumn/value then get row by subject", that is two graph.find calls,seems a reasonable access pattern.

I could not see that graph.find(subject, ANY, ANY) is usingPropertyTable.getRow in the graph.find codepath and I expected it wouldbe. Did I miss something?


        Andy

GSoC : CSV PropertyTables : mid-term checkpoint

Reply via email to