Thanks, I hope the community can help progress this more, and that this
is just a very small start. I'm sure I'm not the only one interested in
the possibilities of column stores as RDF repositories.
"I'll be back...." :-)
On 11/10/11 12:15, Paolo Castagna wrote:
Hi nat lu
nat lu wrote:
HI,
Had a quick look - the patch seems to want to delete (in entirety) a
couple of existing SDB classes which I had made minimal modifications
to, so I've re-checked out SDB 1.3.4 and created a new patch which I can
upload later. So - the first attachment in Jira shouldnt be used.
Great. Thanks for double checking and acting on this.
This will make life easier to whoever review your patch.
Secondly, I haven't recreated all the unit test code that exists for
other RDBMS supported by SDB because, simply by the looks of it, there's
quite a bit of it, and I haven't had time so far.
Ack.
A patch without tests is better than no patch at all.
However, tests are really important and should be considered part of the
process of submitting a patch.
They certainly help us a lot and can make a patch going into trunk faster.
I've also, as I said before, not used Monet with any data yet, just got
it to the point that SDBConfig doesn't fall over, and the Jira issue
raised is simply to record the desire to support it (and perhaps other
column stores if conceptually they prove to have merit when used as an
RDF repo).
Ack.
Others wanting to have MonetDB will see your issue and might come and
help you with that.
I personally don't use SDB much and I've never used MonetDB before.
So, personally, I'd prefer (I assume you-all as well) that nothing
happens too quickly with this, until I or someone else gets some time to
do some work and produce the unit tests.
Sure.
We tend not to commit untested code. :-)
Having low quality or unfinished code committed to trunk is a bad practice
and we avoid that.
Rules have exceptions though, for example, I recently committed an RDF/JSON
writer which isn't complete/fully tested, however it's not going to affect
users in any way (see JENA-135). I did that to make Rob's life easier in case
he want to have an RDF/JSON writer as well as a parser (which he contributed).
Nat, don't take these last two my messages too much personally, I am just
taking the opportunities to send across a few IMHO important messages:
- we welcome patches and new ideas/features (we really do!)
- proper patches, well tested and which apply cleanly make our life
so much easier (so we appreciate any effort in this direction).
It's very difficult to get it wrong with svn diff> JENA-XYZ.patch. :-)
Thanks again and have fun with SDB + MonetDB!
Paolo
On 11/10/11 08:07, Paolo Castagna wrote:
Hi,
first of all, thank you for your patch.
I had a quick look, but I did not try to apply it (yet).
May I ask how you created your patch?
We added a section on the Getting Involved page on the Jena website:
"Patches should be attached to issues in Jira (click on
More Actions> Attach Files). To create a patch you can simply
use the command:
svn diff> JENA-XYZ.patch
Please, inspect your patch and make sure it includes all (and only)
the relevant changes for a single issue. Don't forget tests! If you
want to test if a patch applies cleanly you can use:
patch -p0< JENA-XYZ.patch
If you use Eclipse: right click on the project name in Package
Explorer,
select Team> Create Patch or Team> Apply Patch."
-
http://jena.staging.apache.org/jena/getting_involved/#submit_your_patches
It really helps if a patch contains only the lines you added|removed
and it applies cleanly. It saves a lot of time and speed-up reviewing
it.
Your patch may be perfectly fine, but I wanted to take the opportunity
to send the message across.
I am really curious to run a few benchmarks when it's done to compare
MonetDB with a more traditional SQL system.
By the way, about benchmarks Andy is (secretely) working on this:
https://svn.apache.org/repos/asf/incubator/jena/Experimental/JenaPerf/trunk/
I have not time to try it yet, but it seems very interesting. :-)
Thank you again for the new interesting feature.
Paolo
nat lu wrote:
Added, with patch file, at
https://issues.apache.org/jira/browse/JENA-134
I have made no more progress on testing it out other than sdbconfig so
far, hope too soon.
On 09/09/11 16:43, Paolo Castagna wrote:
Hi,
why don't you open a new JIRA issue (as a New Feature) for this?
https://issues.apache.org/jira/browse/JENA
You can then attach a patch to it. This way others can look at what
you have done so far (and maybe help you out).
Thanks for your help,
Paolo
nat lu wrote:
I made a start, and tried to use one of the existing flavours, but
ended
up creating one for MonetDB - combination of derby and DB2. It doesnt
like longs or unbounded varchars.
So, I got as far as getting SDBConfig to complete, but havent done an
sdbload yet
On 09/09/11 10:37, Andy Seaborne wrote:
On 04/09/11 13:03, nat lu wrote:
I'm going to give it a go sometime soon and report back on my
non-scientific findings. Your point about the small number of
columns is
well made, but the research paper cited earlier also mentions
this and
reports that because of column store optimisations even when they
vertically partitioned their data rather than using a property-table
approach they still saw good improvement. However, again, I'm no
column
store expert so perhaps I'm missing some point here :-). Anyway,
time to
"suck it and see@, all in the name of progress of course.
On 03/09/11 16:29, David Jordan wrote:
I have not used a column-oriented database, but I am somewhat
familiar
with them. My understanding of them is that the storage is
partitioned
on a column basis, such that there is no physical clustering
together
of all the columns for a given row. An advantage of this would
be in
the case where you have tables with many columns, but the
particular
application only needs a small subset of columns.
With the SDB representation of triples (3 columns) and quads (4
columns), and access typically based on having a specific value for
one or two of the columns, I am not so sure that a column-based
approach would offer any advantage.
But again, I am no expert on these types of databases.
These discussions about alternative datastore representations
RDF/OWL
data are very useful, to gain better understanding of which data
architectures yield the best implementation approach for
high-performance.
p.s. I Monet provides support for JDBC, I would not think much
effort
is needed to support in with SDB.
Shouldn't be too hard :-) SDB targets SQL-92 and there are a few
extension points to cope with the vagaries of different SQL engines.
It's one of the reasons there are ~10 small files to write, to
capture
the uniqueness of each SQL syntax.
Andy