Re: normalising the rdb database schema

Julian Reschke Wed, 17 Aug 2016 08:19:21 -0700

On 2016-08-17 06:09, Chetan Mehrotra wrote:

Hi Tomek,


I like the idea of revisiting our current schema based on usage so
far. However couple of points around potential issue with such a
normalized approach

- This approach would lead to a thin and loooong table. As noted in
[1] in a small repo ~14 M nodes we have ~26 M properties. With
multiple revisions (GC takes some time) this can go higher. This would
then increase the memory requirement for id index. Memory consumption
increases further with id+key+revision index. For any db to perform
optimally the index should fit in ram. So such such a design would
possibly reduce the max size of repository which can be supported
(compared to older one) for given memory

- The read for specific id can be done in 1 remote call. But that
would involve select across multiple rows which might increase the
time taken as it would involve 'm' index lookup and then 'm' reads of
row data for any node having 'n' properties (m > n assuming multiple
revision for property present)

Yes, that scared me as well when I thought about the schema a long timeago. (Maybe it would make sense to consider something less drastic,keeping the JSON, but having one row per revision?)

Another thing to think about is that with the current scheme, updating adocument maps to a row update. With the proposed scheme, updating adocument always requires inserting new rows. That sounds like somethingthat could be very costly.

May be we should explore the json support being introduced in multiple
dbs.  DB2 [2], SQL Server [3], Oracle [4], Postgres [5], MySql [6].
Problem here is that we would need DB specific implementation and also
increases the testing effort!
...

That, and that the JSON support really doesn't help that much, as itdoesn't address document updates.


Best regards, Julian

Re: normalising the rdb database schema

Reply via email to