On 2016-08-17 06:09, Chetan Mehrotra wrote:
Hi Tomek,

I like the idea of revisiting our current schema based on usage so
far. However couple of points around potential issue with such a
normalized approach

- This approach would lead to a thin and loooong table. As noted in
[1] in a small repo ~14 M nodes we have ~26 M properties. With
multiple revisions (GC takes some time) this can go higher. This would
then increase the memory requirement for id index. Memory consumption
increases further with id+key+revision index. For any db to perform
optimally the index should fit in ram. So such such a design would
possibly reduce the max size of repository which can be supported
(compared to older one) for given memory

- The read for specific id can be done in 1 remote call. But that
would involve select across multiple rows which might increase the
time taken as it would involve 'm' index lookup and then 'm' reads of
row data for any node having 'n' properties (m > n assuming multiple
revision for property present)

Yes, that scared me as well when I thought about the schema a long time ago. (Maybe it would make sense to consider something less drastic, keeping the JSON, but having one row per revision?)

Another thing to think about is that with the current scheme, updating a document maps to a row update. With the proposed scheme, updating a document always requires inserting new rows. That sounds like something that could be very costly.

May be we should explore the json support being introduced in multiple
dbs.  DB2 [2], SQL Server [3], Oracle [4], Postgres [5], MySql [6].
Problem here is that we would need DB specific implementation and also
increases the testing effort!
...

That, and that the JSON support really doesn't help that much, as it doesn't address document updates.

Best regards, Julian

Reply via email to