Makes sense, think we arrived at the same spot (narrowing the conflict down
to the commit). The comment about responsibility for readers and writers
was with respect to encapsulation (because it is response we can fix it one
location).

It is just hard to talk clearly about this as we have two levels of readers
and writers (direct on the source used by the commit, and indirect wrappers
handed out per transacation).

If I can take a moment to build on your two transacation example...

The adding of f1 and f2 makes sense and will work if we introduce
read/write locks.

Two transacations updating u1 and u2 would also work.

Two transactions both updating u3 would conflict++ with the last one being
successful (as you would expect). I am willing to live with this...

The only real gotcha is the first transaction doing a delete d4, and the
second transaction doing an updated u4. As Diff tracks deletes and updates
separately it should work but I would want to confirm with a test case.

Reference:
-
http://docs.geotools.org/stable/javadocs/org/geotools/data/Diff.html

++ the generation one wfs data store stored a filter for each update
(rather than a fid) as it built up a transaction document. This is a
slightly better approach as it prevents a conflict when the first
transaction changes the data such that the seconds transacation update no
longer applies. Think of the first t1 update changing measure from 10 to 3,
and the second transaction t2 deleting all features with measure > 10. This
is a corner case of a corner case and I would recommend going with it as a
known limitation.


On Fri, Dec 7, 2018 at 10:00 AM Andrea Aime <andrea.a...@geo-solutions.it>
wrote:

> On Fri, Dec 7, 2018 at 6:42 PM Jody Garnett <jody.garn...@gmail.com>
> wrote:
>
>> Yeah I am aware of the gap, we did not implement a two phase commit.
>>
>> Can we introduce those read/write locks into TransacationStateDiff? I
>> thought the object was already responsible for producing reader and writer
>> wrappers which should give us a clean way to do it?
>>
>
> I believe there is a bit of confusion, let me try with an example. Say we
> have two threads using the same data store, one doing
> only reads (continuously, for the sake of example), the other doing one
> write with a transaction.
> When the transaction gets committed by the second thread, the shapefile
> gets read, written out in a temp file with modifications, and then
> the original shapefile gets replaced. During that replace moment, the
> first thread, the reader, still goes, but the file it has below becomes
> inconsistent, invalid, or worse, because it's being modified.
> The issue now happens also between two writing threads, each holding a
> different transaction, because they can commit close enough
> that the two rewrites happen at the same time, or comically, one thread
> might be reading out the original file to apply the diff while the other
> is writing on it.
>
> To work, the read/write locking mechanism has to consider all readers and
> all writers, no matter what transaction they come from.
> The write can take its dear time, but when the file replacement happens,
> then it has to be a "stop the world", with no reader (pure or otherwise)
> working,
> with no other writer working either. This calls for a read/write lock, so
> that writers cannot do their work while readers are busy using the file.
>
> Now that I think about it though, two threads committing their transaction
> must still wait on each other from the beginning of the write out
> to temp file, otherwise the second will do its work without considering
> the modifications done by the first one.
> In other words, if t1 adds f1, and t2 adds f2, I expect that when both
> transactions are committed both f1 and f2 show up in the final file,
> no matter what the timings of the threads are.
>
> In order to get there, and I'm aware it limits scalability, one would
> really have to grab the write lock before beginning to apply the diff.
> Scalability wise I'm not too worried, if one is using property, csv or
> shapefile they are not really concerned about scalability of concurrent
> writes to start with, or not? :-D
>
> Makes sense?
>
> Cheers
> Andrea
>
> ==
>
> GeoServer Professional Services from the experts! Visit
> http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf
> Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa
> <https://maps.google.com/?q=Via+di+Montramito+3/A%0D%0A55054++Massarosa&entry=gmail&source=g>
> (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549
> http://www.geo-solutions.it http://twitter.com/geosolutions_it
> ------------------------------------------------------- *Con riferimento
> alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 -
> Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni
> circostanza inerente alla presente email (il suo contenuto, gli eventuali
> allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i
> destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per
> errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le
> sarei comunque grato se potesse darmene notizia. This email is intended
> only for the person or entity to which it is addressed and may contain
> information that is privileged, confidential or otherwise protected from
> disclosure. We remind that - as provided by European Regulation 2016/679
> “GDPR” - copying, dissemination or use of this e-mail or the information
> herein by anyone other than the intended recipient is prohibited. If you
> have received this email by mistake, please notify us immediately by
> telephone or e-mail.*
>
-- 
--
Jody Garnett
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to