> Jesper,
> 
> I'd say fork master and submit a pull request. By the way, there's a 
> committers call today (Thursday) from 9-9:30am Eastern (UTC-5) if you want to 
> jump in. I think the agenda is mostly around the upcoming release of 3.6.2, 
> but I'd welcome having either (or both) of you and Nicolas join if you're 
> available.
> 
> https://wiki.duraspace.org/display/FCREPO/2012-11-08+-+Fedora+Committer+Meeting
> 
> Eddie

Hi Eddie.

For some reason I did not receive the above from the mailing list. I
just stumbled accidentally upon it when looking for something else in
the mailing-list archive.

I have created a fork, implemented what I think is necessary, and is
currently testing it out.
I'll submit a pull request soon.

-Jesper


On ons, 2012-11-07 at 16:12 +0100, Jesper Damkjaer wrote:
> I'll like to take a stab at this one.
> I'm not sure though if I can create a branch or if I should create a
> fork.
> Which do You prefer?
> 
> -Jesper 
> 
> On ons, 2012-11-07 at 12:09 +0100, Jesper Damkjaer wrote:
> > Yes, I think it would. The problem is that the PID cannot be determined
> > until just before the call to objectExists(). And moving the
> > pidLock.getWriteLock() up here would essentially do the same as You
> > suggest.
> > 
> > -Jesper
> > 
> > On ons, 2012-11-07 at 10:57 +0000, Nicolas Hervé wrote:
> > > A lock on a unique object representing the PID to be inserted would be 
> > > enough to prevent this, no ?
> > > ;ppvg usxd9ds=liiiiiiiiii 
> > > On 06/11/2012 23:51, Jesper Damkjaer wrote:
> > > > It might be possible to simply just move the call to
> > > > pidLock.getWriteLock() up before the call to objectExists().
> > > > Then the first thread will get the lock and the second thread will be
> > > > rejected. Even if the first thread have released the writelock, then it
> > > > will have registered the object and the second thread will be rejected
> > > > by objectExists().
> > > >
> > > > This may of course give a false result if the first thread for some
> > > > reason fails to actually create the object. Then both threads are
> > > > rejected and no objects are created even though the second thread might
> > > > have succeeded (if it had won the race).
> > > > I believe we have another kind of locking mechanism which is a
> > > > stringLock. This lock doesnt reject the second thread - it just make it
> > > > wait until the first thread releases the lock. If this kind of lock is
> > > > used, then only one of the threads can enter the last part of
> > > > getIngestWriter (assuming the lock is before objectExists()) and if the
> > > > first thread succeeds the second thread will, upon release, get an
> > > > exception from objectExists(). If on the other hand, the first thread
> > > > fails to create an object then, when the lock is released, the second
> > > > thread can try to create the object.
> > > >
> > > > -Jesper
> > > >
> > > >
> > > > On tir, 2012-11-06 at 17:17 -0500, [email protected] wrote:
> > > >> It's starting to sound like we need a pattern that is less 
> > > >> constrictive of resources than the use of "synchronized", but still 
> > > >> maintains some kind of guard against the kind of collision Jesper 
> > > >> describes. A simpleminded thought: perhaps we could insert a 
> > > >> lightweight check against the registry before registerObject() with a 
> > > >> line of execution that backs out any changes if the check fails? 
> > > >> Because it's an ingest, those changes shouldn't be hard to describe or 
> > > >> undo.
> > > >>
> > > >> ---
> > > >> A. Soroka
> > > >> Software & Systems Engineering :: Online Library Environment
> > > >> the University of Virginia Library
> > > >>
> > > >> On Nov 6, 2012, at 4:22 PM, Jesper Damkjaer wrote:
> > > >>
> > > >>> I just had a quick look at the code. As far as I can see (and I may 
> > > >>> not
> > > >>> be right) one problem may be the following:
> > > >>> Assume both threads with same PID are inside the method 
> > > >>> getIngestWriter
> > > >>> and has passed the method-call objectExists(...). At this point 
> > > >>> neither
> > > >>> of the objects exists assuming none of the threads has yet reached
> > > >>> registerObject(...).
> > > >>> If one of the threads then get the writeLock, does what it needs to do
> > > >>> and releases the writeLock before the second thread reaches writeLock,
> > > >>> then the second thread will reach registerObject(...). As far as I can
> > > >>> tell, inside registerObject(...) an SQLException will be thrown since
> > > >>> the row already exists in the database, and the method
> > > >>> unregisterObject(...) will be called actually unregistering a legally
> > > >>> created object.
> > > >>>
> > > >>> Someone with more insight into the code should verify if the above is
> > > >>> correct.
> > > >>>
> > > >>> -Jesper
> > > >>>
> > > >>>
> > > >>>
> > > >>> On tir, 2012-11-06 at 21:44 +0100, Jesper Damkjaer wrote:
> > > >>>> Hi.
> > > >>>>
> > > >>>> I originally opened the https://jira.duraspace.org/browse/FCREPO-1024
> > > >>>> I tried back then to create tests in order to convince myself that 
> > > >>>> there
> > > >>>> would be no problems in removing the synchronization, but that proved
> > > >>>> rather difficult.
> > > >>>> As far as I remember there seemed to be some problems associated with
> > > >>>> removing the synchronization, especially if two documents with the 
> > > >>>> same
> > > >>>> PID were ingested at the same time.
> > > >>>> I'm still interested in looking in to this, and will be happy to 
> > > >>>> look at
> > > >>>> my old branch.
> > > >>>>
> > > >>>> -Jesper
> > > >>>>
> > > >>>>
> > > >>>> On tir, 2012-11-06 at 12:46 -0500, [email protected] wrote:
> > > >>>>> There is a bit of history to this discussion:
> > > >>>>>
> > > >>>>> https://jira.duraspace.org/browse/FCREPO-1024
> > > >>>>>
> > > >>>>> ---
> > > >>>>> A. Soroka
> > > >>>>> Software & Systems Engineering :: Online Library Environment
> > > >>>>> the University of Virginia Library
> > > >>>>>
> > > >>>>> On Nov 6, 2012, at 11:53 AM, Scott Prater wrote:
> > > >>>>>
> > > >>>>>> Nicolas,
> > > >>>>>>
> > > >>>>>> That's very encouraging... this problem has been on my radar for 
> > > >>>>>> years.
> > > >>>>>>
> > > >>>>>> Not being very familiar with the details of the internals of 
> > > >>>>>> DefaultDOManager, I can't comment concretely on the merits of your 
> > > >>>>>> patch, but I would be especially interested in tests that provoke 
> > > >>>>>> race conditions and concurrent writes, and makes sure Fedora 
> > > >>>>>> handles those situations cleanly: even though PID generation is 
> > > >>>>>> synchronized, would it still be possible for Fedora to attempt to 
> > > >>>>>> write the same datastreams, provision identical values in the 
> > > >>>>>> resource index, etc. in different threads once the object has been 
> > > >>>>>> created and the PID assigned? As I dimly recall, I think the 
> > > >>>>>> getIngestWriter method was synchronized because there were some 
> > > >>>>>> problems in the early days with concurrent writes. That may be a 
> > > >>>>>> non-issue now, though, if PID generation is synchronized (the 
> > > >>>>>> Akubra storage layer is designed to handle writes more robustly).
> > > >>>>>>
> > > >>>>>> Another potential issue: if you're creating a hierarchical tree of 
> > > >>>>>> objects in parallel, and the ingest of a parent object fails: you 
> > > >>>>>> could be left with orphaned children. But that's something that 
> > > >>>>>> should be checked and handled higher up the stack, with some 
> > > >>>>>> transaction/rollback logic.
> > > >>>>>>
> > > >>>>>> -- Scott
> > > >>>>>>
> > > >>>>>> On 11/06/12, Nicolas Hervé  wrote:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> I think I've found the main problem with massive parallel 
> > > >>>>>>> ingestion.
> > > >>>>>>>
> > > >>>>>>> I'm working with the last github snapshot.
> > > >>>>>>>
> > > >>>>>>> In org.fcrepo.server.storage.DefaultDOManager, the 
> > > >>>>>>> getIngestWriter method should not be synchronized as it seems 
> > > >>>>>>> there is only a single instance of that class for the server. The 
> > > >>>>>>> internal objects of this class seem to be correctly synchronized 
> > > >>>>>>> (pid generation) of new objects are recreated on each call 
> > > >>>>>>> (inside Translator, a new DODeserializer is created and the same 
> > > >>>>>>> happens inside the Validator).
> > > >>>>>>>
> > > >>>>>>> I've tested with FOXML ingestion and now almost all the CPUs are 
> > > >>>>>>> used. I've not been deeper to check that every inserted object is 
> > > >>>>>>> not corrupted, but after a quick look, it seems OK. I guess the 
> > > >>>>>>> same kind of patch could also apply on object deletion.
> > > >>>>>>>
> > > >>>>>>> If one of you that better understand that part could have a look, 
> > > >>>>>>> it seems it would be a nice patch, not too hard to test, with 
> > > >>>>>>> great performance improvements.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>>
> > > >>>>>>> Nicolas HERVE
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On 28/09/2012 11:25, Nicolas Hervé wrote:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> indeed, it seems we are exactly in the same configuration 
> > > >>>>>>>> (millions of DO with some metadata and external content) with 
> > > >>>>>>>> almost the same hardware. I've not identified the bottleneck in 
> > > >>>>>>>> the massive parallel ingestion process right now, but I highly 
> > > >>>>>>>> suspect a synchronized portion of code somewhere in the chain. I 
> > > >>>>>>>> hope Edwin could say more about this :-)
> > > >>>>>>>>
> > > >>>>>>>> For the querying of dc fields, index have to be created in the 
> > > >>>>>>>> Mysql schema and SQL queries are far from being optimal. 
> > > >>>>>>>> Currently I only patched for my own purposes (my datamodel / my 
> > > >>>>>>>> queries) and I bypass some code portions in the following 
> > > >>>>>>>> classes :
> > > >>>>>>>>
> > > >>>>>>>> org.fcrepo.server.search.FieldSearchSQLImpl
> > > >>>>>>>> org.fcrepo.server.search.FieldSearchResultSQLImpl
> > > >>>>>>>>
> > > >>>>>>>> I'm really new to Fedora Commons but, from what I understand, 
> > > >>>>>>>> these SQL part is quite old. Changing them for optimizations 
> > > >>>>>>>> purposes could imply behaviour changes for other people. That's 
> > > >>>>>>>> why I don't think simple patches could do the job. It would need 
> > > >>>>>>>> a complete refactoring. That could only be done with a global 
> > > >>>>>>>> point of view on the different way this classes are used in the 
> > > >>>>>>>> different contexts where Fedora instances are running.
> > > >>>>>>>>
> > > >>>>>>>> Feel free to contact me to discuss this more precisely.
> > > >>>>>>>>
> > > >>>>>>>> Regards,
> > > >>>>>>>>
> > > >>>>>>>> Nicolas HERVE
> > > >>>>>>>> +33 1 49 83 21 66 (GMT + 2)
> > > >>>>>>>>
> > > >>>>>>>> On 27/09/2012 18:23, Jason V wrote:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> Hi Nicolas,
> > > >>>>>>>>>
> > > >>>>>>>>> My name is Jason Varghese and I'm a senior developer at the New 
> > > >>>>>>>>> York Public Library. I think you are doing work similar to what 
> > > >>>>>>>>> I am presently doing based on reading some of your posts.
> > > >>>>>>>>>
> > > >>>>>>>>> We have a relatively large scale Fedora implementation here. 
> > > >>>>>>>>> We've had all the hardware in place for some time and are in 
> > > >>>>>>>>> the process of migrating from a large homegrown repository to a 
> > > >>>>>>>>> Fedora based platform. We have a single Fedora ingest machine 
> > > >>>>>>>>> and 3 Fedora readers. The ingest machine alone is 4 x 6 core 
> > > >>>>>>>>> processors w/ 128GB RAM. I'm in the process of generating about 
> > > >>>>>>>>> 1 million+ digital objects and attaching to each DO all the 
> > > >>>>>>>>> metadata (as managed content datastreams) and all the digital 
> > > >>>>>>>>> assets (as external content datastreams). The digital assets 
> > > >>>>>>>>> currently are about 183 TB of content (this is replicated at 
> > > >>>>>>>>> two sites). I have a multithreaded java client I wrote to 
> > > >>>>>>>>> accomplish the task for the Fedora ingest/DO generation and I 
> > > >>>>>>>>> use the Mediashelf REST API client for connectivity to Fedora. 
> > > >>>>>>>>> I was able to successfully ingest 10's of thousands of digital 
> > > >>>>>>>>> objects, but really need ensure this process performs optimally 
> > > >>>>>>>>> and scales for millions of objects. What bottlenecks were you 
> > > >>>>>>>>> able to identify when running your multithreaded ingest 
> > > >>>>>>>>> process? Look forward to learning/sharing experiences from this 
> > > >>>>>>>>> process with you and the community and possibly collaborating. 
> > > >>>>>>>>> Thanks
> > > >>>>>>>>>
> > > >>>>>>>>> Jason Varghese
> > > >>>>>>>>> NYPL
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> ------------------------------------------------------------------------------
> > > >>>>>>>>  Got visibility? Most devs has no idea what their production app 
> > > >>>>>>>> looks like. Find out how fast your code is with AppDynamics 
> > > >>>>>>>> Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? 
> > > >>>>>>>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> _______________________________________________ 
> > > >>>>>>>> Fedora-commons-users mailing list 
> > > >>>>>>>> [email protected] 
> > > >>>>>>>> <[email protected]> 
> > > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> > > >>>>>>>>
> > > >>>>>> --
> > > >>>>>> -- 
> > > >>>>>> Scott Prater
> > > >>>>>> Library, Instructional, and Research Applications (LIRA)
> > > >>>>>> Division of Information Technology (DoIT)
> > > >>>>>> University of Wisconsin - Madison
> > > >>>>>> [email protected]
> > > >>>>>>
> > > >>>>>> ------------------------------------------------------------------------------
> > > >>>>>> LogMeIn Central: Instant, anywhere, Remote PC access and 
> > > >>>>>> management.
> > > >>>>>> Stay in control, update software, and manage PCs from one command 
> > > >>>>>> center
> > > >>>>>> Diagnose problems and improve visibility into emerging IT issues
> > > >>>>>> Automate, monitor and manage. Do more in less time with Central
> > > >>>>>> http://p.sf.net/sfu/logmein12331_d2d
> > > >>>>>> _______________________________________________
> > > >>>>>> Fedora-commons-developers mailing list
> > > >>>>>> [email protected]
> > > >>>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > >>>>>
> > > >>>>> ------------------------------------------------------------------------------
> > > >>>>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > >>>>> Stay in control, update software, and manage PCs from one command 
> > > >>>>> center
> > > >>>>> Diagnose problems and improve visibility into emerging IT issues
> > > >>>>> Automate, monitor and manage. Do more in less time with Central
> > > >>>>> http://p.sf.net/sfu/logmein12331_d2d
> > > >>>>> _______________________________________________
> > > >>>>> Fedora-commons-developers mailing list
> > > >>>>> [email protected]
> > > >>>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------------------------------------------------------
> > > >>>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > >>>> Stay in control, update software, and manage PCs from one command 
> > > >>>> center
> > > >>>> Diagnose problems and improve visibility into emerging IT issues
> > > >>>> Automate, monitor and manage. Do more in less time with Central
> > > >>>> http://p.sf.net/sfu/logmein12331_d2d
> > > >>>> _______________________________________________
> > > >>>> Fedora-commons-developers mailing list
> > > >>>> [email protected]
> > > >>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > >>>
> > > >>>
> > > >>>
> > > >>> ------------------------------------------------------------------------------
> > > >>> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > >>> Stay in control, update software, and manage PCs from one command 
> > > >>> center
> > > >>> Diagnose problems and improve visibility into emerging IT issues
> > > >>> Automate, monitor and manage. Do more in less time with Central
> > > >>> http://p.sf.net/sfu/logmein12331_d2d
> > > >>> _______________________________________________
> > > >>> Fedora-commons-developers mailing list
> > > >>> [email protected]
> > > >>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > >>
> > > >> ------------------------------------------------------------------------------
> > > >> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > >> Stay in control, update software, and manage PCs from one command 
> > > >> center
> > > >> Diagnose problems and improve visibility into emerging IT issues
> > > >> Automate, monitor and manage. Do more in less time with Central
> > > >> http://p.sf.net/sfu/logmein12331_d2d
> > > >> _______________________________________________
> > > >> Fedora-commons-developers mailing list
> > > >> [email protected]
> > > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > > ------------------------------------------------------------------------------
> > > > LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > > Stay in control, update software, and manage PCs from one command center
> > > > Diagnose problems and improve visibility into emerging IT issues
> > > > Automate, monitor and manage. Do more in less time with Central
> > > > http://p.sf.net/sfu/logmein12331_d2d
> > > > _______________________________________________
> > > > Fedora-commons-developers mailing list
> > > > [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > > 
> > > ------------------------------------------------------------------------------
> > > LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > > Stay in control, update software, and manage PCs from one command center
> > > Diagnose problems and improve visibility into emerging IT issues
> > > Automate, monitor and manage. Do more in less time with Central
> > > http://p.sf.net/sfu/logmein12331_d2d
> > > _______________________________________________
> > > Fedora-commons-developers mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------------------
> > LogMeIn Central: Instant, anywhere, Remote PC access and management.
> > Stay in control, update software, and manage PCs from one command center
> > Diagnose problems and improve visibility into emerging IT issues
> > Automate, monitor and manage. Do more in less time with Central
> > http://p.sf.net/sfu/logmein12331_d2d
> > _______________________________________________
> > Fedora-commons-developers mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> LogMeIn Central: Instant, anywhere, Remote PC access and management.
> Stay in control, update software, and manage PCs from one command center
> Diagnose problems and improve visibility into emerging IT issues
> Automate, monitor and manage. Do more in less time with Central
> http://p.sf.net/sfu/logmein12331_d2d
> _______________________________________________
> Fedora-commons-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to