On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote: > Rohit Sarkar <rohitsarkar5...@gmail.com> writes: > > > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: > >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: > >> > >> > Hey, > >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: > >> >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: > >> >> > >> >> > Hey, > >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: > >> >> >> The parserelations command will be used to bulk import patch > >> >> >> relations into > >> >> >> Patchwork using a "patch groups" file. > >> >> >> The patch groups file is structured such that each line contains a > >> >> >> relation. > >> >> >> Eg patch groups file contents. > >> >> >> 1 3 4 > >> >> >> 2 > >> >> >> 5 9 10 > >> >> >> > >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and > >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit > >> >> >> references. > >> >> >> Note that before ingesting the relations the existing relations in > >> >> >> the > >> >> >> DB are removed. > >> >> >> > >> >> >> v1..v2 > >> >> >> - remove commit references from the patch groups file. > >> >> >> - Update documentation > >> >> >> - rename some variables to remove the overloading. > >> >> >> - use filter and update instead of get and save to reduce the db > >> >> >> calls. > >> >> >> (Visible performance enhancement) > >> >> >> > >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to > >> >> >> proceed. > >> >> >> > >> >> >> Rohit Sarkar (1): > >> >> >> management: introduce parserelations command > >> >> >> > >> >> >> docs/deployment/management.rst | 26 +++++++ > >> >> >> .../management/commands/parserelations.py | 71 > >> >> >> +++++++++++++++++++ > >> >> >> patchwork/tests/test_management.py | 7 ++ > >> >> >> 3 files changed, 104 insertions(+) > >> >> >> create mode 100644 patchwork/management/commands/parserelations.py > >> >> >> > >> >> >> -- > >> >> >> 2.23.0.385.gbc12974a89 > >> >> >> > >> >> > > >> >> > Just wanted to follow up on this. Does this look good? > >> >> > >> >> I was thinking about this as I washed the dishes last night. > >> >> > >> >> Purging all relations in the database means that if any API-based user > >> >> of relations emerges before the big public servers adopt a release with > >> >> this, then they'll never be able to use it. > >> >> > >> >> What's the speed impact of doing two passes through the data, with pass > >> >> 1 collecting all the projects that this touches and pass 2 actually > >> >> installing the relations? > >> >> > >> >> Regards, > >> >> Daniel > >> > I wanted to reopen the discussion on this with the aim being to get > >> > everyone on the same page, and either go with this approach or decide on > >> > an alternate approach. > >> > > >> > The primary goal is to have a "parserelations" command that can parse a > >> > "patch-groups" file that contains a relation on each line. There are 2 > >> > approaches: > >> > > >> > a. Refresh the relations table. The parserelations command will remove > >> > the existing relations before inserting the newly parsed relations. This > >> > avoids any conflicts between existing and new relations. However other > >> > users of the API might be affected. > >> > > >> > b. Insert relations in a non destructive way. We need a way to handle > >> > conflicts between existing and incoming relations. So the solution is > >> > not really clear in this case. What should have precedence in the case > >> > of conflicts? The parserelations script or the existing relations. > >> > > >> > >> > >> Right, sorry for dropping this, work got a bit crazy. > >> > >> It occurs to me that I have an embedded assumption that may not be true > >> here. Are you trying to build a set of relations that includes > >> cross-project relations, or are you just trying to build a set of > >> relations entirely or mostly within a single patchwork project? > >> > >> I've been assuming that your relations are entirely or predominantly > >> intra-project, but I'm starting to suspect that maybe my understanding > >> is not right. > > The relations can be inter project too. PaStA is configured with a list > > of Patchwork projects for which it will pull all the corresponding > > patches and perform it's analyses. PaStA can relate patches across > > projects too. > > > >> If you are expecting a non-trivial quantity of cross-project relations, > >> then my suggested approach earlier (caller of the script specifies the > >> projects that patches may belong to) doesn't make sense. It probably > >> makes most sense to go with approach (a) in that case - it's really > >> difficult to figure out how to do (b) in a way that preserves the > >> meaning that any other API user, or the PaStA tool, is trying to create. > >> > >> If we do go down route (a), I want to make it really hard for users to > >> accidentally shoot themselves in the foot here and wipe out data. So > >> probably I would say that parserelations should only insert relations if > >> the table is empty when the script begins, and there should be a > >> separate command that with a suitably scary name that purges all > >> existing relations. > > > > I agree that we should make parserelations actions very explicit to > > prevent any loss of data. However, if we use parserelations to > > differentially import relations we will need to be able to insert > > relations with a non empty table. Let me explain the complete flow > > between PaStA and Patchwork so that things are a bit clear. > > We plan to run a script, maybe as a cron job at regular time intervals > > which does the following: > > 1. Exports patches from Patchwork into PaStA > > 2. Runs PaStA's analyses > > 3. Runs the parserelations script to parse the patch-groups file > > generated by PaStA and inserts the relations into Patchwork. In this > > step we refresh the db and insert all the relations afresh. A non > > destructive approach would involve thinking of a way to resolve > > conflicts between existing and new relations. > > OK. > > I am happy for you to stick with a destructive approach for the > managment command, so long as it's clear to the user that data is > potentially about to be destroyed. I'm happy for that to be: > > - two commands: a command that purges relations, and then a parse > command that checks if the relations table is empty and refuses to > run if it is not empty. > > - a single command that requires you to pass a command line option like > '--really-delete-old-relations' > > - a single command with a name that captures the idea that data will be > destroyed. I don't know what that would be, maybe 'replace...'. > > I'm not fussy which option you go with, I just don't want a command > named 'parse...' to delete data without the user being properly warned. Yeah, I agree. The name "parserelations" doesnt make it explicit enough. I am fine with replacerelations.
> (You will at some point need to consider how to do incremental updates > over the API for a publicly usable version, but that can definitely wait > and might be out of scope for your GSoC project.) Agreed. > Regards, > Daniel > > > > >> > >> Have I correctly understood things here? > >> > >> Regards, > >> Daniel > >> > > Thanks, > > Rohit Thanks Rohit _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork