Rohit Sarkar <rohitsarkar5...@gmail.com> writes: > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: >> >> > Hey, >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: >> >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: >> >> >> >> > Hey, >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> >> >> The parserelations command will be used to bulk import patch relations >> >> >> into >> >> >> Patchwork using a "patch groups" file. >> >> >> The patch groups file is structured such that each line contains a >> >> >> relation. >> >> >> Eg patch groups file contents. >> >> >> 1 3 4 >> >> >> 2 >> >> >> 5 9 10 >> >> >> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> >> >> references. >> >> >> Note that before ingesting the relations the existing relations in the >> >> >> DB are removed. >> >> >> >> >> >> v1..v2 >> >> >> - remove commit references from the patch groups file. >> >> >> - Update documentation >> >> >> - rename some variables to remove the overloading. >> >> >> - use filter and update instead of get and save to reduce the db calls. >> >> >> (Visible performance enhancement) >> >> >> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to >> >> >> proceed. >> >> >> >> >> >> Rohit Sarkar (1): >> >> >> management: introduce parserelations command >> >> >> >> >> >> docs/deployment/management.rst | 26 +++++++ >> >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ >> >> >> patchwork/tests/test_management.py | 7 ++ >> >> >> 3 files changed, 104 insertions(+) >> >> >> create mode 100644 patchwork/management/commands/parserelations.py >> >> >> >> >> >> -- >> >> >> 2.23.0.385.gbc12974a89 >> >> >> >> >> > >> >> > Just wanted to follow up on this. Does this look good? >> >> >> >> I was thinking about this as I washed the dishes last night. >> >> >> >> Purging all relations in the database means that if any API-based user >> >> of relations emerges before the big public servers adopt a release with >> >> this, then they'll never be able to use it. >> >> >> >> What's the speed impact of doing two passes through the data, with pass >> >> 1 collecting all the projects that this touches and pass 2 actually >> >> installing the relations? >> >> >> >> Regards, >> >> Daniel >> > I wanted to reopen the discussion on this with the aim being to get >> > everyone on the same page, and either go with this approach or decide on >> > an alternate approach. >> > >> > The primary goal is to have a "parserelations" command that can parse a >> > "patch-groups" file that contains a relation on each line. There are 2 >> > approaches: >> > >> > a. Refresh the relations table. The parserelations command will remove >> > the existing relations before inserting the newly parsed relations. This >> > avoids any conflicts between existing and new relations. However other >> > users of the API might be affected. >> > >> > b. Insert relations in a non destructive way. We need a way to handle >> > conflicts between existing and incoming relations. So the solution is >> > not really clear in this case. What should have precedence in the case >> > of conflicts? The parserelations script or the existing relations. >> > >> >> >> Right, sorry for dropping this, work got a bit crazy. >> >> It occurs to me that I have an embedded assumption that may not be true >> here. Are you trying to build a set of relations that includes >> cross-project relations, or are you just trying to build a set of >> relations entirely or mostly within a single patchwork project? >> >> I've been assuming that your relations are entirely or predominantly >> intra-project, but I'm starting to suspect that maybe my understanding >> is not right. > The relations can be inter project too. PaStA is configured with a list > of Patchwork projects for which it will pull all the corresponding > patches and perform it's analyses. PaStA can relate patches across > projects too. > >> If you are expecting a non-trivial quantity of cross-project relations, >> then my suggested approach earlier (caller of the script specifies the >> projects that patches may belong to) doesn't make sense. It probably >> makes most sense to go with approach (a) in that case - it's really >> difficult to figure out how to do (b) in a way that preserves the >> meaning that any other API user, or the PaStA tool, is trying to create. >> >> If we do go down route (a), I want to make it really hard for users to >> accidentally shoot themselves in the foot here and wipe out data. So >> probably I would say that parserelations should only insert relations if >> the table is empty when the script begins, and there should be a >> separate command that with a suitably scary name that purges all >> existing relations. > > I agree that we should make parserelations actions very explicit to > prevent any loss of data. However, if we use parserelations to > differentially import relations we will need to be able to insert > relations with a non empty table. Let me explain the complete flow > between PaStA and Patchwork so that things are a bit clear. > We plan to run a script, maybe as a cron job at regular time intervals > which does the following: > 1. Exports patches from Patchwork into PaStA > 2. Runs PaStA's analyses > 3. Runs the parserelations script to parse the patch-groups file > generated by PaStA and inserts the relations into Patchwork. In this > step we refresh the db and insert all the relations afresh. A non > destructive approach would involve thinking of a way to resolve > conflicts between existing and new relations.
OK. I am happy for you to stick with a destructive approach for the managment command, so long as it's clear to the user that data is potentially about to be destroyed. I'm happy for that to be: - two commands: a command that purges relations, and then a parse command that checks if the relations table is empty and refuses to run if it is not empty. - a single command that requires you to pass a command line option like '--really-delete-old-relations' - a single command with a name that captures the idea that data will be destroyed. I don't know what that would be, maybe 'replace...'. I'm not fussy which option you go with, I just don't want a command named 'parse...' to delete data without the user being properly warned. (You will at some point need to consider how to do incremental updates over the API for a publicly usable version, but that can definitely wait and might be out of scope for your GSoC project.) Regards, Daniel > >> >> Have I correctly understood things here? >> >> Regards, >> Daniel >> > Thanks, > Rohit _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork