Rohit Sarkar <rohitsarkar5...@gmail.com> writes: > On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote: >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: >> >> > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: >> >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: >> >> >> >> > Hey, >> >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: >> >> >> Rohit Sarkar <rohitsarkar5...@gmail.com> writes: >> >> >> >> >> >> > Hey, >> >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> >> >> >> The parserelations command will be used to bulk import patch >> >> >> >> relations into >> >> >> >> Patchwork using a "patch groups" file. >> >> >> >> The patch groups file is structured such that each line contains a >> >> >> >> relation. >> >> >> >> Eg patch groups file contents. >> >> >> >> 1 3 4 >> >> >> >> 2 >> >> >> >> 5 9 10 >> >> >> >> >> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) >> >> >> >> and >> >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> >> >> >> references. >> >> >> >> Note that before ingesting the relations the existing relations in >> >> >> >> the >> >> >> >> DB are removed. >> >> >> >> >> >> >> >> v1..v2 >> >> >> >> - remove commit references from the patch groups file. >> >> >> >> - Update documentation >> >> >> >> - rename some variables to remove the overloading. >> >> >> >> - use filter and update instead of get and save to reduce the db >> >> >> >> calls. >> >> >> >> (Visible performance enhancement) >> >> >> >> >> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how >> >> >> >> to >> >> >> >> proceed. >> >> >> >> >> >> >> >> Rohit Sarkar (1): >> >> >> >> management: introduce parserelations command >> >> >> >> >> >> >> >> docs/deployment/management.rst | 26 +++++++ >> >> >> >> .../management/commands/parserelations.py | 71 >> >> >> >> +++++++++++++++++++ >> >> >> >> patchwork/tests/test_management.py | 7 ++ >> >> >> >> 3 files changed, 104 insertions(+) >> >> >> >> create mode 100644 patchwork/management/commands/parserelations.py >> >> >> >> >> >> >> >> -- >> >> >> >> 2.23.0.385.gbc12974a89 >> >> >> >> >> >> >> > >> >> >> > Just wanted to follow up on this. Does this look good? >> >> >> >> >> >> I was thinking about this as I washed the dishes last night. >> >> >> >> >> >> Purging all relations in the database means that if any API-based user >> >> >> of relations emerges before the big public servers adopt a release with >> >> >> this, then they'll never be able to use it. >> >> >> >> >> >> What's the speed impact of doing two passes through the data, with pass >> >> >> 1 collecting all the projects that this touches and pass 2 actually >> >> >> installing the relations? >> >> >> >> >> >> Regards, >> >> >> Daniel >> >> > I wanted to reopen the discussion on this with the aim being to get >> >> > everyone on the same page, and either go with this approach or decide on >> >> > an alternate approach. >> >> > >> >> > The primary goal is to have a "parserelations" command that can parse a >> >> > "patch-groups" file that contains a relation on each line. There are 2 >> >> > approaches: >> >> > >> >> > a. Refresh the relations table. The parserelations command will remove >> >> > the existing relations before inserting the newly parsed relations. This >> >> > avoids any conflicts between existing and new relations. However other >> >> > users of the API might be affected. >> >> > >> >> > b. Insert relations in a non destructive way. We need a way to handle >> >> > conflicts between existing and incoming relations. So the solution is >> >> > not really clear in this case. What should have precedence in the case >> >> > of conflicts? The parserelations script or the existing relations. >> >> > >> >> >> >> >> >> Right, sorry for dropping this, work got a bit crazy. >> >> >> >> It occurs to me that I have an embedded assumption that may not be true >> >> here. Are you trying to build a set of relations that includes >> >> cross-project relations, or are you just trying to build a set of >> >> relations entirely or mostly within a single patchwork project? >> >> >> >> I've been assuming that your relations are entirely or predominantly >> >> intra-project, but I'm starting to suspect that maybe my understanding >> >> is not right. >> > The relations can be inter project too. PaStA is configured with a list >> > of Patchwork projects for which it will pull all the corresponding >> > patches and perform it's analyses. PaStA can relate patches across >> > projects too. >> > >> >> If you are expecting a non-trivial quantity of cross-project relations, >> >> then my suggested approach earlier (caller of the script specifies the >> >> projects that patches may belong to) doesn't make sense. It probably >> >> makes most sense to go with approach (a) in that case - it's really >> >> difficult to figure out how to do (b) in a way that preserves the >> >> meaning that any other API user, or the PaStA tool, is trying to create. >> >> >> >> If we do go down route (a), I want to make it really hard for users to >> >> accidentally shoot themselves in the foot here and wipe out data. So >> >> probably I would say that parserelations should only insert relations if >> >> the table is empty when the script begins, and there should be a >> >> separate command that with a suitably scary name that purges all >> >> existing relations. >> > >> > I agree that we should make parserelations actions very explicit to >> > prevent any loss of data. However, if we use parserelations to >> > differentially import relations we will need to be able to insert >> > relations with a non empty table. Let me explain the complete flow >> > between PaStA and Patchwork so that things are a bit clear. >> > We plan to run a script, maybe as a cron job at regular time intervals >> > which does the following: >> > 1. Exports patches from Patchwork into PaStA >> > 2. Runs PaStA's analyses >> > 3. Runs the parserelations script to parse the patch-groups file >> > generated by PaStA and inserts the relations into Patchwork. In this >> > step we refresh the db and insert all the relations afresh. A non >> > destructive approach would involve thinking of a way to resolve >> > conflicts between existing and new relations. >> >> OK. >> >> I am happy for you to stick with a destructive approach for the >> managment command, so long as it's clear to the user that data is >> potentially about to be destroyed. I'm happy for that to be: >> >> - two commands: a command that purges relations, and then a parse >> command that checks if the relations table is empty and refuses to >> run if it is not empty. >> >> - a single command that requires you to pass a command line option like >> '--really-delete-old-relations' >> >> - a single command with a name that captures the idea that data will be >> destroyed. I don't know what that would be, maybe 'replace...'. >> >> I'm not fussy which option you go with, I just don't want a command >> named 'parse...' to delete data without the user being properly warned. > Yeah, I agree. The name "parserelations" doesnt make it explicit enough. > I am fine with replacerelations. >
Ok cool! If you send a patch with this name and with tests, then - assuming I don't have any code-quality nits - I'd be happy to take it. Regards, Daneil >> (You will at some point need to consider how to do incremental updates >> over the API for a publicly usable version, but that can definitely wait >> and might be out of scope for your GSoC project.) > Agreed. > >> Regards, >> Daniel >> >> > >> >> >> >> Have I correctly understood things here? >> >> >> >> Regards, >> >> Daniel >> >> >> > Thanks, >> > Rohit > Thanks > Rohit _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork