I honestly don't know what can be done and what has to be sacrificed. I'm pretty sure it'll be more difficult than svn->git conversion because more factors are involved. One tough thing to somehow preserve may be user names (reporters, etc.). I'm not sure how other projects dealt with that.
Perhaps a way to do it incrementally would be to create a json/xml (structured) dump of jira content and then write a converter into a similar json/xml dump for importing into github. I remember it took many iterations and trial and error for svn->git conversion to eventually reach the final shape and it was simpler and faster to do it locally. Dawid On Sat, Jun 18, 2022 at 8:59 AM Tomoko Uchida <tomoko.uchida.1...@gmail.com> wrote: > I'll give it a try though, I'm really skeptical that it can be done > with a satisfactory level of quality (we want to "preserve" issue > history, not just to have shallow/degraded copies, right?), and the > migration will be significantly delayed to figure out the way to > properly moving all issues to GitHub. > if there is another way to bypass this challenge - please let me know. > > Tomoko > > 2022年6月18日(土) 15:44 Dawid Weiss <dawid.we...@gmail.com>: > > > > > > > Hi Tomoko, > > > > I've added a few bullet points that script could/should handle under > LUCENE-10557, hope you don't mind. If you place these script(s) in the open > then perhaps indeed we could try to collaborate and see what can be done. > > > > Dawid > > > > On Sat, Jun 18, 2022 at 5:33 AM Tomoko Uchida < > tomoko.uchida.1...@gmail.com> wrote: > >> > >> Replying to myself - Jira issues can be read via REST API without any > >> access token and we can iterate all issues by issue number. > >> curl -s > https://issues.apache.org/jira/rest/api/latest/issue/LUCENE-10557 > >> > >> Would you please hold the discussion for a while - it's a waste of our > >> time without a working prototype to me. I will be back here with a > >> sandbox github repo where part of existing jira issues are migrated > >> (with the best effort). > >> In the process, we could simultaneously figure out the way to operate > >> GitHub metadata (milestones/labels). > >> > >> Tomoko > >> > >> 2022年6月18日(土) 10:41 Tomoko Uchida <tomoko.uchida.1...@gmail.com>: > >> > >> > > >> > Does anyone have information on API access keys to Jira (preferably, > >> > read-only and limited to Lucene project)? > >> > https://issues.apache.org/jira/browse/LUCENE-10622 > >> > > >> > 2022年6月18日(土) 10:11 Tomoko Uchida <tomoko.uchida.1...@gmail.com>: > >> > > > >> > > I feel like we should delay the decision on the mingration of > existing > >> > > issues until we have a clear image of what can be done and what > cannot > >> > > be done. > >> > > > >> > > I'll write some migration script that preserves the issue history as > >> > > far as possible - then come back here with some experience. > >> > > Let's make a decision upon the concrete knowledge and information. > >> > > > >> > > Tomoko > >> > > > >> > > 2022年6月18日(土) 9:26 Tomoko Uchida <tomoko.uchida.1...@gmail.com>: > >> > > > > >> > > > I don't intend to neglect histories in Jira... it's an important, > >> > > > valuable asset for all of us and possible contributors in the > future. > >> > > > > >> > > > It's important, *therefore*, I don't want to have the degraded > copies > >> > > > of them on GitHub. > >> > > > We cannot preserve all of history - again, there should be tons of > >> > > > unignorable information losses (timestamp, reporter, assignee, > >> > > > markdown, metadata that cannot be ported to GitHub) if we attempt > to > >> > > > migrate the whole Jira history into Github. Rather than trying to > have > >> > > > such incomplete copies, I would preserve Jira issues in the > perfectly > >> > > > archived status, then simply refer to them. > >> > > > > >> > > > Tomoko > >> > > > > >> > > > 2022年6月18日(土) 7:47 Gus Heck <gus.h...@gmail.com>: > >> > > > > > >> > > > > I hope you count me as someone who sees history as important. > It's important in more ways than one however. You gave the example of > trying to understand something, and looking at the issue history directly. > I also give weight to the scenario where someone has written a blog post > about the topic and linked the issue "For the latest see LUCENE-XXXX" for > example... Or someone planning upgrades has a spreadsheet of things to > track down... The existing links should point to a *complete* history of > the issue. > >> > > > > > >> > > > > I don't see the migration of everything to github as being as > critical as you do but I'm not at all against migrating things that are > closed if someone wants to do that work, and perhaps even copying over > existing open issues periodically as they become closed (and accelerating > the close rate by aggressive closing of silent issues). No new issues in > Jira sounds fine, even better if enforced by Jira. Proceed from here in > Github since that's where the community wants to go. Links to the migrated > version automatically added to Jira and/or backlinks to Jira would be just > fine too since readers might (hopefully needlessly) worry that something > didn't get migrated, we should make it easy to check. > >> > > > > > >> > > > > What I don't want is for someone to land on an issue via link > or via google search (or via search in jira because they are using Jira > already for some other apache project), read through it and think A) it > never got resolved when it did or B) miss the fact that it got reopened and > further changes were made and only have half the story... or any other > scenario where they are looking at an incomplete record of the issue. (thus > obfuscating/splitting the very important rich history across systems). > >> > > > > > >> > > > > So that's why I feel issues should be completely tracked in the > system where they were created. Syncing old closed stuff into a new system > probably is fine so long as there are periodic sweeps to pull in reopens or > newly completed issues. We could even sync open things so long as they are > clearly marked in the title as having their primary record in Jira and > "last synced from JIRA on YYYY-MM-DD" or something in a final comment each > time new content is brought over. > >> > > > > > >> > > > > For simplicity and workload however maybe just sync things when > they close. Depends on how much effort the person writing code for syncing > things wants to put into it I guess. > >> > > > > > >> > > > > Although I agree with Dawid on the "What if Elon buys it?" > issue, that ship has sailed, the community accepts that risk and we > probably should not rehash it. > >> > > > > > >> > > > > WRT Robert's comments on PRs being issues... this has already > worried me because I've already seen a lot of discussion on PR's and I've > worried that this stuff has the potential to get lost or be hard to find. > If there is one key positive of this move is that they will become easier > to find since the search in github can find it. I would say that a PR is > not a substitute for a well described issue report but that's probably a > separate discussion (which I would hope mirrors the policy on small edits > like typos or adding comments/javadoc not needing an issue). I've also seen > folks who like to clean up and remove old branches and PR's, which is > problematic if that's where the important discussion is (possibly a 3rd can > of worms there). > >> > > > > > >> > > > > -Gus > >> > > > > > >> > > > > On Fri, Jun 17, 2022 at 4:34 PM Robert Muir <rcm...@gmail.com> > wrote: > >> > > > >> > >> > > > >> On Fri, Jun 17, 2022 at 3:27 PM Dawid Weiss < > dawid.we...@gmail.com> wrote: > >> > > > >> > > >> > > > >> > I'd be more afraid of what happens to github issues in two > years (or longer). Will it look the same? Will it be different? Will it be > gone (and how do we get a backup of the isse history then)? Contrary to the > apache-hosted Jira, github is very much an independent entity. If Elon Musk > decides to buy and close it tomorrow... then what? :) > >> > > > >> > > >> > > > >> > >> > > > >> We already have a ton of github "issues" (pull requests, since > PRs are issues). > >> > > > >> If you want to "back them up", its easy, you can paginate thru > them > >> > > > >> 100 at a time, e.g. run this command, incrementing 'page' > until it > >> > > > >> returns empty list: > >> > > > >> > >> > > > >> curl -H "Accept: application/vnd.github.v3+json" > >> > > > >> " > https://api.github.com/repos/apache/lucene/issues?per_page=100&page=1&direction=asc&state=all > " > >> > > > >> > file1.json > >> > > > >> > >> > > > >> Yeah of course if you want to backup the comments and stuff, > you'll > >> > > > >> need to do more. > >> > > > >> But it is already the case today, that a ton of this "history" > is > >> > > > >> already in github issues, as PRs. Most recent JIRAs are just > useless > >> > > > >> placeholders. > >> > > > >> Also the same risks apply to JIRA, except are not theoretical > and real > >> > > > >> concerns, no? I thought Atlassian had deprecated "onsite" JIRA > to try > >> > > > >> to sucker you into their "Atlassian Cloud": > >> > > > >> > https://www.theregister.com/2020/10/19/atlassian_server_licenses/ > >> > > > >> > >> > > > >> > --------------------------------------------------------------------- > >> > > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > > > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > >> > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > http://www.needhamsoftware.com (work) > >> > > > > http://www.the111shift.com (play) > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >