[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-889565080 > IntelliJ? ... I don't mind reverting that when it's ready to merge ... Yes, I mostly use IntelliJ. Resolving `pom.xml` conflicts turned out to be an opportunity for me to fix the whitespace problems ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-889544503 Hello @kinow , Thank you for all your constructive comments! I have reverted the code to its older form, thereby sacrificing (slight gain in) preformance to readability of the code. I have also added JMH performance analyzer. Side note: I have not forgotted your comments about whitespaces; it looks like my editor adds them automatically when I reformatted the code to adjust the indentation. I can rectify them if you don't mind another commit. Thank you, Ali -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-17931 > Hi @ali-ghanbari > > I was going to start reviewing algorithm C, but I realized you had a few more differences in the pull request from the last time I read your code. > > There are problems with the current implementation, namely the @SInCE version that is not correct, one unnecessary blank space (I can fix that later if needed when merging), and the most important that was the parameters for the function that - I think - you moved to members of the class so that Checkstyle would not complain. > > It would be better to avoid maintaining state in the object if that's not necessary (someone reading the code will immediately know it's thread-safe, since it has no dependency to other classes, and no local state, among other benefits). > > Your previous version was really good, very close to being ready to be merged. Could you undo these changes I mentioned above, keeping whatever you had about algo B and C, please? > > Thanks! > Bruno Certainly, @kinow :) I will address all of your comments soon. By the way, JMH Maven profile and the test code are ready; I can push that also. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886826512 > Do you mind squashing the commits too? Adding an extra commit to add JMH should be fine and doesn't need to be squashed if you don't want to, but if the next review doesn't find any errors, we won't need to touch the branch. just approve and merge, which is a bit better (we could accidentally miss a commit when squashing for instance). > > Thanks Not at all! I have just squashed the commits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886825188 > @ali-ghanbari we are not releqsing 2.0 yet. That's why the method with a typo was not removed. Could you put it back, please? Sorry for misunderstanding! I returned the method back :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886430226 Hello, Thank you @kinow for the insightful comment regarding `StringBuilder` object used in `algorithmC`. It took me a while to understand that it is possible to allocate the object outside of the method, but I finally managed to code it! I think `algorithmC` is now satisfying all the assumptions made by the paper and a back of the envelope calculation will reveal that the method has indeed linear space complexity. I have made a whole bunch of refactoring and added lots of comments and Javadoc to improve readability of the code. I have also deleted the method logestCommonSubsequence for you (the one that had typo in its name and it was planned to be removed in version 2.0). From now on, unless the reviewers request some changes, I am unlikely to make updates to the class. So, I would say, it is safe to review :) I will add the code for JMH as soon as I find some spare time. Thanks a lot! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886259223 > @ali-ghanbari , I will review it once I have some spare time. Thanks a lot for updating the PR and fixing CI issues. > > We can definitely ignore that issue since it is not being introduced here (will fix it in a bit). As for the JMH test, we could probably include it in the source code (if not already in this PR) and run only when a certain Maven profile is activated, or filter the suite with the test, etc. > > Thanks > Bruno Thanks a lot! FYI, for now JMH is in a separate project. I will figure it out how to embed the test code in this project (without having name conflicts) and selectively run it using Maven. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886257991 Thank you @kinow for mentioning the CI issue. It looks like that the issue is due to a recent change([see](https://github.com/apache/commons-text/commit/87640ed259da32f6af516b9b638ae46d96c9ff24) for the update that resolves the CI errors) in CI configuration which makes the older version of `StrTokenizer.java` (which contains several dangling `p` tags in its Javadocs) to fail checkstyle checks. I guess the issue can be ignored for the time being? Please let me know if we should update `StrTokenizer.java` according to the latest version. Please also see the newest version of the implementation which avoids checkstyle warnings by passing the arguments to `algorithmB` using instance fields (it used to complain because we had methods with more than 7 parameters). With this change it is imperative to have _separate instances_ of the class in case if we want to run the algorithm concurrently. By the way, I managed to run JMH. Please check out the Javadoc for `apply` method to see the results. Thank you again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-886238608 > I think CI was passing before, but apparently some Javadoc needs updating @ali-ghanbari ? > > ``` > Error: Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.2.0:javadoc (default-cli) on project commons-text: An error has occurred in Javadoc report generation: > Error: Exit code: 1 - /home/runner/work/commons-text/commons-text/src/main/java/org/apache/commons/text/StrTokenizer.java:181: error: unexpected end tag: > Error: * > Error: ^ > ``` Yes, @kinow . I have just rectified the issue. I hope this didn't interrupt your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-885962560 Sounds good @kinow :) Thanks a lot! I will update you as soon as I manage to run the JMH tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-885893788 > We'll need some JMH benchmarks to show what code performs best before picking a new method IMO... otherwise this is a no-go from my POV, it's just changing code... Agreed. Please note that this implementation improves **space complexity** (from quadratic to linear, which is a huge improvement!) and the general speed of the methods (asymptotically speaking) should remain more or less the same (as both are quadratic in time). I am not sure if JMH can measure memory footprint, but to test the memory usage, I suggest testing the new implementation with the first pair of large strings that make the previous implementation crash in one way or another. I happened to learn about this more efficient algorithm in a research project where I was getting OutOfMemoryError with the current implementation of `apply` method and I used this new implementation and it worked without any problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-884612592 > @ali-ghanbari thanks again for the great PR. I was surprised to see the paper is from the 70's! I think the last time I searched for LCS in GitHub (in Java/Python) the implementations were similar to algorithm A. Good improvement for Commons Text. Thanks @kinow :) Yes, the idea has been around for quite a long time; that was surprising to me as well when I first learned about this newer algorithm. Right, traditional implementations are more or less the same as "Algorithm A," according to that CACM paper (which is then refined by Algorithm B/C in the paper). > ... I will go once again, now with the debugger just to clear a few things and make sure I understand the improvements in the paper ... Sounds good! Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-881971343 > Finished reviewing another PR for Commons Text, now this one is the next in the list. Sorry for the long wait @ali-ghanbari Glad to hear that @kinow ! Many thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-849254619 Hi, Thanks a lot for the comments! I just applied the suggestions to the previous implementation and also refactored it, so that it is now matching the algorithm description of in the original paper. I also implemented linear space complexity algorithm for calculation of LCS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-849112355 Hi @XenoAmess , @aherbert , Thank you for all your suggestions and comments. I will look into them and update you in the up coming days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence
ali-ghanbari commented on pull request #233: URL: https://github.com/apache/commons-text/pull/233#issuecomment-846470751 Thanks @kinow Excellent! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org