[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-29 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-889565080


   > IntelliJ? ... I don't mind reverting that when it's ready to merge ...
   
   Yes, I mostly use IntelliJ.
   Resolving `pom.xml` conflicts turned out to be an opportunity for me to fix 
the whitespace problems ;)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-29 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-889544503


   Hello @kinow ,
   
   Thank you for all your constructive comments!
   
   I have reverted the code to its older form, thereby sacrificing (slight gain 
in) preformance to readability of the code.
   
   I have also added JMH performance analyzer.
   
   Side note: I have not forgotted your comments about whitespaces; it looks 
like my editor adds them automatically when I reformatted the code to adjust 
the indentation. I can rectify them if you don't mind another commit.
   
   Thank you,
   Ali


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-28 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-17931


   > Hi @ali-ghanbari
   > 
   > I was going to start reviewing algorithm C, but I realized you had a few 
more differences in the pull request from the last time I read your code.
   > 
   > There are problems with the current implementation, namely the @SInCE 
version that is not correct, one unnecessary blank space (I can fix that later 
if needed when merging), and the most important that was the parameters for the 
function that - I think - you moved to members of the class so that Checkstyle 
would not complain.
   > 
   > It would be better to avoid maintaining state in the object if that's not 
necessary (someone reading the code will immediately know it's thread-safe, 
since it has no dependency to other classes, and no local state, among other 
benefits).
   > 
   > Your previous version was really good, very close to being ready to be 
merged. Could you undo these changes I mentioned above, keeping whatever you 
had about algo B and C, please?
   > 
   > Thanks!
   > Bruno
   
   Certainly, @kinow :) I will address all of your comments soon.
   By the way, JMH Maven profile and the test code are ready; I can push that 
also.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-26 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886826512


   > Do you mind squashing the commits too? Adding an extra commit to add JMH 
should be fine and doesn't need to be squashed if you don't want to, but if the 
next review doesn't find any errors, we won't need to touch the branch. just 
approve and merge, which is a bit better (we could accidentally miss a commit 
when squashing for instance).
   > 
   > Thanks
   
   Not at all! I have just squashed the commits.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-26 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886825188


   > @ali-ghanbari we are not releqsing 2.0 yet. That's why the method with a 
typo was not removed. Could you put it back, please?
   
   Sorry for misunderstanding! I returned the method back :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-26 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886430226


   Hello,
   
   Thank you @kinow for the insightful comment regarding `StringBuilder` object 
used in `algorithmC`.
   It took me a while to understand that it is possible to allocate the object 
outside of the method, but I finally managed to code it!
   I think `algorithmC` is now satisfying all the assumptions made by the paper 
and a back of the envelope calculation will reveal that the method has indeed 
linear space complexity.
   
   I have made a whole bunch of refactoring and added lots of comments and 
Javadoc to improve readability of the code.
   I have also deleted the method logestCommonSubsequence for you (the one that 
had typo in its name and it was planned to be removed in version 2.0).
   
   From now on, unless the reviewers request some changes, I am unlikely to 
make updates to the class. So, I would say, it is safe to review :)
   
   I will add the code for JMH as soon as I find some spare time.
   
   Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-25 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886259223


   > @ali-ghanbari , I will review it once I have some spare time. Thanks a lot 
for updating the PR and fixing CI issues.
   > 
   > We can definitely ignore that issue since it is not being introduced here 
(will fix it in a bit). As for the JMH test, we could probably include it in 
the source code (if not already in this PR) and run only when a certain Maven 
profile is activated, or filter the suite with the test, etc.
   > 
   > Thanks
   > Bruno
   
   Thanks a lot!
   FYI, for now JMH is in a separate project.
   I will figure it out how to embed the test code in this project (without 
having name conflicts) and selectively run it using Maven.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-25 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886257991


   Thank you @kinow for mentioning the CI issue.
   
   It looks like that the issue is due to a recent 
change([see](https://github.com/apache/commons-text/commit/87640ed259da32f6af516b9b638ae46d96c9ff24)
 for the update that resolves the CI errors) in CI configuration which makes 
the older version of `StrTokenizer.java` (which contains several dangling `p` 
tags in its Javadocs) to fail checkstyle checks.
   I guess the issue can be ignored for the time being? Please let me know if 
we should update `StrTokenizer.java` according to the latest version.
   
   Please also see the newest version of the implementation which avoids 
checkstyle warnings by passing the arguments to `algorithmB` using instance 
fields (it used to complain because we had methods with more than 7 parameters).
   With this change it is imperative to have _separate instances_ of the class 
in case if we want to run the algorithm concurrently.
   
   By the way, I managed to run JMH. Please check out the Javadoc for `apply` 
method to see the results.
   
   Thank you again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-25 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-886238608


   > I think CI was passing before, but apparently some Javadoc needs updating 
@ali-ghanbari ?
   > 
   > ```
   >  Error:  Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.2.0:javadoc (default-cli) on 
project commons-text: An error has occurred in Javadoc report generation: 
   > Error:  Exit code: 1 - 
/home/runner/work/commons-text/commons-text/src/main/java/org/apache/commons/text/StrTokenizer.java:181:
 error: unexpected end tag: 
   > Error:   * 
   > Error: ^
   > ```
   
   Yes, @kinow . I have just rectified the issue. I hope this didn't interrupt 
your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-23 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-885962560


   Sounds good @kinow :) Thanks a lot!
   
   I will update you as soon as I manage to run the JMH tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-23 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-885893788


   > We'll need some JMH benchmarks to show what code performs best before 
picking a new method IMO... otherwise this is a no-go from my POV, it's just 
changing code...
   
   Agreed.
   Please note that this implementation improves **space complexity** (from 
quadratic to linear, which is a huge improvement!) and the general speed of the 
methods (asymptotically speaking) should remain more or less the same (as both 
are quadratic in time). I am not sure if JMH can measure memory footprint, but 
to test the memory usage, I suggest testing the new implementation with the 
first pair of large strings that make the previous implementation crash in one 
way or another.
   I happened to learn about this more efficient algorithm in a research 
project where I was getting OutOfMemoryError with the current implementation of 
`apply` method and I used this new implementation and it worked without any 
problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-21 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-884612592


   > @ali-ghanbari thanks again for the great PR. I was surprised to see the 
paper is from the 70's! I think the last time I searched for LCS in GitHub (in 
Java/Python) the implementations were similar to algorithm A. Good improvement 
for Commons Text.
   
   Thanks @kinow :) Yes, the idea has been around for quite a long time; that 
was surprising to me as well when I first learned about this newer algorithm.
   Right, traditional implementations are more or less the same as "Algorithm 
A," according to that CACM paper (which is then refined by Algorithm B/C in the 
paper).
   
   > ... I will go once again, now with the debugger just to clear a few things 
and make sure I understand the improvements in the paper ...
   
   Sounds good! Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-07-17 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-881971343


   > Finished reviewing another PR for Commons Text, now this one is the next 
in the list. Sorry for the long wait @ali-ghanbari
   
   Glad to hear that @kinow ! Many thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@commons.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-05-26 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-849254619


   Hi,
   
   Thanks a lot for the comments! I just applied the suggestions to the 
previous implementation and also refactored it, so that it is now matching the 
algorithm description of in the original paper.
   
   I also implemented linear space complexity algorithm for calculation of LCS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-05-26 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-849112355


   Hi @XenoAmess , @aherbert ,
   Thank you for all your suggestions and comments.
   I will look into them and update you in the up coming days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [commons-text] ali-ghanbari commented on pull request #233: A More Efficient Implementation for Calculating Size of Longest Common Subsequence

2021-05-22 Thread GitBox


ali-ghanbari commented on pull request #233:
URL: https://github.com/apache/commons-text/pull/233#issuecomment-846470751


   Thanks  @kinow 
   Excellent! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org