Thanks a lot Jakob for trying it out for me. *Problem solved*! All thanks to Florian Florensen's help!
I turned this bunch of code, https://github.com/go-dedup/text/blob/3d0d998bef3db3937496933778c55e4f01cab5e4/text.go#L37-L60 into a single simple line, and now the problem is gone: https://github.com/go-dedup/fsimilar/commit/07c54af74664ddfbd995e59c0cb4ec a758983650 That's the single change I've made to my entire code base. Everyone is welcome to double-check the claim. So apparently the `*unicode.IsUpper*()` will return different things under different language environments, even for just plain ASCII text. Otherwise, I can't explain why there are such huge differences. On Mon, Sep 4, 2017 at 1:14 PM, Jakob Borg <ja...@kastelo.net> wrote: > Then I can't guess. I checked out your code, and the tests fail for me in > the same way as on travis. > > //jb > > > > On 4 Sep 2017, at 15:12, Tong Sun <suntong...@gmail.com> wrote: > > > > Yes, we can say it is calculating hashes in some manner. However, all > the test content so far are pure ascii, which would not change regardless > how you are looking at it (unlike unicode), and the hashes is done on only > words, i.e., spaces and line endings will not affect the hashing. > > > > Thanks a lot for your help though, Jakob. > > > > On Mon, Sep 4, 2017 at 2:17 AM, Jakob Borg <ja...@kastelo.net> wrote: > > Hi, > > > > It's not especially clear from your mail what your tool does, exactly. > But assuming that it calculates hashes of content in some manner, my first > guess would be that your test data character set and/or line endings get > changed by the git checkin/checkout procedure. > > > > //jb > > > > > On 4 Sep 2017, at 05:34, Tong Sun <suntong...@gmail.com> wrote: > > > > > > Hi, > > > > > > I've bumped into another "same code different result" problem -- my > `go test` runs fine locally but on Travis, > > > https://travis-ci.org/go-dedup/fsimilar/builds/271540570 > > > it is broken. > > > > > > I've verified at least four or five times that all my local code have > been pushed to github. Now I've run out of ideas why the same source will > have different behavior after compiling into executables on different > machines. > > > > > > Mine is go 1.9 under Ubuntu 17.04. > > > > > > Somebody help please. > > > > > > FYI, the tool I'm building would spot similar files within the file > system very quickly. > > > > > > $ fsimilar > > > find/file similar > > > Version 0.1.0 built on 2017-09-03 > > > > > > Find similar files > > > > > > Options: > > > > > > -h, --help display help information > > > -S, --size-given size of the files in input as first field > > > -Q, --query-size query the file sizes from os > > > -i, --input *input from stdin or the given file (mandatory) > > > -p, --phonetic use phonetic as words for further error > tolerant > > > -F, --final produce final output, the recommendations > > > -c, --cp[=$FSIM_CP] config path, path that hold all template files > > > -v, --verbose verbose mode (multiple -v increase the > verbosity) > > > > > > Commands: > > > > > > sim Filter the input using simhash similarity check > > > vec Use Vector Space for similarity check > > > > > > $ cat test/sim.lstA > > > test/sim/Audio Book - The Grey Coloured Bunnie.mp3 > > > test/sim/GNU - Python Standard Library (2001).rar > > > test/sim/PopupTest.java > > > test/sim/(eBook) GNU - Python Standard Library 2001.pdf > > > test/sim/Python Standard Library.zip > > > test/sim/GNU - 2001 - Python Standard Library.pdf > > > test/sim/LayoutTest.java > > > test/sim/ColoredGrayBunny.ogg > > > > > > $ fsimilar sim > > > Filter the input using simhash similarity check > > > > > > Usage: > > > mlocate -i soccer | fsimilar sim -i > > > > > > Options: > > > > > > -h, --help display help information > > > -S, --size-given size of the files in input as first field > > > -Q, --query-size query the file sizes from os > > > -i, --input *input from stdin or the given file (mandatory) > > > -p, --phonetic use phonetic as words for further error > tolerant > > > -F, --final produce final output, the recommendations > > > -c, --cp[=$FSIM_CP] config path, path that hold all template files > > > -v, --verbose verbose mode (multiple -v increase the > verbosity) > > > -d, --dist[=3] the hamming distance of hashes within which to > deem similar > > > > > > $ fsimilar sim -i test/sim.lstA -d 12 > > > 1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf > > > 1 test/sim/GNU - Python Standard Library (2001).rar > > > > > > 1 test/sim/GNU - 2001 - Python Standard Library.pdf > > > 1 test/sim/Python Standard Library.zip > > > > > > $ fsimilar vec > > > Use Vector Space for similarity check > > > > > > Usage: > > > { mlocate -i soccer; mlocate -i football; } | fsimilar sim -i | > fsimilar vec -i -S -Q -F > > > > > > Options: > > > > > > -h, --help display help information > > > -S, --size-given size of the files in input as first field > > > -Q, --query-size query the file sizes from os > > > -i, --input *input from stdin or the given file (mandatory) > > > -p, --phonetic use phonetic as words for further error > tolerant > > > -F, --final produce final output, the recommendations > > > -c, --cp[=$FSIM_CP] config path, path that hold all template files > > > -v, --verbose verbose mode (multiple -v increase the > verbosity) > > > -t, --thr[=0.86] the threshold above which to deem similar (0.8 > = 80%) > > > > > > $ fsimilar vec -i test/sim.lstA -t 0.7 > > > 1 test/sim/GNU - Python Standard Library (2001).rar > > > 1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf > > > 1 test/sim/Python Standard Library.zip > > > 1 test/sim/GNU - 2001 - Python Standard Library.pdf > > > > > > $ fsimilar vec -i test/sim.lstA -t 0.7 -p > > > 1 test/sim/Audio Book - The Grey Coloured Bunnie.mp3 > > > 1 test/sim/ColoredGrayBunny.ogg > > > > > > 1 test/sim/GNU - Python Standard Library (2001).rar > > > 1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf > > > 1 test/sim/Python Standard Library.zip > > > 1 test/sim/GNU - 2001 - Python Standard Library.pdf > > > > > > I meant, I hope you can try pulling off from remote yourself and try > testing it with your local machine, as it would be a useful tool eventually. > > > > > > Thanks for helping! > > > > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "golang-nuts" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to golang-nuts+unsubscr...@googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.