Thanks a lot Jakob for trying it out for me.

*Problem solved*!
All thanks to Florian Florensen's help!

I turned this bunch of code,
https://github.com/go-dedup/text/blob/3d0d998bef3db3937496933778c55e4f01cab5e4/text.go#L37-L60
into a single simple line,

and now the problem is gone:

https://github.com/go-dedup/fsimilar/commit/07c54af74664ddfbd995e59c0cb4ec
a758983650

That's the single change I've made to my entire code base. Everyone is
welcome to double-check the claim.

So apparently the `*unicode.IsUpper*()` will return different things under
different language environments, even for just plain ASCII text.

Otherwise, I can't explain why there are such huge differences.



On Mon, Sep 4, 2017 at 1:14 PM, Jakob Borg <ja...@kastelo.net> wrote:

> Then I can't guess. I checked out your code, and the tests fail for me in
> the same way as on travis.
>
> //jb
>
>
> > On 4 Sep 2017, at 15:12, Tong Sun <suntong...@gmail.com> wrote:
> >
> > Yes, we can say it is calculating hashes in some manner. However, all
> the test content so far are pure ascii, which would not change regardless
> how you are looking at it (unlike unicode), and the hashes is done on only
> words, i.e., spaces and line endings will not affect the hashing.
> >
> > Thanks a lot for your help though, Jakob.
> >
> > On Mon, Sep 4, 2017 at 2:17 AM, Jakob Borg <ja...@kastelo.net> wrote:
> > Hi,
> >
> > It's not especially clear from your mail what your tool does, exactly.
> But assuming that it calculates hashes of content in some manner, my first
> guess would be that your test data character set and/or line endings get
> changed by the git checkin/checkout procedure.
> >
> > //jb
> >
> > > On 4 Sep 2017, at 05:34, Tong Sun <suntong...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I've bumped into another "same code different result" problem -- my
> `go test` runs fine locally but on Travis,
> > > https://travis-ci.org/go-dedup/fsimilar/builds/271540570
> > > it is broken.
> > >
> > > I've verified at least four or five times that all my local code have
> been pushed to github. Now I've run out of ideas why the same source will
> have different behavior after compiling into executables on different
> machines.
> > >
> > > Mine is go 1.9 under Ubuntu 17.04.
> > >
> > > Somebody help please.
> > >
> > > FYI, the tool I'm building would spot similar files within the file
> system very quickly.
> > >
> > > $ fsimilar
> > > find/file similar
> > > Version 0.1.0 built on 2017-09-03
> > >
> > > Find similar files
> > >
> > > Options:
> > >
> > >   -h, --help            display help information
> > >   -S, --size-given      size of the files in input as first field
> > >   -Q, --query-size      query the file sizes from os
> > >   -i, --input          *input from stdin or the given file (mandatory)
> > >   -p, --phonetic        use phonetic as words for further error
> tolerant
> > >   -F, --final           produce final output, the recommendations
> > >   -c, --cp[=$FSIM_CP]   config path, path that hold all template files
> > >   -v, --verbose         verbose mode (multiple -v increase the
> verbosity)
> > >
> > > Commands:
> > >
> > >   sim   Filter the input using simhash similarity check
> > >   vec   Use Vector Space for similarity check
> > >
> > > $ cat test/sim.lstA
> > > test/sim/Audio Book - The Grey Coloured Bunnie.mp3
> > > test/sim/GNU - Python Standard Library (2001).rar
> > > test/sim/PopupTest.java
> > > test/sim/(eBook) GNU - Python Standard Library 2001.pdf
> > > test/sim/Python Standard Library.zip
> > > test/sim/GNU - 2001 - Python Standard Library.pdf
> > > test/sim/LayoutTest.java
> > > test/sim/ColoredGrayBunny.ogg
> > >
> > > $ fsimilar sim
> > > Filter the input using simhash similarity check
> > >
> > > Usage:
> > >   mlocate -i soccer | fsimilar sim -i
> > >
> > > Options:
> > >
> > >   -h, --help            display help information
> > >   -S, --size-given      size of the files in input as first field
> > >   -Q, --query-size      query the file sizes from os
> > >   -i, --input          *input from stdin or the given file (mandatory)
> > >   -p, --phonetic        use phonetic as words for further error
> tolerant
> > >   -F, --final           produce final output, the recommendations
> > >   -c, --cp[=$FSIM_CP]   config path, path that hold all template files
> > >   -v, --verbose         verbose mode (multiple -v increase the
> verbosity)
> > >   -d, --dist[=3]        the hamming distance of hashes within which to
> deem similar
> > >
> > > $ fsimilar sim -i test/sim.lstA -d 12
> > >        1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf
> > >        1 test/sim/GNU - Python Standard Library (2001).rar
> > >
> > >        1 test/sim/GNU - 2001 - Python Standard Library.pdf
> > >        1 test/sim/Python Standard Library.zip
> > >
> > > $ fsimilar vec
> > > Use Vector Space for similarity check
> > >
> > > Usage:
> > >   { mlocate -i soccer; mlocate -i football; } | fsimilar sim -i |
> fsimilar vec -i -S -Q -F
> > >
> > > Options:
> > >
> > >   -h, --help            display help information
> > >   -S, --size-given      size of the files in input as first field
> > >   -Q, --query-size      query the file sizes from os
> > >   -i, --input          *input from stdin or the given file (mandatory)
> > >   -p, --phonetic        use phonetic as words for further error
> tolerant
> > >   -F, --final           produce final output, the recommendations
> > >   -c, --cp[=$FSIM_CP]   config path, path that hold all template files
> > >   -v, --verbose         verbose mode (multiple -v increase the
> verbosity)
> > >   -t, --thr[=0.86]      the threshold above which to deem similar (0.8
> = 80%)
> > >
> > > $ fsimilar vec -i test/sim.lstA -t 0.7
> > >        1 test/sim/GNU - Python Standard Library (2001).rar
> > >        1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf
> > >        1 test/sim/Python Standard Library.zip
> > >        1 test/sim/GNU - 2001 - Python Standard Library.pdf
> > >
> > > $ fsimilar vec -i test/sim.lstA -t 0.7 -p
> > >        1 test/sim/Audio Book - The Grey Coloured Bunnie.mp3
> > >        1 test/sim/ColoredGrayBunny.ogg
> > >
> > >        1 test/sim/GNU - Python Standard Library (2001).rar
> > >        1 test/sim/(eBook) GNU - Python Standard Library 2001.pdf
> > >        1 test/sim/Python Standard Library.zip
> > >        1 test/sim/GNU - 2001 - Python Standard Library.pdf
> > >
> > > I meant, I hope you can try pulling off from remote yourself and try
> testing it with your local machine, as it would be a useful tool eventually.
> > >
> > > Thanks for helping!
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "golang-nuts" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to golang-nuts+unsubscr...@googlegroups.com.
> > > For more options, visit https://groups.google.com/d/optout.
> >
> >
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to