Yep, that fixed it! After an avn update, the files on my machine have the same size as yours. All the tests in contrib\analyzers pass now.
Thanks, Luc. -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: vrijdag 11 februari 2005 19:25 To: Lucene Developers List Subject: Re: failure in the Russian Analyzer in contrib On Feb 11, 2005, at 11:19 AM, Vanlerberghe, Luc wrote: > I'm suspecting subversion now: the stemsUnicode.txt and > wordsUnicode.txt files are encoded in UTF-16 (they have the proper two > byte byte-order > prefix) and have property svn:eol-style set to native. > On my (Windows :( )system the files are 904424 and 1101164 bytes long > and are full of "0d 0a 00" byte sequences which in unicode should > probably just be "0a 00" or "0d 00 0a 00". My files have these sizes: $ ls -l total 3608 -rw-r--r-- 1 erik erik 805080 11 Feb 08:30 stemsUnicode.txt -rw-r--r-- 1 erik erik 1001820 11 Feb 08:30 wordsUnicode.txt > Is there a way to do a svn update --raw or something that I can check > this? No, svn doesn't have this type of switch. > If this is indeed the problem, a possible fix would be to set the > svn:eol-style to LF or else let svn know that the file is in unicode > (perhaps setting the svn:mime-type property to something else than the > default?) I have set the svn:eol-style property to LF on both of those files. Let me know if that fixes the issue. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]