[
https://issues.apache.org/jira/browse/LUCENE-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1756:
--------------------------------
Attachment: LUCENE-1756.patch
improved unit test for this analyzer
> contrib/memory: PatternAnalyzerTest is a very, very, VERY, bad unit test
> ------------------------------------------------------------------------
>
> Key: LUCENE-1756
> URL: https://issues.apache.org/jira/browse/LUCENE-1756
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/*
> Reporter: Hoss Man
> Priority: Minor
> Attachments: LUCENE-1756.patch
>
>
> while working on something else i was started getting consistent
> IllegalStateExceptions from PatternAnalyzerTest -- but only when running the
> test from the top level.
> Digging into the test, i've found numerous things that are very scary...
> * instead of using assertions to test that tokens streams match, it throws an
> IllegalStateExceptions when they don't, and then logs a bunch of info about
> the token streams to System.out -- having assertion messages that tell you
> *exactly* what doens't match would make a lot more sense.
> * it builds up a list of files to analyze using patsh thta it evaluates
> relative to the current working directory -- which means you get different
> files depending on wether you run the tests fro mthe contrib level, or from
> the top level build file
> * the list of files it looks for include: "../../*.txt", "../../*.html",
> "../../*.xml" ... so not only do you get different results when you run the
> tests in the contrib vs at the top level, but different people runing the
> tests via the top level build file will get different results depending on
> what types of text, html, and xml files they happen to have two directories
> above where they checked out lucene.
> * the test comments indicates that it's purpose is to show that
> PatternAnalyzer produces the same tokens as other analyzers - but points out
> this will fail for WhitespaceAnalyzer because of the 255 character token
> limit WhitespaceTokenizer imposes -- the test then proceeds to compare
> PaternAnalyzer to WhitespaceTokenizer, garunteeing a test failure for anyone
> who happens to have a text file containing more then 255 characters of
> non-whitespace in a row somewhere in "../../" (in my case: my bookmarks.html
> file, and the hex encoded favicon.gif images)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]