Hudson build is back to normal: Lucene-trunk #425
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/425/changes - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store
[ https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585368#action_12585368 ] Cuong Hoang commented on LUCENE-1039: - >>Each document must only contain one token in the class field Does that mean each document in the training set can only belong to one class? I try to run the test case but get NullPointerException: TestClassifier org.apache.lucene.classifier.TestClassifier test(org.apache.lucene.classifier.TestClassifier) java.lang.NullPointerException at org.apache.lucene.index.MultiTermDocs.doc(MultiReader.java:356) at org.apache.lucene.classifier.BayesianClassifier.classFeatureFrequency(BayesianClassifier.java:92) at org.apache.lucene.classifier.BayesianClassifier.weightedFeatureClassProbability(BayesianClassifier.java:137) at org.apache.lucene.classifier.NaiveBayesClassifier.featuresClassProbability(NaiveBayesClassifier.java:54) at org.apache.lucene.classifier.NaiveBayesClassifier.classify(NaiveBayesClassifier.java:72) at org.apache.lucene.classifier.BayesianClassifier.classify(BayesianClassifier.java:70) at org.apache.lucene.classifier.BayesianClassifier.classify(BayesianClassifier.java:62) at org.apache.lucene.classifier.TestClassifier.testClassifier(TestClassifier.java:110) at org.apache.lucene.classifier.TestClassifier.test(TestClassifier.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196) > Bayesian classifiers using Lucene as data store > --- > > Key: LUCENE-1039 > URL: https://issues.apache.org/jira/browse/LUCENE-1039 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Karl Wettin >Priority: Minor > Attachments: LUCENE-1039.txt > > > Bayesian classifiers using Lucene as data store. Based on the Naive Bayes and > Fisher method algorithms as described by Toby Segaran in "Programming > Collective Intelligence", ISBN 978-0-596-52932-1. > Have fun. > Poor java docs, but the TestCase shows how to use it: > {code:java} > public class TestClassifier extends TestCase { > public void test() throws Exception { > InstanceFactory instanceFactory = new InstanceFactory() { > public Document factory(String text, String _class) { > Document doc = new Document(); > doc.add(new Field("class", _class, Field.Store.YES, > Field.Index.NO_NORMS)); > doc.add(new Field("text", text, Field.Store.YES, Field.Index.NO, > Field.TermVector.NO)); > doc.add(new Field("text/ngrams/start", text, Field.Store.NO, > Field.Index.TOKENIZED, Field.TermVector.YES)); > doc.add(new Field("text/ngrams/inner", text, Field.Store.NO, > Field.Index.TOKENIZED, Field.TermVector.YES)); > doc.add(new Field("text/ngrams/end", text, Field.Store.NO, > Field.Index.TOKENIZED, Field.TermVector.YES)); > return doc; > } > Analyzer analyzer = new Analyzer() { > private int minGram = 2; > private int maxGram = 3; > public TokenStream tokenStream(String fieldName, Reader reader) { > TokenStream ts = new StandardTokenizer(reader); > ts = new LowerCaseFilter(ts); > if (fieldName.endsWith("/ngrams/start")) { > ts = new EdgeNGramTokenFilter(ts, > EdgeNGramTokenFilter.Side.FRONT, minGram, maxGram); > } else if (fieldName.endsWith("/ngrams/inner")) { > ts = new NGramTo
[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads
[ https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1026: Attachment: IndexAccessor.04032008.zip Various tiny tweaks and now provides indexes based on Directory rather than File. This lets you use any Directory imp rather than just FSDirectory. SVN at: http://myhardshadow.com/indexaccessor/trunk/ - Mark > Provide a simple way to concurrently access a Lucene index from multiple > threads > > > Key: LUCENE-1026 > URL: https://issues.apache.org/jira/browse/LUCENE-1026 > Project: Lucene - Java > Issue Type: New Feature > Components: Index, Search >Reporter: Mark Miller >Priority: Minor > Attachments: DefaultIndexAccessor.java, > DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip, > IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip, > IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, > IndexAccessor.04032008.zip, IndexAccessor.java, IndexAccessor.zip, > IndexAccessorFactory.java, MultiIndexAccessor.java, shai-IndexAccessor-2.zip, > shai-IndexAccessor.zip, shai-IndexAccessor3.zip, SimpleSearchServer.java, > StopWatch.java, TestIndexAccessor.java > > > For building interactive indexes accessed through a network/internet > (multiple threads). > This builds upon the LuceneIndexAccessor patch. That patch was not very > newbie friendly and did not properly handle MultiSearchers (or at the least > made it easy to get into trouble). > This patch simplifies things and provides out of the box support for sharing > the IndexAccessors across threads. There is also a simple test class and > example SearchServer to get you started. > Future revisions will be zipped. > Works pretty solid as is, but could use the ability to warm new Searchers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1258) Increment position by default in StopFilter & QueryParser -> PhraseQuery
Increment position by default in StopFilter & QueryParser -> PhraseQuery Key: LUCENE-1258 URL: https://issues.apache.org/jira/browse/LUCENE-1258 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: 2.3.1, 2.3, 2.2, 2.1, 2.0.0, 1.9, 2.3.2, 2.4, 2.9 Reporter: Michael McCandless Priority: Minor Fix For: 3.0 Spinoff from here: https://issues.apache.org/jira/browse/LUCENE-1095 I think for 3.0 we should change the default so that: * By default, StopFilter increments the positionIncrement whenever it skips stop words. Add option to revert back to old way. This is just toggling the boolean default. * By default, when QueryParser adds terms to a PhraseQuery it should include the position reported by the analyzer. Add option to revert back to old way. I'm just opening this now, marking as 3.0 fix, to remind us all to actually fix it for 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1255) CheckIndex should allow term position = -1
[ https://issues.apache.org/jira/browse/LUCENE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1255. Resolution: Fixed > CheckIndex should allow term position = -1 > -- > > Key: LUCENE-1255 > URL: https://issues.apache.org/jira/browse/LUCENE-1255 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.4 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1255.patch, LUCENE-1255.take2.patch > > > Spinoff from this discussion: > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200803.mbox/[EMAIL > PROTECTED] > Right now CheckIndex claims the index is corrupt if you index a Token with -1 > position, which happens if your first token has positionIncrementGap set to 0. > But, as far as I can tell, Lucene doesn't "mind" when this happens. > So I plan to fix CheckIndex to allow this case. I'll backport to 2.3.2 as > well. > LUCENE-1253 is one example where Lucene's core analyzers could do this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1241) 0xffff char is not a string terminator
[ https://issues.apache.org/jira/browse/LUCENE-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1241. Resolution: Won't Fix Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) > 0x char is not a string terminator > -- > > Key: LUCENE-1241 > URL: https://issues.apache.org/jira/browse/LUCENE-1241 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Hiroaki Kawai >Assignee: Michael McCandless > Attachments: ComparableCharSequence.java, LUCENE-1241.patch, > LUCENE-1241.take2.patch > > > Current trunk index.DocumentWriter uses "\u" as a string terminator, but > it should not to be for some reasons. \u is not a terminator char itself > and we can't handle a string that really contains \u. And also, we can > calculate the end char position in a character sequence from the string > length that we already know. > However, I agree with the usage for assertion, that "\u" is placed after > at the end of a string in a char sequence. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Created: (LUCENE-1257) Port to Java5
> - replacement of indexed for loops with for each constructs Is this always the best idea? Doesn't the for loop construct make an iterator, which can be much slower than an indexed for loop? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Job Offer in Barcelona! Spanish most exciting start-up
Description Trovit.com, a classified ads website leader in Europe located in Barcelona is looking to hire a senior Java programmer. Trovit is experimenting a rapidly growing rate of 25% per month (real!).We don't want to be local. We are a global company with presence in 11 countries (and growing), and by end of the year we have planned to be in more than 15 countries. We have more than 5.5 million users, and more than 21 million pages seen. Our Alexa rank is (now) aprox 6k (we are a year and a half old). ...and we want to be the # 1 on the world (not joking !) We are looking for somebody with knowledge in these areas: - Fluent English and Spanish (this is a desire ;) ) - Strong Java experience - Computer science degree - Unix/Linux experience Desired Experience -Experience in high performance and high scalabilty systems -Experience with Lucene -Knowledge about Hadoop We offer a great opportunity in Barcelona, Spain, so if you want to join our team, send us an e-mail with your resume to jobs @ trovit.com , referencing this post, pleaseDescription -- View this message in context: http://www.nabble.com/Job-Offer-in-Barcelona%21-Spanish-most-exciting-start-up-tp16467354p16467354.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]