date:20080403

Hudson build is back to normal: Lucene-trunk #425

2008-04-03 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/425/changes



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

2008-04-03 Thread Cuong Hoang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12585368#action_12585368
 ] 

Cuong Hoang commented on LUCENE-1039:
-

>>Each document must only contain one token in the class field

Does that mean each document in the training set can only belong to one class? 

I try to run the test case but get NullPointerException:

TestClassifier
org.apache.lucene.classifier.TestClassifier
test(org.apache.lucene.classifier.TestClassifier)
java.lang.NullPointerException
at org.apache.lucene.index.MultiTermDocs.doc(MultiReader.java:356)
at 
org.apache.lucene.classifier.BayesianClassifier.classFeatureFrequency(BayesianClassifier.java:92)
at 
org.apache.lucene.classifier.BayesianClassifier.weightedFeatureClassProbability(BayesianClassifier.java:137)
at 
org.apache.lucene.classifier.NaiveBayesClassifier.featuresClassProbability(NaiveBayesClassifier.java:54)
at 
org.apache.lucene.classifier.NaiveBayesClassifier.classify(NaiveBayesClassifier.java:72)
at 
org.apache.lucene.classifier.BayesianClassifier.classify(BayesianClassifier.java:70)
at 
org.apache.lucene.classifier.BayesianClassifier.classify(BayesianClassifier.java:62)
at 
org.apache.lucene.classifier.TestClassifier.testClassifier(TestClassifier.java:110)
at 
org.apache.lucene.classifier.TestClassifier.test(TestClassifier.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
 

> Bayesian classifiers using Lucene as data store
> ---
>
> Key: LUCENE-1039
> URL: https://issues.apache.org/jira/browse/LUCENE-1039
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Karl Wettin
>Priority: Minor
> Attachments: LUCENE-1039.txt
>
>
> Bayesian classifiers using Lucene as data store. Based on the Naive Bayes and 
> Fisher method algorithms as described by Toby Segaran in "Programming 
> Collective Intelligence", ISBN 978-0-596-52932-1. 
> Have fun.
> Poor java docs, but the TestCase shows how to use it:
> {code:java}
> public class TestClassifier extends TestCase {
>   public void test() throws Exception {
> InstanceFactory instanceFactory = new InstanceFactory() {
>   public Document factory(String text, String _class) {
> Document doc = new Document();
> doc.add(new Field("class", _class, Field.Store.YES, 
> Field.Index.NO_NORMS));
> doc.add(new Field("text", text, Field.Store.YES, Field.Index.NO, 
> Field.TermVector.NO));
> doc.add(new Field("text/ngrams/start", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.add(new Field("text/ngrams/inner", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.add(new Field("text/ngrams/end", text, Field.Store.NO, 
> Field.Index.TOKENIZED, Field.TermVector.YES));
> return doc;
>   }
>   Analyzer analyzer = new Analyzer() {
> private int minGram = 2;
> private int maxGram = 3;
> public TokenStream tokenStream(String fieldName, Reader reader) {
>   TokenStream ts = new StandardTokenizer(reader);
>   ts = new LowerCaseFilter(ts);
>   if (fieldName.endsWith("/ngrams/start")) {
> ts = new EdgeNGramTokenFilter(ts, 
> EdgeNGramTokenFilter.Side.FRONT, minGram, maxGram);
>   } else if (fieldName.endsWith("/ngrams/inner")) {
> ts = new NGramTo

[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2008-04-03 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1026:


Attachment: IndexAccessor.04032008.zip

Various tiny tweaks and now provides indexes based on Directory rather than 
File. This lets you use any Directory imp rather than just FSDirectory.

SVN at: http://myhardshadow.com/indexaccessor/trunk/

- Mark

> Provide a simple way to concurrently access a Lucene index from multiple 
> threads
> 
>
> Key: LUCENE-1026
> URL: https://issues.apache.org/jira/browse/LUCENE-1026
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index, Search
>Reporter: Mark Miller
>Priority: Minor
> Attachments: DefaultIndexAccessor.java, 
> DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip, 
> IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip, 
> IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, 
> IndexAccessor.04032008.zip, IndexAccessor.java, IndexAccessor.zip, 
> IndexAccessorFactory.java, MultiIndexAccessor.java, shai-IndexAccessor-2.zip, 
> shai-IndexAccessor.zip, shai-IndexAccessor3.zip, SimpleSearchServer.java, 
> StopWatch.java, TestIndexAccessor.java
>
>
> For building interactive indexes accessed through a network/internet 
> (multiple threads).
> This builds upon the LuceneIndexAccessor patch. That patch was not very 
> newbie friendly and did not properly handle MultiSearchers (or at the least 
> made it easy to get into trouble).
> This patch simplifies things and provides out of the box support for sharing 
> the IndexAccessors across threads. There is also a simple test class and 
> example SearchServer to get you started.
> Future revisions will be zipped.
> Works pretty solid as is, but could use the ability to warm new Searchers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1258) Increment position by default in StopFilter & QueryParser -> PhraseQuery

2008-04-03 Thread Michael McCandless (JIRA)

Increment position by default in StopFilter & QueryParser -> PhraseQuery


 Key: LUCENE-1258
 URL: https://issues.apache.org/jira/browse/LUCENE-1258
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.3.1, 2.3, 2.2, 2.1, 2.0.0, 1.9, 2.3.2, 2.4, 2.9
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.0


Spinoff from here:

  https://issues.apache.org/jira/browse/LUCENE-1095

I think for 3.0 we should change the default so that:

  * By default, StopFilter increments the positionIncrement whenever
it skips stop words.  Add option to revert back to old way.  This is
just toggling the boolean default.

  * By default, when QueryParser adds terms to a PhraseQuery it should
include the position reported by the analyzer.  Add option to
revert back to old way.

I'm just opening this now, marking as 3.0 fix, to remind us all to
actually fix it for 3.0.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1255) CheckIndex should allow term position = -1

2008-04-03 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1255.


Resolution: Fixed

> CheckIndex should allow term position = -1
> --
>
> Key: LUCENE-1255
> URL: https://issues.apache.org/jira/browse/LUCENE-1255
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.2, 2.4
>
> Attachments: LUCENE-1255.patch, LUCENE-1255.take2.patch
>
>
> Spinoff from this discussion:
> 
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200803.mbox/[EMAIL 
> PROTECTED]
> Right now CheckIndex claims the index is corrupt if you index a Token with -1 
> position, which happens if your first token has positionIncrementGap set to 0.
> But, as far as I can tell, Lucene doesn't "mind" when this happens.
> So I plan to fix CheckIndex to allow this case.  I'll backport to 2.3.2 as 
> well.
> LUCENE-1253 is one example where Lucene's core analyzers could do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1241) 0xffff char is not a string terminator

2008-04-03 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1241.


   Resolution: Won't Fix
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

> 0x char is not a string terminator
> --
>
> Key: LUCENE-1241
> URL: https://issues.apache.org/jira/browse/LUCENE-1241
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Hiroaki Kawai
>Assignee: Michael McCandless
> Attachments: ComparableCharSequence.java, LUCENE-1241.patch, 
> LUCENE-1241.take2.patch
>
>
> Current trunk index.DocumentWriter uses "\u" as a string terminator, but 
> it should not to be for some reasons. \u is not a terminator char itself 
> and we can't handle a string that really contains \u. And also, we can 
> calculate the end char position in a character sequence from the string 
> length that we already know.
> However, I agree with the usage for assertion, that "\u" is placed after 
> at the end of a string in a char sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-03 Thread Mark Miller


> - replacement of indexed for loops with for each constructs

Is this always the best idea? Doesn't the for loop construct make an
iterator, which can be much slower than an indexed for loop?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Job Offer in Barcelona! Spanish most exciting start-up

2008-04-03 Thread DaveBart


Description

Trovit.com, a classified ads website leader in Europe located in Barcelona
is looking to hire a senior Java programmer. Trovit is experimenting a
rapidly growing rate of 25% per month (real!).We don't want to be local. We
are a global company with presence in 11 countries (and growing), and by end
of the year we have planned to be in more than 15 countries.

We have more than 5.5 million users, and more than 21 million pages seen.
Our Alexa rank is (now) aprox 6k (we are a year and a half old). ...and we
want to be the # 1 on the world (not joking !)


We are looking for somebody with knowledge in these areas:

- Fluent English and Spanish (this is a desire ;) )

- Strong Java experience

- Computer science degree

- Unix/Linux experience

Desired Experience

-Experience in high performance and high scalabilty systems
-Experience with Lucene
-Knowledge about Hadoop


We offer a great opportunity in Barcelona, Spain, so if you want to join our
team, send us an e-mail with your resume to jobs @ trovit.com , referencing
this post, pleaseDescription
-- 
View this message in context: 
http://www.nabble.com/Job-Offer-in-Barcelona%21-Spanish-most-exciting-start-up-tp16467354p16467354.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Hudson build is back to normal: Lucene-trunk #425

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

[jira] Updated: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

[jira] Created: (LUCENE-1258) Increment position by default in StopFilter & QueryParser -> PhraseQuery

[jira] Resolved: (LUCENE-1255) CheckIndex should allow term position = -1

[jira] Resolved: (LUCENE-1241) 0xffff char is not a string terminator

Re: [jira] Created: (LUCENE-1257) Port to Java5

Job Offer in Barcelona! Spanish most exciting start-up

8 matches

Site Navigation

Mail list logo

Footer information