[jira] [Assigned] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-2762:
-

Assignee: Dawid Weiss

> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104313#comment-13104313
 ] 

Dawid Weiss commented on SOLR-2762:
---

David, can you add a test case for this? I'll be happy to fix it, probably 
something trivial, but a failing test case would be a great start.

> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104384#comment-13104384
 ] 

Dawid Weiss commented on LUCENE-3429:
-

bq. the correct statement is that stop would not stop a thread that is waiting 
if interrupt would also not stop it

Ehm, too many negations for me, but I think you meant the other way around? 
Anyway, there's really little to it: stop() and interrupt() both act similar: 
they attempt to break the thread's execution by throwing an exception inside 
the thread's current call stack. The difference is that interrupt() sets a flag 
on the thread which is checked by wait/sleep method and I/O and then thrown as 
a checked exception and stop() tries to throw an unchecked exception as early 
as possible and theoretically can happen at any given statement.

In a piece of software that cleans up resources using finally() and doesn't 
capture-and-ignore of Throwable/Error exceptions this shouldn't really matter 
that much and be safe.

Simon was worried about calling stop() and possibly leaving junk on disk or 
doing weird stuff. True, this can happen, but in the end it's what will happen 
anyway if a thread is busy-looped infinitely or locked: either we will try to 
kill it or the jvm will at the end of its execution.

I will modify the code to use a more graceful cascade of: interrupt() - wait a 
bit - then try to kill the thread because I still think it has advantages over 
just leaving the problematic thread running in the background. These 
disadvantages are:

- the vm will never exit from tests if the threads are non-daemon threads,
- background threads may interfere with other threads and provide noise that 
will not be reproducible.

These are my motivating factors for using stop() as a last resort option for 
threads that did go into an endless loop (or exceeded a largeish timeout time). 
Simon, I know you have a gut feeling that calling stop() is wrong, but you need 
to convince me with arguments other than just your gut feeling :)




> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104385#comment-13104385
 ] 

Dawid Weiss commented on LUCENE-3429:
-

Mike, Robert, Mark -- can you confirm my understanding of this snippet:
{code}
if (doFail && (Thread.currentThread().getName().equals("main") 
  || Thread.currentThread().getName().equals("Main Thread"))) {
{code}
what was it for (thread name checking)? Can this method be called from threads 
other than the main junit thread (and in such case shouldn't go into the if 
block?).

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated "update.processor"

2011-09-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2750:
--

Fix Version/s: (was: 3.4)
   3.5

Have you looked at the patches? I have tested that they work and plan to commit 
soon

> Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated 
> "update.processor"
> 
>
> Key: SOLR-2750
> URL: https://issues.apache.org/jira/browse/SOLR-2750
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch
>
>
> CoreAdminHandler#handleMergeAction
> DataImportHandler#handleRequestBody

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-14 Thread Jan Høydahl
Good! I'm +1 although I did not do any testing other than using latest 3x 
branch locally.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. sep. 2011, at 12:33, Michael McCandless wrote:

> This VOTE has passed!
> 
> I'll announce/publish.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Mon, Sep 12, 2011 at 5:11 PM, Yonik Seeley
>  wrote:
>> On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
>>  wrote:
>>> Please vote to release the RC1 artifacts at:
>>> 
>>>  
>>> https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
>>> 
>>> as Lucene 3.4.0 and Solr 3.4.0.
>> 
>> +1
>> 
>> -Yonik
>> http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2763) Extracting update request handler throws exception and returns 400 when zero-length file posted using multipart form post

2011-09-14 Thread Karl Wright (JIRA)
Extracting update request handler throws exception and returns 400 when 
zero-length file posted using multipart form post
-

 Key: SOLR-2763
 URL: https://issues.apache.org/jira/browse/SOLR-2763
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 3.3, 3.2, 3.1, 1.4.1
Reporter: Karl Wright


When zero-length documents are posted to the extracting update request handler, 
and the method used for posting is multipart form encoding, then you get a 400 
error returned and the following exception to stderr:

Sep 14, 2011 3:45:45 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:50)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Sep 14, 2011 3:45:45 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract params={id=123} status=400 QTime=300

Other ways of indexing zero-length data do not produce this error.

A curl command that will reproduce the problem easily is as follows:

curl -location -F "id=123" -F "file=@hello.txt" 
http://localhost:8983/solr/update/extract

... assuming hello.txt is a zero-length file.

This ticket is related to CONNECTORS-254.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread JIRA
Create a Norwegian plural/singular stemmer
--

 Key: SOLR-2764
 URL: https://issues.apache.org/jira/browse/SOLR-2764
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Jan Høydahl


We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104468#comment-13104468
 ] 

Jan Høydahl commented on SOLR-2764:
---

One idea is try the Hunspell stemmer and modify the .aff file to only do 
plural/singular of nouns

> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1170559 - in /lucene/dev/branches/branch_3x/lucene/backwards: lib/lucene-core-3.4.0.jar lib/lucene-core-3.4.0RC0.jar src/test/ src/test/org/apache/lucene/index/TestIndexWriter.java sr

2011-09-14 Thread Michael McCandless
Thanks Uwe!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 14, 2011 at 8:18 AM,   wrote:
> Author: uschindler
> Date: Wed Sep 14 12:18:40 2011
> New Revision: 1170559
>
> URL: http://svn.apache.org/viewvc?rev=1170559&view=rev
> Log:
> Merge in backwards test changes due to release of 3.4.0. Also add final JAR 
> file.
>
> Added:
>    lucene/dev/branches/branch_3x/lucene/backwards/lib/lucene-core-3.4.0.jar   
> (with props)
> Modified:
>    lucene/dev/branches/branch_3x/lucene/backwards/lib/lucene-core-3.4.0RC0.jar
>    lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props changed)
>    
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriter.java
>    
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
>
> Added: 
> lucene/dev/branches/branch_3x/lucene/backwards/lib/lucene-core-3.4.0.jar
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/backwards/lib/lucene-core-3.4.0.jar?rev=1170559&view=auto
> ==
> Binary file - no diff available.
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriter.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriter.java?rev=1170559&r1=1170558&r2=1170559&view=diff
> ==
> --- 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriter.java
>  (original)
> +++ 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriter.java
>  Wed Sep 14 12:18:40 2011
> @@ -1320,6 +1320,7 @@ public class TestIndexWriter extends Luc
>             IndexWriterConfig conf = newIndexWriterConfig(
>                                                           
> TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMaxBufferedDocs(2);
>             w = new IndexWriter(dir, conf);
> +            w.setInfoStream(VERBOSE ? System.out : null);
>
>             Document doc = new Document();
>             doc.add(newField("field", "some text contents", Field.Store.YES, 
> Field.Index.ANALYZED));
>
> Modified: 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriterDelete.java?rev=1170559&r1=1170558&r2=1170559&view=diff
> ==
> --- 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
>  (original)
> +++ 
> lucene/dev/branches/branch_3x/lucene/backwards/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
>  Wed Sep 14 12:18:40 2011
> @@ -847,7 +847,7 @@ public class TestIndexWriterDelete exten
>     }
>
>     modifier.close();
> -    TestIndexWriter.assertNoUnreferencedFiles(dir, "docswriter abort() 
> failed to delete unreferenced files");
> +    TestIndexWriter.assertNoUnreferencedFiles(dir, "docswriter abort() 
> failed to delete unreferenced files");
>     dir.close();
>   }
>
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2742) Add commitWithin to convenience signatures for SolrServer.add(..)

2011-09-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-2742.
---

Resolution: Fixed

Committed to trunk and 3.x

> Add commitWithin to convenience signatures for SolrServer.add(..)
> -
>
> Key: SOLR-2742
> URL: https://issues.apache.org/jira/browse/SOLR-2742
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: SolrJ, commitWithin
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2742.patch, SOLR-2742.patch, SOLR-2742.patch
>
>
> Today you need to manually create an UpdateRequest in order to set the 
> commitWithin value.
> We should provide an optional commitWithin parameter on all 
> SolrServer.add(..) methods as a convenience

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-3426.


Resolution: Fixed

trunk: Committed revision 1170586.
3x: Committed revision 1170593.

> optimizer for n-gram PhraseQuery
> 
>
> Key: LUCENE-3426
> URL: https://issues.apache.org/jira/browse/LUCENE-3426
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 2.9.4, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
> LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, 
> PerfTest.java
>
>
> If 2-gram is used and the length of query string is 4, for example q="ABCD", 
> QueryParser generates (when autoGeneratePhraseQueries is true) 
> PhraseQuery("AB BC CD") with slop 0. But it can be optimized PhraseQuery("AB 
> CD") with appropriate positions.
> The idea came from the Japanese paper "N.M-gram: Implementation of Inverted 
> Index Using N-gram with Hash Values" by Mikio Hirabayashi, et al. (The main 
> theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-14 Thread Grant Ingersoll (JIRA)
Create a Size Estimator model for Lucene and Solr
-

 Key: LUCENE-3435
 URL: https://issues.apache.org/jira/browse/LUCENE-3435
 Project: Lucene - Java
  Issue Type: Task
  Components: core/other
Affects Versions: 4.0
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor


It is often handy to be able to estimate the amount of memory and disk space 
that both Lucene and Solr use, given certain assumptions.  I intend to check in 
an Excel spreadsheet that allows people to estimate memory and disk usage for 
trunk.  I propose to put it under dev-tools, as I don't think it should be 
official documentation just yet and like the IDE stuff, we'll see how well it 
gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104479#comment-13104479
 ] 

Chris Male commented on SOLR-2764:
--

I don't know much about Norwegian, but I think its best to follow the same 
model as the other light / minimal stemmers.  They are incredibly efficient, 
targeted and easy to understand.

> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2750) Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated "update.processor"

2011-09-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104491#comment-13104491
 ] 

Mark Miller commented on SOLR-2750:
---

+1

> Some places look for UpdateParams.UPDATE_CHAIN but not the deprecated 
> "update.processor"
> 
>
> Key: SOLR-2750
> URL: https://issues.apache.org/jira/browse/SOLR-2750
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2750-branch_3x.patch, SOLR-2750.patch
>
>
> CoreAdminHandler#handleMergeAction
> DataImportHandler#handleRequestBody

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104500#comment-13104500
 ] 

Mark Miller commented on LUCENE-3429:
-

bq. but I think you meant the other way around?

I liked the tough negation wording! :) But I'm sticking to the order (as long 
as I'm peeling the negations off right). I can put it in the original terms 
though: if a *waiting* thread doesn't react to interrupt() it won't react to 
stop() either. But without the *waiting*, the statement is wrong.

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104503#comment-13104503
 ] 

David Smiley commented on SOLR-2762:


Maybe some day; sorry. I'm trying to push out the 2nd edition of my Solr book 
and I ran into this when kicking the tires on the Suggester. I thought it best 
to at least report the problem instead of do nothing.

> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2749) use BoundaryScanner in Solr FVH

2011-09-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2749.
--

Resolution: Fixed

trunk: Committed revision 1170616.
3x: Committed revision 1170620.

> use BoundaryScanner in Solr FVH
> ---
>
> Key: SOLR-2749
> URL: https://issues.apache.org/jira/browse/SOLR-2749
> Project: Solr
>  Issue Type: New Feature
>  Components: highlighter
>Affects Versions: 3.1, 3.2, 3.3, 3.4, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2749.patch, SOLR-2749.patch, SOLR-2749.patch
>
>
> After LUCENE-1824 committed, Solr FragmentsBuilder can snip off at the 
> "natural" boundary by nature. But to bring out the full feature, Solr should 
> take care of arbitrary BoundaryScanner in solrconfig.xml.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2372) Upgrade Solr to Tika 0.9

2011-09-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-2372:
-

Assignee: Jan Høydahl

> Upgrade Solr to Tika 0.9
> 
>
> Key: SOLR-2372
> URL: https://issues.apache.org/jira/browse/SOLR-2372
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Grant Ingersoll
>Assignee: Jan Høydahl
> Fix For: 3.4, 4.0
>
>
> as the title says

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2372) Upgrade Solr to Tika 0.9

2011-09-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2372:
--

Fix Version/s: (was: 3.4)
   3.5

Here's the diff between old and what I plan to commit. Does it look right?

Only in lib-0.9: apache-mime4j-0.6.jar
Only in lib-0.9: apache-mime4j-LICENSE-ASL.txt
Only in lib-0.9: apache-mime4j-NOTICE.txt
Only in lib-0.8: fontbox-1.3.1.jar
Only in lib-0.9: fontbox-1.4.0.jar
Only in lib-0.8: jempbox-1.3.1.jar
Only in lib-0.9: jempbox-1.4.0.jar
Only in lib-0.9: netcdf-4.2-min.jar
Only in lib-0.8: netcdf-4.2.jar
Only in lib-0.8: pdfbox-1.3.1.jar
Only in lib-0.9: pdfbox-1.4.0.jar
Only in lib-0.8: tika-core-0.8.jar
Only in lib-0.9: tika-core-0.9.jar
Only in lib-0.8: tika-parsers-0.8.jar
Only in lib-0.9: tika-parsers-0.9.jar

PS: I've built the tika-jars using Java1.5, would that be an issue?

> Upgrade Solr to Tika 0.9
> 
>
> Key: SOLR-2372
> URL: https://issues.apache.org/jira/browse/SOLR-2372
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Grant Ingersoll
>Assignee: Jan Høydahl
> Fix For: 3.5, 4.0
>
>
> as the title says

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2758:
---

Attachment: SOLR-2758_move_ConcurrentLRUCache.patch

The attached patch is svn based and it should hopefully work. I did it on trunk.

I decided against attempting to have FastLRUcache and ConcurrentLRUCache in the 
same package because FastLRUCache has numerous dependencies on the search 
package whereas ConcurrentLRUCache has no dependencies on Solr at all; just 
Lucene's PriorityQueue.

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104539#comment-13104539
 ] 

Jan Høydahl commented on SOLR-2764:
---

Unfortunately the rules for Noun conjugation is much more complex in Norwegian 
than English, and there are many irregularities.

> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104541#comment-13104541
 ] 

David Smiley commented on SOLR-2758:


Please apply this patch to both 3x and trunk branches. Someone might argue that 
_technically_ this is a breaking change because a class moved from point A to 
point B, but it is internal. And config files reference the caches via the 
"solr." convenience.

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2758:
-

Assignee: Steven Rowe

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2765) Shard/Node states

2011-09-14 Thread Yonik Seeley (JIRA)
Shard/Node states
-

 Key: SOLR-2765
 URL: https://issues.apache.org/jira/browse/SOLR-2765
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley


Need state for shards that indicate they are recovering, active/enabled, or 
disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2765) Shard/Node states

2011-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104562#comment-13104562
 ] 

Yonik Seeley commented on SOLR-2765:


- we probably want states on a per shard basis (in case we go with 
micro-sharding, a node may have multiple shards in different states).
- we might want a state on the node also... a way to mark it as "disabled" in 
general (note to rest of cluster - consider the node to be down)
- an active/enabled shard should be preferred as a leader

Perhaps at the same time thing of adding "roles" to nodes.  A comma separated 
list of values that have some pre-defined values, but that the user may also 
choose to define their own values.  One example use case would be to have a 
bank of indexers for rich text (PDF, Word, etc) that do all the work of text 
extraction or other expensive processing and forward the results to the right 
leader.  This could also be used to remove all search traffic from a node (by 
removing the standard "searcher" role) but allow it to stay up-to-date by 
remaining in the indexing loop.



> Shard/Node states
> -
>
> Key: SOLR-2765
> URL: https://issues.apache.org/jira/browse/SOLR-2765
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud, update
>Reporter: Yonik Seeley
> Fix For: 4.0
>
>
> Need state for shards that indicate they are recovering, active/enabled, or 
> disabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104564#comment-13104564
 ] 

Jan Høydahl commented on LUCENE-3414:
-

Is there a JIRA for adding HunspellStemFilterFactory to Solr?

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3414.patch, LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104565#comment-13104565
 ] 

Robert Muir commented on LUCENE-3429:
-

{quote}
what was it for (thread name checking)?
{quote}

Yes, its "main" on Sun/IBM/... and "Main Thread" on Jrockit

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3414) Bring Hunspell for Lucene into analysis module

2011-09-14 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104568#comment-13104568
 ] 

Chris Male commented on LUCENE-3414:


Nope, its on my mental TODO but go for it.

> Bring Hunspell for Lucene into analysis module
> --
>
> Key: LUCENE-3414
> URL: https://issues.apache.org/jira/browse/LUCENE-3414
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Chris Male
>Assignee: Chris Male
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3414.patch, LUCENE-3414.patch
>
>
> Some time ago I along with Robert and Uwe, wrote an Stemmer which uses the 
> Hunspell algorithm.  It has the benefit of supporting dictionaries for a wide 
> array of languages.   
> It seems to still be being used but has fallen out of date.  I think it would 
> benefit from being inside the analysis module where additional features such 
> as decompounding support, could be added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104574#comment-13104574
 ] 

Otis Gospodnetic commented on LUCENE-3435:
--

Grant - what is your experience with this estimator (the one you just 
committed)?  That is, how often is it right or close (how close?) to what you 
see in reality, assuming you give it correct input?


> Create a Size Estimator model for Lucene and Solr
> -
>
> Key: LUCENE-3435
> URL: https://issues.apache.org/jira/browse/LUCENE-3435
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2764) Create a Norwegian plural/singular stemmer

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104577#comment-13104577
 ] 

Robert Muir commented on SOLR-2764:
---

I would leave the irregularities out (e.g. just like our english one basically 
'strips the s)'.
someone can always deal with exceptions with their own list: 
stemmerOverrideFilter etc

i dont know anything about norwegian but you can take the other languages as 
examples here, 
and create the ruleset for the most common nominal inflections... e.g. strip { 
-a, -ene, -en, -er, -et }  
or whatever.


> Create a Norwegian plural/singular stemmer
> --
>
> Key: SOLR-2764
> URL: https://issues.apache.org/jira/browse/SOLR-2764
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>
> We need a light-weight stemmer for plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104582#comment-13104582
 ] 

Michael McCandless commented on SOLR-2762:
--

bq. I thought it best to at least report the problem instead of do nothing.

+1


> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2758:
--

Attachment: SOLR-2758_move_ConcurrentLRUCache.patch

When I applied the patch (using the 'patch' utility), the file movement didn't 
happen, so I modified the patch to depend on the this svn script having been 
already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104586#comment-13104586
 ] 

Steven Rowe edited comment on SOLR-2758 at 9/14/11 3:54 PM:


When I applied the patch (using the 'patch' utility), the file movement didn't 
happen, so I modified the patch to depend on this svn script having been 
already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.

  was (Author: steve_rowe):
When I applied the patch (using the 'patch' utility), the file movement 
didn't happen, so I modified the patch to depend on the this svn script having 
been already run:

{noformat}
svn mv solr/solrj/src/java/org/apache/solr/common/util/ConcurrentLRUCache.java 
solr/core/src/java/org/apache/solr/util/
{noformat}

(I generated the patch with {{svn --no-diff-deleted diff > ...}}, so that the 
source file's contents wouldn't be needlessly included in the patch.)

Also, I added a CHANGES.txt entry.

I plan on committing this shortly.
  
> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104608#comment-13104608
 ] 

Michael McCandless commented on SOLR-2761:
--

What limitation of FSTs is causing us to discretize the term frequencies?

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3435) Create a Size Estimator model for Lucene and Solr

2011-09-14 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104622#comment-13104622
 ] 

Grant Ingersoll commented on LUCENE-3435:
-

A good deal of it Mike and I worked out yesterday on IRC (well, mostly Mike 
explained and I took copious notes).  The disk storage stuff is based on LIA2.  
It is a theoretical model and not an empirical one other than the bytes/term 
calculation was based off of indexing wikipedia.  

I would deem it a gross approximation of the state of trunk at this point in 
time.  My gut says the Lucene estimation is a little low, while Solr is fairly 
close (since I suspect Solr's memory usage is dominated by caching).  I imagine 
there are things still unaccounted for. For instance, I haven't reverse 
engineered the fieldValueCache memSize() method yet and I don't have a good 
sense of how much memory would be consumed in a highly concurrent system by the 
sheer number of Query objects instantiated or when one has really large Queries 
(say 5K terms).  It also is not meant to be one size fits all.  Lucene/Solr 
have a ton of tuning options that could change things significantly.

I did a few sanity checks against things I've seen in the past, and thought it 
was reasonable.  There is, of course, no substitute for good testing.  In other 
words, caveat emptor.

> Create a Size Estimator model for Lucene and Solr
> -
>
> Key: LUCENE-3435
> URL: https://issues.apache.org/jira/browse/LUCENE-3435
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/other
>Affects Versions: 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104641#comment-13104641
 ] 

David Smiley commented on SOLR-2761:


FSTLookup is well documented, thanks to Dawid.  Here is a link to the Javadocs 
for your convenience, Mike: 
https://builds.apache.org/job/Lucene-3.x/javadoc/all/org/apache/lucene/search/suggest/fst/FSTLookup.html?is-external=true

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3434) Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable

2011-09-14 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104646#comment-13104646
 ] 

Robert Muir commented on LUCENE-3434:
-

I think you can remove the suppresswarnings and use Collections.emptyMap() 
instead of Collections.EMPTY_MAP ?

> Make ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper immutable
> -
>
> Key: LUCENE-3434
> URL: https://issues.apache.org/jira/browse/LUCENE-3434
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Chris Male
> Attachments: LUCENE-3434-3x.patch, LUCENE-3434-trunk.patch
>
>
> Both ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper have setters which 
> change some state which impacts their analysis stack.  If these are going to 
> become reusable, then the state must be immutable as changing it will have no 
> effect.
> Process will be similar to QueryAutoStopWordAnalyzer, I will remove in trunk 
> and deprecate in 3x.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104653#comment-13104653
 ] 

David Smiley commented on SOLR-2761:


It should be noted there are code comments Dawid left on doing another approach:
{code}
// Distribute weights into at most N buckets. This is a form of 
discretization to
// limit the number of possible weights so that they can be efficiently 
encoded in the
// automaton.
//
// It is assumed the distribution of weights is _linear_ so proportional 
division 
// of [min, max] range will be enough here. Other approaches could be to 
sort 
// weights and divide into proportional ranges.
{code}

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2762) FSTLookup returns one less suggestion than it should when onlyMorePopular=true

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104708#comment-13104708
 ] 

Dawid Weiss commented on SOLR-2762:
---

Ok, I'll try to reproduce on my own, thanks.

> FSTLookup returns one less suggestion than it should when onlyMorePopular=true
> --
>
> Key: SOLR-2762
> URL: https://issues.apache.org/jira/browse/SOLR-2762
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Dawid Weiss
>Priority: Minor
>
> I'm using the Suggester.  When I switched from TSTLookup to FSTLookup, I 
> noticed that it returned one fewer suggestion than what I asked for. I have 
> spellcheck.onlyMorePopular=true; when I set it to false, I see the correct 
> count. Another aspect of the bug is that this off-by-one bug only seems to 
> occur when my suggestion has an exact match.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104703#comment-13104703
 ] 

Dawid Weiss commented on LUCENE-3429:
-

bq. if a waiting thread doesn't react to interrupt() it won't react to stop() 
either. But without the waiting, the statement is wrong

Yes, this is probably true. If it's waiting (on a monitor or i/o) and doesn't 
react to interrupt, then it's in a deep hole somewhere where nothing's going to 
help it :)

bq. Yes, its "main" on Sun/IBM/... and "Main Thread" on Jrockit

Ok, so can I change it the way I suggested (i.e. have a "test" thread variable 
on the test superclass and compare to it instead)? You didn't explain the 
reason this code needs this comparison at all.

> improve build system when tests hang
> 
>
> Key: LUCENE-3429
> URL: https://issues.apache.org/jira/browse/LUCENE-3429
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3429.patch, LUCENE-3429.patch
>
>
> Currently, if tests hang in hudson it can go hung for days until we manually 
> kill it.
> The problem is that when a hang happens its probably serious, what we want to 
> do (I think), is:
> # time out the build.
> # ensure we have enough debugging information to hopefully fix any hang.
> So I think the ideal solution would be:
> # add a sysprop "-D" that LuceneTestCase respects, it could default to no 
> timeout at all (some value like zero).
> # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
> the test class? method?
> # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
> information, random seed information to hopefully reproduce the hang, and 
> fails the test.
> # nightly builds would pass some reasonable -D for each test.
> separately, I think we should have an "ant-level" timeout for the whole 
> build, in case it goes completely crazy (e.g. jvm completely hangs or 
> something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104710#comment-13104710
 ] 

Dawid Weiss commented on SOLR-2761:
---

Let me know if anything is not clear, Mike.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2766) ant prepare-release fails to package the javadocs for solr-test-framework and solr-solrj

2011-09-14 Thread Michael McCandless (JIRA)
ant prepare-release fails to package the javadocs for solr-test-framework and 
solr-solrj


 Key: SOLR-2766
 URL: https://issues.apache.org/jira/browse/SOLR-2766
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.5



I was updating Solr's web site with the 3.4.0 release, but suddenly
discovered that the javadocs for the test-framework and solrj (linked
under the Documentation tab on the left) are missing from
apache-solr-3.4.0.tgz.

Ie, when I "tar xzf" that, then:

{noformat}
find . -name index.html
./docs/index.html
./docs/api/index.html
{noformat}

(3.3.0's tgz does include them)

I think this is just a packaging problem (maybe from the recent
renaming?); I see their javadocs under solr/build/solr-solrj/docs/api
and solr/build/solr-test-framework/docs/api.

I also see seprate javadocs for all solr contribs... are these
supposed to be published on the web site?

For now I've just copied up the solrj and test-framework jdocs, built
from the 3.4 branch, to the site.  But we should fix this for
3.5.0


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104761#comment-13104761
 ] 

Michael McCandless commented on SOLR-2761:
--

Ooooh, the javadocs and comments are awesome! -- thanks Dawid and
David.

I was just wondering what specifically is the limitation on our FST
impl and whether it's something we could improve.  It sounds like the
limitation is just how we quantize the incoming weights...

David, when you use > 100 buckets did you see bad performance for
low-weight lookups?

Maybe, in addition to the up-front quantization, we could also store
a more exact weight for each term (eg as the output).  Then on
retrieve we could re-sort the candidates by that exact weight.  But
this will make the FST larger...


> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104766#comment-13104766
 ] 

David Smiley commented on SOLR-2761:


bq. David, when you use > 100 buckets did you see bad performance for 
low-weight lookups?

I didn't try in any serious way. I was simply writing about this feature when I 
observed the suggestions were poor compared to other Lookup impls and other 
ways of doing term completion. Then I started digging into why and what could 
be done about it.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2766) ant prepare-release fails to package the javadocs for solr-test-framework and solr-solrj

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104773#comment-13104773
 ] 

Steven Rowe commented on SOLR-2766:
---

bq. I think this is just a packaging problem (maybe from the recent renaming?); 
I see their javadocs under solr/build/solr-solrj/docs/api and 
solr/build/solr-test-framework/docs/api.

Yes, this is undoubtedly my fault (SOLR-2452).  I'll investigate.

> ant prepare-release fails to package the javadocs for solr-test-framework and 
> solr-solrj
> 
>
> Key: SOLR-2766
> URL: https://issues.apache.org/jira/browse/SOLR-2766
> Project: Solr
>  Issue Type: Bug
>Reporter: Michael McCandless
> Fix For: 3.5
>
>
> I was updating Solr's web site with the 3.4.0 release, but suddenly
> discovered that the javadocs for the test-framework and solrj (linked
> under the Documentation tab on the left) are missing from
> apache-solr-3.4.0.tgz.
> Ie, when I "tar xzf" that, then:
> {noformat}
> find . -name index.html
> ./docs/index.html
> ./docs/api/index.html
> {noformat}
> (3.3.0's tgz does include them)
> I think this is just a packaging problem (maybe from the recent
> renaming?); I see their javadocs under solr/build/solr-solrj/docs/api
> and solr/build/solr-test-framework/docs/api.
> I also see seprate javadocs for all solr contribs... are these
> supposed to be published on the web site?
> For now I've just copied up the solrj and test-framework jdocs, built
> from the 3.4 branch, to the site.  But we should fix this for
> 3.5.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-2540) CommitWithin as an Update Request parameter

2011-09-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-2540:



the new ExtractingRequestHandlerTest.testCommitWithin method fails fairly 
reliable on multiple machines.

Noted by sarowe on the dev list...

{noformat}
Subject: Trunk test failure: ExtractingRequestHandlerTest.testCommitWithin() 
[was: [JENKINS-MAVEN]
Lucene-Solr-Maven-trunk #239: POMs out of sync]

This is 100% reproducible on my local machine (run from 
solr/contrib/extraction/):

ant test -Dtestcase=ExtractingRequestHandlerTest -Dtestmethod=testCommitWithin
-Dtests.seed=-2b35f16e02bddd0d:5c36eb67e44fc16d:-54d0d485d6a45315
{noformat}

...i can reproduce this failure everytime i try (regardless of seed)


> CommitWithin as an Update Request parameter
> ---
>
> Key: SOLR-2540
> URL: https://issues.apache.org/jira/browse/SOLR-2540
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: commit, commitWithin
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2540.patch, SOLR-2540.patch
>
>
> It would be useful to support commitWithin HTTP GET request param on all 
> UpdateRequestHandlers.
> That way, you could set commitWithin on the request (for XML, JSON, CSV, 
> Binary and Extracting handlers) with this syntax:
> {code}
>   curl 
> http://localhost:8983/solr/update/extract?literal.id=123&commitWithin=1
>-H "Content-Type: application/pdf" --data-binary @file.pdf
> {code}
> PS: The JsonUpdateRequestHandler and BinaryUpdateRequestHandler already 
> support this syntax.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Trunk test failure: ExtractingRequestHandlerTest.testCommitWithin() [was: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync]

2011-09-14 Thread Chris Hostetter

: This is 100% reproducible on my local machine (run from 
solr/contrib/extraction/):
: 
: ant test -Dtestcase=ExtractingRequestHandlerTest 
-Dtestmethod=testCommitWithin 
-Dtests.seed=-2b35f16e02bddd0d:5c36eb67e44fc16d:-54d0d485d6a45315

I reopend SOLR-2540, where this test was added.

Jan?  are you looking at this?

: 
: Steve
: 
: > -Original Message-
: > From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
: > Sent: Tuesday, September 13, 2011 12:09 PM
: > To: dev@lucene.apache.org
: > Subject: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #239: POMs out of sync
: > 
: > Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/239/
: > 
: > 1 tests failed.
: > FAILED:
: > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
: > tWithin
: > 
: > Error Message:
: > Exception during query
: > 
: > Stack Trace:
: > java.lang.RuntimeException: Exception during query
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:396)
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:363)
: > at
: > org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testCommi
: > tWithin(ExtractingRequestHandlerTest.java:306)
: > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
: > at
: > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
: > :57)
: > at
: > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
: > mpl.java:43)
: > at java.lang.reflect.Method.invoke(Method.java:616)
: > at
: > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMeth
: > od.java:44)
: > at
: > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallabl
: > e.java:15)
: > at
: > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod
: > .java:41)
: > at
: > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.
: > java:20)
: > at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
: > at
: > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
: > :28)
: > at
: > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
: > 1)
: > at
: > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.
: > java:76)
: > at
: > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
: > .java:148)
: > at
: > org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner
: > .java:50)
: > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
: > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
: > at
: > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
: > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
: > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
: > at
: > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java
: > :28)
: > at
: > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
: > 1)
: > at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
: > at
: > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java
: > :35)
: > at
: > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Prov
: > ider.java:146)
: > at
: > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.jav
: > a:97)
: > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
: > at
: > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
: > :57)
: > at
: > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
: > mpl.java:43)
: > at java.lang.reflect.Method.invoke(Method.java:616)
: > at
: > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(
: > ProviderFactory.java:103)
: > at $Proxy0.invoke(Unknown Source)
: > at
: > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireS
: > tarter.java:145)
: > at
: > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(Suref
: > ireStarter.java:87)
: > at
: > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
: > Caused by: java.lang.RuntimeException: REQUEST FAILED:
: > xpath=//*[@numFound='1']
: > xml response was: 
: > 
: > 0 name="QTime">0 start="0">
: > 
: > 
: > request was:start=0&q=id:one&qt=standard&rows=20&version=2.2
: > at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:389)
: > ... 36 more
: > 
: > 
: > 
: > 
: > Build Log (for compile errors):
: > [...truncated 24297 lines...]
: > 
: > 
: > 
: > -
: > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: > For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss

-

[jira] [Resolved] (SOLR-2758) ConcurrentLRUCache should move from common/util to solr/util

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2758.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.5

Committed to trunk and branch_3x.

Thanks David!

> ConcurrentLRUCache should move from common/util to solr/util
> 
>
> Key: SOLR-2758
> URL: https://issues.apache.org/jira/browse/SOLR-2758
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch, 
> SOLR-2758_move_ConcurrentLRUCache.patch
>
>
> There is exactly one small dependency that the SolrJ jar has to lucene-core 
> and that is indirectly via ConcurrentLRUCache which is in common/util.  SolrJ 
> doesn't even use this cache but it's there any way.  Attached is a patch for 
> the move. It also removes the lucene-core dependency that the SolrJ maven 
> pom.xml has on lucene-core.
> Steve Rowe agrees:
> https://issues.apache.org/jira/browse/SOLR-2756?focusedCommentId=13103103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13103103

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104797#comment-13104797
 ] 

Steven Rowe commented on SOLR-2756:
---

bq. I think ConcurrentLRUCache should be moved from Solrj to solr-core.

Done: SOLR-2758

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2760) Cannot "ant dist or ant example"

2011-09-14 Thread Grant Ingersoll
Did you clean first?

On Sep 14, 2011, at 1:49 AM, Bill Bell wrote:

> Thoughts on this?
> 
> I did an "svn up"
> 
> 
> On 9/13/11 11:00 PM, "Bill Bell (JIRA)"  wrote:
> 
>> Cannot "ant dist or ant example"
>> 
>> 
>>Key: SOLR-2760
>>URL: https://issues.apache.org/jira/browse/SOLR-2760
>>Project: Solr
>> Issue Type: Bug
>>   Reporter: Bill Bell
>> 
>> 
>> Path: .
>> URL: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr
>> Repository Root: http://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 1170435
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: chrism
>> Last Changed Rev: 1170425
>> Last Changed Date: 2011-09-13 21:36:56 -0600 (Tue, 13 Sep 2011)
>> 
>> 
>> Then
>> 
>>> ant dist or ant example
>> 
>> compile-core:
>>   [javac] Compiling 23 source files to
>> /Users/bill/solr/trunk/modules/queries/build/classes/java
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:48: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put("searcher",searcher);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NormValueSource.java:61: cannot find symbol
>>   [javac] symbol  : class ConstDoubleDocValues
>>   [javac] location: class
>> org.apache.lucene.queries.function.valuesource.NormValueSource
>>   [javac]   return new ConstDoubleDocValues(0.0, this);
>>   [javac]  ^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/NumDocsValueSource.java:40: cannot find symbol
>>   [javac] symbol  : class ConstIntDocValues
>>   [javac] location: class
>> org.apache.lucene.queries.function.valuesource.NumDocsValueSource
>>   [javac] return new
>> ConstIntDocValues(ReaderUtil.getTopLevelContext(readerContext).reader.numD
>> ocs(), this);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/QueryValueSource.java:73: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put(this, w);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/ScaleFloatFunction.java:96: warning: [unchecked]
>> unchecked call to put(K,V) as a member of the raw type java.util.Map
>>   [javac] context.put(this.source, scaleInfo);
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/SumTotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>   [javac] context.put(this, new LongDocValues(this) {
>>   [javac]^
>>   [javac] 
>> /Users/bill/solr/trunk/modules/queries/src/java/org/apache/lucene/queries/
>> function/valuesource/TotalTermFreqValueSource.java:68: warning:
>> [unchecked] unchecked call to put(K,V) as a member of the raw type
>> java.util.Map
>>   [javac] context.put(this, new LongDocValues(this) {
>>   [javac]^
>>   [javac] 2 errors
>>   [javac] 5 warnings
>> 
>> 
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com



[jira] [Updated] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2756:
--

Attachment: SOLR-2756-zookeeper-and-stax-api.patch

bq. there appears to be dependency on stax:stax-api:jar:1.0.1 that is 
questionably if we already depend on geronimo's stax API - which I assume is 
the same Stax API.

This version of the patch excludes the stax:stax-api transitive dependency.

I also added a {{CHANGES.txt}} entry.

I plan on committing later today to branch_3x, then applying the same 
stax:stax-api transitive dependency exclusion to trunk (the other changes to 
branch_3x are not applicable to trunk).

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-14 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104824#comment-13104824
 ] 

Shawn Heisey commented on SOLR-2739:


This is still a problem.  As it seems to be specific to my environment, I am 
very interested in tracking it down, but I have no idea where to begin.  My 
current test setup is CentOS 6, ext4, Oracle Java 1.6.0_27-b07.  Can you give 
me pointers on how to figure out what the problem is?  Do you need me to 
provide any more information than I have already?

Steps to reproduce current failures on my system from the commandline with 3.4 
or branch_3x:

svn co https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_4 
lucene_solr_3_4
cd lucene_solr_3_4/solr
ant test

svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x branch_3x
cd branch_3x/solr
ant test

Some additional info:

[root@bigindy5 src]# rpm -qa | egrep "ant|junit|java"
java-1.6.0-openjdk-1.6.0.0-1.36.b17.el6_0.x86_64
java-1.6.0-sun-1.6.0.27-1.0.cf.x86_64
ant-nodeps-1.7.1-13.el6.x86_64
wpa_supplicant-0.6.8-10.el6.x86_64
libvirt-java-devel-0.4.5-2.el6.noarch
java-1.6.0-sun-devel-1.6.0.27-1.0.cf.x86_64
enchant-1.5.0-4.el6.x86_64
tzdata-java-2011g-1.el6.noarch
java_cup-0.10k-5.el6.x86_64
junit-3.8.2-6.5.el6.x86_64
java-1.6.0-sun-plugin-1.6.0.27-1.0.cf.x86_64
anthy-9100h-10.1.el6.x86_64
libvirt-java-0.4.5-2.el6.noarch
java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64
ant-1.7.1-13.el6.x86_64
ant-junit-1.7.1-13.el6.x86_64
ibus-anthy-1.2.1-1.el6.x86_64
java-1.6.0-sun-jdbc-1.6.0.27-1.0.cf.x86_64
junit4-4.5-5.3.el6.noarch

[root@bigindy5 src]# java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode)

[root@bigindy5 src]# uname -a
Linux bigindy5 2.6.32-71.29.1.el6.centos.plus.x86_64 #1 SMP Sun Jun 26 16:27:27 
BST 2011 x86_64 x86_64 x86_64 GNU/Linux


> TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some 
> systems
> ---
>
> Key: SOLR-2739
> URL: https://issues.apache.org/jira/browse/SOLR-2739
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.3
>Reporter: Shawn Heisey
>Assignee: Hoss Man
> Fix For: 3.4, 4.0
>
>
> Shawn Heisey noted on the mailing list that he was getting consistent 
> failures from TestSqlEntityProcessorDelta.testNonWritablePersistFile on his 
> machine.
> I can't reproduce his exact failures, but the test is hinky enough that i 
> want to try and clean it up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2739) TestSqlEntityProcessorDelta.testNonWritablePersistFile failures on some systems

2011-09-14 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104841#comment-13104841
 ] 

Shawn Heisey commented on SOLR-2739:


I've also been seeing intermittent failures in TestCSVLoader, in both 3.4 and 
3x.  The nature of the failure is the same as on TestSqlEntityProcessorDelta, 
numFound shows a different value than what the test expects.  If I run the 
following command over and over, sometimes it will fail, but mostly it will 
pass:

ant test -Dtestcase=TestCSVLoader -Dtestmethod=testCommitWithin

Here's one failure on lucene_solr_3_4:

[junit] Testsuite: org.apache.solr.handler.TestCSVLoader
[junit] Testcase: testCommitWithin(org.apache.solr.handler.TestCSVLoader):  
Caused an ERROR
[junit] Exception during query
[junit] java.lang.RuntimeException: Exception during query
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:385)
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:352)
[junit] at 
org.apache.solr.handler.TestCSVLoader.testCommitWithin(TestCSVLoader.java:121)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
[junit] Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:378)
[junit]
[junit]
[junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 29.793 sec
[junit]
[junit] - Standard Error -
[junit] 2011-09-15 12:46:04.PD org.apache.solr.SolrTestCaseJ4 assertQ
[junit] SEVERE: REQUEST FAILED: xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] 2011-09-15 12:46:04.PD org.apache.solr.common.SolrException log
[junit] SEVERE: REQUEST FAILED: 
start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0:java.lang.RuntimeException:
 REQUEST FAILED: xpath=//*[@numFound='3']
[junit] xml response was: 
[junit] 
[junit] 
00
[junit] 
[junit]
[junit] request 
was:start=0&q=id:[100+TO+110]&qt=standard&rows=20&version=2.0
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:378)
[junit] at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:352)
[junit] at 
org.apache.solr.handler.TestCSVLoader.testCommitWithin(TestCSVLoader.java:121)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
[junit] at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
[junit] at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
[junit] at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
[junit] at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
[junit] at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
[junit] at 
junit.framework.JUnit

[jira] [Created] (LUCENE-3436) Spellchecker "Suggest Mode" Support

2011-09-14 Thread James Dyer (JIRA)
Spellchecker "Suggest Mode" Support
---

 Key: LUCENE-3436
 URL: https://issues.apache.org/jira/browse/LUCENE-3436
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 3.3, 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 3.5, 4.0


This is a spin-off from SOLR-2585.

Currently o.a.l.s.s.SpellChecker and o.a.l.s.s.DirectSpellChecker support two 
"Suggest Modes":
1. Suggest for terms that are not in the index.
2. Suggest "more popular" terms.

This issue is to add a third Suggest Mode:
3. Suggest always.

This will assist users in developing context-sensitive spell suggestions and/or 
did-you-mean suggestions.  See SOLR-2585 for a full discussion.

Note that o.a.l.s.s.SpellChecker already can support this functionality, if the 
user passes in a NULL term & IndexReader.  This, however, is not intutive.  
o.a.l.s.s.DirectSpellChecker currently has no support for this third Suggest 
Mode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104843#comment-13104843
 ] 

Ryan McKinley commented on SOLR-2756:
-

bq. However, because ConcurrentLRUCache is the only class in Solrj that 
requires the lucene-core dependency, and solr-core's FastLRUCache is the only 
Lucene/Solr use of ConcurrentLRUCache, I think ConcurrentLRUCache should be 
moved from Solrj to solr-core.

+1

solrj is the *client* it should not have any dependency on the server.   

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104845#comment-13104845
 ] 

Dawid Weiss commented on SOLR-2761:
---

I guess a lot depends on the use case. In my case quantization was not a 
problem (the scores were "rough" and query independent anyway, so they did fall 
into corresponding buckets). "poor" performance would then have to be backed by 
what the requirement really is -- if one needs sorting by exact scores then the 
method used to speed up FSTLookup simply isn't a good fit. Still, compared to 
fetching everything and resorting this is a hell of a lot faster, so many folks 
(including me) may find it helpful.

It all depends, in other words.

As for using more buckets -- sure, you can do this. In fact, you can combine 
both approaches and use quantization to prefetch a buffer of matches, then 
collect outputs, sort and if this fills your desired number of results then 
there is no need to search any further because all other buckets will have 
lower scores (exact).

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3436) Spellchecker "Suggest Mode" Support

2011-09-14 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated LUCENE-3436:
---

Attachment: LUCENE-3436.patch

- Creates a new Enum, "SuggestMode".  
- SpellChecker methods that used to take a boolean "morePopular" have been 
converted to use the new Enum.
- Old SpellChecker methods have been marked as @Deprecated with comments (can 
be removed entirely for Trunk.  Included here for possible 3.x inclusion)
- A new Unit Test method for 0.a.l.s.s.SpellChecker tests 
"SUGGEST_MORE_POPULAR" and "SUGGEST_ALWAYS" (prior, "morePopular" had no test 
coverage).
- A new Unit Test scenario added to TestDirectSpellChecker for testing 
"SUGGEST_ALWAYS".

> Spellchecker "Suggest Mode" Support
> ---
>
> Key: LUCENE-3436
> URL: https://issues.apache.org/jira/browse/LUCENE-3436
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Affects Versions: 3.3, 4.0
>Reporter: James Dyer
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3436.patch
>
>
> This is a spin-off from SOLR-2585.
> Currently o.a.l.s.s.SpellChecker and o.a.l.s.s.DirectSpellChecker support two 
> "Suggest Modes":
> 1. Suggest for terms that are not in the index.
> 2. Suggest "more popular" terms.
> This issue is to add a third Suggest Mode:
> 3. Suggest always.
> This will assist users in developing context-sensitive spell suggestions 
> and/or did-you-mean suggestions.  See SOLR-2585 for a full discussion.
> Note that o.a.l.s.s.SpellChecker already can support this functionality, if 
> the user passes in a NULL term & IndexReader.  This, however, is not 
> intutive.  o.a.l.s.s.DirectSpellChecker currently has no support for this 
> third Suggest Mode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2756) SolrJ maven dependencies are faulty; needless dependency on lucene-core

2011-09-14 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104857#comment-13104857
 ] 

Steven Rowe commented on SOLR-2756:
---

With the patch applied, the output under {{solr/solrj/}} from {{mvn 
dependency:tree}} under Java 1.6 is now:

{noformat}
[INFO] org.apache.solr:solr-solrj:jar:3.5-SNAPSHOT
[INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.1:compile (version managed 
from 1.0.4)
[INFO] |  \- commons-codec:commons-codec:jar:1.4:compile (version managed from 
1.2)
[INFO] +- commons-io:commons-io:jar:1.4:compile
[INFO] +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
[INFO] \- org.slf4j:slf4j-api:jar:1.6.1:compile
{noformat}

> SolrJ maven dependencies are faulty; needless dependency on lucene-core
> ---
>
> Key: SOLR-2756
> URL: https://issues.apache.org/jira/browse/SOLR-2756
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 3.3
>Reporter: David Smiley
>Priority: Minor
> Attachments: SOLR-2756-zookeeper-and-stax-api.patch, 
> SOLR-2756-zookeeper-and-stax-api.patch
>
>
> I included a SolrJ 3.3 dependency into a new project and I noticed needless 
> dependencies transitive show up.
> Here is a subset of the output from mvn dependency:tree:
> {noformat}
> [INFO] +- org.apache.solr:solr-solrj:jar:3.3.0:compile
> [INFO] |  +- org.apache.lucene:lucene-core:jar:3.3.0:compile
> [INFO] |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
> [INFO] |  |  \- commons-codec:commons-codec:jar:1.2:compile
> [INFO] |  +- 
> org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:compile
> [INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.3.1:compile
> [INFO] |  |  +- log4j:log4j:jar:1.2.15:compile
> [INFO] |  |  |  \- javax.mail:mail:jar:1.4:compile
> [INFO] |  |  | \- javax.activation:activation:jar:1.1:compile
> [INFO] |  |  \- jline:jline:jar:0.9.94:compile
> [INFO] |  \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:compile
> [INFO] | \- stax:stax-api:jar:1.0.1:compile
> {noformat}
> Clearly there is an inconsistency with solr/dist/solrj-lib and this list.
> * lucene-core dependency should be removed
> * AFAIK, geronimo-stax-api and wstx-asl are only needed for Java 1.5.  Right? 
>  These can be put in a maven profile activated by jdk1.5.
> * zookeeper dependency should be removed. Is this used in Solr 4?  Even if it 
> is, it strikes me as an optional dependency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

2011-09-14 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2585:
-

Attachment: SOLR-2585.patch

As previously discussed, the Lucene portion of this issue has been spun off to 
LUCENE-3436.  

This is a new patch with just the Solr piece.  Also, the new "Suggest Mode" 
enum is used both for the Original Lucene Spell Checker and DirectSpellChecker.

> Context-Sensitive Spelling Suggestions & Collations
> ---
>
> Key: SOLR-2585
> URL: https://issues.apache.org/jira/browse/SOLR-2585
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2585.patch, SOLR-2585.patch, SOLR-2585.patch
>
>
> Solr currently cannot offer what I'm calling here a "context-sensitive" 
> spelling suggestion.  That is, if a user enters one or more words that have 
> docFrequency > 0, but nevertheless are misspelled, then no suggestions are 
> offered.  Currently, Solr will always consider a word "correctly spelled" if 
> it is in the index and/or dictionary, regardless of context.  This issue & 
> patch add support for context-sensitive spelling suggestions. 
> See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical 
> use case for this functionality.  This tests both using 
> IndexBasedSepllChecker and DirectSolrSpellChecker. 
> Two new Spelling Parameters were added:
>   - spellcheck.alternativeTermCount - The count of suggestions to return for 
> each query term existing in the index and/or dictionary.  Presumably, users 
> will want fewer suggestions for words with docFrequency>0.  Also setting this 
> value turns "on" context-sensitive spell suggestions. 
>   - spellcheck.maxResultsForSuggest - The maximum number of hits the request 
> can return in order to both generate spelling suggestions and set the 
> "correctlySpelled" element to "false".  For example, if this is set to 5 and 
> the user's query returns 5 or fewer results, the spellchecker will report 
> "correctlySpelled=false" and also offer suggestions (and collations if 
> requested).  Setting this greater than zero is useful for creating 
> "did-you-mean" suggestions for queries that return a low number of hits.
> I have also included a test using shards.  See additions to 
> DistributedSpellCheckComponentTest. 
> In Lucene, SpellChecker.java can already support this functionality (by 
> passing a null IndexReader and field-name).  The DirectSpellChecker, however, 
> needs a minor enhancement.  This gives the option to allow DirectSpellChecker 
> to return suggestions for all query terms regardless of frequency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2761) FSTLookup should use long-tail like discretization instead of proportional (linear)

2011-09-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104864#comment-13104864
 ] 

David Smiley commented on SOLR-2761:


I recommend that we follow through with the alternative suggested in the source 
code comment: sort by weight and divide evenly.  That will handle the actual 
distribution in the data no matter what the curve looks like.

> FSTLookup should use long-tail like discretization instead of proportional 
> (linear)
> ---
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 3.4
>Reporter: David Smiley
>Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies 
> into a configurable number of buckets (configurable as "weightBuckets") in 
> order to deal with FST limitations. The mapping of a source frequency into a 
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum 
> value. I don't think this makes sense at all given the well-known long-tail 
> like distribution of term frequencies. As a result of this problem, I've 
> found it necessary to increase weightBuckets substantially, like >100, to get 
> quality suggestions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2767) ClassCastException when using FieldAnalysisResponse and the analyzer list contains the CharMappingFilter (or any CharFilter)

2011-09-14 Thread Spyros Kapnissis (JIRA)
ClassCastException when using FieldAnalysisResponse and the analyzer list 
contains the CharMappingFilter (or any CharFilter)


 Key: SOLR-2767
 URL: https://issues.apache.org/jira/browse/SOLR-2767
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.3, 4.0
Reporter: Spyros Kapnissis


I use the FieldAnalysisResponse class in order to gather some information about 
the analysis process. However, I get a ClassCastException (cannot convert 
String to NamedList) thrown at AnalysisResponseBase.buildPhases method 
if I have included the CharMappingFilter in my configuration.

It seems that a CharFilter does not create a NamedList, but a String 
entry in the analysis response.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104917#comment-13104917
 ] 

Yonik Seeley commented on SOLR-2066:


I took a quick peek at this, and I see some changes to how distrib search works 
(ShardRequestFactory).
Could you give a brief explanation about the need for that and how it works?  
Maybe changes like this should be in their own issue so it's easy to tell other 
refactoring vs what's needed just for grouping.

> Search Grouping: support distributed search
> ---
>
> Key: SOLR-2066
> URL: https://issues.apache.org/jira/browse/SOLR-2066
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
> Fix For: 3.5, 4.0
>
> Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
> SOLR-2066.patch, SOLR-2066.patch
>
>
> Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Lucene 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Lucene™ 3.4.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 3.4.0.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:

   http://www.apache.org/dyn/closer.cgi/lucene/java (see note below).

If you are already using Apache Lucene 3.1, 3.2 or 3.3, we strongly
recommend you upgrade to 3.4.0 because of the index corruption bug on
OS or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 3.4.0 Release Highlights:

  * Fixed a major bug (LUCENE-3418) whereby a Lucene index could
easily become corrupted if the OS or computer crashed or lost
power.

  * Added a new faceting module (contrib/facet) for computing facet
counts (both hierarchical and non-hierarchical) at search
time (LUCENE-3079).

  * Added a new join module (contrib/join), enabling indexing and
searching of nested (parent/child) documents using
BlockJoinQuery/Collector (LUCENE-3171).

  * It is now possible to index documents with term frequencies
included but without positions (LUCENE-2048); previously
omitTermFreqAndPositions always omitted both.

  * The modular QueryParser (contrib/queryparser) can now create
NumericRangeQuery.

  * Added SynonymFilter, in contrib/analyzers, to apply multi-word
synonyms during indexing or querying, including parsers to read
the wordnet and solr synonym formats (LUCENE-3233).

  * You can now control how documents that don't have a value on the
sort field should sort (LUCENE-3390), using SortField.setMissingValue.

  * Fixed a case where term vectors could be silently deleted from the
index after addIndexes (LUCENE-3402).

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Solr 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Solr™ 3.4.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0.

Apache Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
distributed search and index replication, and it powers the search and
navigation features of many of the world's largest internet sites.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:

   http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).

If you are already using Apache Solr 3.1, 3.2 or 3.3, we strongly
recommend you upgrade to 3.4.0 because of the index corruption bug on OS
or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.4.0 Release Highlights:

  * Bug fixes and improvements from Apache Lucene 3.4.0, including a
major bug (LUCENE-3418) whereby a Lucene index could
easily become corrupted if the OS or computer crashed or lost
power.

  * SolrJ client can now parse grouped and range facets results
(SOLR-2523).

  * A new XsltUpdateRequestHandler allows posting XML that's
transformed by a provided XSLT into a valid Solr document
(SOLR-2630).

  * Post-group faceting option (group.truncate) can now compute
facet counts for only the highest ranking documents per-group.
(SOLR-2665).

  * Add commitWithin update request parameter to all update handlers
that were previously missing it.  This tells Solr to commit the
change within the specified amount of time (SOLR-2540).

  * You can now specify NIOFSDirectory (SOLR-2670).

  * New parameter hl.phraseLimit speeds up FastVectorHighlighter
(LUCENE-3234).

  * The query cache and filter cache can now be disabled per request
See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
(SOLR-2429).

  * Improved memory usage, build time, and performance of
SynonymFilterFactory (LUCENE-3233).

  * Added omitPositions to the schema, so you can omit position
information while still indexing term frequencies (LUCENE-2048).

  * Various fixes for multi-threaded DataImportHandler.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3380) enable FileSwitchDirectory randomly in tests and fix compound-file/NoSuchDirectoryException bugs

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3380:
---

Fix Version/s: (was: 3.4)
   3.5

> enable FileSwitchDirectory randomly in tests and fix 
> compound-file/NoSuchDirectoryException bugs
> 
>
> Key: LUCENE-3380
> URL: https://issues.apache.org/jira/browse/LUCENE-3380
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3380.patch
>
>
> Looks like FileSwitchDirectory has the same bugs in it as LUCENE-3374.
> We should randomly enable this guy in tests and flush them all out the same 
> way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3363) minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3363:
---

Fix Version/s: (was: 3.4)
   3.5

> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> --
>
> Key: LUCENE-3363
> URL: https://issues.apache.org/jira/browse/LUCENE-3363
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch 
> (LUCENE-3030) and comment out the @Ignore in 
> TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant 
> test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite 
> -Dtests.seed=-2577608857970454726:-2463580050179334504}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3343) Comparison operators >,>=,<,<= and = support as RangeQuery syntax in QueryParser

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3343:
---

Fix Version/s: (was: 3.4)
   3.5

> Comparison operators >,>=,<,<= and = support as RangeQuery syntax in 
> QueryParser
> 
>
> Key: LUCENE-3343
> URL: https://issues.apache.org/jira/browse/LUCENE-3343
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/queryparser
>Reporter: Olivier Favre
>Assignee: Adriano Crestani
>Priority: Minor
>  Labels: parser, query
> Fix For: 3.5, 4.0
>
> Attachments: NumCompQueryParser-3x.patch, NumCompQueryParser.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> To offer better interoperability with other search engines and to provide an 
> easier and more straight forward syntax,
> the operators >, >=, <, <= and = should be available to express an open range 
> query.
> They should at least work for numeric queries.
> '=' can be made a synonym for ':'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3415) Snowball filter to include original word too

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3415:
---

Fix Version/s: (was: 3.4)
   3.5

> Snowball filter to include original word too
> 
>
> Key: LUCENE-3415
> URL: https://issues.apache.org/jira/browse/LUCENE-3415
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.3
> Environment: All
>Reporter: Manish
>  Labels: features
> Fix For: 3.5, 4.0
>
>
> 1. Currently, snowball filter deletes the original word and adds the stemmed 
> word to the index. So, if i want to do search with / without stemming, i have 
> to keep 2 fields, one with stemming and one without it. 
> 2. Rather than doing this, if we have configurable item to preserve original, 
> it would solve more business problem. 
> 3. Using single field, i can do search using stemming / without stemming by 
> changing the query filters. 
> The same can also be done for phonetic filters too. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3392) Combining analyzers output

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3392:
---

Fix Version/s: (was: 3.4)
   3.5

> Combining analyzers output
> --
>
> Key: LUCENE-3392
> URL: https://issues.apache.org/jira/browse/LUCENE-3392
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Olivier Favre
>Priority: Minor
>  Labels: analysis
> Fix For: 3.5, 4.0
>
> Attachments: ComboAnalyzer-lucene-trunk.patch, 
> ComboAnalyzer-lucene3x.patch, ComboAnalyzer-lucene3x.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It should be easy to combine the output of multiple Analyzers, or 
> TokenStreams.
> A ComboAnalyzer and a ComboTokenStream class would take multiple instances, 
> and multiplex their output, keeping a rough order of tokens like increasing 
> position then increasing start offset then increasing end offset.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3229) SpanNearQuery: ordered spans should not overlap

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3229:
---

Fix Version/s: (was: 3.4)
   3.5

> SpanNearQuery: ordered spans should not overlap
> ---
>
> Key: LUCENE-3229
> URL: https://issues.apache.org/jira/browse/LUCENE-3229
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1
> Environment: Windows XP, Java 1.6
>Reporter: ludovic Boutros
> Fix For: 3.5
>
> Attachments: LUCENE-3229.patch, LUCENE-3229.patch, SpanOverlap.diff, 
> SpanOverlap2.diff, SpanOverlapTestUnit.diff
>
>
> While using Span queries I think I've found a little bug.
> With a document like this (from the TestNearSpansOrdered unit test) :
> "w1 w2 w3 w4 w5"
> If I try to search for this span query :
> spanNear([spanNear([field:w3, field:w5], 1, true), field:w4], 0, true)
> the above document is returned and I think it should not because 'w4' is not 
> after 'w5'.
> The 2 spans are not ordered, because there is an overlap.
> I will add a test patch in the TestNearSpansOrdered unit test.
> I will add a patch to solve this issue too.
> Basicaly it modifies the two docSpansOrdered functions to make sure that the 
> spans does not overlap.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3218) Make CFS appendable

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3218:
---

Fix Version/s: (was: 3.4)
   3.5

> Make CFS appendable  
> -
>
> Key: LUCENE-3218
> URL: https://issues.apache.org/jira/browse/LUCENE-3218
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 3.4, 4.0
>Reporter: Simon Willnauer
>Priority: Blocker
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
> LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
> LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch
>
>
> Currently CFS is created once all files are written during a flush / merge. 
> Once on disk the files are copied into the CFS format which is basically a 
> unnecessary for some of the files. We can at any time write at least one file 
> directly into the CFS which can save a reasonable amount of IO. For instance 
> stored fields could be written directly during indexing and during a Codec 
> Flush one of the written files can be appended directly. This optimization is 
> a nice sideeffect for lucene indexing itself but more important for DocValues 
> and LUCENE-3216 we could transparently pack per field files into a single 
> file only for docvalues without changing any code once LUCENE-3216 is 
> resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3184) add LuceneTestCase.rarely()/LuceneTestCase.atLeast()

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3184:
---

Fix Version/s: (was: 3.4)
   3.5

> add LuceneTestCase.rarely()/LuceneTestCase.atLeast()
> 
>
> Key: LUCENE-3184
> URL: https://issues.apache.org/jira/browse/LUCENE-3184
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3184.patch
>
>
> in LUCENE-3175, the tests were sped up a lot by using reasonable number of 
> iterations normally, but cranking up for NIGHTLY.
> we also do crazy things more 'rarely' for normal builds (e.g. simpletext, 
> payloads, crazy merge params, etc)
> also, we found some bugs by doing this, because in general our parameters are 
> too fixed.
> however, it made the code look messy... I propose some new methods:
> instead of some crazy code in your test like:
> {code}
> int numdocs = (TEST_NIGHTLY ? 1000 : 100) * RANDOM_MULTIPLIER;
> {code}
> you use:
> {code}
> int numdocs = atLeast(100);
> {code}
> this will apply the multiplier, also factor in nightly, and finally add some 
> random fudge... so e.g. in local runs its sometimes 127 docs, sometimes 113 
> docs, etc.
> additionally instead of code like:
> {code}
> if ((TEST_NIGHTLY && random.nextBoolean()) || (random.nextInt(20) == 17)) {
> {code}
> you do
> {code}
> if (rarely()) {
> {code}
> which applies NIGHTLY and also the multiplier (logarithmic growth).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3237) FSDirectory.fsync() may not work properly

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3237:
---

Fix Version/s: (was: 3.4)
   3.5

> FSDirectory.fsync() may not work properly
> -
>
> Key: LUCENE-3237
> URL: https://issues.apache.org/jira/browse/LUCENE-3237
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/store
>Reporter: Shai Erera
> Fix For: 3.5, 4.0
>
>
> Spinoff from LUCENE-3230. FSDirectory.fsync() opens a new RAF, sync() its 
> FileDescriptor and closes RAF. It is not clear that this syncs whatever was 
> written to the file by other FileDescriptors. It would be better if we do 
> this operation on the actual RAF/FileOS which wrote the data. We can add 
> sync() to IndexOutput and FSIndexOutput will do that.
> Directory-wise, we should stop syncing on file names, and instead sync on the 
> IOs that performed the write operations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3269) Speed up Top-K sampling tests

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3269:
---

Fix Version/s: (was: 3.4)
   3.5

> Speed up Top-K sampling tests
> -
>
> Key: LUCENE-3269
> URL: https://issues.apache.org/jira/browse/LUCENE-3269
> Project: Lucene - Java
>  Issue Type: Test
>  Components: modules/facet
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3269.patch, LUCENE-3269.patch, LUCENE-3269.patch
>
>
> speed up the top-k sampling tests (but make sure they are thorough on nightly 
> etc still)
> usually we would do this with use of atLeast(), but these tests are somewhat 
> tricky,
> so maybe a different approach is needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3206) FST package API refactoring

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3206:
---

Fix Version/s: (was: 3.4)
   3.5

> FST package API refactoring
> ---
>
> Key: LUCENE-3206
> URL: https://issues.apache.org/jira/browse/LUCENE-3206
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Affects Versions: 3.2
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3206.patch
>
>
> The current API is still marked @experimental, so I think there's still time 
> to fiddle with it. I've been using the current API for some time and I do 
> have some ideas for improvement. This is a placeholder for these -- I'll post 
> a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3339) TestNRTThreads hangs in nightly 3.x builds

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3339:
---

Fix Version/s: (was: 3.4)
   3.5

> TestNRTThreads hangs in nightly 3.x builds
> --
>
> Key: LUCENE-3339
> URL: https://issues.apache.org/jira/browse/LUCENE-3339
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Michael McCandless
> Fix For: 3.5
>
> Attachments: LUCENE-3339.patch
>
>
> Maybe we have a problem, maybe its a bug in the test.
> But its strange that lately the 3.x nightlies have been hanging here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3205) remove MultiTermQuery get/inc/clear totalNumberOfTerms

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3205:
---

Fix Version/s: (was: 3.4)
   3.5

> remove MultiTermQuery get/inc/clear totalNumberOfTerms
> --
>
> Key: LUCENE-3205
> URL: https://issues.apache.org/jira/browse/LUCENE-3205
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Uwe Schindler
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3205.patch
>
>
> This method is not correct if the index has more than one segment.
> Its also not thread safe, and it means calling query.rewrite() modifies
> the original query. 
> All of these things add up to confusion, I think we should remove this 
> from multitermquery, the only thing that "uses" it is the NRQ tests, which 
> conditionalizes all the asserts anyway.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3175) speed up core tests

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3175:
---

Fix Version/s: (was: 3.4)
   3.5

> speed up core tests
> ---
>
> Key: LUCENE-3175
> URL: https://issues.apache.org/jira/browse/LUCENE-3175
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3175.patch, LUCENE-3175.patch, LUCENE-3175.patch, 
> LUCENE-3175_2.patch, test-core_core_2_duo_2-53GHZ.rtf, 
> test-core_core_2_duo_2-53GHZ.rtf
>
>
> Our core tests have gotten slower and slower, if you don't have a really fast 
> computer its probably frustrating.
> I think we should:
> 1. still have random parameters, but make the 'obscene' settings like 
> SimpleText rarer... we can always make them happen more on NIGHTLY
> 2. tests that make a lot of documents can conditionalize on NIGHTLY so that 
> they are still doing a reasonable test on ordinary runs e.g. numdocs = 
> (NIGHTLY ? 1 : 1000) * multiplier
> 3. refactor some of the slow huge classes with lots of tests like 
> TestIW/TestIR, at least pull out really slow methods like TestIR.testDiskFull 
> into its own class. this gives better parallelization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3201:
---

Fix Version/s: (was: 3.4)
   3.5

> improved compound file handling
> ---
>
> Key: LUCENE-3201
> URL: https://issues.apache.org/jira/browse/LUCENE-3201
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
>Assignee: Simon Willnauer
>Priority: Blocker
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
> return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3150) Wherever we catch & suppress Throwable we should not suppress ThreadInterruptedException

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3150:
---

Fix Version/s: (was: 3.4)
   3.5

> Wherever we catch & suppress Throwable we should not suppress 
> ThreadInterruptedException
> 
>
> Key: LUCENE-3150
> URL: https://issues.apache.org/jira/browse/LUCENE-3150
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> In various places we catch Throwable and suppress it, usually in exception 
> handlers where we want to just throw the first exc we had hit.
> But this is dangerous for a thread interrupt since it means we can swallow & 
> ignore the interrupt.
> We should at least catch the interrupt & restore the interrupt bit, if we 
> can't rethrow it.
> One example is in SegmentInfos where we write the segments.gen file... there 
> are many other examples in SegmentInfos too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3161) consider warnings from the source compilation

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3161:
---

Fix Version/s: (was: 3.4)
   3.5

> consider warnings from the source compilation
> -
>
> Key: LUCENE-3161
> URL: https://issues.apache.org/jira/browse/LUCENE-3161
> Project: Lucene - Java
>  Issue Type: Task
>  Components: general/build
>Reporter: Robert Muir
>  Labels: maybe32blocker
> Fix For: 3.5, 4.0
>
>
> as Doron mentioned in his review: At compiling there are various warning 
> printed, I think it would be more assuring for downloaders if the build runs 
> without warning. These warnings are not a stopper.
> we could conditionalize these warnings so that they don't "display" when 
> compiling from actual releases, but I have to wonder if we should hide 
> these... being open source I think we should display all our warts, maybe 
> some contributor sees these warnings and decides they want to submit a patch 
> to fix some of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3133) Fix QueryParser to handle nested fields

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3133:
---

Fix Version/s: (was: 3.4)
   3.5

> Fix QueryParser to handle nested fields
> ---
>
> Key: LUCENE-3133
> URL: https://issues.apache.org/jira/browse/LUCENE-3133
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> Once we commit LUCENE-2454, we need to make it easy for apps to enable this 
> with QueryParser.
> It seems like it's a "schema" like behavior, ie we need to be able to express 
> the join structure of the related fields.
> And then whenever QP produces a query that spans fields requiring a join, the 
> NestedDocumentQuery is used to wrap the child fields?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3122) Cascaded grouping

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3122:
---

Fix Version/s: (was: 3.4)
   3.5

> Cascaded grouping
> -
>
> Key: LUCENE-3122
> URL: https://issues.apache.org/jira/browse/LUCENE-3122
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> Similar to SOLR-2526, in that you are grouping on 2 separate fields, but 
> instead of treating those fields as a single grouping by a compound key, this 
> change would let you first group on key1 for the primary groups and then 
> secondarily on key2 within the primary groups.
> Ie, the result you get back would have groups A, B, C (grouped by key1) but 
> then the documents within group A would be grouped by key 2.
> I think this will be important for apps whose documents are the product of 
> denormalizing, ie where the Lucene document is really a sub-document of a 
> different identifier field.  Borrowing an example from LUCENE-3097, you have 
> doctors but each doctor may have multiple offices (addresses) where they 
> practice and so you index doctor X address as your lucene documents.  In this 
> case, your "identifier" field (that which "counts" for facets, and should be 
> "grouped" for presentation) is doctorid.  When you offer users search over 
> this index, you'd likely want to 1) group by distance (ie, < 0.1 miles, < 0.2 
> miles, etc., as a function query), but 2) also group by doctorid, ie cascaded 
> grouping.
> I suspect this would be easier to implement than it sounds: the per-group 
> collector used by the 2nd pass grouping collector for key1's grouping just 
> needs to be another grouping collector.  Spookily, though, that collection 
> would also have to be 2-pass, so it could get tricky since grouping is sort 
> of recursing on itself once we have LUCENE-3112, though, that should 
> enable efficient single pass grouping by the identifier (doctorid).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3145) FST APIs should support CharsRef too

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3145:
---

Fix Version/s: (was: 3.4)
   3.5

> FST APIs should support CharsRef too
> 
>
> Key: LUCENE-3145
> URL: https://issues.apache.org/jira/browse/LUCENE-3145
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> The Builder API at heart is IntsRef, but we have sugar to pass in BytesRef, 
> CharSequence, etc.  We should add CharsRef too.
> Likewise we have IntsRefFSTEnum, BytesRefFSTEnum; we should add CharsRef 
> there.
> Finally the static Util methods should accept CharsRef.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3004) Define Test Plan for 3.2

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3004:
---

Fix Version/s: (was: 3.4)
   3.5

> Define Test Plan for 3.2
> 
>
> Key: LUCENE-3004
> URL: https://issues.apache.org/jira/browse/LUCENE-3004
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Grant Ingersoll
>Priority: Blocker
> Fix For: 3.5, 4.0
>
>
> Before we can release, we need a test plan that defines what a successful 
> release candidate must do to be accepted.
> Test plan should be written at http://wiki.apache.org/lucene-java/TestPlans
> See 
> http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3120:
---

Fix Version/s: (was: 3.4)
   3.5

> span query matches too many docs when two query terms are the same unless 
> inOrder=true
> --
>
> Key: LUCENE-3120
> URL: https://issues.apache.org/jira/browse/LUCENE-3120
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3120.patch, LUCENE-3120.patch
>
>
> spinoff of user list discussion - [SpanNearQuery - inOrder 
> parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
> With 3 documents:
> *  "a b x c d"
> *  "a b b d"
> *  "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1)  in-order, slop=0, "b", "x", "b"
> * (1)  in-order, slop=0, "b", "b"
> * (2)  in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1)  any-order, slop=0, "b", "x", "b"
> * (1)  any-order, slop=1, "b", "x", "b"
> * (1)  any-order, slop=2, "b", "x", "b"
> * (1)  any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1)  any-order, slop=0, "b", "b"
> * (2)  any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no 
> in-order indication in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span 
> queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
> so we might decide to close this bug after all, but I would like to at least 
> have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2822) TimeLimitingCollector starts thread in static {} with no way to stop them

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2822:
---

Fix Version/s: (was: 3.4)
   3.5

> TimeLimitingCollector starts thread in static {} with no way to stop them
> -
>
> Key: LUCENE-2822
> URL: https://issues.apache.org/jira/browse/LUCENE-2822
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
>
> See the comment in LuceneTestCase.
> If you even do Class.forName("TimeLimitingCollector") it starts up a thread 
> in a static method, and there isn't a way to kill it.
> This is broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3071:
---

Fix Version/s: (was: 3.4)
   3.5

> PathHierarchyTokenizer adaptation for urls: splits reversed
> ---
>
> Key: LUCENE-3071
> URL: https://issues.apache.org/jira/browse/LUCENE-3071
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Olivier Favre
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3071.patch, LUCENE-3071.patch, LUCENE-3071.patch, 
> LUCENE-3071.patch, ant.log.tar.bz2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> {{PathHierarchyTokenizer}} should be usable to split urls the a "reversed" 
> way (useful for faceted search against urls):
> {{www.site.com}} -> {{www.site.com, site.com, com}}
> Moreover, it should be able to skip a given number of first (or last, if 
> reversed) tokens:
> {{/usr/share/doc/somesoftware/INTERESTING/PART}}
> Should give with 4 tokens skipped:
> {{INTERESTING}}
> {{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3116) pendingCommit in IndexWriter is not thoroughly tested

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3116:
---

Fix Version/s: (was: 3.4)
   3.5

> pendingCommit in IndexWriter is not thoroughly tested
> -
>
> Key: LUCENE-3116
> URL: https://issues.apache.org/jira/browse/LUCENE-3116
> Project: Lucene - Java
>  Issue Type: Test
>  Components: core/index
>Affects Versions: 3.2, 4.0
>Reporter: Uwe Schindler
> Fix For: 3.5, 4.0
>
>
> When working on LUCENE-3084, I had a copy-paste error in my patch (see 
> revision 1124307 and corrected in 1124316), I replaced pendingCommit by 
> segmentInfos in IndexWriter, corrected by the following patch:
> {noformat}
> --- lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
> (original)
> +++ lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/IndexWriter.java 
> Wed May 18 16:16:29 2011
> @@ -2552,7 +2552,7 @@ public class IndexWriter implements Clos
>  lastCommitChangeCount = pendingCommitChangeCount;
>  segmentInfos.updateGeneration(pendingCommit);
>  segmentInfos.setUserData(pendingCommit.getUserData());
> -rollbackSegments = segmentInfos.createBackupSegmentInfos(true);
> +rollbackSegments = pendingCommit.createBackupSegmentInfos(true);
>  deleter.checkpoint(pendingCommit, true);
>} finally {
>  // Matches the incRef done in startCommit:
> {noformat}
> This did not cause any test failure.
> On IRC, Mike said:
> {quote}
> [19:21]   mikemccand: ThetaPh1: hmm
> [19:21]   mikemccand: well
> [19:22]   mikemccand: pendingCommit and sis only differ while commit() is 
> running
> [19:22]   mikemccand: ie if a thread starts commit
> [19:22]   mikemccand: but fsync is taking a long time
> [19:22]   mikemccand: and another thread makes a change to sis
> [19:22]   ThetaPh1: ok so hard to find that bug
> [19:22]   mikemccand: we need our mock dir wrapper to sometimes take a 
> long time syncing
> {quote}
> Maybe we need such a test, I feel bad when such stupid changes don't make any 
> test fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3138) IW.addIndexes should fail fast if an index is too old/new

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3138:
---

Fix Version/s: (was: 3.4)
   3.5

> IW.addIndexes should fail fast if an index is too old/new
> -
>
> Key: LUCENE-3138
> URL: https://issues.apache.org/jira/browse/LUCENE-3138
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> Today IW.addIndexes (both Dir and IR versions) do not check the format of the 
> incoming indexes. Therefore it could add a too old/new index and the app will 
> discover that only later, maybe during commit() or segment merges. We should 
> check that up front and fail fast.
> This issue is relevant only to 4.0 at the moment, which will not support 2.x 
> indexes anymore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3097) Post grouping faceting

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3097:
---

Fix Version/s: (was: 3.4)

> Post grouping faceting
> --
>
> Key: LUCENE-3097
> URL: https://issues.apache.org/jira/browse/LUCENE-3097
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/grouping
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-3097.patch, 
> LUCENE-3097.patch, LUCENE-3097.patch, LUCENE-30971.patch
>
>
> This issues focuses on implementing post grouping faceting.
> * How to handle multivalued fields. What field value to show with the facet.
> * Where the facet counts should be based on
> ** Facet counts can be based on the normal documents. Ungrouped counts. 
> ** Facet counts can be based on the groups. Grouped counts.
> ** Facet counts can be based on the combination of group value and facet 
> value. Matrix counts.   
> And properly more implementation options.
> The first two methods are implemented in the SOLR-236 patch. For the first 
> option it calculates a DocSet based on the individual documents from the 
> query result. For the second option it calculates a DocSet for all the most 
> relevant documents of a group. Once the DocSet is computed the FacetComponent 
> and StatsComponent use one the DocSet to create facets and statistics.  
> This last one is a bit more complex. I think it is best explained with an 
> example. Lets say we search on travel offers:
> |||hotel||departure_airport||duration||
> |Hotel a|AMS|5
> |Hotel a|DUS|10
> |Hotel b|AMS|5
> |Hotel b|AMS|10
> If we group by hotel and have a facet for airport. Most end users expect 
> (according to my experience off course) the following airport facet:
> AMS: 2
> DUS: 1
> The above result can't be achieved by the first two methods. You either get 
> counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3003) Move UnInvertedField into Lucene core

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3003:
---

Fix Version/s: (was: 3.4)
   3.5

> Move UnInvertedField into Lucene core
> -
>
> Key: LUCENE-3003
> URL: https://issues.apache.org/jira/browse/LUCENE-3003
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3003.patch, LUCENE-3003.patch, 
> byte_size_32-bit-openjdk6.txt
>
>
> Solr's UnInvertedField lets you quickly lookup all terms ords for a
> given doc/field.
> Like, FieldCache, it inverts the index to produce this, and creates a
> RAM-resident data structure holding the bits; but, unlike FieldCache,
> it can handle multiple values per doc, and, it does not hold the term
> bytes in RAM.  Rather, it holds only term ords, and then uses
> TermsEnum to resolve ord -> term.
> This is great eg for faceting, where you want to use int ords for all
> of your counting, and then only at the end you need to resolve the
> "top N" ords to their text.
> I think this is a useful core functionality, and we should move most
> of it into Lucene's core.  It's a good complement to FieldCache.  For
> this first baby step, I just move it into core and refactor Solr's
> usage of it.
> After this, as separate issues, I think there are some things we could
> explore/improve:
>   * The first-pass that allocates lots of tiny byte[] looks like it
> could be inefficient.  Maybe we could use the byte slices from the
> indexer for this...
>   * We can improve the RAM efficiency of the TermIndex: if the codec
> supports ords, and we are operating on one segment, we should just
> use it.  If not, we can use a more RAM-efficient data structure,
> eg an FST mapping to the ord.
>   * We may be able to improve on the main byte[] representation by
> using packed ints instead of delta-vInt?
>   * Eventually we should fold this ability into docvalues, ie we'd
> write the byte[] image at indexing time, and then loading would be
> fast, instead of uninverting

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3019) FVH: uncontrollable color tags

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3019:
---

Fix Version/s: (was: 3.4)
   3.5

> FVH: uncontrollable color tags
> --
>
> Key: LUCENE-3019
> URL: https://issues.apache.org/jira/browse/LUCENE-3019
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9.4, 3.0.3, 3.1, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3019.patch
>
>
> The multi-colored tags is a feature of FVH. But it is uncontrollable (or more 
> precisely, unexpected by users) that which color is used for each terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3022:
---

Fix Version/s: (was: 3.4)
   3.5

> DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
> -
>
> Key: LUCENE-3022
> URL: https://issues.apache.org/jira/browse/LUCENE-3022
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 2.9.4, 3.1
>Reporter: Johann Höchtl
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3022.patch, LUCENE-3022.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
> got a strange behaviour:
> The german word "streifenbluse" (blouse with stripes) was decompounded to 
> "streifen" (stripe),"reifen"(tire) which makes no sense at all.
> I thought the flag onlyLongestMatch would fix this, because "streifen" is 
> longer than "reifen", but it had no effect.
> So I reviewed the sourcecode and found the problem:
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.length() < this.minWordSize) {
>   return;
> }
> 
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
> 
> for (int i=0;i Token longestMatchToken=null;
> for (int j=this.minSubwordSize-1;j if(i+j>token.length()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.length()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> } 
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
>   tokens.add(longestMatchToken);
> }
> }
>   }
> [/code]
> should be changed to 
> [code]
> protected void decomposeInternal(final Token token) {
> // Only words longer than minWordSize get processed
> if (token.termLength() < this.minWordSize) {
>   return;
> }
> char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
> Token longestMatchToken=null;
> for (int i=0;i for (int j=this.minSubwordSize-1;j if(i+j>token.termLength()) {
> break;
> }
> if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
> if (this.onlyLongestMatch) {
>if (longestMatchToken!=null) {
>  if (longestMatchToken.termLength()longestMatchToken=createToken(i,j,token);
>  }
>} else {
>  longestMatchToken=createToken(i,j,token);
>}
> } else {
>tokens.add(createToken(i,j,token));
> }
> }
> }
> }
> if (this.onlyLongestMatch && longestMatchToken!=null) {
> tokens.add(longestMatchToken);
> }
>   }
> [/code]
> So, that only the longest token is really indexed and the onlyLongestMatch 
> Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2906) Filter to process output of ICUTokenizer and create overlapping bigrams for CJK

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2906:
---

Fix Version/s: (was: 3.4)
   3.5

> Filter to process output of ICUTokenizer and create overlapping bigrams for 
> CJK 
> 
>
> Key: LUCENE-2906
> URL: https://issues.apache.org/jira/browse/LUCENE-2906
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Tom Burton-West
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2906.patch
>
>
> The ICUTokenizer produces unigrams for CJK. We would like to use the 
> ICUTokenizer but have overlapping bigrams created for CJK as in the CJK 
> Analyzer.  This filter would take the output of the ICUtokenizer, read the 
> ScriptAttribute and for selected scripts (Han, Kana), would produce 
> overlapping bigrams.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2971) Auto Generate our LICENSE.txt and NOTICE.txt files

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2971:
---

Fix Version/s: (was: 3.4)
   3.5

> Auto Generate our LICENSE.txt and NOTICE.txt files
> --
>
> Key: LUCENE-2971
> URL: https://issues.apache.org/jira/browse/LUCENE-2971
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 3.5, 4.0
>
>
> Once LUCENE-2952 is in place, we should be able to automatically generate 
> Lucene and Solr's LICENSE.txt and NOTICE.txt file (without massive 
> duplication)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2974) the hudson nightly for lucene should check out lucene by itself

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2974:
---

Fix Version/s: (was: 3.4)
   3.5

> the hudson nightly for lucene should check out lucene by itself
> ---
>
> Key: LUCENE-2974
> URL: https://issues.apache.org/jira/browse/LUCENE-2974
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: general/build
>Reporter: Robert Muir
> Fix For: 3.5, 4.0
>
>
> Currently its too easy to break the lucene-only packaging and build.
> the hudson job for lucene should check out lucene by itself, this will
> prevent it from being broken.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2949:
---

Fix Version/s: (was: 3.4)
   3.5

> FastVectorHighlighter FieldTermStack could likely benefit from using 
> TermVectorMapper
> -
>
> Key: LUCENE-2949
> URL: https://issues.apache.org/jira/browse/LUCENE-2949
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 4.0
>Reporter: Grant Ingersoll
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter, Highlighter
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-2949.patch
>
>
> Based on my reading of the FieldTermStack constructor that loads the vector 
> from disk, we could probably save a bunch of time and memory by using the 
> TermVectorMapper callback mechanism instead of materializing the full array 
> of terms into memory and then throwing most of them out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2921) Now that we track the code version at the segment level, we can stop tracking it also in each file level

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2921:
---

Fix Version/s: (was: 3.4)
   3.5

> Now that we track the code version at the segment level, we can stop tracking 
> it also in each file level
> 
>
> Key: LUCENE-2921
> URL: https://issues.apache.org/jira/browse/LUCENE-2921
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
> Fix For: 3.5, 4.0
>
>
> Now that we track the code version that created the segment at the segment 
> level, we can stop tracking versions in each file. This has several major 
> benefits:
> # Today the constant names that use to track versions are confusing - they do 
> not state since which version it applies to, and so it's harder to determine 
> which formats we can stop supporting when working on the next major release.
> # Those format numbers are usually negative, but in some cases positive 
> (inconsistency) -- we need to remember to increase it "one down" for the 
> negative ones, which I always find confusing.
> # It will remove the format tracking from all the *Writers, and the *Reader 
> will receive the code format (String) and work w/ the appropriate constant 
> (e.g. Constants.LUCENE_30). Centralizing version tracking to SegmentInfo is 
> an advantage IMO.
> It's not urgent that we do it for 3.1 (though it requires an index format 
> change), because starting from 3.1 all segments track their version number 
> anyway (or migrated to track it), so we can safely release it in follow-on 3x 
> release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >