date:20120129

[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported

2012-01-29 Thread Tommaso Teofili (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195719#comment-13195719
 ] 

Tommaso Teofili commented on SOLR-3049:
---

Good catch, if you could provide that patch I will take care of review and 
commit it if that is ok.


 UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types 
 supported
 -

 Key: SOLR-3049
 URL: https://issues.apache.org/jira/browse/SOLR-3049
 Project: Solr
  Issue Type: Bug
  Components: update
Reporter: Harsh P
Priority: Minor
  Labels: uima, update_request_handler

 solrconfig.xml file has an option to override certain UIMA runtime
 parameters in the UpdateRequestProcessorChain section.
 There are certain UIMA annotators like RegexAnnotator which define
 runtimeParameters value as an Array which is not currently supported
 in the Solr-UIMA interface.
 In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java,
 private Object getRuntimeValue(AnalysisEngineDescription desc, String
 attributeName) function defines override for UIMA analysis engine
 runtimeParameters as they are passed to UIMA Analysis Engine.
 runtimeParameters which are currently supported in the Solr-UIMA interface 
 are:
  String
  Integer
  Boolean
  Float
 I have made a hack to fix this issue to add Array support. I would
 like to submit that as a patch if no one else is working on fixing
 this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-3.x - Build # 584 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-3.x/584/

1 tests failed.
REGRESSION:  org.apache.solr.search.TestSort.testRandomFieldNameSorts

Error Message:
Over 0.2% oddities in test: 12/5900 have func/query parsing semenatics gotten 
broader?

Stack Trace:
junit.framework.AssertionFailedError: Over 0.2% oddities in test: 12/5900 have 
func/query parsing semenatics gotten broader?
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.solr.search.TestSort.__CLR2_6_3ug45jo16xn(TestSort.java:140)
at 
org.apache.solr.search.TestSort.testRandomFieldNameSorts(TestSort.java:62)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




Build Log (for compile errors):
[...truncated 30216 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Welcome Tommaso Teofili as Lucene/Solr committer

2012-01-29 Thread Uwe Schindler

Hi Tommaso,

 

Changing the HTML on the Apache Server is not enough, you should change the
XML files and then run forrest (have fun!) locally on your computer and then
svn export the files to the people.apache.org (the cronjob is as far as I
know not running anymore):

 

 http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite
http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite

 

The same applies to Solr, where the webpage is in the trunk/3.x Solr
checkout directly. The root webpage for both projects currently only
contains the PMC, no change needed.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Sunday, January 29, 2012 6:05 AM
To: dev@lucene.apache.org
Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

 

Thank you all guys for this warm welcome.

Cheers,

Tommaso

 

p.s.

I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't
have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext
for the (yet unreleased) CMS based Lucene website

 

2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com

Welcome!

 

On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote:

Welcome !

Shai

 

On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Welcome Tommaso!

 





 

-- 
Met vriendelijke groet,

Martijn van Groningen

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1641 - Still Failing

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1641/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:24750/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:24750/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 10232 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Tommaso Teofili as Lucene/Solr committer

2012-01-29 Thread Tommaso Teofili

Hi Uwe,

I already followed the instructions on that wiki page, changing the XML
file, running 'forrest run/site' and then committing both XML and generated
HTML files (see [1][2]).
Just I didn't know the cronjob was not running anymore, however I can see
the page published as if the cronjob ran regularly.
Maybe I missed something, if that's the case please let me know.

Tommaso

p.s.:
I just realized I didn't commit the updated whoweare.pdf, will do that
shortly.

[1] :
http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml
[2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html



2012/1/29 Uwe Schindler u...@thetaphi.de

 Hi Tommaso,

 ** **

 Changing the HTML on the Apache Server is not enough, you should change
 the XML files and then run forrest (have fun!) locally on your computer and
 then svn export the files to the people.apache.org (the cronjob is as far
 as I know not running anymore):

 ** **

 http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite

 ** **

 The same applies to Solr, where the webpage is in the trunk/3.x Solr
 checkout directly. The root webpage for both projects currently only
 contains the PMC, no change needed.

 ** **

 Uwe

 ** **

 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de

 ** **

 *From:* Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 *Sent:* Sunday, January 29, 2012 6:05 AM
 *To:* dev@lucene.apache.org

 *Subject:* Re: Welcome Tommaso Teofili as Lucene/Solr committer

 ** **

 Thank you all guys for this warm welcome.

 Cheers,

 Tommaso

 ** **

 p.s.

 I did update http://lucene.apache.org/java/docs/whoweare.html but I
 didn't have the permission to change the
 /lucene/cms/trunk/content/whoweare.mdtext for the (yet unreleased) CMS
 based Lucene website

 ** **

 2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com

 Welcome!

 ** **

 On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote:

 Welcome !

 Shai

 ** **

 On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless 
 luc...@mikemccandless.com wrote:

 Welcome Tommaso!

 ** **



 

 ** **

 --
 Met vriendelijke groet,

 Martijn van Groningen

 ** **

RE: Welcome Tommaso Teofili as Lucene/Solr committer

2012-01-29 Thread Uwe Schindler

Hi,

 

Ok, then the cronjob seems to work again! I just missed your commits
somehow, for me it looked like you did the change directly on apache's
servers. The PDF is also committed since a few minutes, thanks.

 

You should also add yourself to Solr, that's in the source checkout, same
procedure. In that case you should do the change on trunk and merge also
back to 3.x branch. Solr is unfortunately having the web page in its
versioned source checkout, so you have to keep all in sync.

 

About the new CMS: I have no idea about the status, maybe it's currently not
yet made available to all committers. Grant?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Sunday, January 29, 2012 11:17 AM
To: dev@lucene.apache.org
Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

 

Hi Uwe,

 

I already followed the instructions on that wiki page, changing the XML
file, running 'forrest run/site' and then committing both XML and generated
HTML files (see [1][2]).

Just I didn't know the cronjob was not running anymore, however I can see
the page published as if the cronjob ran regularly.

Maybe I missed something, if that's the case please let me know.

 

Tommaso

 

p.s.:

I just realized I didn't commit the updated whoweare.pdf, will do that
shortly.

 

[1] :
http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/x
docs/whoweare.xml

[2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html

 

 

 

2012/1/29 Uwe Schindler u...@thetaphi.de

Hi Tommaso,

 

Changing the HTML on the Apache Server is not enough, you should change the
XML files and then run forrest (have fun!) locally on your computer and then
svn export the files to the people.apache.org (the cronjob is as far as I
know not running anymore):

 

 http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite
http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite

 

The same applies to Solr, where the webpage is in the trunk/3.x Solr
checkout directly. The root webpage for both projects currently only
contains the PMC, no change needed.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Sent: Sunday, January 29, 2012 6:05 AM
To: dev@lucene.apache.org


Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

 

Thank you all guys for this warm welcome.

Cheers,

Tommaso

 

p.s.

I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't
have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext
for the (yet unreleased) CMS based Lucene website

 

2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com

Welcome!

 

On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote:

Welcome !

Shai

 

On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Welcome Tommaso!

 





 

-- 
Met vriendelijke groet,

Martijn van Groningen

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12296 - Still Failing

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12296/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:20690/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:20690/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 8721 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-trunk - Build # 1748 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Solr-trunk/1748/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:33651/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:33651/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 9929 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Tommaso Teofili as Lucene/Solr committer

2012-01-29 Thread Simon Willnauer

On Sun, Jan 29, 2012 at 11:26 AM, Uwe Schindler u...@thetaphi.de wrote:
Hi,

Ok, then the cronjob seems to work again! I just missed your commits
somehow, for me it looked like you did the change directly on apache’s
servers. The PDF is also committed since a few minutes, thanks.

You should also add yourself to Solr, that’s in the source checkout, same
procedure. In that case you should do the change on trunk and merge also
back to 3.x branch. Solr is unfortunately having the web page in its
versioned source checkout, so you have to keep all in sync.

About the new CMS: I have no idea about the status, maybe it’s currently not
yet made available to all committers. Grant?

/lucene/cms was pmc only in the auth file. I opened that path for all
lucene/solr committers. Tommaso you should be able to change that
patch now.

simon

Uwe

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: u...@thetaphi.de

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Sunday, January 29, 2012 11:17 AM

To: dev@lucene.apache.org
Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

Hi Uwe,

I already followed the instructions on that wiki page, changing the XML
file, running 'forrest run/site' and then committing both XML and generated
HTML files (see [1][2]).

Just I didn't know the cronjob was not running anymore, however I can see
the page published as if the cronjob ran regularly.

Maybe I missed something, if that's the case please let me know.

Tommaso

p.s.:

I just realized I didn't commit the updated whoweare.pdf, will do that
shortly.

[1] :
http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml

[2] : http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html

2012/1/29 Uwe Schindler u...@thetaphi.de

Hi Tommaso,

Changing the HTML on the Apache Server is not enough, you should change the
XML files and then run forrest (have fun!) locally on your computer and then
svn export the files to the people.apache.org (the cronjob is as far as I
know not running anymore):

http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite

The same applies to Solr, where the webpage is in the trunk/3.x Solr
checkout directly. The root webpage for both projects currently only
contains the PMC, no change needed.

Uwe

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: u...@thetaphi.de

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Sunday, January 29, 2012 6:05 AM
To: dev@lucene.apache.org

Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

Thank you all guys for this warm welcome.

Cheers,

Tommaso

p.s.

I did update http://lucene.apache.org/java/docs/whoweare.html but I didn't
have the permission to change the /lucene/cms/trunk/content/whoweare.mdtext
for the (yet unreleased) CMS based Lucene website

2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com

Welcome!

On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote:

Welcome !

Shai

On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Welcome Tommaso!

--
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Tommaso Teofili as Lucene/Solr committer

2012-01-29 Thread Tommaso Teofili

Uwe, Simon,
thanks, I'll add myself to Solr and CMS websites as well.
Tommaso

2012/1/29 Simon Willnauer simon.willna...@googlemail.com

On Sun, Jan 29, 2012 at 11:26 AM, Uwe Schindler u...@thetaphi.de wrote:
Hi,

About the new CMS: I have no idea about the status, maybe it’s currently
not
yet made available to all committers. Grant?

/lucene/cms was pmc only in the auth file. I opened that path for all
lucene/solr committers. Tommaso you should be able to change that
patch now.

simon

Uwe

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: u...@thetaphi.de

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Sunday, January 29, 2012 11:17 AM

To: dev@lucene.apache.org
Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

Hi Uwe,

I already followed the instructions on that wiki page, changing the XML
file, running 'forrest run/site' and then committing both XML and
generated
HTML files (see [1][2]).

Just I didn't know the cronjob was not running anymore, however I can see
the page published as if the cronjob ran regularly.

Maybe I missed something, if that's the case please let me know.

Tommaso

p.s.:

I just realized I didn't commit the updated whoweare.pdf, will do that
shortly.

[1] :

http://svn.apache.org/repos/asf/lucene/java/site/src/documentation/content/xdocs/whoweare.xml

[2] :
http://svn.apache.org/repos/asf/lucene/java/site/docs/whoweare.html

2012/1/29 Uwe Schindler u...@thetaphi.de

Hi Tommaso,

Changing the HTML on the Apache Server is not enough, you should change
the
XML files and then run forrest (have fun!) locally on your computer and
then
svn export the files to the people.apache.org (the cronjob is as far as
I
know not running anymore):

http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite

The same applies to Solr, where the webpage is in the trunk/3.x Solr
checkout directly. The root webpage for both projects currently only
contains the PMC, no change needed.

Uwe

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: u...@thetaphi.de

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Sunday, January 29, 2012 6:05 AM
To: dev@lucene.apache.org

Subject: Re: Welcome Tommaso Teofili as Lucene/Solr committer

Thank you all guys for this warm welcome.

Cheers,

Tommaso

p.s.

I did update http://lucene.apache.org/java/docs/whoweare.html but I
didn't
have the permission to change the
/lucene/cms/trunk/content/whoweare.mdtext
for the (yet unreleased) CMS based Lucene website

2012/1/28 Martijn v Groningen martijn.v.gronin...@gmail.com

Welcome!

On 27 January 2012 05:49, Shai Erera ser...@gmail.com wrote:

Welcome !

Shai

On Fri, Jan 27, 2012 at 12:42 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Welcome Tommaso!

--
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2358) Distributing Indexing

2012-01-29 Thread Robert Muir (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195732#comment-13195732
]

Robert Muir commented on SOLR-2358:
---

{quote}
I can't currently get into the hudson machine - used the wrong username the
other day and seemed to get ip banned pretty much right away. Looking into
getting that undone.
{quote}

Yeah thats probably the best way to move forward. Otherwise you have to wait
like an hour just to see if one tweak to a single test worked.

{quote}
Which tricks? This could be part of it by the sound of things.
{quote}

It depends on what the test is doing, but just a few ideas:
* any client operations in tests should have a low connect()timeout/so_timeout.
if you always set this then it will never hang for long periods of time.
* if you absolutely need to test the case where you don't get a timeout but
another exception,
use an ipv6 test address (eg [ff01::114]). because jenkins has no ipv6, it
fails fast always. this won't work forever...
* in a situation where you have A talking to B, and you want to test a
condition where B goes down,
instead of just bringing B down, instead you can consider mocking up a remote
node to test failures.
bring up a mock downed server (e.g. just a ServerSocket on that same port
with reuseAddress=true).
this one can return whatever error you want, or just disconnect, and even
assert that A tried to
connect to it. maybe instead of using real remote jettys at all, most tests
could even be totally
implemented this way: it would be faster and simpler than spinning up so many
jettys in all the tests.

Distributing Indexing
-

Key: SOLR-2358
URL: https://issues.apache.org/jira/browse/SOLR-2358
Project: Solr
Issue Type: New Feature
Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
Fix For: 4.0

Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch,
apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar

The indexing side of SolrCloud - the goal of this issue is to provide
durable, fault tolerant indexing to an elastic cluster of Solr instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3661) move deletes under codec

2012-01-29 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195733#comment-13195733
 ] 

Robert Muir commented on LUCENE-3661:
-

Merging this to trunk now...

We can use LUCENE-3613 issue for any remaining splitting of 4.x/3.x codec impls 
(stored fields, deletes).


 move deletes under codec
 

 Key: LUCENE-3661
 URL: https://issues.apache.org/jira/browse/LUCENE-3661
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3661.patch


 After LUCENE-3631, this should be easier I think.
 I haven't looked at it much myself but i'll play around a bit, but at a 
 glance:
 * SegmentReader to have Bits liveDocs instead of BitVector
 * address the TODO in the IW-using ctors so that SegmentReader doesn't take a 
 parent but just an existing core.
 * we need some type of minimal MutableBits or similar subinterface of bits. 
 BitVector and maybe Fixed/OpenBitSet could implement it
 * BitVector becomes an impl detail and moves to codec (maybe we have a shared 
 base class and split the 3.x/4.x up rather than the conditional backwards)
 * I think the invertAll should not be used by IndexWriter, instead we define 
 the codec interface to say give me a new MutableBits, by default all are 
 set ?
 * redundant internally-consistent checks in checkLiveCounts should be done in 
 the codec impl instead of in SegmentReader.
 * plain text impl in SimpleText.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3661) move deletes under codec

2012-01-29 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3661.
-

   Resolution: Fixed
Fix Version/s: 4.0

 move deletes under codec
 

 Key: LUCENE-3661
 URL: https://issues.apache.org/jira/browse/LUCENE-3661
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3661.patch


 After LUCENE-3631, this should be easier I think.
 I haven't looked at it much myself but i'll play around a bit, but at a 
 glance:
 * SegmentReader to have Bits liveDocs instead of BitVector
 * address the TODO in the IW-using ctors so that SegmentReader doesn't take a 
 parent but just an existing core.
 * we need some type of minimal MutableBits or similar subinterface of bits. 
 BitVector and maybe Fixed/OpenBitSet could implement it
 * BitVector becomes an impl detail and moves to codec (maybe we have a shared 
 base class and split the 3.x/4.x up rather than the conditional backwards)
 * I think the invertAll should not be used by IndexWriter, instead we define 
 the codec interface to say give me a new MutableBits, by default all are 
 set ?
 * redundant internally-consistent checks in checkLiveCounts should be done in 
 the codec impl instead of in SegmentReader.
 * plain text impl in SimpleText.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3728) better handling of files inside/outside CFS by codec

2012-01-29 Thread Robert Muir (Created) (JIRA)

better handling of files inside/outside CFS by codec


 Key: LUCENE-3728
 URL: https://issues.apache.org/jira/browse/LUCENE-3728
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir


Since norms and deletes were moved under Codec (LUCENE-3606, LUCENE-3661),
we never really properly addressed the issue of how Codec.files() should work,
considering these files are always stored outside of CFS.

LUCENE-3606 added a hack, LUCENE-3661 cleaned up the hack a little bit more,
but its still a hack.

Currently the logic in SegmentInfo.files() is:
{code}
clearCache()

if (compoundFile) {
  // don't call Codec.files(), hardcoded CFS extensions, etc
} else {
  Codec.files()
}

// always add files stored outside CFS regardless of CFS setting
Codec.separateFiles()

if (sharedDocStores) {
  // hardcoded shared doc store extensions, etc
}
{code}

Also various codec methods take a Directory parameter, but its inconsistent
what this Directory is in the case of CFS: for some parts of the index its
the CFS directory, for others (deletes, separate norms) its not.

I wonder if instead we could restructure this so that SegmentInfo.files() logic 
is:
{code}
clearCache()
Codec.files()
{code}

and so that Codec is instead responsible.

instead Codec.files logic by default would do the if (compoundFile) thing, and
Lucene3x codec itself would only have the if (sharedDocStores) thing, and any
part of the codec that wants to put stuff always outside of CFS (e.g. Lucene3x 
separate norms, deletes) 
could just use SegmentInfo.dir. Directory parameters in the case of CFS would 
always
consistently be the CFSDirectory.

I haven't totally tested if this will work but there is definitely some 
cleanups 
we can do either way, and I think it would be a good step to try to clean this 
up
and simplify it.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12297 - Still Failing

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12297/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:17075/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:17075/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 8169 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3728) better handling of files inside/outside CFS by codec

2012-01-29 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195741#comment-13195741
 ] 

Robert Muir commented on LUCENE-3728:
-

I'm gonna slowly iterate on cleaning this up on the branch for 
lucene-3661 (branches/lucene3661: i recreated it), in case anyone 
wants to jump in and help or test out some ideas.


 better handling of files inside/outside CFS by codec
 

 Key: LUCENE-3728
 URL: https://issues.apache.org/jira/browse/LUCENE-3728
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir

 Since norms and deletes were moved under Codec (LUCENE-3606, LUCENE-3661),
 we never really properly addressed the issue of how Codec.files() should work,
 considering these files are always stored outside of CFS.
 LUCENE-3606 added a hack, LUCENE-3661 cleaned up the hack a little bit more,
 but its still a hack.
 Currently the logic in SegmentInfo.files() is:
 {code}
 clearCache()
 if (compoundFile) {
   // don't call Codec.files(), hardcoded CFS extensions, etc
 } else {
   Codec.files()
 }
 // always add files stored outside CFS regardless of CFS setting
 Codec.separateFiles()
 if (sharedDocStores) {
   // hardcoded shared doc store extensions, etc
 }
 {code}
 Also various codec methods take a Directory parameter, but its inconsistent
 what this Directory is in the case of CFS: for some parts of the index its
 the CFS directory, for others (deletes, separate norms) its not.
 I wonder if instead we could restructure this so that SegmentInfo.files() 
 logic is:
 {code}
 clearCache()
 Codec.files()
 {code}
 and so that Codec is instead responsible.
 instead Codec.files logic by default would do the if (compoundFile) thing, and
 Lucene3x codec itself would only have the if (sharedDocStores) thing, and any
 part of the codec that wants to put stuff always outside of CFS (e.g. 
 Lucene3x separate norms, deletes) 
 could just use SegmentInfo.dir. Directory parameters in the case of CFS would 
 always
 consistently be the CFSDirectory.
 I haven't totally tested if this will work but there is definitely some 
 cleanups 
 we can do either way, and I think it would be a good step to try to clean 
 this up
 and simplify it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1642 - Still Failing

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1642/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:31200/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:31200/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 10465 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12298 - Still Failing

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12298/

1 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

Error Message:
http://localhost:19819/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: 
http://localhost:19819/solr/collection1
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
at 
org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
at 
org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
at 
org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)




Build Log (for compile errors):
[...truncated 8165 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Uwe Schindler (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195757#comment-13195757
]

Uwe Schindler commented on LUCENE-2858:
---

I now fixed the branch's test-framework and all remaining TODOs about the API.

Now the horrible stupid slave-work to port all test starts. I assume the API is
now fixed, as nobody complained after one week.

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
--

Key: LUCENE-2858
URL: https://issues.apache.org/jira/browse/LUCENE-2858
Project: Lucene - Java
Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
Fix For: 4.0

With current trunk, whenever you open an IndexReader on a directory you get
back a DirectoryReader which is a composite reader. The interface of
IndexReader has now lots of methods that simply throw UOE (in fact more than
50% of all methods that are commonly used ones are unuseable now). This
confuses users and makes the API hard to understand.
This issue should split atomic readers from reader collections with a
separate API. After that, you are no longer able, to get TermsEnum without
wrapping from those composite readers. We currently have helper classes for
wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or
Multi*), those should be retrofitted to implement the correct classes
(SlowMultiReaderWrapper would be an atomic reader but takes a composite
reader as ctor param, maybe it could also simply take a ListAtomicReader).
In my opinion, maybe composite readers could implement some collection APIs
and also have the ReaderUtil method directly built in (possibly as a view
in the util.Collection sense). In general composite readers do not really
need to look like the previous IndexReaders, they could simply be a
collection of SegmentReaders with some functionality like reopen.
On the other side, atomic readers do not need reopen logic anymore? When a
segment changes, you need a new atomic reader? - maybe because of deletions
thats not the best idea, but we should investigate. Maybe make the whole
reopen logic simplier to use (ast least on the collection reader level).
We should decide about good names, i have no preference at the moment.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3716) Discussion topic: Move all Commit/VersionReopen stuff from abstract IR to DirectoryReader

2012-01-29 Thread Uwe Schindler (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-3716.
---

Resolution: Not A Problem

Nobody complained after one week, I will proceed with the branch and close this 
sub-task.

 Discussion topic: Move all Commit/VersionReopen stuff from abstract IR to 
 DirectoryReader
 --

 Key: LUCENE-3716
 URL: https://issues.apache.org/jira/browse/LUCENE-3716
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


 When implementing the parent issue, I noticed a lot of other stuff in 
 IndexReader thats only implemented in DirectoryReader/SegmentReader and is 
 not really related to IndexReader at all:
 - getVersion (maybe also isCurrent) only affects DirectoryReaders, because of 
 the commit-stuff there is no easy way for e.g. MultiReader to implement this
 - reopen/openIfChanged cannot be implemented easily by most 
 AtomicIndexReaders, but also CompositeIndexReader is the wrong place to 
 define those methods
 - all methods returning/opening IndexCommits
 In the parant issue, I already let IndexReader.open() return DirectoryReader 
 and I made this class public. We should move the whole stuff (including 
 IR.open) to DirectoryReader. Reopening outside DirectoryReader is not really 
 needed.
 If some people think, it should maybe stay abstract (affects only the 
 reopen/version stuff), there are ways for other readers to implement it, but 
 for sure its not specific to IR's in general. In that case I would decalre an 
 interface that DirectoryReader implements. Code like SearcherManager/Solr 
 could then instanceof the IR instance and find out if it's worth 
 reopening/version checking).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2358) Distributing Indexing

2012-01-29 Thread Mark Miller (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195759#comment-13195759
 ] 

Mark Miller commented on SOLR-2358:
---

These tests really need to be done with real jetty instances (at least some of 
them). I'll try adding some timeouts where we are not currently using them 
(generally they are used from any test code but not always in non test code).

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, 
 apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar


 The indexing side of SolrCloud - the goal of this issue is to provide 
 durable, fault tolerant indexing to an elastic cluster of Solr instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3072) FAILED: org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch

2012-01-29 Thread Mark Miller (Created) (JIRA)

FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch
--

 Key: SOLR-3072
 URL: https://issues.apache.org/jira/browse/SOLR-3072
 Project: Solr
  Issue Type: Test
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0


Another test that seems to fail often on jenkins but not other systems. We take 
down a jetty instance and then try to query a still up jetty index with the 
load balancing solr client (within the solrcloud client). We get an so read 
timeout on the request. I saw this once in early dev - I couldn't figure out 
what to blame other than http client - the other server was down, and somehow 
that was affecting the request to the server that was still up. I tried not 
sharing the httpclient instance between requests and making a new one each time 
and it started working - I reverted that and it was still working though - and 
worked since. Seems to fail consistently on jenkins though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1642 - Still Failing

2012-01-29 Thread Mark Miller

I've created SOLR-3072 to track this. Read time out on an available, up and 
running jetty instance on jenkins.

On Jan 29, 2012, at 7:30 AM, Apache Jenkins Server wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1642/
 
 1 tests failed.
 FAILED:  org.apache.solr.cloud.FullSolrCloudTest.testDistribSearch
 
 Error Message:
 http://localhost:31200/solr/collection1
 
 Stack Trace:
 org.apache.solr.client.solrj.SolrServerException: 
 http://localhost:31200/solr/collection1
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481)
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
   at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:104)
   at 
 org.apache.solr.cloud.FullSolrCloudTest.index_specific(FullSolrCloudTest.java:493)
   at 
 org.apache.solr.cloud.FullSolrCloudTest.brindDownShardIndexSomeDocsAndRecover(FullSolrCloudTest.java:720)
   at 
 org.apache.solr.cloud.FullSolrCloudTest.doTest(FullSolrCloudTest.java:545)
   at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:663)
   at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
   at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
 Caused by: java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:150)
   at java.net.SocketInputStream.read(SocketInputStream.java:121)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
   at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
   at 
 org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
   at 
 org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
   at 
 org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
   at 
 org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
   at 
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:425)
 
 
 
 
 Build Log (for compile errors):
 [...truncated 10465 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com












-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2358) Distributing Indexing

2012-01-29 Thread Yonik Seeley (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195761#comment-13195761
 ] 

Yonik Seeley commented on SOLR-2358:


We should be careful of using socket read timeouts in non-test code for 
operations that could potentially take a long time... commit, optimize, and 
even query requests (depending on what the request is).  By default, solr does 
not currently time out requests because we don't know what the upper bound is.

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, 
 apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar


 The indexing side of SolrCloud - the goal of this issue is to provide 
 durable, fault tolerant indexing to an elastic cluster of Solr instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Yonik Seeley (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195768#comment-13195768
]

Yonik Seeley commented on LUCENE-2858:
--

bq. SlowMultiReaderWrapper - please rename, the name is really ugly; or Multi*

+1, the Slow* is misleading as it makes it seem like there's a faster way you
should be doing it.
CompositeReaderWrapper should be fine. And no, it doesn't sound too cool for
the hypothetical developers who use that as a criteria when coding ;-)

Other possibilities include AtomicReaderEmulator, AtomicEmulatorReader,
CompositeAsAtomicReader, etc

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
--

Key: LUCENE-2858
URL: https://issues.apache.org/jira/browse/LUCENE-2858
Project: Lucene - Java
Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2358) Distributing Indexing

2012-01-29 Thread Mark Miller (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195783#comment-13195783
 ] 

Mark Miller commented on SOLR-2358:
---

Yup, I agree - in general in non test code we don't want to time out by default 
- that is why I've stuck to only using them in the tests until now. I've tried 
adding one to the Solr cmd distributor for a bit though - just to see if that 
helps on Jenkins any. I'd like to narrow in and at least know if this is the 
problem or not (blackhole hangups). For some things, like a request to recover, 
timeouts may be fine I think.

Once I am able to log into jenkins again, I can hopefully narrow down what is 
happening a lot faster.

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, 
 apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar


 The indexing side of SolrCloud - the goal of this issue is to provide 
 durable, fault tolerant indexing to an elastic cluster of Solr instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2358) Distributing Indexing

2012-01-29 Thread Yonik Seeley (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195787#comment-13195787
 ] 

Yonik Seeley commented on SOLR-2358:


bq. For some things, like a request to recover, timeouts may be fine I think.

Definitely - we have a lot better handle on Solr created requests.  Replication 
(although it can take a long time to send a big file, there shouldn't be long 
periods where no packets are sent), PeerSync, etc.

Although IIRC, a new cloud-style replication request involves the recipient 
doing a commit?

 Distributing Indexing
 -

 Key: SOLR-2358
 URL: https://issues.apache.org/jira/browse/SOLR-2358
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud, update
Reporter: William Mayor
Priority: Minor
 Fix For: 4.0

 Attachments: 2shard4server.jpg, SOLR-2358.patch, SOLR-2358.patch, 
 apache-solr-noggit-r1211150.jar, zookeeper-3.3.4.jar


 The indexing side of SolrCloud - the goal of this issue is to provide 
 durable, fault tolerant indexing to an elastic cluster of Solr instances.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir

2012-01-29 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2795.


   Resolution: Fixed
Fix Version/s: 4.0

Thanks Varun!

 Genericize DirectIOLinuxDir - UnixDir
 --

 Key: LUCENE-2795
 URL: https://issues.apache.org/jira/browse/LUCENE-2795
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, 
 LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch, 
 LUCENE-2795.patch, LUCENE-2795.patch, LUCENE-2795.patch


 Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to 
 use it for indexWriter and not IndexReader (searching).  It's a trap.
 But, once we do LUCENE-2793, we can make it fully general purpose because 
 then a single native Dir impl can be used.
 I'd also like to make it generic to other Unices, if we can, so that it 
 becomes UnixDirectory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED

2012-01-29 Thread Michael McCandless (Created) (JIRA)

Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
--

 Key: LUCENE-3729
 URL: https://issues.apache.org/jira/browse/LUCENE-3729
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified

2012-01-29 Thread Erick Erickson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195810#comment-13195810
 ] 

Erick Erickson commented on SOLR-3017:
--

A couple of questions:
1 I notice that guava is in here. The only other place I see imports for 
google.common is in the carrot code. Does anyone object to guava getting used 
in core? I only ask because it's used in so few places, do we prefer apache 
commons StringUtils for this kind of stuff or do we just not care?

2 In ExtendedDismaxQParserPlugin, around lines 1140 (in 
noStopwordFilterAnalyzer) there are a couple of tests like:
  if (stopwordFilterFactoryClass.isInstance(tf)) {
  
Scanning the code, it seems like stopwordFilterFactoryClass could be null, an 
NPE here seems questionable.

Otherwise, this seems fine to me from a tactical perspective, anyone want to 
weigh in on whether this is a good thing overall?

 Allow edismax stopword filter factory implementation to be specified
 

 Key: SOLR-3017
 URL: https://issues.apache.org/jira/browse/SOLR-3017
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3017.patch, edismax_stop_filter_factory.patch


 Currently, the edismax query parser assumes that stopword filtering is being 
 done by StopFilter: the removal of the stop filter is performed by looking 
 for an instance of 'StopFilterFactory' (hard-coded) within the associated 
 field's analysis chain.
 We'd like to be able to use our own stop filters whilst keeping the edismax 
 stopword removal goodness. The supplied patch allows the stopword filter 
 factory class to be supplied as a param, stopwordFilterClassName. If no 
 value is given, the default (StopFilterFactory) is used.
 Another option I looked into was to extend StopFilterFactory to create our 
 own filter. Unfortunately, StopFilterFactory's 'create' method returns 
 StopFilter, not TokenStream. StopFilter is also final.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED

2012-01-29 Thread Michael McCandless (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3729:
---

Attachment: LUCENE-3729.patch

Prototype patch just for testing...

As a quick test for viability here... I hacked
FieldCacheImpl.DocTermsIndexImpl, to build an FST to map term - ord,
and changed the lookup method to use the new Util.getByOutput method.

Then I tested perf on 10M docs from Wikipedia:

{noformat}
TaskQPS base StdDev base   QPS fstfcStdDev fstfc  Pct 
diff
 TermGroup1M   47.751.59   25.750.36  -48% -  
-43%
TermBGroup1M   17.100.58   14.200.37  -21% -  
-11%
PKLookup  158.736.07  155.843.00   -7% -
4%
   TermTitleSort   43.492.54   42.731.84  -11% -
8%
 Respell   81.133.24   80.673.83   -8% -
8%
Term  106.133.59  106.031.28   -4% -
4%
  TermBGroup1M1P   25.310.44   25.370.54   -3% -
4%
  Fuzzy2   55.321.21   55.762.55   -5% -
7%
  Fuzzy1   74.061.21   74.882.80   -4% -
6%
SloppyPhrase9.820.619.950.42   -8% -   
12%
SpanNear3.390.163.470.15   -6% -   
12%
  Phrase9.290.699.660.69  -10% -   
20%
Wildcard   20.150.66   21.230.460% -   
11%
 AndHighHigh   13.430.55   14.240.70   -3% -   
15%
 Prefix3   10.050.53   10.700.190% -   
14%
  AndHighMed   56.623.36   60.544.28   -6% -   
21%
   OrHighMed   25.780.98   27.751.51   -1% -   
17%
  OrHighHigh   10.970.41   11.820.63   -1% -   
17%
  IntNRQ9.740.81   10.830.260% -   
24%
{noformat}

Two-pass grouping took a big hit... and single-pass grouping a moderate
hit... but TermTitleSort was a minor slowdown, which is good news.

The net RAM required across all segs for the title field FST was 30.2
MB, vs 46.5 MB for the current FieldCache terms storage (PagedBytes +
PackedInts), which is ~35% less.

The FST for the group-by fields was quite a bit larger (~60%) RAM
usage than PagedBytes + PackedInts, because these fields are actually
randomly generated unicode strings...

I didn't make the change to use the FST for term - ord lookup (have
to fix the binarySearchLookup method), but we really should do this
for real because it's doing an unnecessary binary search (repeated
ord - term lookup) now.  Ie, perf should be better than
above... grouping is a heavier user of binarySearchLookup than sorting
so it should help recover some of that slowdown.

Also, Util.getByOutput currently doesn't optimize for array'd
arcs... so if we fix that we should get some small perf gain.

To do this for real I think we should do it only with DocValues,
because the FST build time is relatively costly.


 Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
 --

 Key: LUCENE-3729
 URL: https://issues.apache.org/jira/browse/LUCENE-3729
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3729.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED

2012-01-29 Thread Michael McCandless (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3729:
---

Attachment: LUCENE-3729.patch

Updated patch to use FST for term - ord lookup during 2 pass grouping... much 
better results:

{noformat}
TaskQPS base StdDev base   QPS fstfcStdDev fstfc  Pct 
diff
 TermGroup1M   45.691.25   43.430.90   -9% -
0%
PKLookup  164.029.00  157.126.23  -12% -
5%
 Respell   64.282.75   61.962.48  -11% -
4%
SloppyPhrase5.990.265.790.35  -13% -
7%
  Fuzzy2   54.072.47   52.561.83  -10% -
5%
  Phrase   14.970.16   14.610.44   -6% -
1%
   TermTitleSort   39.530.56   38.711.93   -8% -
4%
   OrHighMed   32.911.33   32.271.48  -10% -
6%
  OrHighHigh   15.100.62   14.830.68   -9% -
7%
  AndHighMed   53.490.56   52.532.02   -6% -
3%
SpanNear   14.280.21   14.040.28   -5% -
1%
  Fuzzy1   60.371.25   59.391.65   -6% -
3%
 AndHighHigh9.150.129.010.25   -5% -
2%
TermBGroup1M   33.530.61   33.070.77   -5% -
2%
  IntNRQ   10.160.47   10.040.90  -14% -   
12%
 Prefix3   40.440.59   40.441.99   -6% -
6%
Wildcard   35.380.63   35.551.49   -5% -
6%
  TermBGroup1M1P   16.540.34   16.780.54   -3% -
6%
Term  100.392.64  103.133.69   -3% -
9%
{noformat}


 Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
 --

 Key: LUCENE-3729
 URL: https://issues.apache.org/jira/browse/LUCENE-3729
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-3729.patch, LUCENE-3729.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer

2012-01-29 Thread Updated


 [ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-2764:
--

Attachment: SOLR-2764.patch

Thanks Christian. I further refined stuff:

- For MinimalStemmer, we now do two-pass removal for the -dom and -het endings. 
This means that the word kristendom will first be stemmed to kristen, and then 
all the general rules apply so it will be further stemmed to krist. The effect 
of this is that both kristen,kristendom,kristendommen,kristendommens will all 
be stemmed to krist (due to in this case incorrect interpretation of -en as 
plural ending), but when stopping at -dom removal, kristendom would not match 
inflections of kristen.

What do you think, is this a reasonable improvement or could there be side 
effects? I've not added these rules to the MinimalStemmer, to keep it simpler.

 Create a NorwegianLightStemmer and NorwegianMinimalStemmer
 --

 Key: SOLR-2764
 URL: https://issues.apache.org/jira/browse/SOLR-2764
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Jan Høydahl
 Fix For: 3.6, 4.0

 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, 
 SOLR-2764.patch


 We need a simple light-weight stemmer and a minimal stemmer for 
 plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Tommaso Teofili (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili reassigned SOLR-3013:
-

Assignee: Tommaso Teofili

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer

2012-01-29 Thread Issue Comment Edited


[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195820#comment-13195820
 ] 

Jan Høydahl edited comment on SOLR-2764 at 1/29/12 10:09 PM:
-

Thanks Christian. I further refined stuff:

- I think the MinimalStemmer is more or less good to go, it seems to do what 
it's supposed to
- For LightStemmer, we now do two-pass removal for the -dom and -het endings. 
This means that the word kristendom will first be stemmed to kristen, and 
then all the general rules apply so it will be further stemmed to krist. The 
effect of this is that both kristen,kristendom,kristendommen,kristendommens 
will all be stemmed to krist (due to in this case incorrect interpretation of 
-en as singular definite ending).
- Added some more tests to highlight this

What do you think, is this -dom -het thing a reasonable improvement or could 
there be side effects?

Are there some other general rules that could easily be incorporated to catch 
semi-regular conjugations for the light stemmer?

  was (Author: janhoy):
Thanks Christian. I further refined stuff:

- For MinimalStemmer, we now do two-pass removal for the -dom and -het endings. 
This means that the word kristendom will first be stemmed to kristen, and then 
all the general rules apply so it will be further stemmed to krist. The effect 
of this is that both kristen,kristendom,kristendommen,kristendommens will all 
be stemmed to krist (due to in this case incorrect interpretation of -en as 
plural ending), but when stopping at -dom removal, kristendom would not match 
inflections of kristen.

What do you think, is this a reasonable improvement or could there be side 
effects? I've not added these rules to the MinimalStemmer, to keep it simpler.
  
 Create a NorwegianLightStemmer and NorwegianMinimalStemmer
 --

 Key: SOLR-2764
 URL: https://issues.apache.org/jira/browse/SOLR-2764
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Jan Høydahl
 Fix For: 3.6, 4.0

 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, 
 SOLR-2764.patch


 We need a simple light-weight stemmer and a minimal stemmer for 
 plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Tommaso Teofili (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861
 ] 

Tommaso Teofili commented on SOLR-3013:
---

If no one objects I'll commit this shortly.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer

2012-01-29 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195863#comment-13195863
 ] 

Robert Muir commented on SOLR-2764:
---

just some general suggestions:

in a light stemmer, i would be wary of derivational endings. 
it seems in the case of dom/het because its dealing with adj/noun that its
on the edge (maybe ok here), but if possible it would be more ideal to
avoid multiple passes... this is the kind of thing that causes snowball 
problems.

Can you think of examples for dom/het where the meaning would be changed?

for example: freedom is used the same way in english, but stemming this 
to free is very lossy, since free has a variety of meanings (such as costs 
nothing), 
some of which are incompatible with freedom. This is the danger of stripping
derivational suffixes...


 Create a NorwegianLightStemmer and NorwegianMinimalStemmer
 --

 Key: SOLR-2764
 URL: https://issues.apache.org/jira/browse/SOLR-2764
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Jan Høydahl
 Fix For: 3.6, 4.0

 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, 
 SOLR-2764.patch


 We need a simple light-weight stemmer and a minimal stemmer for 
 plural/singlular only in Norwegian

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Robert Muir (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195868#comment-13195868
]

Robert Muir commented on LUCENE-2858:
-

Can we please do some eclipse-renames like:

AtomicIndexReader - AtomicReader
AtomicIndexReader.AtomicReaderContext - AtomicReader.Context

The verbosity of the api is killing me :)

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
--

Key: LUCENE-2858
URL: https://issues.apache.org/jira/browse/LUCENE-2858
Project: Lucene - Java
Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Michael McCandless (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195873#comment-13195873
]

Michael McCandless commented on LUCENE-2858:

+1 for those names.

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
--

Key: LUCENE-2858
URL: https://issues.apache.org/jira/browse/LUCENE-2858
Project: Lucene - Java
Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Uwe Schindler (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195876#comment-13195876
]

Uwe Schindler commented on LUCENE-2858:
---

Jaja, will fix this...

Separate SegmentReaders (and other atomic readers) from composite IndexReaders
--

Key: LUCENE-2858
URL: https://issues.apache.org/jira/browse/LUCENE-2858
Project: Lucene - Java
Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
Fix For: 4.0

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1646 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1646/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.ZkControllerTest.testUploadToCloud

Error Message:
KeeperErrorCode = NodeExists for /configs/config1/schema-reversed.xml

Stack Trace:
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /configs/config1/schema-reversed.xml
at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:486)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:483)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:369)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:896)
at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:689)
at 
org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:121)
at 
org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)




Build Log (for compile errors):
[...truncated 9809 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3725) Add optional packing to FST building

2012-01-29 Thread Michael McCandless (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3725.


Resolution: Fixed

 Add optional packing to FST building
 

 Key: LUCENE-3725
 URL: https://issues.apache.org/jira/browse/LUCENE-3725
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, 
 Perf.java


 The FSTs produced by Builder can be further shrunk if you are willing
 to spend highish transient RAM to do so... our Builder today tries
 hard not to use much RAM (and has options to tweak down the RAM usage,
 in exchange for somewhat lager FST), even when building immense FSTs.
 But for apps that can afford highish transient RAM to get a smaller
 net FST, I think we should offer packing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3714) add suggester that uses shortest path/wFST instead of buckets

2012-01-29 Thread Robert Muir (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3714:

Attachment: LUCENE-3714.patch

I've been wanting to work on this.. havent found the time.

This just syncs the patch up to trunk's FST api changes.

add suggester that uses shortest path/wFST instead of buckets
-

Key: LUCENE-3714
URL: https://issues.apache.org/jira/browse/LUCENE-3714
Project: Lucene - Java
Issue Type: New Feature
Components: modules/spellchecker
Reporter: Robert Muir
Attachments: LUCENE-3714.patch, LUCENE-3714.patch, LUCENE-3714.patch,
LUCENE-3714.patch, LUCENE-3714.patch, TestMe.java, out.png

Currently the FST suggester (really an FSA) quantizes weights into buckets
(e.g. single byte) and puts them in front of the word.
This makes it fast, but you lose granularity in your suggestions.
Lately the question was raised, if you build lucene's FST with
positiveintoutputs, does it behave the same as a tropical semiring wFST?
In other words, after completing the word, we instead traverse min(output) at
each node to find the 'shortest path' to the
best suggestion (with the highest score).
This means we wouldnt need to quantize weights at all and it might make some
operations (e.g. adding fuzzy matching etc) a lot easier.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-3.x - Build # 627 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-3.x/627/

No tests ran.

Build Log (for compile errors):
[...truncated 10841 lines...]
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/java/overview.html
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene/util
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/test-framework/resources/org/apache/lucene/util/europarl.lines.txt.gz
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/index.html
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/README.txt
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/JRE_VERSION_MIGRATION.txt
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/BUILD.txt
 [exec] A
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/build.xml
 [exec] Exported revision 1237511.
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/tools/javadoc/java5
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/docs/changes
 [exec] --2012-01-30 00:39:15--  
https://issues.apache.org/jira/rest/api/2.0.alpha1/project/LUCENE
 [exec] Resolving issues.apache.org (issues.apache.org)... 140.211.11.121
 [exec] Connecting to issues.apache.org 
(issues.apache.org)|140.211.11.121|:443... connected.
 [exec] WARNING: cannot verify issues.apache.org's certificate, issued by 
`/C=US/O=Thawte, Inc./CN=Thawte SSL CA':
 [exec]   Unable to locally verify the issuer's authority.
 [exec] HTTP request sent, awaiting response... 200 OK
 [exec] Length: unspecified [application/json]
 [exec] Saving to: `STDOUT'
 [exec] 
 [exec]  0K .. ..  
46.4M=0s
 [exec] 
 [exec] 2012-01-30 00:39:17 (46.4 MB/s) - written to stdout [16744]
 [exec] 
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl
 line 384,  line 5949.
 [exec] Use of uninitialized value $heading in lc at 
/usr/home/hudson/hudson-slave/workspace/Lucene-3.x/checkout/lucene/build/svn-export/src/site/changes/changes2html.pl

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12319 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12319/

No tests ran.

Build Log (for compile errors):
[...truncated 16 lines...]
U lucene/CHANGES.txt
U lucene/src/test/org/apache/lucene/util/fst/TestFSTs.java
U lucene/src/java/org/apache/lucene/util/FixedBitSet.java
U lucene/src/java/org/apache/lucene/util/fst/PairOutputs.java
U lucene/src/java/org/apache/lucene/util/fst/Util.java
U lucene/src/java/org/apache/lucene/util/fst/FSTEnum.java
U lucene/src/java/org/apache/lucene/util/fst/PositiveIntOutputs.java
U lucene/src/java/org/apache/lucene/util/fst/Outputs.java
U lucene/src/java/org/apache/lucene/util/fst/Builder.java
U lucene/src/java/org/apache/lucene/util/fst/NodeHash.java
U lucene/src/java/org/apache/lucene/util/fst/FST.java
U lucene/src/java/org/apache/lucene/util/UnicodeUtil.java
 Ulucene
 U.
At revision 1237512
Reverting /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/nightly
Updating http://svn.apache.org/repos/asf/lucene/dev/nightly
At revision 1237512
no change for http://svn.apache.org/repos/asf/lucene/dev/nightly since the 
previous build
No emails were triggered.
[Lucene-Solr-tests-only-3.x] $ /bin/bash -xe 
/var/tmp/hudson5884032007436283436.sh
+ sh 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/nightly/hudson-lusolr-tests-3.x.sh
+ ANT_HOME=/home/hudson/tools/ant/latest1.7
+ SVNVERSION_EXE=svnversion
+ SVN_EXE=svn
+ CLOVER=/home/hudson/tools/clover/clover2latest
+ JAVA_HOME_15=/home/hudson/tools/java/latest1.5
+ [ -z '' ]
+ JAVA_HOME_16=/home/hudson/tools/java/latest1.6
+ JAVADOC_HOME_15=/usr/local/share/doc/jdk1.5/api
+ JAVADOC_HOME_16=/usr/local/share/doc/jdk1.6/api
+ ROOT_DIR=checkout
+ CORE_DIR=checkout/lucene
+ MODULES_DIR=checkout/modules
+ SOLR_DIR=checkout/solr
+ 
ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/artifacts
+ 
JAVADOCS_ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/javadocs
+ 
DUMP_DIR=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps
+ TESTS_MULTIPLIER=3
+ TEST_LINE_DOCS_FILE=/home/hudson/lucene-data/enwiki.random.lines.txt.gz
+ TEST_JVM_ARGS='-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps/
 '
+ set +x
+ mkdir -p 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps
+ rm -rf 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/heapdumps/README.txt
+ echo 'This directory contains heap dumps that may be generated by test runs 
when OOM occurred.'
+ TESTS_MULTIPLIER=5
+ cd /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant clean
Buildfile: build.xml

clean:

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build
 [echo] Building solr...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build
 [echo] TODO: fix tests to not write files to 
'core/src/test-files/solr/data'!
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/test-files/solr/data

BUILD SUCCESSFUL
Total time: 41 seconds
+ cd 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib 
compile-backwards
Buildfile: build.xml

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/classes/java
[javac] Compiling 498 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getColumn();
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getLine();
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/src/java/org/apache/lucene/util/fst/FST.java:1764:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1673 - Failure

2012-01-29 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1673/

No tests ran.

Build Log (for compile errors):
[...truncated 13 lines...]
U lucene/CHANGES.txt
U lucene/src/test/org/apache/lucene/util/fst/TestFSTs.java
U lucene/src/java/org/apache/lucene/util/FixedBitSet.java
U lucene/src/java/org/apache/lucene/util/fst/PairOutputs.java
U lucene/src/java/org/apache/lucene/util/fst/Util.java
U lucene/src/java/org/apache/lucene/util/fst/FSTEnum.java
U lucene/src/java/org/apache/lucene/util/fst/PositiveIntOutputs.java
U lucene/src/java/org/apache/lucene/util/fst/Outputs.java
U lucene/src/java/org/apache/lucene/util/fst/Builder.java
U lucene/src/java/org/apache/lucene/util/fst/NodeHash.java
U lucene/src/java/org/apache/lucene/util/fst/FST.java
U lucene/src/java/org/apache/lucene/util/UnicodeUtil.java
 Ulucene
 U.
At revision 1237513
Reverting 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/nightly
Updating http://svn.apache.org/repos/asf/lucene/dev/nightly
At revision 1237513
no change for http://svn.apache.org/repos/asf/lucene/dev/nightly since the 
previous build
No emails were triggered.
[Lucene-Solr-tests-only-3.x-java7] $ /bin/bash -xe 
/var/tmp/hudson2486050198597281484.sh
+ sh 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/nightly/hudson-lusolr-tests-3.x.sh
+ ANT_HOME=/home/hudson/tools/ant/latest1.7
+ SVNVERSION_EXE=svnversion
+ SVN_EXE=svn
+ CLOVER=/home/hudson/tools/clover/clover2latest
+ JAVA_HOME_15=/home/hudson/tools/java/latest1.5
+ [ -z yes ]
+ JAVA_HOME_16=/home/hudson/tools/java/latest1.7
+ JAVADOC_HOME_15=/usr/local/share/doc/jdk1.5/api
+ JAVADOC_HOME_16=/usr/local/share/doc/jdk1.6/api
+ ROOT_DIR=checkout
+ CORE_DIR=checkout/lucene
+ MODULES_DIR=checkout/modules
+ SOLR_DIR=checkout/solr
+ 
ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/artifacts
+ 
JAVADOCS_ARTIFACTS=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/javadocs
+ 
DUMP_DIR=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps
+ TESTS_MULTIPLIER=3
+ TEST_LINE_DOCS_FILE=/home/hudson/lucene-data/enwiki.random.lines.txt.gz
+ TEST_JVM_ARGS='-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps/
 '
+ set +x
+ mkdir -p 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps
+ rm -rf 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/heapdumps/README.txt
+ echo 'This directory contains heap dumps that may be generated by test runs 
when OOM occurred.'
+ TESTS_MULTIPLIER=5
+ cd 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant clean
Buildfile: build.xml

clean:

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build
 [echo] Building solr...

clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build
 [echo] TODO: fix tests to not write files to 
'core/src/test-files/solr/data'!
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/core/src/test-files/solr/data

BUILD SUCCESSFUL
Total time: 28 seconds
+ cd 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene
+ JAVA_HOME=/home/hudson/tools/java/latest1.5 
/home/hudson/tools/ant/latest1.7/bin/ant compile compile-test build-contrib 
compile-backwards
Buildfile: build.xml

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/classes/java
[javac] Compiling 498 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:34:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getColumn();
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/src/java/org/apache/lucene/queryParser/CharStream.java:41:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getLine();
[javac]   ^
[javac]

[jira] [Created] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID

2012-01-29 Thread Devon Krisman (Created) (JIRA)

Distributed Grouping fails if the uniqueKey is a UUID
-

 Key: SOLR-3073
 URL: https://issues.apache.org/jira/browse/SOLR-3073
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Devon Krisman
Priority: Minor
 Fix For: 3.6
 Attachments: SOLR-3073-3x.patch

Attempting use distributed grouping (using a StrField as the group.fieldname) 
with a UUID as the uniqueKey results in an error because the classname 
(java.util.UUID) is prepended to the field value during the second phase of the 
grouping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID

2012-01-29 Thread Devon Krisman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devon Krisman updated SOLR-3073:


Attachment: SOLR-3073-3x.patch

 Distributed Grouping fails if the uniqueKey is a UUID
 -

 Key: SOLR-3073
 URL: https://issues.apache.org/jira/browse/SOLR-3073
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Devon Krisman
Priority: Minor
 Fix For: 3.6

 Attachments: SOLR-3073-3x.patch


 Attempting use distributed grouping (using a StrField as the group.fieldname) 
 with a UUID as the uniqueKey results in an error because the classname 
 (java.util.UUID) is prepended to the field value during the second phase of 
 the grouping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID

2012-01-29 Thread Devon Krisman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195898#comment-13195898
 ] 

Devon Krisman commented on SOLR-3073:
-

This is the error that happens when you try to run a distributed grouping 
search with a UUID as the uniqueKey:

SEVERE: org.apache.solr.common.SolrException: Invalid UUID String: 
'java.util.UUID:317db1e1-b778-ec66-ef68-ddd00b096632'
at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:85)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:217)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

The request handlers append the field Object's classname to its string value if 
it is from an unrecognized class, the attached patch should add java.util.UUID 
to the recognized classtypes for Solr's response handlers.

 Distributed Grouping fails if the uniqueKey is a UUID
 -

 Key: SOLR-3073
 URL: https://issues.apache.org/jira/browse/SOLR-3073
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Devon Krisman
Priority: Minor
 Fix For: 3.6

 Attachments: SOLR-3073-3x.patch


 Attempting use distributed grouping (using a StrField as the group.fieldname) 
 with a UUID as the uniqueKey results in an error because the classname 
 (java.util.UUID) is prepended to the field value during the second phase of 
 the grouping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3727) fix assertions/checks that use File.length() to use getFilePointer()

2012-01-29 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3727.
-

   Resolution: Fixed
Fix Version/s: 3.6

 fix assertions/checks that use File.length() to use getFilePointer()
 

 Key: LUCENE-3727
 URL: https://issues.apache.org/jira/browse/LUCENE-3727
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.6
Reporter: Robert Muir
 Fix For: 3.6

 Attachments: LUCENE-3727.patch, LUCENE-3727.patch


 This came up on this thread Getting RuntimeException: after flush: fdx size 
 mismatch while Indexing 
 (http://www.lucidimagination.com/search/document/a8db01a220f0a126)
 In trunk, a side effect of the codec refactoring is that these assertions 
 were pushed into codecs as finish() before close().
 they check getFilePointer() instead in this computation, which checks that 
 lucene did its part (instead of falsely tripping if directory metadata is 
 stale).
 I think we should fix these checks/asserts on 3.x too

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID

2012-01-29 Thread Devon Krisman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devon Krisman updated SOLR-3073:


Description: Attempting to use distributed grouping (using a StrField as 
the group.fieldname) with a UUID as the uniqueKey results in an error because 
the classname (java.util.UUID) is prepended to the field value during the 
second phase of the grouping.  (was: Attempting use distributed grouping (using 
a StrField as the group.fieldname) with a UUID as the uniqueKey results in an 
error because the classname (java.util.UUID) is prepended to the field value 
during the second phase of the grouping.)

 Distributed Grouping fails if the uniqueKey is a UUID
 -

 Key: SOLR-3073
 URL: https://issues.apache.org/jira/browse/SOLR-3073
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Devon Krisman
Priority: Minor
 Fix For: 3.6

 Attachments: SOLR-3073-3x.patch


 Attempting to use distributed grouping (using a StrField as the 
 group.fieldname) with a UUID as the uniqueKey results in an error because the 
 classname (java.util.UUID) is prepended to the field value during the second 
 phase of the grouping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2012-01-29 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195904#comment-13195904
 ] 

Uwe Schindler commented on LUCENE-2858:
---

I renamed the enclosing classes and also removed the public ctors from 
ReaderContexts (to prevent stupid things already reported on mailing lists).

The renameing of ReaderContexts all to the same name Context, but with 
different enclosing class is a refactoring, Eclipse cannot do (it creates 
invalid code). It seems only NetBeans can do this, I will try to find a 
solution. The problem is that Eclipse always tries to import the inner class, 
what causes conflicts.

Finally, e.g. the method getDocIdSet should look like 
getDocIdSet(AtomicReader.Context,...) [only importing AtomicReader], but 
Eclipse always tries to use Context [and import oal.AtomicReader.Context]. At 
the end we should have abstract IndexReader.Context, AtomicReader.Context, 
CompositeReader.Context.

Will go to bed now.

 Separate SegmentReaders (and other atomic readers) from composite IndexReaders
 --

 Key: LUCENE-2858
 URL: https://issues.apache.org/jira/browse/LUCENE-2858
 Project: Lucene - Java
  Issue Type: Task
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Blocker
 Fix For: 4.0


 With current trunk, whenever you open an IndexReader on a directory you get 
 back a DirectoryReader which is a composite reader. The interface of 
 IndexReader has now lots of methods that simply throw UOE (in fact more than 
 50% of all methods that are commonly used ones are unuseable now). This 
 confuses users and makes the API hard to understand.
 This issue should split atomic readers from reader collections with a 
 separate API. After that, you are no longer able, to get TermsEnum without 
 wrapping from those composite readers. We currently have helper classes for 
 wrapping (SlowMultiReaderWrapper - please rename, the name is really ugly; or 
 Multi*), those should be retrofitted to implement the correct classes 
 (SlowMultiReaderWrapper would be an atomic reader but takes a composite 
 reader as ctor param, maybe it could also simply take a ListAtomicReader). 
 In my opinion, maybe composite readers could implement some collection APIs 
 and also have the ReaderUtil method directly built in (possibly as a view 
 in the util.Collection sense). In general composite readers do not really 
 need to look like the previous IndexReaders, they could simply be a 
 collection of SegmentReaders with some functionality like reopen.
 On the other side, atomic readers do not need reopen logic anymore? When a 
 segment changes, you need a new atomic reader? - maybe because of deletions 
 thats not the best idea, but we should investigate. Maybe make the whole 
 reopen logic simplier to use (ast least on the collection reader level).
 We should decide about good names, i have no preference at the moment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Chris Male (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195906#comment-13195906
 ] 

Chris Male commented on SOLR-3013:
--

Hey Tommaso,

Did a quick glance over the patch.  Couple of things:

- Could UIMATypeAwareAnalyzerTest (and any other Analyzer/Tokenizer tests) use 
BaseTokenStreamTestCase? It has some useful utility methods to verify that your 
Analyzer works as expected
- UIMABaseAnalyzerTest could do the same, and could probably make use of 
newDirectory() etc to handle some of the boilerplate


 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195908#comment-13195908
 ] 

Robert Muir commented on SOLR-3013:
---

in addition to what Chris said: 

* it looks like some correctOffset() etc are missing (these would be detected 
by BaseTokenStreamTestCase.checkRandomData likely)
* the analysis components look as if they might be able to work with lucene 
too... maybe we could refactor the 
  Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? 
And Solr uima module would 
  provide the factories to integrate

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Chris Male (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195909#comment-13195909
 ] 

Chris Male commented on SOLR-3013:
--

{quote}
the analysis components look as if they might be able to work with lucene 
too... maybe we could refactor the
Tokenizer/Analyzer/etc in a new modules/analysis/uima that depends on uima? And 
Solr uima module would 
provide the factories to integrate
{quote}

I absolutely agree.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3719) FVH: slow performance on very large queries

2012-01-29 Thread Koji Sekiguchi (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-3719.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6
 Assignee: Koji Sekiguchi

trunk: Committed revision 1237528.
3x: Committed revision 1237531.

 FVH: slow performance on very large queries
 ---

 Key: LUCENE-3719
 URL: https://issues.apache.org/jira/browse/LUCENE-3719
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.5, 4.0
Reporter: Igor Motov
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3719.patch


 The change from HashSet to ArrayList for flatQueries in LUCENE-3019 resulted 
 in very significant slowdown in some of our e-discovery queries after upgrade 
 from 3.4.0 to 3.5.0. Our queries sometime contain tens of thousands of terms. 
 As a result, major portion of execution time for such queries is now spent in 
 the flatQueries.contains( sourceQuery ) method calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3719) FVH: slow performance on very large queries

2012-01-29 Thread Koji Sekiguchi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195913#comment-13195913
 ] 

Koji Sekiguchi commented on LUCENE-3719:


Thanks Igor for reporting the issue and providing the patch!

 FVH: slow performance on very large queries
 ---

 Key: LUCENE-3719
 URL: https://issues.apache.org/jira/browse/LUCENE-3719
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 3.5, 4.0
Reporter: Igor Motov
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3719.patch


 The change from HashSet to ArrayList for flatQueries in LUCENE-3019 resulted 
 in very significant slowdown in some of our e-discovery queries after upgrade 
 from 3.4.0 to 3.5.0. Our queries sometime contain tens of thousands of terms. 
 As a result, major portion of execution time for such queries is now spent in 
 the flatQueries.contains( sourceQuery ) method calls.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3725) Add optional packing to FST building

2012-01-29 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195914#comment-13195914
 ] 

Robert Muir commented on LUCENE-3725:
-

Just some numbers with another (CJK) fst I have been playing with, this one 
uses BYTE2 + SingleByteOutput
Before:
  Finished: 326915 words, 77222 nodes, 358677 arcs, 2617255 bytes... 
  Zipped: 1812629 bytes
Packed:
  Finished: 326915 words, 77222 nodes, 358677 arcs, 2027763 bytes...
  Zipped: 1735486 bytes

 Add optional packing to FST building
 

 Key: LUCENE-3725
 URL: https://issues.apache.org/jira/browse/LUCENE-3725
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3725.patch, LUCENE-3725.patch, LUCENE-3725.patch, 
 Perf.java


 The FSTs produced by Builder can be further shrunk if you are willing
 to spend highish transient RAM to do so... our Builder today tries
 hard not to use much RAM (and has options to tweak down the RAM usage,
 in exchange for somewhat lager FST), even when building immense FSTs.
 But for apps that can afford highish transient RAM to get a smaller
 net FST, I think we should offer packing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding

2012-01-29 Thread Christian Moen (Created) (JIRA)

Improved Kuromoji search mode segmentation/decompounding


 Key: LUCENE-3730
 URL: https://issues.apache.org/jira/browse/LUCENE-3730
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen


Kuromoji has a segmentation mode for search that uses a heuristic to promote 
additional segmentation of long candidate tokens to get a decompounding effect. 
 This heuristic has been improved.  Patch is coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-1198) confine all solrconfig.xml parsing to SolrConfig.java

2012-01-29 Thread Noble Paul (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul resolved SOLR-1198.
--

   Resolution: Fixed
Fix Version/s: (was: 3.6)
   (was: 4.0)
   1.4

This was resolved in 1.4

 confine all solrconfig.xml parsing to SolrConfig.java
 -

 Key: SOLR-1198
 URL: https://issues.apache.org/jira/browse/SOLR-1198
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, 
 SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, 
 SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, SOLR-1198.patch, 
 SOLR-1198.patch


 Currently , xpath evaluations are spread across Solr code. It would be 
 cleaner if if can do it all in one place . All the parsing can be done in 
 SolrConfig.java
 another problem with the current design is that we are not able to benefit 
 from re-use of solrconfig object across cores. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding

2012-01-29 Thread Christian Moen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen updated LUCENE-3730:
---

Attachment: LUCENE-3730_trunk.patch

 Improved Kuromoji search mode segmentation/decompounding
 

 Key: LUCENE-3730
 URL: https://issues.apache.org/jira/browse/LUCENE-3730
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: LUCENE-3730_trunk.patch


 Kuromoji has a segmentation mode for search that uses a heuristic to promote 
 additional segmentation of long candidate tokens to get a decompounding 
 effect.  This heuristic has been improved.  Patch is coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding

2012-01-29 Thread Christian Moen (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195927#comment-13195927
]

Christian Moen commented on LUCENE-3730:

Find attached a patch for {{trunk}} that improves the heuristic. Search
segmentation tests/examples are in {{search-segmentation-tests.txt}} and is
validated by {{TestSearchMode}}.

Note that both the tests and the heuristic is tuned for IPADIC. Hence, we need
to revisit this when we add support for other dictionaries/models.

I've also moved the ASF license header in {{TestExtendedMode.java}} to the
right place.

Improved Kuromoji search mode segmentation/decompounding

Key: LUCENE-3730
URL: https://issues.apache.org/jira/browse/LUCENE-3730
Project: Lucene - Java
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
Attachments: LUCENE-3730_trunk.patch

Kuromoji has a segmentation mode for search that uses a heuristic to promote
additional segmentation of long candidate tokens to get a decompounding
effect. This heuristic has been improved. Patch is coming up.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3730) Improved Kuromoji search mode segmentation/decompounding

2012-01-29 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195929#comment-13195929
 ] 

Christian Moen commented on LUCENE-3730:


If you want to try the new search mode, there's a simple Kuromoji web interface 
available on http://atilika.org/kuromoji that perhaps is useful.  After 
inputing some text and pressing enter, click normal mode to switch to search 
mode to test the various segmentation modes for the given input.

 Improved Kuromoji search mode segmentation/decompounding
 

 Key: LUCENE-3730
 URL: https://issues.apache.org/jira/browse/LUCENE-3730
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: LUCENE-3730_trunk.patch


 Kuromoji has a segmentation mode for search that uses a heuristic to promote 
 additional segmentation of long candidate tokens to get a decompounding 
 effect.  This heuristic has been improved.  Patch is coming up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3726) Default KuromojiAnalyzer to use search mode

2012-01-29 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195932#comment-13195932
 ] 

Christian Moen commented on LUCENE-3726:


I've improved the heuristic and submitted a patch to LUCENE-3730, which covers 
the issue.

We can now deal with cases such as コニカミノルタホールディングス and many others just fine.  
The former becomes コニカ  ミノルタ  ホールディングス as we'd like.

I think we should apply LUCENE-3730 before changing any defaults -- and also 
independently of changing any defaults.  I think we should also make sure that 
the default we use for Lucene is consistent with the Solr's default in 
{{schema.xml}} for {{text_ja}}.

I'll do additional tests on a Japanese corpus and provide feedback, and we can 
use this as a basis for how to follow up.  Hopefully, we'll have sufficient and 
good data to conclude on this.


 Default KuromojiAnalyzer to use search mode
 ---

 Key: LUCENE-3726
 URL: https://issues.apache.org/jira/browse/LUCENE-3726
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.6, 4.0
Reporter: Robert Muir

 Kuromoji supports an option to segment text in a way more suitable for search,
 by preventing long compound nouns as indexing terms.
 In general 'how you segment' can be important depending on the application 
 (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this 
 in chinese)
 The current algorithm punishes the cost based on some parameters 
 (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc)
 for long runs of kanji.
 Some questions (these can be separate future issues if any useful ideas come 
 out):
 * should these parameters continue to be static-final, or configurable?
 * should POS also play a role in the algorithm (can/should we refine exactly 
 what we decompound)?
 * is the Tokenizer the best place to do this, or should we do it in a 
 tokenfilter? or both?
   with a tokenfilter, one idea would be to also preserve the original 
 indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0)
   from my understanding this tends to help with noun compounds in other 
 languages, because IDF of the original term boosts 'exact' compound matches.
   but does a tokenfilter provide the segmenter enough 'context' to do this 
 properly?
 Either way, I think as a start we should turn on what we have by default: its 
 likely a very easy win.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml

2012-01-29 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195966#comment-13195966
 ] 

Christian Moen commented on SOLR-3056:
--

Robert, I've improved the search mode heuristic (see LUCENE-3730 with patch) 
and I've also provided some feedback on LUCENE-3726.  Before providing a patch 
to use search mode as our default, I'd like to do some corpus-based testing to 
make sure overall segmentation quality is where I'd like it to be.

As for this JIRA, I guess it has branched out into the following topics:

# Introduce field type for Japanese in {{schema.xml}}
# Move Kuromoji to core to make it generally available in Solr 
# Get rid of contrib altogether

There seems to be consensus to move Kuromoji to core from at least three people 
(excluding myself).

Do you prefer that we conclude on LUCENE-3726 before we follow up on getting 
Japanese support for Solr and Lucene working out-of-the-box -- or can we 
conclude on default search mode separately?

I'm happy to start JIRAs for moving Kuromoji to get Japanese support in place 
if that's the best next course of action.  Please advise.  Many thanks.

 Introduce Japanese field type in schema.xml
 ---

 Key: SOLR-3056
 URL: https://issues.apache.org/jira/browse/SOLR-3056
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen

 Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again 
 Robert, Uwe and Simon). It would be very good to get a default field type 
 defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box 
 support in Solr.
 I've been playing with the below configuration today, which I think is a 
 reasonable starting point for Japanese.  There's lot to be said about various 
 considerations necessary when searching Japanese, but perhaps a wiki page is 
 more suitable to cover the wider topic?
 In order to make the below {{text_ja}} field type work, Kuromoji itself and 
 its analyzers need to be seen by the Solr classloader.  However, these are 
 currently in contrib and I'm wondering if we should consider moving them to 
 core to make them directly available.  If there are concerns with additional 
 memory usage, etc. for non-Japanese users, we can make sure resources are 
 loaded lazily and only when needed in factory-land.
 Any thoughts?
 {code:xml}
 !-- Text field type is suitable for Japanese text using morphological 
 analysis
  NOTE: Please copy files
contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar
dist/apache-solr-analysis-extras-x.y.z.jar
  to your Solr lib directory (i.e. example/solr/lib) before before 
 starting Solr.
  (x.y.z refers to a version number)
  If you would like to optimize for precision, default operator AND with
solrQueryParser defaultOperator=AND/
  below (this file).  Use OR if you would like to optimize for recall 
 (default).
 --
 fieldType name=text_ja class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=false
   analyzer
 !-- Kuromoji Japanese morphological analyzer/tokenizer
  Use search-mode to get a noun-decompounding effect useful for search.
  Example:
関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 
 (International) 空港 (airport)
so we get a match for 空港 (airport) as we would expect from a good 
 search engine
  Valid values for mode are:
 normal: default segmentation
 search: segmentation useful for search (extra compound splitting)
   extended: search mode with unigramming of unknown words 
 (experimental)
  NOTE: Search mode improves segmentation for search at the expense of 
 part-of-speech accuracy
 --
 tokenizer class=solr.KuromojiTokenizerFactory mode=search/
 !-- Reduces inflected verbs and adjectives to their base/dectionary 
 forms (辞書形) --  
 filter class=solr.KuromojiBaseFormFilterFactory/
 !-- Optionally remove tokens with certain part-of-speeches
 filter class=solr.KuromojiPartOfSpeechStopFilterFactory 
 tags=stopTags.txt enablePositionIncrements=true/ --
 !-- Normalizes full-width romaji to half-with and half-width kana to 
 full-width (Unicode NFKC subset) --
 filter class=solr.CJKWidthFilterFactory/
 !-- Lower-case romaji characters --
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-29 Thread abhimanyu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195968#comment-13195968
 ] 

abhimanyu commented on SOLR-3060:
-

thanks for your patch but i am not able to apply this patch ,i am not using any 
svn version , please tell me how to apply this patch.

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-29 Thread abhimanyu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195969#comment-13195969
 ] 

abhimanyu commented on SOLR-3060:
-

i am using  patch -p0 -i SOLR-3060.patch --dry-run as mentioned in the docs , 
but error is coming that not a correct p option,plz tell me how to use ur patch

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-01-29 Thread Shalu Singh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195976#comment-13195976
 ] 

Shalu Singh commented on SOLR-3060:
---

i am facing the same problem. Donno how to apply the SOLR-3060.patch

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

68 matches

Mail list logo