[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

2011-07-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/

11 tests failed.
FAILED:  org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


REGRESSION:  org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers

Error Message:
Only one compound segment should exist expected:<3> but was:<4>

Stack Trace:
junit.framework.AssertionFailedError: Only one compound segment should exist 
expected:<3> but was:<4>
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
at 
org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.index.CompoundFileWriter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccess

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccessClones

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneT

RE: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

2011-07-01 Thread Uwe Schindler
Is fixed now, was a problem during cutover to 3.3 backwards tests.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
> Sent: Friday, July 01, 2011 9:17 AM
> To: dev@lucene.apache.org
> Subject: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure
> 
> Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/
> 
> 11 tests failed.
> FAILED:
> org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge
> 
> Error Message:
> Forked Java VM exited abnormally. Please note the time in the report does
> not reflect the time until the VM exit.
> 
> Stack Trace:
> junit.framework.AssertionFailedError: Forked Java VM exited abnormally.
> Please note the time in the report does not reflect the time until the VM
> exit.
>   at java.lang.Thread.run(Thread.java:636)
> 
> 
> REGRESSION:
> org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers
> 
> Error Message:
> Only one compound segment should exist expected:<3> but was:<4>
> 
> Stack Trace:
> junit.framework.AssertionFailedError: Only one compound segment should
> exist expected:<3> but was:<4>
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1195)
>   at
> org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddInd
> exes.java:952)
> 
> 
> REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile
> 
> Error Message:
> org/apache/lucene/index/CompoundFileWriter
> 
> Stack Trace:
> java.lang.NoClassDefFoundError:
> org/apache/lucene/index/CompoundFileWriter
>   at
> org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFil
> e.java:203)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1195)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.lucene.index.CompoundFileWriter
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
> 
> 
> REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles
> 
> Error Message:
> org/apache/lucene/index/CompoundFileWriter
> 
> Stack Trace:
> java.lang.NoClassDefFoundError:
> org/apache/lucene/index/CompoundFileWriter
>   at
> org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFil
> e.java:226)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1195)
> 
> 
> REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles
> 
> Error Message:
> org/apache/lucene/index/CompoundFileWriter
> 
> Stack Trace:
> java.lang.NoClassDefFoundError:
> org/apache/lucene/index/CompoundFileWriter
>   at
> org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoun
> dFile.java:276)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1195)
> 
> 
> REGRESSION:
> org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing
> 
> Error Message:
> org/apache/lucene/index/CompoundFileWriter
> 
> Stack Trace:
> java.lang.NoClassDefFoundError:
> org/apache/lucene/index/CompoundFileWriter
>   at
> org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav
> a:305)
>   at
> org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(Test
> CompoundFile.java:371)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1195)
> 
> 
> REGRESSION:
> org.apache.lucene.index.TestCompoundFile.testRandomAccess
> 
> Error Message:
> org/apache/lucene/index/CompoundFileWriter
> 
> Stack Trace:
> java.lang.NoClassDefFoundError:
> org/apache/lucene/index/CompoundFileWriter
>   at
> org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav
> a:305)
>   at
> org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompo
> undFile.java:428)
>   at
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
> eneTestCase.java:1277)
>   at
> org.apache.lucene.util.LuceneTestCase$

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing

2011-07-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9222/

11 tests failed.
FAILED:  org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


REGRESSION:  org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers

Error Message:
Only one compound segment should exist expected:<3> but was:<4>

Stack Trace:
junit.framework.AssertionFailedError: Only one compound segment should exist 
expected:<3> but was:<4>
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
at 
org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.index.CompoundFileWriter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccess

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccessClones

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneT

[jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Bernd Fehling (JIRA)
use of FST for SynonymsFilterFactory and synonyms.txt
-

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Priority: Minor


Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
This can generate huge maps because of the permutations for synonyms.

Now where FST (finite state transducer) is introduced to lucene this could also 
be used for synonyms.
A tool can compile the synoynms.txt file to a binary automaton file which can 
then be used
with SynoynmsFilterFactory.

Advantage:
- faster start of solr, no need to generate SynonymsMap
- faster lookup
- memory saving


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-2628:
-

Assignee: Dawid Weiss

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread yazhini.k vini
I don't need information about solr projects.

*
*

**yazhini**


[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058241#comment-13058241
 ] 

Dawid Weiss commented on SOLR-2628:
---

I've talked about it a little bit with Bernd and indeed, it seems possible to 
reduce the size of in-memory data structures by an order of magnitude (or even 
two orders of magnitude, we shall see). I'm on vacation for the next week and 
on a business trip for another one after that, but I'll be on it once I come 
back home.

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss
> I don't need information about solr projects.

I'm afraid you don't have much choice because Solr and Lucene coexist
in the same mailing list/ Jira space.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing

2011-07-01 Thread yazhini.k vini
I don't need message from your mail sir.


Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

2011-07-01 Thread yazhini.k vini
I don't need message from your mail sir.


[jira] [Created] (SOLR-2629) warning about org.apache.solr.request.SolrQueryResponse is deprecated

2011-07-01 Thread Bernd Fehling (JIRA)
warning about org.apache.solr.request.SolrQueryResponse is deprecated
-

 Key: SOLR-2629
 URL: https://issues.apache.org/jira/browse/SOLR-2629
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 3.2, 3.1, 3.3
 Environment: Linux
Reporter: Bernd Fehling
Priority: Trivial


The web admin interface uses the deprecated method 
org.apache.solr.request.SolrQueryResponse from within files:
- solr/src/webapp/web/admin/replication/header.jsp
- solr/src/webapp/web/admin/ping.jsp

That should be changed to use org.apache.solr.response.SolrQueryResponse



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

2011-07-01 Thread Uwe Schindler
Hi,

 

It would be nice, if you could stop sending those eMails. If they are
generated by an auto-mailer, please stop it.

If this doesn't stop, the moderator will remove your address from the
mailing list.

 

Uwe

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de  

eMail: u...@thetaphi.de

 

From: yazhini.k vini [mailto:yazhini@gmail.com] 
Sent: Friday, July 01, 2011 9:49 AM
To: dev@lucene.apache.org
Subject: Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

 

I don't need message from your mail sir.



[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-01 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058461#comment-13058461
 ] 

Shalin Shekhar Mangar commented on SOLR-2623:
-

Hoss, I wish there was a way to do just that. I looked and looked but couldn't 
find it. The JMX API is really screwed up. Once you send in a MBean, apparently 
you can't get it out again. I'd be interested if anyone knew of a way to do 
that.

> Solr JMX MBeans do not survive core reloads
> ---
>
> Key: SOLR-2623
> URL: https://issues.apache.org/jira/browse/SOLR-2623
> Project: Solr
>  Issue Type: Bug
>  Components: multicore
>Affects Versions: 1.4, 1.4.1, 3.1, 3.2
>Reporter: Alexey Serba
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch
>
>
> Solr JMX MBeans do not survive core reloads
> {noformat:title="Steps to reproduce"}
> sh> cd example
> sh> vi multicore/core0/conf/solrconfig.xml # enable jmx
> sh> java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
> start.jar
> sh> echo 'open 8842 # 8842 is java pid
> > domain solr/core0
> > beans
> > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
> 
> solr/core0:id=core0,type=core
> solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
> solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
> solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
> solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
> ...
> solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
> solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
> sh> curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0'
> sh> echo 'open 8842 # 8842 is java pid
> > domain solr/core0
> > beans
> > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
> # there's only one bean left after Solr core reload
> solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
> main
> {noformat}
> The root cause of this is Solr core reload behavior:
> # create new core (which overwrites existing registered MBeans)
> # register new core and close old one (we remove/un-register MBeans on 
> oldCore.close)
> The correct sequence is:
> # unregister MBeans from old core
> # create and register new core
> # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058468#comment-13058468
 ] 

Michael McCandless commented on SOLR-2628:
--

Dawid, have a look at LUCENE-3233 -- we have a [very very rough] start at this.

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-2628.
---

Resolution: Duplicate

Duplicate of LUCENE-3233

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058470#comment-13058470
 ] 

Dawid Weiss commented on SOLR-2628:
---

Yep, this is a duplicate. Thanks Mike. Like I said -- I won't be able to work 
on this for the next two weeks (I also have that FST refactoring opened up in 
the background... it's progressing slowly), but it's definitely a low-hanging 
fruit to pick because it shouldn't be very difficult and the gains huge.

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058475#comment-13058475
 ] 

Michael McCandless commented on SOLR-2628:
--

I think the reduction of RAM should be huge but lookup speed might be slower 
(ie the usual tradeoff of FST), since we are going char by char in the FST.  If 
we go word-by-word (ie FST's labels are word ords and we separately resolve 
word -> ord via "normal" hash lookup) then that might be a good middle 
ground... but this is all speculation for now!


> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058476#comment-13058476
 ] 

Michael McCandless commented on LUCENE-3272:


Big +1!  We've needed query parsing factored out for a lng time.  And 
cutting tests over to a new MockQP, and then simply moving (but not merging) 
all QPs together to a module, sounds like great first steps.

Note that the FieldType work (at least as currently planned/targetted) isn't a 
schema -- it's really just a nicer API for working with documents.  Ie, nothing 
is persisted, nothing checks that 2 docs have the fields / types, etc.

Still, it would be great to pull Solr's QP in and somehow abstract the parts 
that require access to Solr's schema.

> Consolidate Lucene's QueryParsers into a module
> ---
>
> Key: LUCENE-3272
> URL: https://issues.apache.org/jira/browse/LUCENE-3272
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/queryparser
>Reporter: Chris Male
>
> Lucene has a lot of QueryParsers and we should have them all in a single 
> consistent place.  
> The following are QueryParsers I can find that warrant moving to the new 
> module:
> - Lucene Core's QueryParser
> - AnalyzingQueryParser
> - ComplexPhraseQueryParser
> - ExtendableQueryParser
> - Surround's QueryParser
> - PrecedenceQueryParser
> - StandardQueryParser
> - XML-Query-Parser's CoreParser
> All seem to do a good job at their kind of parsing with extensive tests.
> One challenge of consolidating these is that many tests use Lucene Core's 
> QueryParser.  One option is to just replicate this class in src/test and call 
> it TestingQueryParser.  Another option is to convert all tests over to 
> programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058489#comment-13058489
 ] 

Dawid Weiss commented on SOLR-2628:
---

Yes, this may be the case. It'd need to be investigated because storing words 
in a hashtable will also bump memory requirements, whereas an FST can at least 
reuse some prefixes and suffixes.

> use of FST for SynonymsFilterFactory and synonyms.txt
> -
>
> Key: SOLR-2628
> URL: https://issues.apache.org/jira/browse/SOLR-2628
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis
>Affects Versions: 3.4, 4.0
> Environment: Linux
>Reporter: Bernd Fehling
>Assignee: Dawid Weiss
>Priority: Minor
>  Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: revisit naming for grouping/join?

2011-07-01 Thread Michael McCandless
I think joining and grouping are two different functions, and we
should keep different modules for them...

On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir  wrote:
> Hi,
>
> when looking at just a very quick glance at some of the newer
> grouping/join features, I found myself a little confused about what is
> exactly what, and I think users might too.

They are confusing!

> I discussed some of this with hossman, and it only seemed to make me
> even more totally confused about:
> * difference between field collapsing and grouping

I like the name grouping better here: I think field collapsing
undersells (it's only one specific way to use grouping).  EG, grouping
w/o collapsing is useful (eg, Best Buy grouping hits by product
category and showing the top 5 in each).

> * difference between nested documents and the index-time join

Similarly I think nested docs undersells index-time join: you can
join (either during indexing or during searching) in many different
ways, and nested docs is just one use case.

EG, maybe your docs are doctors but during indexing you join to a city
table with facts about that city (each doctor's office is in a
specific city) and then you want to run queries like "city's avg
annual temp > 60 and doctor has good bedside manner" or something.

> * difference between index-time-join/nested documents and single-pass
> index-time grouping. Is the former only a more general case of the
> latter?

Grouping is purely a presentation concern -- you are not altering
which docs hit; you are simply changing how you pick which hits to
display ("top N by group").  So we only have collectors here.

The "generic" (requires 2 passes) collectors can group on anything at
search time; the "doc block" collector requires that you indexed all
docs in each group as a block.

Join is both about restricting matches and also presentation of hits,
because your query needs to match fields from different [logical]
tables (so, the module has a Query and a Collector).  When you get the
results back, you may or may not be interested in retaining the table
structure in your result set (ie, you may not have selected fields
from the child table).

Similarly, "generic" joining (in Solr/ElasticSearch today but I'd like
to factor into the join module) can do any join at search time, while
the "doc block" collector requires that you did the necessary join(s)
during indexing.

> * difference between the above joinish capabilities and solr's join
> impl... other than the single-pass/index-time limitation (which is
> really an implementation detail), I'm talking about use cases.

Solr's/ElasticSearch's join is more general because you can join
anything at search time (even, across 2 different indexes), vs doc
block join where you must pick which joins you will ever want to use
and then build the index accordingly.

You can also mix the two.  Maybe you do certain joins while indexing,
but then at search time you do other joins "generically".  That's
fine.  (Same is true for grouping).

> I think its especially interesting since the join module depends on
> the grouping module.

The join module does currently depend on the grouping module, but for
a silly reason: just for the TopGroups, to represent the returned
hits.  We could move TopGroups/GroupDocs into common (thus justifying
its generic name!)?  Then both join and grouping modules depend on
common.

Really TopGroups is just a TopDocs that allows some recursion (ie,
each hit may in turn be another TopDocs).  But TopGroups is limited
now to only depth 2 recursion... we need to fix this for nested
grouping.  Really we just need a recursive TopDocs here

> So I am curious if we should:
> * add docs (maybe with simple examples) in the package.html or
> otherwise that differentiate what these guys are, or at least agree on
> some consistent terminology and define it somewhere? I feel like
> people have explained to me the differences in all these things
> before, but then its easy to forget.

Well, each module's package.html has a start here, but I agree we
should do more.

I think what would be best is a smallish but feature complete demo, ie
pull together some easy-to-understand sample content and the build a
small demo app around it.  We could then show how to use grouping for
field collapsing (and for other use cases), joining for nested docs
(and for other use cases), etc.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Upayavira (JIRA)
XsltUpdateRequestHandler


 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Priority: Minor
 Fix For: 4.0
 Attachments: xslt-update-handler.patch

An update request handler that can accept a tr param, allowing the indexing of 
any XML content that is passed to solr, so long as there is an XSLT stylesheet 
in solr/conf/xslt that can transform it to the  format.

Could be used, for example, to allow Solr to ingest docbook directly, without 
any preprocessing.
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Upayavira (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Upayavira updated SOLR-2630:


Attachment: xslt-update-handler.patch

Patch for XsltUpdateRequestHandler, along with a test case for it

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Issues with Grouping

2011-07-01 Thread Yonik Seeley
On Thu, Jun 30, 2011 at 11:58 PM, Bill Bell  wrote:
> I meant FC insanity. It does not appear to be an NPE.

That's natural, and not a bug.  Grouping always uses per-segment field
cache entries, where faceting sometimes uses top level field caches.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058540#comment-13058540
 ] 

Uwe Schindler commented on SOLR-2630:
-

XML is binary data, so you should not convert it to Strings. Ideally the 
already transformed DOM tree or SAX stream would directly be passed to the 
importer. I know, this is not easily possible, so the most correct way would be 
to pass the binary byte[] dierectly and reparse.

I will try to investigate to directly pass the SAX events / XSL DOM tree 
around, which is possible, as transformer API can also directly pipe to StAX, 
used by the underlying XMLImporter.

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058541#comment-13058541
 ] 

Uwe Schindler commented on SOLR-2630:
-

Also you miss to pass the content type charset to the StreamSource. I will post 
a improved patch fixing both problems soon.

Thanks for the patch!

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Michael McCandless
On Fri, Jul 1, 2011 at 1:47 AM, Simon Willnauer
 wrote:

> Mike I think we should do that but the real issue here is what if
> somebody comes up with any arbitrary method in the future claiming its
> slow we back out and use the "we think safe way" what if it is
> actually the other way around and copyOf is optimized by new VMs and
> the copyarray is slightly slower.

I think we take it case by case?  We do need to be careful when using
low level ops in Java.

In this specific case, we have been almost burned already by
Arrays.copyOf, and never by System.arraycopy (that I'm aware of), so
we should not cutover until we have some confidence that burning will
not occur.  Other cases will be different, but this one has enough
history that the bias seems clear.

> I just want
> to prevent the "we should not do this" it might be a problem in the
> big picture while the microbenchmark doesn't show a difference. At
> some point we have to rely on the JVM.

I agree, but generally the burden of proof is on the "new one".  Just
because we can use Java 1.6 code now doesn't mean we should blindly
cutover to new stuff.

System.arraycopy is tried & true and we've never hit a perf issue with
it, in my memory.

> Even if we benchmark on all OS with various JVMs we can't prevent jvm
> bug based perf hits.

Right, nothing is perfect here; this is just about mitigating risk.
We were lucky to have tracked down this slowdown down last time, I
think only because Robert was using a -client JVM at the time?

> While there was no bug reported for primitives
> here we don't have to be afraid of it either. I don't think its saver
> to use arraycopy at all its just a level of indirection safer but
> makes development more of a pain IMO.

Yes it's a slight hassle (2 extra lines), but if this mitigates risk,
it's worth it.

That said, if we can convince ourselves that in the primitives only
case, Arrays.copyOf has very little risk, then I'm OK w/ using it for
those cases.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-2630:
---

Assignee: Uwe Schindler

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Michael McCandless
On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler  wrote:
> Hi,
>
> I don't understand the whole discussion here, so please compare these two 
> implementations and tell me which one is faster. Please don't hurt me, if you 
> don't want to see src.jar code from OpenJDK Java6 - just delete this mail if 
> you don’t want to (the code here is licensed under GPL):

This is the source code for a specific version of one specific Java
impl.  If we knew all Java impls simply implemented the primitive case
using System.arraycopy (admittedly it's hard to imagine that they
wouldn't!) then we are fine.

> This is our implementation, simon replaced and Robert reverted 
> (UnsafeByteArrayOutputStream):
>
>  private void grow(int newLength) {
>    // It actually should be: (Java 1.7, when its intrinsic on all machines)
>    // buffer = Arrays.copyOf(buffer, newLength);
>    byte[] newBuffer = new byte[newLength];
>    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
>    buffer = newBuffer;
>  }
>
> So please look at the code, where is a difference that could slow down, 
> except the Math.min() call which is an intrinsic in almost every JDK on earth?

Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
sure about other cases...

> The problem we are talking about here is only about the generic Object[] 
> copyOf method and also affects e.g. *all* Collection.toArray() methods - they 
> all use this code, so whenever you use ArrayList.toArray() or similar, the 
> slow code is executed. This is why we replaced Collections.sort() by 
> CollectionUtil.sort, that does no array copy. Simon & me were not willing to 
> replace the reallocations in FST code (Mike you remember, we reverted that on 
> your GIT repo when we did perf tests) and other parts in Lucene (there are 
> only few of them). The idea was only to replace primitive type code to make 
> it easier readable. And with later JDK code it could even get faster (not 
> slower), if Oracle starts to add intrinsics for those new methods (and that’s 
> Dawid and mine reason to change to copyTo for primitive types). In general, 
> if you use Java SDK methods, that are as fast as ours, they always have a 
> chance to get faster in later JDKs. So we should always prefer Java SDK 
> methods, unless they are slower because their default impl is too generic or 
> has too much safety checks or uses reflection.

OK I'm convinced (I think!) that for primitive types only, let's use
Arrays.copyOf!

> To come back to UnsafeByteArrayOutputStream:
>
> I would change the whole code, as I don’t like the allocation strategy in it 
> (it's exponential, on every grow it doubles its size). We should change that 
> to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar 
> allocation strategy like in trunk. Then we can discuss about this problem 
> again when Simon & me wants to change ArrayUtils.grow methods to use 
> Arrays.copyOf... *g* [just joking, I will never ask again, because this 
> discussion here is endless and does not bring us forward].

Well, it sounds like for primitive types, we can cutover
ArrayUtils.grow methods.  Then we can look @ the nightly bench the
next day ;)

But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
(almost) a dup of ByteArrayDataOutput?

> The other thing I don’t like in the new faceting module is duplication of 
> vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's 
> new In/OutStream wrapper for DataOutput everywhere. This would be much more 
> streamlined with all the code we currently have. Then we can encode the 
> payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, 
> wrapped with a OutputStreamDataOutput wrapper? Or maybe add a 
> ByteArrayDataOutput class.

That sounds good!

Uwe can you commit TODOs to the code w/ these ideas?

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058548#comment-13058548
 ] 

Michael McCandless commented on LUCENE-2309:


Great!

This will overlap w/ the field type work (we have branch for this now), where 
we already have decoupled indexer from concrete Field/Document impls, by adding 
a minimal IndexableField.

I think this issue should further that, ie pare back IndexableField so that 
there's only a getTokenStream for indexing (ie indexer will no longer try for 
String then Reader then tokenStream), and Analyzer must move to the FieldType 
and not be passed to IndexWriterConfig.  Multi-valued fields will be tricky, 
since IW now asks analyzer for the gaps...

> Fully decouple IndexWriter from analyzers
> -
>
> Key: LUCENE-2309
> URL: https://issues.apache.org/jira/browse/LUCENE-2309
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
>
> IndexWriter only needs an AttributeSource to do indexing.
> Yet, today, it interacts with Field instances, holds a private
> analyzers, invokes analyzer.reusableTokenStream, has to deal with a
> wide variety (it's not analyzed; it is analyzed but it's a Reader,
> String; it's pre-analyzed).
> I'd like to have IW only interact with attr sources that already
> arrived with the fields.  This would be a powerful decoupling -- it
> means others are free to make their own attr sources.
> They need not even use any of Lucene's analysis impls; eg they can
> integrate to other things like [OpenPipeline|http://www.openpipeline.org].
> Or make something completely custom.
> LUCENE-2302 is already a big step towards this: it makes IW agnostic
> about which attr is "the term", and only requires that it provide a
> BytesRef (for flex).
> Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
> FieldType knows the analyzer to use, then we could simply create a
> getAttrSource() method (say) on it and move all the logic IW has today
> onto there.  (We'd still need existing IW code for back-compat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2883) Consolidate Solr & Lucene FunctionQuery into modules

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058549#comment-13058549
 ] 

Michael McCandless commented on LUCENE-2883:


+1, this is great Chris!

> Consolidate Solr  & Lucene FunctionQuery into modules
> -
>
> Key: LUCENE-2883
> URL: https://issues.apache.org/jira/browse/LUCENE-2883
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/search
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Chris Male
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: LUCENE-2883.patch
>
>
> Spin-off from the [dev list | 
> http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: revisit naming for grouping/join?

2011-07-01 Thread mark harwood
>> I think what would be best is a smallish but feature complete demo,

For the nested stuff I had a reasonable demo on LUCENE-2454 that was based 
around resumes - that use case has the one-to-many characteristics that lends 
itself to nested e.g. a person has many different qualifications and records of 
employment.
This scenario was illustrated 
here: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

I also had the "book search" type scenario where a book has many sections and 
for the purposes of efficient highlighting/summarisation  these sections were 
treated as child docs which could be read quickly (rather than highlighting a 
whole book)

I'm not sure what the "parent" was in your doctor and cities example, Mike. If 
a 
doctor is in only one city then there is no point making city a child doc as 
the 
one city info can happily be combined with the doctor info into a single 
document with no conflict (doctors have different properties to cities).
If the city is the parent with many child doctor docs that makes more sense but 
feels like a less likely use case e.g. "find me a city with doctor x and a 
different doctor y"
Searching for a person with excellent java and prefrerably good lucene skills 
feels like a more real-world example.

It feels like documenting some of the trade-offs behind index design choices is 
useful too e.g. nesting is not too great for very volatile content with 
constantly changing children while search-time join is more costly in RAM and 
2-pass processing

Cheers
Mark



- Original Message 
From: Michael McCandless 
To: dev@lucene.apache.org
Sent: Fri, 1 July, 2011 13:51:04
Subject: Re: revisit naming for grouping/join?

I think joining and grouping are two different functions, and we
should keep different modules for them...

On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir  wrote:
> Hi,
>
> when looking at just a very quick glance at some of the newer
> grouping/join features, I found myself a little confused about what is
> exactly what, and I think users might too.

They are confusing!

> I discussed some of this with hossman, and it only seemed to make me
> even more totally confused about:
> * difference between field collapsing and grouping

I like the name grouping better here: I think field collapsing
undersells (it's only one specific way to use grouping).  EG, grouping
w/o collapsing is useful (eg, Best Buy grouping hits by product
category and showing the top 5 in each).

> * difference between nested documents and the index-time join

Similarly I think nested docs undersells index-time join: you can
join (either during indexing or during searching) in many different
ways, and nested docs is just one use case.

EG, maybe your docs are doctors but during indexing you join to a city
table with facts about that city (each doctor's office is in a
specific city) and then you want to run queries like "city's avg
annual temp > 60 and doctor has good bedside manner" or something.

> * difference between index-time-join/nested documents and single-pass
> index-time grouping. Is the former only a more general case of the
> latter?

Grouping is purely a presentation concern -- you are not altering
which docs hit; you are simply changing how you pick which hits to
display ("top N by group").  So we only have collectors here.

The "generic" (requires 2 passes) collectors can group on anything at
search time; the "doc block" collector requires that you indexed all
docs in each group as a block.

Join is both about restricting matches and also presentation of hits,
because your query needs to match fields from different [logical]
tables (so, the module has a Query and a Collector).  When you get the
results back, you may or may not be interested in retaining the table
structure in your result set (ie, you may not have selected fields
from the child table).

Similarly, "generic" joining (in Solr/ElasticSearch today but I'd like
to factor into the join module) can do any join at search time, while
the "doc block" collector requires that you did the necessary join(s)
during indexing.

> * difference between the above joinish capabilities and solr's join
> impl... other than the single-pass/index-time limitation (which is
> really an implementation detail), I'm talking about use cases.

Solr's/ElasticSearch's join is more general because you can join
anything at search time (even, across 2 different indexes), vs doc
block join where you must pick which joins you will ever want to use
and then build the index accordingly.

You can also mix the two.  Maybe you do certain joins while indexing,
but then at search time you do other joins "generically".  That's
fine.  (Same is true for grouping).

> I think its especially interesting since the join module depends on
> the grouping module.

The join module does currently depend on the grouping module, but for
a silly reason: just for the TopGroups, to represent the returned
hits.  We could mov

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058552#comment-13058552
 ] 

Upayavira commented on SOLR-2630:
-

Great! I was sure I'd missed stuff. Happy to improve stuff here too (e.g. port 
to 3.x).

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: revisit naming for grouping/join?

2011-07-01 Thread Robert Muir
On Fri, Jul 1, 2011 at 8:51 AM, Michael McCandless
 wrote:

> The join module does currently depend on the grouping module, but for
> a silly reason: just for the TopGroups, to represent the returned
> hits.  We could move TopGroups/GroupDocs into common (thus justifying
> its generic name!)?  Then both join and grouping modules depend on
> common.

Just a suggestion: maybe they belong in the lucene core? And maybe the
stuff in the common module belongs in lucene core's util package?

I guess I'm suggesting we try to keep our modules as flat as possible,
with as little dependencies as possible. I think we really already
have a 'common' module, thats the lucene core. If multiple modules end
up relying upon the same functionality, especially if its something
simple like an abstract class (Analyzer) or a utility thing (these
mutable integers, etc), then thats a good sign it belongs in core
apis.

I think we really need to try to nuke all these dependencies between
modules: its great to add them as a way to get refactoring started,
but ultimately we should try to clean up: because we don't want a
complex 'graph' of dependencies but instead something dead-simple. I
made a total mess with the analyzers module at first, i think
everything depended on it! but now we have nuked almost all
dependencies on this thing, except for where it makes sense to have
that concrete dependency (benchmark, demo, solr).

>
> I think what would be best is a smallish but feature complete demo, ie
> pull together some easy-to-understand sample content and the build a
> small demo app around it.  We could then show how to use grouping for
> field collapsing (and for other use cases), joining for nested docs
> (and for other use cases), etc.
>

For the same reason listed above, I think we should take our
contrib/demo and consolidate 'examples' across various places into
this demo module. The reason is:
* examples typically depend upon 'concrete' stuff, but in general core
stuff should work around interfaces/abstract classes: e.g. the
faceting module has an analyzers dependency only because of its
examples.
* examples might want to integrate modules, e.g. an example of how to
integrate faceting and grouping or something like that.
* examples are important: i think if the same question comes up on the
user list often, we should consider adding an example.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #168: POMs out of sync

2011-07-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/168/

No tests ran.

Build Log (for compile errors):
[...truncated 7442 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2630:


Attachment: xslt-update-handler.patch

Here improved patch. This impl does not internally serialize the XML again to a 
stream and read it using StAX; this one uses the XSL ResultTreeFragment (RTF) 
which is always built as a DOM tree by XSL transformers and feeds it to StAX. 
So we dont need any stupid serialize/deserialize step inbetween. This patch 
also respects the content-type parameter of the input like XMLLoader. The 
intermediate buffering is needed because we need to change from push to pull 
APIs.

This patch also fixes a small issue in XSLTResponseWriter, as it also misses to 
correctly log transformation warn/error events to slf4j.

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Michael Herndon
@Rory, @All,

The only tickets I currently have for those is LUCENE-419, LUCENE-418

418, I should be able to push into the 2.9.4g branch tonight.419 is a
long term goal and not as important as getting the tests fixed, of have the
tests broken down into what is actually a unit test, functional test, perf
or long running test. I can get into more why it needs to be done.

I'll also need to make document the what build script currently does on the
wiki & and make a few notes about testing, like using the RAMDirectory,
etc.

Things that need to get done or even be discussed.
 * There needs to be a running list of things to do/not to do with testing.
I don't know if this goes in a jira or do we keep a running list on the wiki
or site for people to pick up and  help with.
 * Tests need to run on mono and not Fail (there is a good deal of failing
tests on mono, mostly due to the temp directory have the C:\ in the path).
 * Assert.Throw() needs to be used instead of Try/Catch
Assert.Fail.  **
 * File & Path combines to the temp directory need helper methods,
 * e,g, having this in a hundred places is bad   new
System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get("tempDir",
""), "testIndex"));
 * We should still be testing deprecated methods, but we need to use #pragma
warning disable/enable 0618  for testing those. otherwise compiler warnings
are too numerous to be anywhere near helpful.
 * We should only be using deprecated methods in places where they are being
explicitly tested, other tests that need that functionality in order to
validate those tests should be re factored to use methods that are not
deprecated.
 * Identify code that could be abstracted into test utility classes.
 * Infrastructure Validation tests need to be made, anything that seems like
infrastructure.  e.g. does the temp directory exist, does the folders that
the tests use inside the temp directory exist, can we read/write to those
folders. (if a ton of tests fail due to the file system, we should be able
to point out that it was due to permissions or missing folders, files,
etc).
 * Identify what classes need an interface, abstract class or inherited in
order to create testing mocks. (once those classes are created, they should
be documented in the wiki).



** Asset.Throws needs to replace stuff like the following. We should also be
checking the messages for exceptions and make sure they make sense and can
help users fix isses if the exceptions are aimed at the library users.
try
{
d = DateTools.StringToDate("97"); // no date
Assert.Fail();
}
catch (System.FormatException e)
{
/* expected exception */
}

On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire  wrote:

> So, veering towards action - are there concrete tasks written up anywhere
> for the unit tests? If a poor schlep like me wanted to dig in and start to
> improve them, where would I get the understanding of what is good and what
> needs help?
>
> -r
>
> On Thu, Jun 30, 2011 at 3:29 PM, Digy  wrote:
>
> > I can not say I like this approach, but till we find an automated
> way(with
> > good results), it seems to be the only way we can use.
> >
> > DIGY
> >
> > -Original Message-
> > From: Troy Howard [mailto:thowar...@gmail.com]
> > Sent: Friday, July 01, 2011 12:43 AM
> > To: lucene-net-...@lucene.apache.org
> > Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
> >
> > Scott -
> >
> > The idea of the automated port is still worth doing. Perhaps it makes
> sense
> > for someone more passionate about the line-by-line idea to do that work?
> >
> > I would say, focus on what makes sense to you. Being productive,
> regardless
> > of the specific direction, is what will be most valuable. Once you start,
> > others will join and momentum will build. That is how these things work.
> >
> > I like DIGY's approach too, but the problem with it is that it is a
> > never-ending manual task. The theory behind the automated port is that it
> > may reduce the manual work. It is complicated, but once it's built and
> > works, it will save a lot of future development hours. If it's built in a
> > sufficiently general manner, it could be useful for other project like
> > Lucene.Net that want to automate a port from Java to C#.
> >
> > It might make sense for that to be a separate project from Lucene.Net
> > though.
> >
> > -T
> >
> >
> > On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard  > >wrote:
> >
> > > Ok I think I asked the wrong question.  I am trying to figure out where
> > to
> > > put my time.  I was thinking about working on the automated porting
> > system,
> > > but when I saw the response to the .NET 4.0 discussions I started to
> > > question if that is the right direction.  The community seemed to be
> more
> > > interested in the .NET features.
> > >
> > > The complexity of the automated tool is going to become very high and
> > will
> > > probably end up with a line-for-line style port.  So I keep asking my
> > self
> > > is the automated

[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2630:


Affects Version/s: 3.3
Fix Version/s: 3.4

Merging to 3.x should be simple, too!

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Michael Herndon
* need to document what the build script does.  whut grammerz?

On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon <
mhern...@wickedsoftware.net> wrote:

> @Rory, @All,
>
> The only tickets I currently have for those is LUCENE-419, LUCENE-418
>
> 418, I should be able to push into the 2.9.4g branch tonight.419 is a
> long term goal and not as important as getting the tests fixed, of have the
> tests broken down into what is actually a unit test, functional test, perf
> or long running test. I can get into more why it needs to be done.
>
> I'll also need to make document the what build script currently does on the
> wiki & and make a few notes about testing, like using the RAMDirectory,
> etc.
>
> Things that need to get done or even be discussed.
>  * There needs to be a running list of things to do/not to do with testing.
> I don't know if this goes in a jira or do we keep a running list on the wiki
> or site for people to pick up and  help with.
>  * Tests need to run on mono and not Fail (there is a good deal of failing
> tests on mono, mostly due to the temp directory have the C:\ in the path).
>  * Assert.Throw() needs to be used instead of Try/Catch
> Assert.Fail.  **
>  * File & Path combines to the temp directory need helper methods,
>  * e,g, having this in a hundred places is bad   new
> System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get("tempDir",
> ""), "testIndex"));
>  * We should still be testing deprecated methods, but we need to use #pragma
> warning disable/enable 0618  for testing those. otherwise compiler warnings
> are too numerous to be anywhere near helpful.
>  * We should only be using deprecated methods in places where they are
> being explicitly tested, other tests that need that functionality in order
> to validate those tests should be re factored to use methods that are not
> deprecated.
>  * Identify code that could be abstracted into test utility classes.
>  * Infrastructure Validation tests need to be made, anything that seems
> like infrastructure.  e.g. does the temp directory exist, does the folders
> that the tests use inside the temp directory exist, can we read/write to
> those folders. (if a ton of tests fail due to the file system, we should be
> able to point out that it was due to permissions or missing folders, files,
> etc).
>  * Identify what classes need an interface, abstract class or inherited in
> order to create testing mocks. (once those classes are created, they should
> be documented in the wiki).
>
>
>
> ** Asset.Throws needs to replace stuff like the following. We should also
> be checking the messages for exceptions and make sure they make sense and
> can help users fix isses if the exceptions are aimed at the library users.
> try
> {
> d = DateTools.StringToDate("97"); // no date
>  Assert.Fail();
> }
> catch (System.FormatException e)
>  {
> /* expected exception */
> }
>
> On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire wrote:
>
>> So, veering towards action - are there concrete tasks written up anywhere
>> for the unit tests? If a poor schlep like me wanted to dig in and start to
>> improve them, where would I get the understanding of what is good and what
>> needs help?
>>
>> -r
>>
>> On Thu, Jun 30, 2011 at 3:29 PM, Digy  wrote:
>>
>> > I can not say I like this approach, but till we find an automated
>> way(with
>> > good results), it seems to be the only way we can use.
>> >
>> > DIGY
>> >
>> > -Original Message-
>> > From: Troy Howard [mailto:thowar...@gmail.com]
>> > Sent: Friday, July 01, 2011 12:43 AM
>> > To: lucene-net-...@lucene.apache.org
>> > Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
>> >
>> > Scott -
>> >
>> > The idea of the automated port is still worth doing. Perhaps it makes
>> sense
>> > for someone more passionate about the line-by-line idea to do that work?
>> >
>> > I would say, focus on what makes sense to you. Being productive,
>> regardless
>> > of the specific direction, is what will be most valuable. Once you
>> start,
>> > others will join and momentum will build. That is how these things work.
>> >
>> > I like DIGY's approach too, but the problem with it is that it is a
>> > never-ending manual task. The theory behind the automated port is that
>> it
>> > may reduce the manual work. It is complicated, but once it's built and
>> > works, it will save a lot of future development hours. If it's built in
>> a
>> > sufficiently general manner, it could be useful for other project like
>> > Lucene.Net that want to automate a port from Java to C#.
>> >
>> > It might make sense for that to be a separate project from Lucene.Net
>> > though.
>> >
>> > -T
>> >
>> >
>> > On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard > > >wrote:
>> >
>> > > Ok I think I asked the wrong question.  I am trying to figure out
>> where
>> > to
>> > > put my time.  I was thinking about working on the automated porting
>> > system,
>> > > but when I saw the response to the .NET 4.0 discussions I s

RE: revisit naming for grouping/join?

2011-07-01 Thread Steven A Rowe
On 7/1/2011 at 10:02 AM, Robert Muir wrote:
> [...] I think we should take our contrib/demo and consolidate 'examples'
> across various places into this demo module. The reason is:
>
> * examples typically depend upon 'concrete' stuff, but in general core
>   stuff should work around interfaces/abstract classes: e.g. the faceting
>   module has an analyzers dependency only because of its examples.
>
> * examples might want to integrate modules, e.g. an example of how to
>   integrate faceting and grouping or something like that.
>
> * examples are important: i think if the same question comes up on the
>   user list often, we should consider adding an example.

+1


Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Shai Erera
About the encoders package - there are several encoders there besides
VInt, so I wouldn't dispose of it so quickly. That said, I think we
should definitely explore consolidating VInt with the core classes,
and maybe write an encoder which delegate to them.

Or, come up w/ a different approach for allowing to plug in different
Encoders. I don't rule out anything, as long as we preserve
functionality and capabilities.

Shai

On Friday, July 1, 2011, Michael McCandless  wrote:
> On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler  wrote:
>> Hi,
>>
>> I don't understand the whole discussion here, so please compare these two 
>> implementations and tell me which one is faster. Please don't hurt me, if 
>> you don't want to see src.jar code from OpenJDK Java6 - just delete this 
>> mail if you don’t want to (the code here is licensed under GPL):
>
> This is the source code for a specific version of one specific Java
> impl.  If we knew all Java impls simply implemented the primitive case
> using System.arraycopy (admittedly it's hard to imagine that they
> wouldn't!) then we are fine.
>
>> This is our implementation, simon replaced and Robert reverted 
>> (UnsafeByteArrayOutputStream):
>>
>>  private void grow(int newLength) {
>>    // It actually should be: (Java 1.7, when its intrinsic on all machines)
>>    // buffer = Arrays.copyOf(buffer, newLength);
>>    byte[] newBuffer = new byte[newLength];
>>    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
>>    buffer = newBuffer;
>>  }
>>
>> So please look at the code, where is a difference that could slow down, 
>> except the Math.min() call which is an intrinsic in almost every JDK on 
>> earth?
>
> Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
> sure about other cases...
>
>> The problem we are talking about here is only about the generic Object[] 
>> copyOf method and also affects e.g. *all* Collection.toArray() methods - 
>> they all use this code, so whenever you use ArrayList.toArray() or similar, 
>> the slow code is executed. This is why we replaced Collections.sort() by 
>> CollectionUtil.sort, that does no array copy. Simon & me were not willing to 
>> replace the reallocations in FST code (Mike you remember, we reverted that 
>> on your GIT repo when we did perf tests) and other parts in Lucene (there 
>> are only few of them). The idea was only to replace primitive type code to 
>> make it easier readable. And with later JDK code it could even get faster 
>> (not slower), if Oracle starts to add intrinsics for those new methods (and 
>> that’s Dawid and mine reason to change to copyTo for primitive types). In 
>> general, if you use Java SDK methods, that are as fast as ours, they always 
>> have a chance to get faster in later JDKs. So we should always prefer Java 
>> SDK methods, unless they are slower because their default impl is too 
>> generic or has too much safety checks or uses reflection.
>
> OK I'm convinced (I think!) that for primitive types only, let's use
> Arrays.copyOf!
>
>> To come back to UnsafeByteArrayOutputStream:
>>
>> I would change the whole code, as I don’t like the allocation strategy in it 
>> (it's exponential, on every grow it doubles its size). We should change that 
>> to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar 
>> allocation strategy like in trunk. Then we can discuss about this problem 
>> again when Simon & me wants to change ArrayUtils.grow methods to use 
>> Arrays.copyOf... *g* [just joking, I will never ask again, because this 
>> discussion here is endless and does not bring us forward].
>
> Well, it sounds like for primitive types, we can cutover
> ArrayUtils.grow methods.  Then we can look @ the nightly bench the
> next day ;)
>
> But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
> (almost) a dup of ByteArrayDataOutput?
>
>> The other thing I don’t like in the new faceting module is duplication of 
>> vint code. Why don’t we change it to use DataInput/DataOutput and use 
>> Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be 
>> much more streamlined with all the code we currently have. Then we can 
>> encode the payloads (or later docvalues) using the new 
>> UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? 
>> Or maybe add a ByteArrayDataOutput class.
>
> That sounds good!
>
> Uwe can you commit TODOs to the code w/ these ideas?
>
> Mike
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-01 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058626#comment-13058626
 ] 

Hoss Man commented on SOLR-2623:


Grr... right, right.   ObjectInstance != MBean.

> Solr JMX MBeans do not survive core reloads
> ---
>
> Key: SOLR-2623
> URL: https://issues.apache.org/jira/browse/SOLR-2623
> Project: Solr
>  Issue Type: Bug
>  Components: multicore
>Affects Versions: 1.4, 1.4.1, 3.1, 3.2
>Reporter: Alexey Serba
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch
>
>
> Solr JMX MBeans do not survive core reloads
> {noformat:title="Steps to reproduce"}
> sh> cd example
> sh> vi multicore/core0/conf/solrconfig.xml # enable jmx
> sh> java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
> start.jar
> sh> echo 'open 8842 # 8842 is java pid
> > domain solr/core0
> > beans
> > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
> 
> solr/core0:id=core0,type=core
> solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
> solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
> solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
> solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
> ...
> solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
> solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
> sh> curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0'
> sh> echo 'open 8842 # 8842 is java pid
> > domain solr/core0
> > beans
> > ' | java -jar jmxterm-1.0-alpha-4-uber.jar
> # there's only one bean left after Solr core reload
> solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
> main
> {noformat}
> The root cause of this is Solr core reload behavior:
> # create new core (which overwrites existing registered MBeans)
> # register new core and close old one (we remove/un-register MBeans on 
> oldCore.close)
> The correct sequence is:
> # unregister MBeans from old core
> # create and register new core
> # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved SOLR-2630.
-

Resolution: Fixed

Committed trunk revision: 1141999
Committed 3.x revision: 1142003

Thanks Upayavira, the idea is great and also of use for myself (if 
PANGAEA/panFMP moves to Solr, but since we have facetting now in Lucene I don't 
think we will do this step)!

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058679#comment-13058679
 ] 

Hoss Man commented on SOLR-2630:


Hmmm... from a user perspective does it really make sense for this to be an 
entirely new RequestHandler?

wouldn't it make more sense if users could just continue to use 
XmlUpdateRequestHandler along with a tr param indicating the transform to apply 
first?

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058681#comment-13058681
 ] 

Uwe Schindler commented on SOLR-2630:
-

I was thinking about that, it would be easy to implement as the current code 
would simply be moved to XMLLoader?

Should I add patch relative to whats currently committed?

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058690#comment-13058690
 ] 

Uwe Schindler commented on SOLR-2630:
-

On the other hand, this one is similar to XSLTResponseWriter which also is 
separate to XMLResponseWriter. XMLResponseWriter could also take an optional tr 
param and then transform? So the current solution is more consistent.

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058705#comment-13058705
 ] 

Upayavira commented on SOLR-2630:
-

I considered the same thing, making the XmlUpdateRequestHandler accept tr, but 
opted not to for the same reason as Uwe. Which ever way, consistency is a good 
thing!

> XsltUpdateRequestHandler
> 
>
> Key: SOLR-2630
> URL: https://issues.apache.org/jira/browse/SOLR-2630
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.3, 4.0
>Reporter: Upayavira
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 3.4, 4.0
>
> Attachments: xslt-update-handler.patch, xslt-update-handler.patch
>
>
> An update request handler that can accept a tr param, allowing the indexing 
> of any XML content that is passed to solr, so long as there is an XSLT 
> stylesheet in solr/conf/xslt that can transform it to the  
> format.
> Could be used, for example, to allow Solr to ingest docbook directly, without 
> any preprocessing.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2011-07-01 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058730#comment-13058730
 ] 

Hoss Man commented on LUCENE-3272:
--

single module != single jar ... correct?

someone writing a small form factor app that wants to use the basic Lucene 
QueryParser shouldn't have to load a jar containing every query parser provided 
by solr (and all of the dependencies they have)

> Consolidate Lucene's QueryParsers into a module
> ---
>
> Key: LUCENE-3272
> URL: https://issues.apache.org/jira/browse/LUCENE-3272
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/queryparser
>Reporter: Chris Male
>
> Lucene has a lot of QueryParsers and we should have them all in a single 
> consistent place.  
> The following are QueryParsers I can find that warrant moving to the new 
> module:
> - Lucene Core's QueryParser
> - AnalyzingQueryParser
> - ComplexPhraseQueryParser
> - ExtendableQueryParser
> - Surround's QueryParser
> - PrecedenceQueryParser
> - StandardQueryParser
> - XML-Query-Parser's CoreParser
> All seem to do a good job at their kind of parsing with extensive tests.
> One challenge of consolidating these is that many tests use Lucene Core's 
> QueryParser.  One option is to just replicate this class in src/test and call 
> it TestingQueryParser.  Another option is to convert all tests over to 
> programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Rory Plaire
@Michael -

Should that list be in JIRA? It would be easier to manage, I think...

If yes, I'll happily do it.

-r

On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon  wrote:

> * need to document what the build script does.  whut grammerz?
>
> On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon <
> mhern...@wickedsoftware.net> wrote:
>
> > @Rory, @All,
> >
> > The only tickets I currently have for those is LUCENE-419, LUCENE-418
> >
> > 418, I should be able to push into the 2.9.4g branch tonight.419 is a
> > long term goal and not as important as getting the tests fixed, of have
> the
> > tests broken down into what is actually a unit test, functional test,
> perf
> > or long running test. I can get into more why it needs to be done.
> >
> > I'll also need to make document the what build script currently does on
> the
> > wiki & and make a few notes about testing, like using the RAMDirectory,
> > etc.
> >
> > Things that need to get done or even be discussed.
> >  * There needs to be a running list of things to do/not to do with
> testing.
> > I don't know if this goes in a jira or do we keep a running list on the
> wiki
> > or site for people to pick up and  help with.
> >  * Tests need to run on mono and not Fail (there is a good deal of
> failing
> > tests on mono, mostly due to the temp directory have the C:\ in the
> path).
> >  * Assert.Throw() needs to be used instead of Try/Catch
> > Assert.Fail.  **
> >  * File & Path combines to the temp directory need helper methods,
> >  * e,g, having this in a hundred places is bad   new
> >
> System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get("tempDir",
> > ""), "testIndex"));
> >  * We should still be testing deprecated methods, but we need to use
> #pragma
> > warning disable/enable 0618  for testing those. otherwise compiler
> warnings
> > are too numerous to be anywhere near helpful.
> >  * We should only be using deprecated methods in places where they are
> > being explicitly tested, other tests that need that functionality in
> order
> > to validate those tests should be re factored to use methods that are not
> > deprecated.
> >  * Identify code that could be abstracted into test utility classes.
> >  * Infrastructure Validation tests need to be made, anything that seems
> > like infrastructure.  e.g. does the temp directory exist, does the
> folders
> > that the tests use inside the temp directory exist, can we read/write to
> > those folders. (if a ton of tests fail due to the file system, we should
> be
> > able to point out that it was due to permissions or missing folders,
> files,
> > etc).
> >  * Identify what classes need an interface, abstract class or inherited
> in
> > order to create testing mocks. (once those classes are created, they
> should
> > be documented in the wiki).
> >
> >
> >
> > ** Asset.Throws needs to replace stuff like the following. We should also
> > be checking the messages for exceptions and make sure they make sense and
> > can help users fix isses if the exceptions are aimed at the library
> users.
> > try
> > {
> > d = DateTools.StringToDate("97"); // no date
> >  Assert.Fail();
> > }
> > catch (System.FormatException e)
> >  {
> > /* expected exception */
> > }
> >
> > On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire  >wrote:
> >
> >> So, veering towards action - are there concrete tasks written up
> anywhere
> >> for the unit tests? If a poor schlep like me wanted to dig in and start
> to
> >> improve them, where would I get the understanding of what is good and
> what
> >> needs help?
> >>
> >> -r
> >>
> >> On Thu, Jun 30, 2011 at 3:29 PM, Digy  wrote:
> >>
> >> > I can not say I like this approach, but till we find an automated
> >> way(with
> >> > good results), it seems to be the only way we can use.
> >> >
> >> > DIGY
> >> >
> >> > -Original Message-
> >> > From: Troy Howard [mailto:thowar...@gmail.com]
> >> > Sent: Friday, July 01, 2011 12:43 AM
> >> > To: lucene-net-...@lucene.apache.org
> >> > Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port
> needed?
> >> >
> >> > Scott -
> >> >
> >> > The idea of the automated port is still worth doing. Perhaps it makes
> >> sense
> >> > for someone more passionate about the line-by-line idea to do that
> work?
> >> >
> >> > I would say, focus on what makes sense to you. Being productive,
> >> regardless
> >> > of the specific direction, is what will be most valuable. Once you
> >> start,
> >> > others will join and momentum will build. That is how these things
> work.
> >> >
> >> > I like DIGY's approach too, but the problem with it is that it is a
> >> > never-ending manual task. The theory behind the automated port is that
> >> it
> >> > may reduce the manual work. It is complicated, but once it's built and
> >> > works, it will save a lot of future development hours. If it's built
> in
> >> a
> >> > sufficiently general manner, it could be useful for other project like
> >> > Lucene.Net that want to automate a port from Java to 

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Michael Herndon
I think whatever makes sense to do.

possibly create one jira for now with a running list that can be modified
and possibly as people pull from that list, cross things off or create a
separate ticket that links back to to the main one.




On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire  wrote:

> @Michael -
>
> Should that list be in JIRA? It would be easier to manage, I think...
>
> If yes, I'll happily do it.
>
> -r
>
> On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon <
> mhern...@wickedsoftware.net
> > wrote:
>
> > * need to document what the build script does.  whut grammerz?
> >
> > On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon <
> > mhern...@wickedsoftware.net> wrote:
> >
> > > @Rory, @All,
> > >
> > > The only tickets I currently have for those is LUCENE-419, LUCENE-418
> > >
> > > 418, I should be able to push into the 2.9.4g branch tonight.419 is
> a
> > > long term goal and not as important as getting the tests fixed, of have
> > the
> > > tests broken down into what is actually a unit test, functional test,
> > perf
> > > or long running test. I can get into more why it needs to be done.
> > >
> > > I'll also need to make document the what build script currently does on
> > the
> > > wiki & and make a few notes about testing, like using the RAMDirectory,
> > > etc.
> > >
> > > Things that need to get done or even be discussed.
> > >  * There needs to be a running list of things to do/not to do with
> > testing.
> > > I don't know if this goes in a jira or do we keep a running list on the
> > wiki
> > > or site for people to pick up and  help with.
> > >  * Tests need to run on mono and not Fail (there is a good deal of
> > failing
> > > tests on mono, mostly due to the temp directory have the C:\ in the
> > path).
> > >  * Assert.Throw() needs to be used instead of Try/Catch
> > > Assert.Fail.  **
> > >  * File & Path combines to the temp directory need helper methods,
> > >  * e,g, having this in a hundred places is bad   new
> > >
> >
> System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get("tempDir",
> > > ""), "testIndex"));
> > >  * We should still be testing deprecated methods, but we need to use
> > #pragma
> > > warning disable/enable 0618  for testing those. otherwise compiler
> > warnings
> > > are too numerous to be anywhere near helpful.
> > >  * We should only be using deprecated methods in places where they are
> > > being explicitly tested, other tests that need that functionality in
> > order
> > > to validate those tests should be re factored to use methods that are
> not
> > > deprecated.
> > >  * Identify code that could be abstracted into test utility classes.
> > >  * Infrastructure Validation tests need to be made, anything that seems
> > > like infrastructure.  e.g. does the temp directory exist, does the
> > folders
> > > that the tests use inside the temp directory exist, can we read/write
> to
> > > those folders. (if a ton of tests fail due to the file system, we
> should
> > be
> > > able to point out that it was due to permissions or missing folders,
> > files,
> > > etc).
> > >  * Identify what classes need an interface, abstract class or inherited
> > in
> > > order to create testing mocks. (once those classes are created, they
> > should
> > > be documented in the wiki).
> > >
> > >
> > >
> > > ** Asset.Throws needs to replace stuff like the following. We should
> also
> > > be checking the messages for exceptions and make sure they make sense
> and
> > > can help users fix isses if the exceptions are aimed at the library
> > users.
> > > try
> > > {
> > > d = DateTools.StringToDate("97"); // no date
> > >  Assert.Fail();
> > > }
> > > catch (System.FormatException e)
> > >  {
> > > /* expected exception */
> > > }
> > >
> > > On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire  > >wrote:
> > >
> > >> So, veering towards action - are there concrete tasks written up
> > anywhere
> > >> for the unit tests? If a poor schlep like me wanted to dig in and
> start
> > to
> > >> improve them, where would I get the understanding of what is good and
> > what
> > >> needs help?
> > >>
> > >> -r
> > >>
> > >> On Thu, Jun 30, 2011 at 3:29 PM, Digy  wrote:
> > >>
> > >> > I can not say I like this approach, but till we find an automated
> > >> way(with
> > >> > good results), it seems to be the only way we can use.
> > >> >
> > >> > DIGY
> > >> >
> > >> > -Original Message-
> > >> > From: Troy Howard [mailto:thowar...@gmail.com]
> > >> > Sent: Friday, July 01, 2011 12:43 AM
> > >> > To: lucene-net-...@lucene.apache.org
> > >> > Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port
> > needed?
> > >> >
> > >> > Scott -
> > >> >
> > >> > The idea of the automated port is still worth doing. Perhaps it
> makes
> > >> sense
> > >> > for someone more passionate about the line-by-line idea to do that
> > work?
> > >> >
> > >> > I would say, focus on what makes sense to you. Being productive,
> > >> regardless
> > >> > of the specific direction, is what w

[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2011-07-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058769#comment-13058769
 ] 

Robert Muir commented on LUCENE-3272:
-

single jar, but you can customize: its open source.

Hoss I think you are looking at this the wrong way: this actually makes it way 
easier for someone writing a small form factor app that uses no query parser at 
all, or their own queryparser, or whatever.

we should do this to make the lucene core smaller, and then you plug in the 
modules you need (and maybe only selected parts from them, but thats your call).

I don't think we need to provide X * Y * Z possibilities, nor do we need to 
provide 87 jar files.


But, this is just rehashing LUCENE-2323, where we already had this 
conversation. I think at the least we should put all these QPs into one place 
to make refactoring between them easier. Then we make a smaller amount of code 
for these small form factor apps you are so concerned about, with the messy 
duplication this is not possible now.

I still stand by my comments in LUCENE-2323, and guess what, turns out I think 
I was right.
LUCENE-1938 then refactored one of these queryparsers, removing 4000 lines of 
code but keeping the same functionality.


> Consolidate Lucene's QueryParsers into a module
> ---
>
> Key: LUCENE-3272
> URL: https://issues.apache.org/jira/browse/LUCENE-3272
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/queryparser
>Reporter: Chris Male
>
> Lucene has a lot of QueryParsers and we should have them all in a single 
> consistent place.  
> The following are QueryParsers I can find that warrant moving to the new 
> module:
> - Lucene Core's QueryParser
> - AnalyzingQueryParser
> - ComplexPhraseQueryParser
> - ExtendableQueryParser
> - Surround's QueryParser
> - PrecedenceQueryParser
> - StandardQueryParser
> - XML-Query-Parser's CoreParser
> All seem to do a good job at their kind of parsing with extensive tests.
> One challenge of consolidating these is that many tests use Lucene Core's 
> QueryParser.  One option is to just replicate this class in src/test and call 
> it TestingQueryParser.  Another option is to convert all tests over to 
> programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Updated] (LUCENENET-400) Evaluate tooling for continuous integration server

2011-07-01 Thread michael herndon (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael herndon updated LUCENENET-400:
--

Due Date: 30/Sep/11  (was: 28/Feb/11)

> Evaluate tooling for continuous integration server
> --
>
> Key: LUCENENET-400
> URL: https://issues.apache.org/jira/browse/LUCENENET-400
> Project: Lucene.Net
>  Issue Type: Task
>  Components: Build Automation, Project Infrastructure
>Reporter: Troy Howard
>Assignee: michael herndon
>
> We would like to have a CI server setup for Lucene.Net.
> It has been suggested to do this outside of the ASF infrastructure, but this 
> would not work for ASF. 
> Please review the available options at http://ci.apache.org/ and evaluate 
> which CI server system would be preferred for our setup. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.

2011-07-01 Thread michael herndon (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058781#comment-13058781
 ] 

michael herndon commented on LUCENENET-418:
---

r1132085 under the Lucene.Net_2_9_4g branch.  The exception was removed. The 
static constructor still exists, but will be re-factored out at a later date.   
The paths for the TestBackwardsCompatability tests were also fixed.  

> LuceneTestCase should not have a static method could throw exceptions.  
> 
>
> Key: LUCENENET-418
> URL: https://issues.apache.org/jira/browse/LUCENENET-418
> Project: Lucene.Net
>  Issue Type: Bug
>  Components: Lucene.Net Test
>Affects Versions: Lucene.Net 3.x
> Environment: Linux, OSX, etc 
>Reporter: michael herndon
>Assignee: michael herndon
>  Labels: test
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> Throwing an exception in a base classes for 90% tests in a static method 
> makes it hard to debug the issue in nunit.
> The test results came back saying that TestFixtureSetup was causing an issue 
> even though it was the Static Constructor causing problems and this then 
> propagates to all the tests that stem from LuceneTestCase. 
> The TEMP_DIR needs to be moved to a static util class as a property or even a 
> mixin method.  This caused me hours to debug and figure out the real issue as 
> the underlying exception method never bubbled up.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

2011-07-01 Thread Uwe Schindler (JIRA)
PingRequestHandler can infinite loop if called with a qt that points to itsself
---

 Key: SOLR-2631
 URL: https://issues.apache.org/jira/browse/SOLR-2631
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.2, 3.1, 1.4, 3.3
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.4, 4.0


We got a security report to priv...@lucene.apache.org, that Solr can infinite 
loop, use 100% CPU and stack overflow, if you execute the following HTTP 
request: 

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/admin/ping?qt=/admin/ping

The qt paramter instructs PingRequestHandler to call the given request handler. 
This leads to an infinite loop. This is not an security issue, but for an 
unprotected Solr server with unprotected /solr/select path this makes it stop 
working.

The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

2011-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058785#comment-13058785
 ] 

Uwe Schindler commented on SOLR-2631:
-

Edoardo Tosca, who reported the issue, gave the following workaround for 
solrconfig.xml to fix this by configuration:

{quote}
Ok,
to solve the Ping problem you can add an invariant:

  solrpingquery
  all


  search


in this case you avoid generating recursive calls to /admin/ping handler

Edo
{quote}

> PingRequestHandler can infinite loop if called with a qt that points to 
> itsself
> ---
>
> Key: SOLR-2631
> URL: https://issues.apache.org/jira/browse/SOLR-2631
> Project: Solr
>  Issue Type: Bug
>  Components: search, web gui
>Affects Versions: 1.4, 3.1, 3.2, 3.3
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.4, 4.0
>
>
> We got a security report to priv...@lucene.apache.org, that Solr can infinite 
> loop, use 100% CPU and stack overflow, if you execute the following HTTP 
> request: 
> - http://localhost:8983/solr/select?qt=/admin/ping
> - http://localhost:8983/admin/ping?qt=/admin/ping
> The qt paramter instructs PingRequestHandler to call the given request 
> handler. This leads to an infinite loop. This is not an security issue, but 
> for an unprotected Solr server with unprotected /solr/select path this makes 
> it stop working.
> The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2631:


Attachment: SOLR-2631.patch

This patch fixes the bug.

Hoss said, we could also simply check the qt param but I decided to do the 
instanceof check: If the PingRequestHandler is registered multiple times in the 
solrconfig.xml (e.g. by different URI paths or different names), the infinite 
loop could still occur. The PingRequestHandler should generally disallow 
calling itsself.

> PingRequestHandler can infinite loop if called with a qt that points to 
> itsself
> ---
>
> Key: SOLR-2631
> URL: https://issues.apache.org/jira/browse/SOLR-2631
> Project: Solr
>  Issue Type: Bug
>  Components: search, web gui
>Affects Versions: 1.4, 3.1, 3.2, 3.3
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2631.patch
>
>
> We got a security report to priv...@lucene.apache.org, that Solr can infinite 
> loop, use 100% CPU and stack overflow, if you execute the following HTTP 
> request: 
> - http://localhost:8983/solr/select?qt=/admin/ping
> - http://localhost:8983/admin/ping?qt=/admin/ping
> The qt paramter instructs PingRequestHandler to call the given request 
> handler. This leads to an infinite loop. This is not an security issue, but 
> for an unprotected Solr server with unprotected /solr/select path this makes 
> it stop working.
> The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

2011-07-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2631:


Description: 
We got a security report to priv...@lucene.apache.org, that Solr can infinite 
loop, use 100% CPU and stack overflow, if you execute the following HTTP 
request: 

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/solr/admin/ping?qt=/admin/ping

The qt paramter instructs PingRequestHandler to call the given request handler. 
This leads to an infinite loop. This is not an security issue, but for an 
unprotected Solr server with unprotected /solr/select path this makes it stop 
working.

The fix is to prevent infinite loop by disallowing calling itsself.

  was:
We got a security report to priv...@lucene.apache.org, that Solr can infinite 
loop, use 100% CPU and stack overflow, if you execute the following HTTP 
request: 

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/admin/ping?qt=/admin/ping

The qt paramter instructs PingRequestHandler to call the given request handler. 
This leads to an infinite loop. This is not an security issue, but for an 
unprotected Solr server with unprotected /solr/select path this makes it stop 
working.

The fix is to prevent infinite loop by disallowing calling itsself.


> PingRequestHandler can infinite loop if called with a qt that points to 
> itsself
> ---
>
> Key: SOLR-2631
> URL: https://issues.apache.org/jira/browse/SOLR-2631
> Project: Solr
>  Issue Type: Bug
>  Components: search, web gui
>Affects Versions: 1.4, 3.1, 3.2, 3.3
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>  Labels: security
> Fix For: 3.4, 4.0
>
> Attachments: SOLR-2631.patch
>
>
> We got a security report to priv...@lucene.apache.org, that Solr can infinite 
> loop, use 100% CPU and stack overflow, if you execute the following HTTP 
> request: 
> - http://localhost:8983/solr/select?qt=/admin/ping
> - http://localhost:8983/solr/admin/ping?qt=/admin/ping
> The qt paramter instructs PingRequestHandler to call the given request 
> handler. This leads to an infinite loop. This is not an security issue, but 
> for an unprotected Solr server with unprotected /solr/select path this makes 
> it stop working.
> The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2429) ability to not cache a filter

2011-07-01 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2429.


   Resolution: Fixed
Fix Version/s: 3.4

> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Fix For: 3.4
>
> Attachments: SOLR-2429.patch
>
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-07-01 Thread Mitsu Hadeishi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058836#comment-13058836
 ] 

Mitsu Hadeishi commented on SOLR-2462:
--

Oh now you tell us. :) Well, we already built the patched 3.2 so we're going 
with that for now :)

> Using spellcheck.collate can result in extremely high memory usage
> --
>
> Key: SOLR-2462
> URL: https://issues.apache.org/jira/browse/SOLR-2462
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 3.1
>Reporter: James Dyer
>Assignee: Robert Muir
>Priority: Critical
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058844#comment-13058844
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

So, after a few hours hacking .. it's hopefully a step into the right direction 
for the Analysis-Page! :>

Please, have a look and let me know, what you're thinking. I've changed various 
things:
* Vertical Separation should be more clear now (Index- vs. Query-Time)
* Filter- & Tokenizer-Names are placed on the left Side (so it should be easier 
to follow each token through all the steps, Full Name on MouseOver)
* Property-Names are not longer abbreviated
* All Properties (except {{match}} and {{positionHistory}}) are displayed
* If the Property-Name contains a #-Sign, only the latter part is displayed 
(Full Name on MouseOver)

Uwe, maybe you could give it a try w/ lucene-gosen? These are the [required 
changes|https://github.com/steffkes/solr-admin/commit/ddb1e0098efc2ef48082e43ed57e4b62b23ba6d7]
 (since the last svn-commit).

|| ||Version 1980||First Try||Current Page||
||Normal|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis.png]*|
||Verbose|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose.png]*|

Stefan

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
> SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
> SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
> SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-07-01 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058848#comment-13058848
 ] 

Lance Norskog commented on SOLR-1499:
-

Ahmet- are you still using this? 

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
> Fix For: 3.3
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Stefan Matheis (steffkes) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058865#comment-13058865
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

First Feedback from #solr:

{quote} in the new ones, it's easy to overlook that "sonic" and 
"viewsonic" are at the same position
 steffkes: i think what i would suggest is to keep your new layout, 
treat position as special and put some sort of visual indicator on terms that 
are at the same position{quote}

{quote} oh yeah ... i ment to ask about that ... i'm assuming you 
look at the capital letters in the class name?
 i think it's an assume way to save space, definitely a good idea for 
verbose==false (as long as you can mouse over it or something to see the full 
name)
 for verbose==true ... not sure{quote}

{quote}_regarding the two-column-layout / Index- vs. Query-Time_:
 I see now.  I think they might need headers to indicate which is 
which.  Not strictly required if your screen is wide enough, but if it wraps 
below, it may not be immediately apparent.{quote}

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
> SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
> SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
> SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058879#comment-13058879
 ] 

Robert Muir commented on SOLR-2399:
---

{quote}
Uwe, maybe you could give it a try w/ lucene-gosen? These are the required 
changes (since the last svn-commit).
{quote}

If Uwe doesn't have the time, I'll try to investigate this in the next few 
days, once I stop laughing about "Version 1980".

we have a version that works with trunk here, 
https://lucene-gosen.googlecode.com/svn/branches/4x

> Solr Admin Interface, reworked
> --
>
> Key: SOLR-2399
> URL: https://issues.apache.org/jira/browse/SOLR-2399
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Stefan Matheis (steffkes)
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
> SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
> SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
> SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
> SOLR-2399-wip-notice.patch, SOLR-2399.patch
>
>
> *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
> Interface.* [Based on this 
> [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
> *Features:*
> * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
> * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
> * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
> * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
> SOLR-2400)
> * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
> * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
> * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
> * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
> * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
> * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
> ** Stub (using static data)
> Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
> I've quickly created a Github-Repository (Just for me, to keep track of the 
> changes)
> » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1612 - Still Failing

2011-07-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1612/

No tests ran.

Build Log (for compile errors):
[...truncated 9445 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org