Re: [Why] some PyLucene tests fail on Windows

2012-05-08 Thread Andi Vajda


On Tue, 8 May 2012, Thomas Koch wrote:


There's a known issue with some PyLucene tests that fail on Windows -
reported/discussed before - see
http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201104.mbox/00
0d01cbfa7b$c60ca530$5225ef90$@de

While some tests have been fixed, some others still show errors / fail on
windows, currently that's

ERROR: test_removeDocument (__main__.PythonDirectoryTests)
FAIL: testTiming
(lia.indexing.CompoundVersusMultiFileIndexTest.CompoundVersusMultiFileIndexT
est)

I had a look at this and tried to figure out why the PythonDirectoryTests
fail. A possible cause is a windows issue with os.unlink where in certain
situations (timing issue!) a file that is going to be deleted is still
locked by windows - which may happen because of some process being notified
about file removal still holds a lock on it (sth. like an indexer or virus
checker). Another reason could simply be that some tests keep files open...
anyway: On Windows, attempting to remove a file that is in use causes an
exception to be raised. (from the Python doc on  os.remove(path) and
unlink(path)).

The traceback below shows that the test in PythonDirectoryTests fails in
deleteFile (where lock is on an index file like '_0.tis' or '_0.fdt').
There's been a similar problem with Python unit tests (on windows) as
discussed here:  http://bugs.python.org/issue7443 (can't see that they came
to a solution though ... it's still open)

I've added special hack for windows to the test_PythonDirectory and added a
try-delete-wait-retry-loop there - this shows that it's probably not a
timing issue (after 10 secs/retries, the file still cannot be deleted) - so
it rather looks like the index files are still open! (couldn't find a bug in
the python code though)

Thus I tried to simply ignore the exception - the files are then *not*
removed, but the test succeeds:
...
indexing  98
indexing  99
failed to delete: testpyrepo\_0.tis
failed to delete: testpyrepo\_0.nrm
failed to delete: testpyrepo\_0.frq
failed to delete: testpyrepo\_0.fdx
failed to delete: testpyrepo\_0.prx
failed to delete: testpyrepo\_0.fdt
failed to delete: testpyrepo\_0.tis
failed to delete: testpyrepo\_0.nrm
failed to delete: testpyrepo\_0.frq
failed to delete: testpyrepo\_0.fdx
failed to delete: testpyrepo\_0.prx
failed to delete: testpyrepo\_0.fdt
...

Ran 10 tests in 3.659s

(The failed to delete: is a debug statement - to be removed)

I could provide a patch that allows the test_PythonDirectory to pass on
windows - if that's an accepted solution. Still I see possibility that the
PythonDirectory has a bug (i.e. index files are not closed always) - on the
other hand it could be timing issue or sth related to my environment (?)


I don't think that hiding failures under the rug is the right thing to do.


I wonder if anyone has used PythonDirectory yet (in production) or if
someone else came across these  issues on windows? (I'm not using
PythonDirectory and haven't had similar lock issues except of when running
the tests...) If someone could confirm that the test_PythonDirectory passes
on his/her windows environment that would be good to know ,-)


PythonDirectory is meant as an example of python extension and tests that 
the extension is functional. It's not intended as a production-level

implementation.

Andi..




Haven't had a look at testTiming yet.

Regards,
Thomas
--
Details :

==
ERROR: test_removeDocument (__main__.PythonDirectoryTests)
--
Traceback (most recent call last):
 File
F:\Devel\workspaces\workspace.pylucene\pylucene-3.6.0-2\test\test_PyLucene.
py, line 178, in test_removeDocument
   self.closeStore(store, searcher, reader)
 File test_PythonDirectory.py, line 241, in closeStore
   arg.close()
JavaError: org.apache.jcc.PythonException: (32, 'Der Prozess kann nicht auf
die Datei zugreifen, da sie von einem anderen Prozess verwendet wird',
u'testpyrepo_0.tis')
Traceback (most recent call last):
 File test_PythonDirectory.py, line 181, in deleteFile
   os.unlink(os.path.join(self.path, name))
WindowsError: [Error 32] Der Prozess kann nicht auf die Datei zugreifen, da
sie
von einem anderen Prozess verwendet wird: u'testpyrepo\\_0.tis'

   Java stacktrace:
...
   at org.apache.pylucene.store.PythonDirectory.deleteFile(Native
Method)
   at
org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:57
8)
   at
org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:517)
   at
org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:504)
   at
org.apache.lucene.index.IndexFileDeleter.close(IndexFileDeleter.java:377)
   at
org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:854)
   at org.apache.lucene.index.IndexReader.commit(IndexReader.java:1520)
   at 

Easy way to find JAVA_HOME

2012-05-08 Thread Christian Heimes
Hello,

I found a much easier to detect the path to JAVA_HOME on Unix-like
platforms where the java command is in the search path. java -verbose
prints out the paths of all loaded JAR files.

Christian

---
import subprocess
import re
import os

PATH_RE = re.compile(Loaded\ .*\ from\ (.*)/jre/lib/rt.jar)

def find_java():
Find java home by running java -verbose

In verbose mode, java prints lines like

[Loaded java.lang.Object from
/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/rt.jar]

to stdout.

try:
proc = subprocess.Popen([java, -verbose],
stdout=subprocess.PIPE)
except OSError:
return None
out, err = proc.communicate()
for line in out.split(\n):
mo = PATH_RE.search(line)
if mo is None:
continue
javahome = mo.group(1)
if os.path.isdir(javahome):
return javahome

if __name__ == __main__:
print find_java()
---


[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #480: POMs out of sync

2012-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/480/

No tests ran.

Build Log (for compile errors):
[...truncated 6847 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 13874 - Failure

2012-05-08 Thread Chris Male
I'll fix.

On Tue, May 8, 2012 at 5:36 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13874/

 All tests passed

 Build Log (for compile errors):
 [...truncated 24376 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13875 - Still Failing

2012-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13875/

All tests passed

Build Log (for compile errors):
[...truncated 24129 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 13875 - Still Failing

2012-05-08 Thread Chris Male
Fixed in r1335354.

On Tue, May 8, 2012 at 6:32 PM, Apache Jenkins Server 
jenk...@builds.apache.org wrote:

 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13875/

 All tests passed

 Build Log (for compile errors):
 [...truncated 24129 lines...]




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


Lucene Index Size and Performance

2012-05-08 Thread parkhekishor
Hi, 
   I have Index with size 1GB. Its each documents consist five Fields which
are use for search.For single result it take 30 to 40 milliseconds.I want to
reduce this time.How can I do this? is search performance depends on a Index
size? What is maximum capacity to add Documents  in Index?
  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lucene-Index-Size-and-Performance-tp3970551.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13876 - Still Failing

2012-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13876/

All tests passed

Build Log (for compile errors):
[...truncated 24092 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4039) Add AddIndexesTask to Benchmark

2012-05-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-4039.


Resolution: Fixed

Committed revision 1335363.

 Add AddIndexesTask to Benchmark
 ---

 Key: LUCENE-4039
 URL: https://issues.apache.org/jira/browse/LUCENE-4039
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-4039.patch


 I was interested in measuring the performance of 
 IndexWriter.addIndexes(Directory) vs. IndexWriter.addIndexes(IndexReader). I 
 wrote an AddIndexesTask and a matching .alg. The task takes a parameter 
 whether to use the IndexReader or Directory variants. I'll upload the patch 
 and describe the perf results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4038) some testcases not executed by 'ant test'

2012-05-08 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270307#comment-13270307
 ] 

Dawid Weiss commented on LUCENE-4038:
-

That's the way it's always been -- I didn't change it when switching to junit4 
(I think).

 some testcases not executed by 'ant test'
 -

 Key: LUCENE-4038
 URL: https://issues.apache.org/jira/browse/LUCENE-4038
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.0


 Look under 'spatial', RecursivePrefixTreeStrategyTestCase and 
 TwoDoublesStrategyTestCase
 don't get invoked.
 I suspect this is something in junit4 that doesnt like the fact that these 
 classes extend
 a base class that takes a generic type? 
 Because if i just click 'run tests' from this folder in my IDE, then they run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Index Size and Performance

2012-05-08 Thread Li Li
what's the hardware configuration of your machines?
if you have enough RAM, you could use RAMDirectory.

On Tue, May 8, 2012 at 2:52 PM, parkhekishor kishor.par...@highmark.in wrote:
 Hi,
   I have Index with size 1GB. Its each documents consist five Fields which
 are use for search.For single result it take 30 to 40 milliseconds.I want to
 reduce this time.How can I do this? is search performance depends on a Index
 size? What is maximum capacity to add Documents  in Index?


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Lucene-Index-Size-and-Performance-tp3970551.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Multi-content-type /update handler

2012-05-08 Thread Erik Hatcher
+1 !!


On May 7, 2012, at 20:28 , Ryan McKinley wrote:

 I'd like to commit SOLR-2857 soon -- it would be great for 4.0 to
 assume XML/JSON/CSV/JAVABIN at the same endpoint rather then 4
 configured RequestHandlers
 
 The bulk of the patch is refactoring the tests to all point to the same 
 handler
 
 Any objections?  If so, should we improve things on trunk or more patches?
 
 ryan
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

2012-05-08 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4022:


Attachment: LUCENE-4022.patch

here is a patch with a slightly change algorithm. It still takes free/2 as the 
base buffer size but checks if it is reasonable to grow the heap if the total 
available mem is 10x larger than the free memory or if the free memory is 
smaller than MIN_BUFFER_SIZE_MB. If we run into small heaps like on mobile 
phones where you only have up to 3MB this falls back to the 1/2 or the 
ABSOLUTE_MIN_SORT_BUFFER_SIZE. 

The actual buffer size is bounded by Integer.MAX_VALUE

 Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
 -

 Key: LUCENE-4022
 URL: https://issues.apache.org/jira/browse/LUCENE-4022
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Affects Versions: 3.6, 4.0
Reporter: Simon Willnauer
 Fix For: 4.0, 3.6.1

 Attachments: LUCENE-4022.patch


 The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a 
 upper bound even if there is more memory available. See this snippet:
 {code}
 long half = free/2;
 if (half = ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
 }
   
 // by max mem (heap will grow)
 half = (max - total) / 2;
 return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
 {code}
 use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

2012-05-08 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4022:
---

Assignee: Simon Willnauer

 Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
 -

 Key: LUCENE-4022
 URL: https://issues.apache.org/jira/browse/LUCENE-4022
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Affects Versions: 3.6, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0, 3.6.1

 Attachments: LUCENE-4022.patch


 The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a 
 upper bound even if there is more memory available. See this snippet:
 {code}
 long half = free/2;
 if (half = ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
 }
   
 // by max mem (heap will grow)
 half = (max - total) / 2;
 return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
 {code}
 use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4042) New snowball stemmers (Irish gaelic and Czech)

2012-05-08 Thread Dawid Weiss (JIRA)
Dawid Weiss created LUCENE-4042:
---

 Summary: New snowball stemmers (Irish gaelic and Czech)
 Key: LUCENE-4042
 URL: https://issues.apache.org/jira/browse/LUCENE-4042
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Dawid Weiss
Priority: Trivial
 Fix For: 4.0


New stemmers have been added to snowball (Irish gaelic and Czech).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2012-05-08 Thread Shane (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane updated SOLR-2834:


Affects Version/s: 3.6

 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6
Reporter: Shane

 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2012-05-08 Thread Shane (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane updated SOLR-2834:


Attachment: AnalysisResponseBase.patch

Patch file for fix to check if the Entry value is an instance of List.

 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6
Reporter: Shane
 Attachments: AnalysisResponseBase.patch


 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



AW: [VOTE] Release PyLucene 3.6.0 rc2

2012-05-08 Thread Thomas Koch
I could build JCC and PyLucene on Win7-32 with Python27 and Java16. The ivy
thing gets installed automatically. All tests pass except of the
PythonDirectoryTests  and testTiming. However there's a known issue about
some tests that fail on windows thus this shouldn't be a release blocker.
I've investigated a bit further on the test issue and will send a separate
email to the list about it.

+1 for PyLucene 3.6.0 rc2

Regards,
Thomas 



 -Ursprüngliche Nachricht-
 Von: Andi Vajda [mailto:va...@apache.org]
 Gesendet: Dienstag, 8. Mai 2012 02:20
 An: pylucene-...@lucene.apache.org
 Cc: gene...@lucene.apache.org
 Betreff: [VOTE] Release PyLucene 3.6.0 rc2
 
 
 The ivy requirement for building Lucene Java is now handled by PyLucene's
 Makefile. A new release candidate is available for review.
 
 The PyLucene 3.6.0-2 release tracking the recent release of Apache Lucene
 3.6.0 is ready.
 
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_6/
 CHANGES
 
 PyLucene 3.6.0 is built with JCC 2.13 included in these release artifacts:
 http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucen
 e/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 3.6.0-2.
 
 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1




[jira] [Commented] (SOLR-3221) Make Shard handler threadpool configurable

2012-05-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270498#comment-13270498
 ] 

Mark Miller commented on SOLR-3221:
---

bq. I am loathe to submit a patch for changing the CHANGES.txt

Usually the committer handles this - no rules about it though. 

I'd go with something a bit shorter - no need to get into the gritty details - 
that's why the JIRA issue number is there. I'd stick to something closer to 
Make shard handler threadpool configurable. or Added the ability to directly 
configure aspects of the concurrency and thread-pooling used within distributed 
search in solr.

 Make Shard handler threadpool configurable
 --

 Key: SOLR-3221
 URL: https://issues.apache.org/jira/browse/SOLR-3221
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.6, 4.0
Reporter: Greg Bowyer
Assignee: Erick Erickson
  Labels: distributed, http, shard
 Fix For: 3.6, 4.0

 Attachments: SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, 
 SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch


 From profiling of monitor contention, as well as observations of the
 95th and 99th response times for nodes that perform distributed search
 (or ‟aggregator‟ nodes) it would appear that the HttpShardHandler code
 currently does a suboptimal job of managing outgoing shard level
 requests.
 Presently the code contained within lucene 3.5's SearchHandler and
 Lucene trunk / 3x's ShardHandlerFactory create arbitrary threads in
 order to service distributed search requests. This is done presently to
 limit the size of the threadpool such that it does not consume resources
 in deployment configurations that do not use distributed search.
 This unfortunately has two impacts on the response time if the node
 coordinating the distribution is under high load.
 The usage of the MaxConnectionsPerHost configuration option results in
 aggressive activity on semaphores within HttpCommons, it has been
 observed that the aggregator can have a response time far greater than
 that of the searchers. The above monitor contention would appear to
 suggest that in some cases its possible for liveness issues to occur and
 for simple queries to be starved of resources simply due to a lack of
 attention from the viewpoint of context switching.
 With, as mentioned above the http commons connection being hotly
 contended
 The fair, queue based configuration eliminates this, at the cost of
 throughput.
 This patch aims to make the threadpool largely configurable allowing for
 those using solr to choose the throughput vs latency balance they
 desire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3221) Make Shard handler threadpool configurable

2012-05-08 Thread Greg Bowyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270504#comment-13270504
 ] 

Greg Bowyer commented on SOLR-3221:
---

Sorry the changes.txt change was done so I dont think there is anything left 
for this jira ticket, I think Erick added the changes.txt for the 3.6 release

 Make Shard handler threadpool configurable
 --

 Key: SOLR-3221
 URL: https://issues.apache.org/jira/browse/SOLR-3221
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.6, 4.0
Reporter: Greg Bowyer
Assignee: Erick Erickson
  Labels: distributed, http, shard
 Fix For: 3.6, 4.0

 Attachments: SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-3x_branch.patch, 
 SOLR-3221-3x_branch.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, 
 SOLR-3221-trunk.patch, SOLR-3221-trunk.patch, SOLR-3221-trunk.patch


 From profiling of monitor contention, as well as observations of the
 95th and 99th response times for nodes that perform distributed search
 (or ‟aggregator‟ nodes) it would appear that the HttpShardHandler code
 currently does a suboptimal job of managing outgoing shard level
 requests.
 Presently the code contained within lucene 3.5's SearchHandler and
 Lucene trunk / 3x's ShardHandlerFactory create arbitrary threads in
 order to service distributed search requests. This is done presently to
 limit the size of the threadpool such that it does not consume resources
 in deployment configurations that do not use distributed search.
 This unfortunately has two impacts on the response time if the node
 coordinating the distribution is under high load.
 The usage of the MaxConnectionsPerHost configuration option results in
 aggressive activity on semaphores within HttpCommons, it has been
 observed that the aggregator can have a response time far greater than
 that of the searchers. The above monitor contention would appear to
 suggest that in some cases its possible for liveness issues to occur and
 for simple queries to be starved of resources simply due to a lack of
 attention from the viewpoint of context switching.
 With, as mentioned above the http commons connection being hotly
 contended
 The fair, queue based configuration eliminates this, at the cost of
 throughput.
 This patch aims to make the threadpool largely configurable allowing for
 those using solr to choose the throughput vs latency balance they
 desire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4042) New snowball stemmers (Irish gaelic and Czech)

2012-05-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270511#comment-13270511
 ] 

Robert Muir commented on LUCENE-4042:
-

We have the irish one already, Jim contributed that on LUCENE-3883.

I verified the .sbl is the same and already removed our local copy.

We should add the Czech one imo. It differs from cz.CzechStemmer.java in
that it implements the more aggressive variant, stemming derivational
endings etc, so it gives users a choice.


 New snowball stemmers (Irish gaelic and Czech)
 --

 Key: LUCENE-4042
 URL: https://issues.apache.org/jira/browse/LUCENE-4042
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Dawid Weiss
Priority: Trivial
 Fix For: 4.0


 New stemmers have been added to snowball (Irish gaelic and Czech).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



document search returning no results

2012-05-08 Thread Ryan Langton
I have a search that is coming up empty despite a document existing with the 
search text.  Is the / an illegal character?

Here's the field when I'm creating the document:

[5] = 
{indexed,tokenizedAssignedAreasWithId:3-Genetics,404-AnnalsofFamilyMedicine-July/August2009,60-Obesity/WeightManagement}

Here's my lucene search query:

{+(AssignedAreasWithId:*404-annalsoffamilymedicine-july/august2009*)}

Thanks,

Ryan Langton
Engineer
Digital Evolution Group
913.951.3175 x155 (office)
913.498.9985 (fax)
langt...@digitalev.commailto:langt...@digitalev.com
www.digitalev.comhttp://www.digitalev.com



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-05-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270612#comment-13270612
 ] 

Yonik Seeley commented on SOLR-139:
---

Committed (5 years after the issue was opened!)

I'll keep this issue open and we can add follow-on patches to implement 
increment and other set operations.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Ryan McKinley
 Attachments: Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
 Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, SOLR-139.patch, SOLR-139.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, 
 getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
 getStoredFields.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4043) Add scoring support for query time join

2012-05-08 Thread Martijn van Groningen (JIRA)
Martijn van Groningen created LUCENE-4043:
-

 Summary: Add scoring support for query time join
 Key: LUCENE-4043
 URL: https://issues.apache.org/jira/browse/LUCENE-4043
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/join
Reporter: Martijn van Groningen


Have similar scoring for query time joining just like the index time block join 
(with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.6.0 rc2

2012-05-08 Thread Michael McCandless
+1 to release.

I built/installed successfully on OS X 10.6.8, and ran my usual smoke
test (index/search first 100 K docs from Wikipedia).

Was the added 'print setup args = %s % args' intentional, in
jcc/jcc/python.py?  Just prints a lot of stuff out while building
PyLucene...

Mike McCandless

http://blog.mikemccandless.com

On Mon, May 7, 2012 at 8:20 PM, Andi Vajda va...@apache.org wrote:

 The ivy requirement for building Lucene Java is now handled by PyLucene's
 Makefile. A new release candidate is available for review.

 The PyLucene 3.6.0-2 release tracking the recent release of
 Apache Lucene 3.6.0 is ready.

 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/

 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_6/CHANGES

 PyLucene 3.6.0 is built with JCC 2.13 included in these release artifacts:
 http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES

 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/CHANGES.txt

 Please vote to release these artifacts as PyLucene 3.6.0-2.

 Thanks !

 Andi..

 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS

 pps: here is my +1


[jira] [Updated] (LUCENE-4043) Add scoring support for query time join

2012-05-08 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated LUCENE-4043:
--

Attachment: LUCENE-4043.patch

Draft patch. Added ScoreMode as parameter to JoinUtil#createJoinQuery(...).

Maybe ScoreMode should be a public enum inside the join package.

 Add scoring support for query time join
 ---

 Key: LUCENE-4043
 URL: https://issues.apache.org/jira/browse/LUCENE-4043
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/join
Reporter: Martijn van Groningen
 Attachments: LUCENE-4043.patch


 Have similar scoring for query time joining just like the index time block 
 join (with the score mode).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.6.0 rc2

2012-05-08 Thread Andi Vajda

On May 8, 2012, at 10:18, Michael McCandless luc...@mikemccandless.com wrote:

 +1 to release.
 
 I built/installed successfully on OS X 10.6.8, and ran my usual smoke
 test (index/search first 100 K docs from Wikipedia).
 
 Was the added 'print setup args = %s % args' intentional, in
 jcc/jcc/python.py?  Just prints a lot of stuff out while building
 PyLucene...

Yes, that was submitted by a user to make more explicit what was fed to 
setup(). An aid for debugging.

Andi..

 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 On Mon, May 7, 2012 at 8:20 PM, Andi Vajda va...@apache.org wrote:
 
 The ivy requirement for building Lucene Java is now handled by PyLucene's
 Makefile. A new release candidate is available for review.
 
 The PyLucene 3.6.0-2 release tracking the recent release of
 Apache Lucene 3.6.0 is ready.
 
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_6/CHANGES
 
 PyLucene 3.6.0 is built with JCC 2.13 included in these release artifacts:
 http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_0/lucene/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 3.6.0-2.
 
 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1


[MAVEN] Heads up: build changes

2012-05-08 Thread Steven A Rowe
If you use the Lucene/Solr Maven POMs to drive the build, I committed a major 
change last night (see https://issues.apache.org/jira/browse/LUCENE-3948 for 
more details):

* 'ant get-maven-poms' no longer places pom.xml files under the lucene/ and 
solr/ directories.  Instead, they are placed in a new top-level directory: 
maven-build/.

* When you run 'mvn whatever' under maven-build/, build and test output now 
goes under the conventional Maven target/ directories associated with each 
module's POM under the top-level maven-build/ directory.  Maven build and test 
outputs are now completely separate from those produced by the Ant build.

The above changes don't affect the 'ant generate-maven-artifacts' process - the 
top-level maven-build/ directory is not involved.  (Instead, the 
'generate-maven-artifacts' target calls a separate target - 
'filter-pom-templates' - to copy the POMs to lucene/build/poms/ and interpolate 
their versions.)

Please let me know if you run into problems with the new setup.

Thanks,
Steve


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-05-08 Thread Mike Bria (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270648#comment-13270648
 ] 

Mike Bria commented on SOLR-1604:
-

Hi everyone,

Sorry, but I'm green to this stuff.  How do I apply/install a *.patch file?

I downloaded and successfully built (well, packaged, via mvn) the 
ComplexPhrase.zip from Jul-2011.

I then downloaded the SOLR-1604-alternative.patch from Feb-2012. I can open and 
view it via a text editor...but I have no idea what to do to apply it?  I'm 
working on a RH Linux box at the moment.

Can anyone guide me and/or point me in the right direction please?

Thanks!
Mike

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.6.0 rc2

2012-05-08 Thread Michael McCandless
On Tue, May 8, 2012 at 1:24 PM, Andi Vajda va...@apache.org wrote:

 On May 8, 2012, at 10:18, Michael McCandless luc...@mikemccandless.com 
 wrote:

 Was the added 'print setup args = %s % args' intentional, in
 jcc/jcc/python.py?  Just prints a lot of stuff out while building
 PyLucene...

 Yes, that was submitted by a user to make more explicit what was fed to 
 setup(). An aid for debugging.

OK, makes sense!

Mike McCandless

http://blog.mikemccandless.com


[jira] [Issue Comment Edited] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-05-08 Thread Mike Bria (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270648#comment-13270648
 ] 

Mike Bria edited comment on SOLR-1604 at 5/8/12 6:00 PM:
-

Hi everyone,

Sorry, but I'm green to this stuff.  How do I apply/install a *.patch file?

I downloaded and successfully built (well, packaged, via mvn) the 
ComplexPhrase.zip from Jul-2011.

I then downloaded the 'SOLR-1604-alternative.patch' from Feb-2012. I can open 
and view it via a text editor...but I have no idea what to do to apply it?  
I'm working on a RH Linux box at the moment.

Can anyone guide me and/or point me in the right direction please?

Thanks!
Mike

  was (Author: mbria):
Hi everyone,

Sorry, but I'm green to this stuff.  How do I apply/install a *.patch file?

I downloaded and successfully built (well, packaged, via mvn) the 
ComplexPhrase.zip from Jul-2011.

I then downloaded the SOLR-1604-alternative.patch from Feb-2012. I can open and 
view it via a text editor...but I have no idea what to do to apply it?  I'm 
working on a RH Linux box at the moment.

Can anyone guide me and/or point me in the right direction please?

Thanks!
Mike
  
 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3445) SOLR Stored field in ASCII

2012-05-08 Thread Bill Bell (JIRA)
Bill Bell created SOLR-3445:
---

 Summary: SOLR Stored field in ASCII
 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell


In order to reduce the size of the stored fields and increase performance of 
SOLR by limiting the payload, we should consider adding a parameter for stored 
that will store the information in ASCII format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3445) SOLR Stored field in ASCII

2012-05-08 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270712#comment-13270712
 ] 

Steven Rowe commented on SOLR-3445:
---

For ASCII characters, UTF-8 has the same footprint as ASCII itself, so there is 
no space savings available here.

But maybe you are thinking of a lossy conversion from UTF-8 to ASCII?

 SOLR Stored field in ASCII
 --

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in ASCII format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3445) SOLR Stored field in byte format

2012-05-08 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-3445:


Description: In order to reduce the size of the stored fields and increase 
performance of SOLR by limiting the payload, we should consider adding a 
parameter for stored that will store the information in byte format instead of 
UTF-8.  (was: In order to reduce the size of the stored fields and increase 
performance of SOLR by limiting the payload, we should consider adding a 
parameter for stored that will store the information in ASCII format instead of 
UTF-8.)
Summary: SOLR Stored field in byte format  (was: SOLR Stored field in 
ASCII)

 SOLR Stored field in byte format
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3445) SOLR Stored field in byte format

2012-05-08 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270726#comment-13270726
 ] 

Steven Rowe commented on SOLR-3445:
---

What is byte format?

 SOLR Stored field in byte format
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3445) SOLR Stored field in byte format

2012-05-08 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270727#comment-13270727
 ] 

Bill Bell commented on SOLR-3445:
-

Well for most of my use cases I am okay with the 256 characters and don't need 
the overhead of UTF-8. So instead of converting to UTF-8 just store as a normal 
String. I would also be good with Lossy versions, but I am unaware of these 
algorithms. The goal: get the index smaller since I don't need the data in 
there in UTF-8 format.

String x = new String(Store this into a field in solr);

Instead of something like:

String original = new String(A + \u00ea + \u00f1 + \u00fc + C);




 SOLR Stored field in byte format
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3445) SOLR Stored field in byte format

2012-05-08 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270728#comment-13270728
 ] 

Bill Bell commented on SOLR-3445:
-

Non-Unicoded format? 

 SOLR Stored field in byte format
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3445) SOLR Stored field in non UTF-8 (non-unicoded format)

2012-05-08 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-3445:


Summary: SOLR Stored field in non UTF-8 (non-unicoded format)  (was: SOLR 
Stored field in byte format)

 SOLR Stored field in non UTF-8 (non-unicoded format)
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3445) SOLR Stored field in non UTF-8 (non-unicoded format)

2012-05-08 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270752#comment-13270752
 ] 

Bill Bell commented on SOLR-3445:
-

Does Codecs help with this?

 SOLR Stored field in non UTF-8 (non-unicoded format)
 

 Key: SOLR-3445
 URL: https://issues.apache.org/jira/browse/SOLR-3445
 Project: Solr
  Issue Type: Improvement
Reporter: Bill Bell

 In order to reduce the size of the stored fields and increase performance of 
 SOLR by limiting the payload, we should consider adding a parameter for 
 stored that will store the information in byte format instead of UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 3.6.0 rc2

2012-05-08 Thread Christian Heimes
Am 08.05.2012 02:20, schrieb Andi Vajda:
 Please vote to release these artifacts as PyLucene 3.6.0-2.

All tests are passing on Ubuntu 12.04 AMD64.

This time I'm unable to test PyLucene 3.6 with our application since
bobo browse is incompatible with Lucene 3.6.

Here is my +1

Christian


Re: document search returning no results

2012-05-08 Thread Jack Krupansky
Even with “multi-term aware” (in 3.6 and trunk) analysis, you can’t have a 
single query term that analyzes (tokenizes) into multiple index terms AND has 
wildcards. In other words, if you want to use wildcard, the query term has to 
analyze (tokenize) into a single term.

Three strategies:

1. Split the query into multiple terms that are ANDed together and then use 
wildcards on the specific terms (words or tokens.)

2. Consider whether the field should be tokenized at all. Maybe it should be 
string or keyword and always wildcard to reference values.

3. Have two fields, one which is tokenized and lets you query by individual 
words embedded in the field values, and a second field which is a string or 
keyword and is not tokenized but use wildcards on the full field value, with a 
copyField to populate one field from the stored value of the other.

-- Jack Krupansky

From: Ryan Langton 
Sent: Tuesday, May 08, 2012 12:49 PM
To: mailto:dev@lucene.apache.org 
Subject: document search returning no results

I have a search that is coming up empty despite a document existing with the 
search text.  Is the / an illegal character?

 

Here’s the field when I’m creating the document:

 

[5] = 
{indexed,tokenizedAssignedAreasWithId:3-Genetics,404-AnnalsofFamilyMedicine-July/August2009,60-Obesity/WeightManagement}

 

Here’s my lucene search query:

 

{+(AssignedAreasWithId:*404-annalsoffamilymedicine-july/august2009*)}

 

Thanks,

 

Ryan Langton

Engineer

Digital Evolution Group

913.951.3175 x155 (office)

913.498.9985 (fax)

langt...@digitalev.com

www.digitalev.com

 


[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2012-05-08 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270809#comment-13270809
 ] 

Ahmet Arslan commented on SOLR-1604:


There are two separate ways to enable this functionality.

* You can consume *.zip attachments as a solr plugin. Which does not require 
source code modification, but this particular case requires re-creating 
solr.war. http://wiki.apache.org/solr/SolrPlugins

* *.patch files contains source code modifications. 
http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13892 - Failure

2012-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13892/

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
ERROR: SolrIndexSearcher opens=80 closes=78

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=80 closes=78
at __randomizedtesting.SeedInfo.seed([F35879832CC30BE2]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:212)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:101)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1961)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:742)
at 
org.apache.lucene.util.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:63)
at 
org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:75)
at 
org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:38)
at 
org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:69)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log (for compile errors):
[...truncated 11159 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2857) Multi-content-type /update handler

2012-05-08 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-2857.
-

Resolution: Fixed
  Assignee: Ryan McKinley

I added this in #1335768, and have put rough docs on the wiki -- I'm sure there 
are more links/references that should be updated

 Multi-content-type /update handler
 --

 Key: SOLR-2857
 URL: https://issues.apache.org/jira/browse/SOLR-2857
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
Assignee: Ryan McKinley
 Fix For: 4.0

 Attachments: SOLR-2857-content-type-refactor.patch, 
 SOLR-2857-content-type-refactor.patch, SOLR-2857-content-type-refactor.patch, 
 SOLR-2857-update-content-type.patch, SOLR-2857-update-content-type.patch, 
 SOLR-2857-update-content-type.patch


 Something I've been thinking about lately... it'd be great to get rid of all 
 the specific update handlers like /update/csv, /update/extract, and 
 /update/json and collapse them all into a single /update that underneath uses 
 the content-type(s) to hand off to specific content handlers.  This would 
 make it much easier to toss content at Solr and provide a single entry point 
 for updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)
Eric Spishak created SOLR-3446:
--

 Summary: PatternSyntaxException Crash from Unvalidated Regular 
Expression Usage
 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch

Solr sometimes crashes with an unhelpful stack trace.  If the 
PatternTokenizerFactory's pattern attribute is set to an invalid regular 
expression, a PatternSyntaxException is thrown and Solr fails to start. The 
PatternSyntaxException is not useful to users in diagnosing the error. I think 
it would be better to report a detailed error message. The attached patch makes 
this change.

Note that the patch adds a small RegexUtil class with helper methods to 
determine whether a String is a valid regular expression and to generate error 
messages for invalid regular expressions. I feel that these helper methods are 
more readable than catching the PatternSyntaxException. Furthermore, they can 
be re-used if more bugs like this one are found.

Steps to reproduce:

# Patch in bug.patch
#* Note that this sets PatternTokenizerFactory's pattern attribute to an 
invalid regular expression.
# Run 'ant run-example' from the solr folder
# See exception in console output on startup:

{code}
Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
(
 ^
   at java.util.regex.Pattern.error(Pattern.java:1713)
   at java.util.regex.Pattern.accept(Pattern.java:1571)
   at java.util.regex.Pattern.group0(Pattern.java:2533)
   at java.util.regex.Pattern.sequence(Pattern.java:1806)
   at java.util.regex.Pattern.expr(Pattern.java:1752)
   at java.util.regex.Pattern.compile(Pattern.java:1460)
   at java.util.regex.Pattern.init(Pattern.java:1133)
   at java.util.regex.Pattern.compile(Pattern.java:847)
   at 
org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
   at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
   at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
   at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
   at org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
   at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
   at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
   at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
   at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
   at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
   at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
   at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
   at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
   at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
   at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
   at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: SOLR-3446.patch

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: SOLR-3446.patch

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch, bug.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: bug.patch

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch, bug.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: (was: SOLR-3446.patch)

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch, bug.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: (was: bug.patch)

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch, bug.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)

[jira] [Updated] (SOLR-3446) PatternSyntaxException Crash from Unvalidated Regular Expression Usage

2012-05-08 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated SOLR-3446:
---

Attachment: bug.patch

 PatternSyntaxException Crash from Unvalidated Regular Expression Usage
 --

 Key: SOLR-3446
 URL: https://issues.apache.org/jira/browse/SOLR-3446
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5
Reporter: Eric Spishak
 Attachments: SOLR-3446.patch, bug.patch


 Solr sometimes crashes with an unhelpful stack trace.  If the 
 PatternTokenizerFactory's pattern attribute is set to an invalid regular 
 expression, a PatternSyntaxException is thrown and Solr fails to start. The 
 PatternSyntaxException is not useful to users in diagnosing the error. I 
 think it would be better to report a detailed error message. The attached 
 patch makes this change.
 Note that the patch adds a small RegexUtil class with helper methods to 
 determine whether a String is a valid regular expression and to generate 
 error messages for invalid regular expressions. I feel that these helper 
 methods are more readable than catching the PatternSyntaxException. 
 Furthermore, they can be re-used if more bugs like this one are found.
 Steps to reproduce:
 # Patch in bug.patch
 #* Note that this sets PatternTokenizerFactory's pattern attribute to an 
 invalid regular expression.
 # Run 'ant run-example' from the solr folder
 # See exception in console output on startup:
 {code}
 Apr 3, 2012 2:07:29 PM org.apache.solr.common.SolrException log
 SEVERE: java.util.regex.PatternSyntaxException: Unclosed group near index 1
 (
  ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.accept(Pattern.java:1571)
at java.util.regex.Pattern.group0(Pattern.java:2533)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.init(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:847)
at 
 org.apache.solr.analysis.PatternTokenizerFactory.init(PatternTokenizerFactory.java:90)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:901)
at org.apache.solr.schema.IndexSchema$5.init(IndexSchema.java:890)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:148)
at 
 org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:910)
at org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)
at 
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:480)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:125)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at 

[jira] [Created] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory

2012-05-08 Thread Chris Male (JIRA)
Chris Male created LUCENE-4044:
--

 Summary: Add NamedSPILoader support to TokenizerFactory, 
TokenFilterFactory and CharFilterFactory
 Key: LUCENE-4044
 URL: https://issues.apache.org/jira/browse/LUCENE-4044
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/analysis
Reporter: Chris Male


In LUCENE-2510 I want to move all the analysis factories out of Solr and into 
the directories with what they create.  This is going to hamper Solr's existing 
strategy for supporting {{solr.*}} package names, where it replaces {{solr}} 
with various pre-defined package names.  One way to tackle this is to use 
NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for example, 
and find it wherever it is, as long as it is defined as a service.  This is 
similar to how we support Codecs currently.

As noted by Robert in LUCENE-2510, this would also have the benefit of meaning 
configurations could be less verbose, would aid in fully decoupling the 
analysis module from Solr, and make the analysis factories easier to interact 
with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory

2012-05-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271017#comment-13271017
 ] 

Yonik Seeley commented on LUCENE-4044:
--

bq. This is going to hamper Solr's existing strategy for supporting solr.* 
package names

Why is that?  Why can't solr.WhitespaceTokenizerFactory also check the package 
that you're planning on moving WhitespaceTokenizerFactory to?

 Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and 
 CharFilterFactory
 

 Key: LUCENE-4044
 URL: https://issues.apache.org/jira/browse/LUCENE-4044
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/analysis
Reporter: Chris Male
 Fix For: 4.0


 In LUCENE-2510 I want to move all the analysis factories out of Solr and into 
 the directories with what they create.  This is going to hamper Solr's 
 existing strategy for supporting {{solr.*}} package names, where it replaces 
 {{solr}} with various pre-defined package names.  One way to tackle this is 
 to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for 
 example, and find it wherever it is, as long as it is defined as a service.  
 This is similar to how we support Codecs currently.
 As noted by Robert in LUCENE-2510, this would also have the benefit of 
 meaning configurations could be less verbose, would aid in fully decoupling 
 the analysis module from Solr, and make the analysis factories easier to 
 interact with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory

2012-05-08 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271020#comment-13271020
 ] 

Chris Male commented on LUCENE-4044:


There will be alot of different packages, I assumed that cycling through them 
all would be undesirable.

 Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and 
 CharFilterFactory
 

 Key: LUCENE-4044
 URL: https://issues.apache.org/jira/browse/LUCENE-4044
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/analysis
Reporter: Chris Male
 Fix For: 4.0


 In LUCENE-2510 I want to move all the analysis factories out of Solr and into 
 the directories with what they create.  This is going to hamper Solr's 
 existing strategy for supporting {{solr.*}} package names, where it replaces 
 {{solr}} with various pre-defined package names.  One way to tackle this is 
 to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for 
 example, and find it wherever it is, as long as it is defined as a service.  
 This is similar to how we support Codecs currently.
 As noted by Robert in LUCENE-2510, this would also have the benefit of 
 meaning configurations could be less verbose, would aid in fully decoupling 
 the analysis module from Solr, and make the analysis factories easier to 
 interact with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory

2012-05-08 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271024#comment-13271024
 ] 

Chris Male commented on LUCENE-4044:


With that said, I'm open to suggestions since I dont think this is going to do 
what I want it to do.

 Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and 
 CharFilterFactory
 

 Key: LUCENE-4044
 URL: https://issues.apache.org/jira/browse/LUCENE-4044
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/analysis
Reporter: Chris Male
 Fix For: 4.0


 In LUCENE-2510 I want to move all the analysis factories out of Solr and into 
 the directories with what they create.  This is going to hamper Solr's 
 existing strategy for supporting {{solr.*}} package names, where it replaces 
 {{solr}} with various pre-defined package names.  One way to tackle this is 
 to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for 
 example, and find it wherever it is, as long as it is defined as a service.  
 This is similar to how we support Codecs currently.
 As noted by Robert in LUCENE-2510, this would also have the benefit of 
 meaning configurations could be less verbose, would aid in fully decoupling 
 the analysis module from Solr, and make the analysis factories easier to 
 interact with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4044) Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and CharFilterFactory

2012-05-08 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271055#comment-13271055
 ] 

Chris Male commented on LUCENE-4044:


Hmm it seems that this process only supports singletons, which isn't much use 
to us.

 Add NamedSPILoader support to TokenizerFactory, TokenFilterFactory and 
 CharFilterFactory
 

 Key: LUCENE-4044
 URL: https://issues.apache.org/jira/browse/LUCENE-4044
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: modules/analysis
Reporter: Chris Male
 Fix For: 4.0


 In LUCENE-2510 I want to move all the analysis factories out of Solr and into 
 the directories with what they create.  This is going to hamper Solr's 
 existing strategy for supporting {{solr.*}} package names, where it replaces 
 {{solr}} with various pre-defined package names.  One way to tackle this is 
 to use NamedSPILoader so we simply look up {{StandardTokenizerFactory}} for 
 example, and find it wherever it is, as long as it is defined as a service.  
 This is similar to how we support Codecs currently.
 As noted by Robert in LUCENE-2510, this would also have the benefit of 
 meaning configurations could be less verbose, would aid in fully decoupling 
 the analysis module from Solr, and make the analysis factories easier to 
 interact with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org