[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753391#comment-13753391
 ] 

Robert Muir commented on LUCENE-5189:
-

In the case of old codecs: what we do is pretty tricky for testing:

* we make them read-only officially for the user (so that new segments are 
written in the latest format, but old segments can still be read).
* this has the additional caveat they are not purely read-only, because 
actually we allow liveDocs updates (deletes) against the old formats. so they 
are mostly read-only.
* tests have read-write versions (like in branch4x: PreFlexRWCodec). These 
allow in tests for us to override the read-only-ness, and write like the old 
formats did and read them in transparently in tests. 
* Of course they cannot support the newest features with this impersonator 
testing we do, but in general we get a lot more test coverage than if we relied 
solely upon TestBackwardsCompatibility.


 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5201) UIMAUpdateRequestProcessor should cache the AnalysisEngine

2013-08-29 Thread Tommaso Teofili (JIRA)
Tommaso Teofili created SOLR-5201:
-

 Summary: UIMAUpdateRequestProcessor should cache the AnalysisEngine
 Key: SOLR-5201
 URL: https://issues.apache.org/jira/browse/SOLR-5201
 Project: Solr
  Issue Type: Improvement
  Components: contrib - UIMA
Affects Versions: 4.4
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 4.5, 5.0


As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
which is bad for performance therefore that should be cached in the URP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-08-29 Thread Tommaso Teofili (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-5201:
--

Description: As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
which is bad for performance therefore it'd be nice if such AEs could be reused 
whenever that's possible.  (was: As reported in 
http://markmail.org/thread/2psiyl4ukaejl4fx UIMAUpdateRequestProcessor 
instantiates an AnalysisEngine for each request which is bad for performance 
therefore that should be cached in the URP.)
Summary: UIMAUpdateRequestProcessor should reuse the AnalysisEngine  
(was: UIMAUpdateRequestProcessor should cache the AnalysisEngine)

 UIMAUpdateRequestProcessor should reuse the AnalysisEngine
 --

 Key: SOLR-5201
 URL: https://issues.apache.org/jira/browse/SOLR-5201
 Project: Solr
  Issue Type: Improvement
  Components: contrib - UIMA
Affects Versions: 4.4
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 4.5, 5.0


 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
 UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
 which is bad for performance therefore it'd be nice if such AEs could be 
 reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753455#comment-13753455
 ] 

Uwe Schindler commented on LUCENE-5191:
---

We have a variant of this code, recently added by Robert Muir into 
PostingsHighlighter's DefaultPassageFormatter.

This escapes a little bit more chars, with a reference to OWASP: 
[https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content]
 and 
[https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.232_-_Attribute_Escape_Before_Inserting_Untrusted_Data_into_HTML_Common_Attributes]

The code used here escapes any charis 127 and 255 according to the second 
rule, which is not needed here, because the escaped data is not included into 
HTML attributes which may be unquoted. So for this only the first rule 
applies, in which it is enough to escape the 4 well-known escapes and also the 
forward slash + single quote ('). The latter two ones do not need to be escaped 
if used in text, but for safety we could include them.

In any case I would like to unify the different approaches of HTML escaping. As 
we are not working in unquoted attributes (we just encode floating HTML text), 
I would use Robert's code without the extra numeric escapes.

The official HTML4 spec (I used HTML4, the passage is the same for other HTML, 
see [http://www.w3.org/TR/REC-html40/charset.html#h-5.3.2]): 

{quote}
Four character entity references deserve special mention since they are 
frequently used to escape special characters:

lt; represents the  sign.
gt; represents the  sign.
amp; represents the  sign.
quot; represents the  mark.
Authors wishing to put the  character in text should use lt; (ASCII 
decimal 60) to avoid possible confusion with the beginning of a tag (start tag 
open delimiter). Similarly, authors should use gt; (ASCII decimal 62) in 
text instead of  to avoid problems with older user agents that incorrectly 
perceive this as the end of a tag (tag close delimiter) when it appears in 
quoted attribute values.

Authors should use amp; (ASCII decimal 38) instead of  to avoid confusion 
with the beginning of a character reference (entity reference open delimiter). 
Authors should also use amp; in attribute values since character references 
are allowed within CDATA attribute values.

Some authors use the character entity reference quot; to encode instances of 
the double quote mark () since that character may be used to delimit attribute 
values.
{quote}

Any comments?

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Patch adds some nocommits and tests that expose some problems:

+Problem 1+
If you run the test with {{-Dtests.method=testSegmentMerges 
-Dtests.seed=7651E2AEEBC55BDF}}, you'll hit an exception:

{noformat}
NOTE: reproduce with: ant test  -Dtestcase=TestNumericDocValuesUpdates 
-Dtests.method=testSegmentMerges -Dtests.seed=7651E2AEEBC55BDF 
-Dtests.locale=en_AU -Dtests.timezone=Etc/GMT+11 -Dtests.file.encoding=UTF-8
Aug 29, 2013 11:57:35 AM com.carrotsearch.randomizedtesting.ThreadLeakControl 
checkThreadLeaks
WARNING: Will linger awaiting termination of 1 leaked thread(s).
Aug 29, 2013 11:57:35 AM 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
WARNING: Uncaught exception in thread: Thread[Lucene Merge Thread 
#0,6,TGRP-TestNumericDocValuesUpdates]
org.apache.lucene.index.MergePolicy$MergeException: java.lang.AssertionError: 
formatName=Lucene45 prevValue=Memory
at __randomizedtesting.SeedInfo.seed([7651E2AEEBC55BDF]:0)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.AssertionError: formatName=Lucene45 prevValue=Memory
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:133)
at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:105)
at 
org.apache.lucene.index.ReadersAndLiveDocs.writeLiveDocs(ReadersAndLiveDocs.java:389)
at 
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:178)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3732)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3401)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
{noformat}

What happens is the test uses RandomCodec and picks MemoryDVF for writing that 
field. Later, when ReaderAndLiveDocs applies updates to that field, it uses 
SI.codec, which is not RandomCodec anymore, but Lucene45Codec (or in this case 
Facet45Codec - based on Codec.forName(Lucene45)), and its DVF returns for 
that field Lucene45DVF, because Lucene45Codec always returns that. The way it 
works during search is that PerFieldDVF.FieldsReader does not rely on the Codec 
at all, but rather looks up an attribute in FieldInfo which tells it the 
DVFormat.name and then it calls DVF.forName. But for writing, it relies on the 
Codec.

I am not sure how to resolve this. I don't think ReaderAndLiveDocs is doing 
anything wrong -- per-field is not exposed on Codec API, therefore it shouldn't 
assume it should do any per-field stuff. But on the other hand, Lucene45Codec 
instances return per-field DVF based on what the instance says, and don't look 
at the FieldInfo attributes, as PerFieldDVF.FieldsReader does. Any ideas?

+Problem 2+
Robert thought of this usecase: if you have a sparse DocValue field 'f', such 
that say in segment 1 only doc1 has a value, but in segment 2 none of the 
documents have values, you cannot really update documents in segment 2, because 
the FieldInfos for that segment won't list the field as having DocValues at 
all. For now, I catch that case in ReaderAndLiveDocs and throw an exception. 
The workaround is to make sure you always have values for a field in a segment, 
by e.g. always setting some default value. But this is ugly and exposes 
internal stuff (e.g. segments) to users. Also, it's bad because e.g. if 
segments 1+2 are merged, you suddenly *can* update documents that were in 
segment2 before.

A way to solve it is to gen FieldInfos as well. That will allow us to 
additionally support adding new fields through field updates, though that's 
optional and we can still choose to forbid it. If we gen FieldInfos though, the 
changes I've done to SegmentInfos (recording per-field dvGen) need to be 
reverted. So it's important that we come to a resolution about this in this 
issue. This is somewhat of a corner case (sparse fields), but I don't like the 
fact that users can trip on exceptions that depend whether or not the segment 
was merged...

+Problem 3+
FieldInfos.Builder neglect to update globalFieldNumbers.docValuesType map, if 
it updates a FieldInfo's DocValueType. It's an easy fix, and I added a test to 
numeric updates. If someone has an idea how to reproduce this outside of 
numeric updates scope, I'll be happy handle this in a separate issue. The 

[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-08-29 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753475#comment-13753475
 ] 

Tommaso Teofili commented on SOLR-5201:
---

ok, now I recall why the caching logic was put in the AEProvider. Basically an 
UpdateRequestProcessor is instantiated on each update request (it's not reused) 
and therefore caching it locally wouldn't help.

 UIMAUpdateRequestProcessor should reuse the AnalysisEngine
 --

 Key: SOLR-5201
 URL: https://issues.apache.org/jira/browse/SOLR-5201
 Project: Solr
  Issue Type: Improvement
  Components: contrib - UIMA
Affects Versions: 4.4
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 4.5, 5.0


 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
 UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
 which is bad for performance therefore it'd be nice if such AEs could be 
 reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753476#comment-13753476
 ] 

Shai Erera commented on LUCENE-5189:


Regarding problem 1, I don't know if it's a valid solution, but maybe if we 
recorded a per-field-format map for each SegInfo, Lucene45Codec could 
initialize its dvFormat accordingly? This is not generic though .. it's like we 
need to have a Codec.serialize() method which dumps stuff to SegInfo (or 
returns a BytesRef/String from which it can later initialize itself). We'd then 
not need the attributes on FieldInfo. We have to somehow employ the same logic 
as we do in PerFieldDVF.FieldsReader, in PerFieldDVF.FieldsWriter for updating 
existing segments. Whatever solution we'll do here, will help us when we come 
to implement field updates for postings.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753482#comment-13753482
 ] 

Shai Erera commented on LUCENE-5189:


BTW, this may generally not be a bad idea, to let the Codec serialize some 
stuff which is later given to it in Codec.init(BytesRef). E.g. if a Codec is 
initialized with some parameters that are also important during search (e.g 
FacetsCodec can be initialized with FacetIndexingParams, which get lost during 
search because the Codec is initialized with default ctor), this could be a way 
for it to serialize/deserialize itself. The name will be used for the 
newInstance(), the rest to initialize the Codec.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



solr performance testing

2013-08-29 Thread Mikhail Khludnev
Hello,

afaik http://code.google.com/a/apache-extras.org/p/luceneutil/ is used for
testing Lucene performance. What about Solr? Is it also supported or there
are separate well known facility?

Thanks in advance

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753500#comment-13753500
 ] 

Shai Erera commented on LUCENE-5189:


Regarding problem 3, Mike helped me construct a simple test which reproduces 
the bug - I opened LUCENE-5192 to fix.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5192:
--

 Summary: FieldInfos.Builder failed to catch adding field with 
different DV type under some circumstances
 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5


I found it while working on LUCENE-5189. I'll attach a patch with a simple 
testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5192:
---

Attachment: LUCENE-5192.patch

Patch adds a testcase and fixes the bug. The bug only happens if you add same 
field name as indexable and DV, and then in another segment change its DV type. 
I'll commit it shortly.

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5202) Support easier overrides of Carrot2 clustering attributes via XML data sets exported from the Workbench.

2013-08-29 Thread Dawid Weiss (JIRA)
Dawid Weiss created SOLR-5202:
-

 Summary: Support easier overrides of Carrot2 clustering attributes 
via XML data sets exported from the Workbench.
 Key: SOLR-5202
 URL: https://issues.apache.org/jira/browse/SOLR-5202
 Project: Solr
  Issue Type: New Feature
Reporter: Dawid Weiss
Assignee: Dawid Weiss
 Fix For: 4.5, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753510#comment-13753510
 ] 

Michael McCandless commented on LUCENE-5192:


+1, sneaky!

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5123) invert the codec postings API

2013-08-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5123:
---

Attachment: LUCENE-5123.patch

New patch, adding a test case that exercises this API a bit...

 invert the codec postings API
 -

 Key: LUCENE-5123
 URL: https://issues.apache.org/jira/browse/LUCENE-5123
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Robert Muir
Assignee: Michael McCandless
 Attachments: LUCENE-5123.patch, LUCENE-5123.patch, LUCENE-5123.patch


 Currently FieldsConsumer/PostingsConsumer/etc is a push oriented api, e.g. 
 FreqProxTermsWriter streams the postings at flush, and the default merge() 
 takes the incoming codec api and filters out deleted docs and pushes via 
 same api (but that can be overridden).
 It could be cleaner if we allowed for a pull model instead (like 
 DocValues). For example, maybe FreqProxTermsWriter could expose a Terms of 
 itself and just passed this to the codec consumer.
 This would give the codec more flexibility to e.g. do multiple passes if it 
 wanted to do things like encode high-frequency terms more efficiently with a 
 bitset-like encoding or other things...
 A codec can try to do things like this to some extent today, but its very 
 difficult (look at buffering in Pulsing). We made this change with DV and it 
 made a lot of interesting optimizations easy to implement...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753515#comment-13753515
 ] 

ASF subversion and git services commented on LUCENE-5192:
-

Commit 1518591 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1518591 ]

LUCENE-5192: FieldInfos.Builder failed to catch adding field with different DV 
type under some circumstances

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query

2013-08-29 Thread HeXin (JIRA)
HeXin created SOLR-5203:
---

 Summary: Strengthen the function of Min should match, making it 
select BooleanClause as Occur.MUST according to the weight of query
 Key: SOLR-5203
 URL: https://issues.apache.org/jira/browse/SOLR-5203
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753540#comment-13753540
 ] 

Adrien Grand commented on LUCENE-5192:
--

Wow, good catch!

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query

2013-08-29 Thread HeXin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeXin updated SOLR-5203:


Description: 
In some case, we want the value of mm to select BooleanClause as Occur.MUST can 
according to the weight of query. 

Only if the weight larger than the threshold, it can be selected as Occur.MUST. 
The threshold can be configurable, equaling the minimum integer by default. 

Any comments is welcomed.

  was:In some case, we want the value of mm to select BooleanClause as 
Occur.MUST can according to the weight of query. Only if the weight larger than 
the threshold, it can be selected as Occur.MUST. The threshold can be 
configurable, equaling the minimum integer by default. 


 Strengthen the function of Min should match, making it select BooleanClause 
 as Occur.MUST according to the weight of query
 --

 Key: SOLR-5203
 URL: https://issues.apache.org/jira/browse/SOLR-5203
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


 In some case, we want the value of mm to select BooleanClause as Occur.MUST 
 can according to the weight of query. 
 Only if the weight larger than the threshold, it can be selected as 
 Occur.MUST. The threshold can be configurable, equaling the minimum integer 
 by default. 
 Any comments is welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query

2013-08-29 Thread HeXin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeXin updated SOLR-5203:


Description: In some case, we want the value of mm to select BooleanClause 
as Occur.MUST can according to the weight of query. Only if the weight larger 
than the threshold, it can be selected as Occur.MUST. The threshold can be 
configurable, equaling the minimum integer by default. 
 Issue Type: Improvement  (was: Bug)

 Strengthen the function of Min should match, making it select BooleanClause 
 as Occur.MUST according to the weight of query
 --

 Key: SOLR-5203
 URL: https://issues.apache.org/jira/browse/SOLR-5203
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


 In some case, we want the value of mm to select BooleanClause as Occur.MUST 
 can according to the weight of query. Only if the weight larger than the 
 threshold, it can be selected as Occur.MUST. The threshold can be 
 configurable, equaling the minimum integer by default. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753546#comment-13753546
 ] 

ASF subversion and git services commented on LUCENE-5192:
-

Commit 1518616 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518616 ]

LUCENE-5192: FieldInfos.Builder failed to catch adding field with different DV 
type under some circumstances

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5192.


Resolution: Fixed

Committed to trunk and 4x. On 4x I had to also fix DocFieldProcessor to call 
FieldInfos.addOrUpdate even when the field has been encountered. That's because 
the logic has changed in trunk and now DV fields are processed as stored 
fields, therefore FIS.addOrUpdate is called for both the posting and NDV, but 
in 4x it's not, and only the FI was updated in case you added same field with 
two types (and FIS didn't know about it at all!).

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753557#comment-13753557
 ] 

Shai Erera commented on LUCENE-5189:


Regarding problem 1, I hardwired the test to use Lucene45Codec for now so that 
I'm not blocked. I thought about Codec.serlize/attributes and now I realize 
it's not a good idea since those attributes must be recorded per-segment, yet 
the Codec is single-instance for all segments. We can however record these in 
SegmentInfo.attributes(). The documentation suggests this is where the Codec 
should record stuff per-segment. Would it work if PerFieldDVF recorded the 
per-field-format in SegWriteStage.si.attributes() and read them later, instead 
of FieldInfo.attributes?

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5193:
--

 Summary: Add jar-src to build.xml
 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor


I think it's useful if we have a top-level jar-src which generates source jars 
for all modules. One can basically do that by iterating through the directories 
and calling 'ant jar-src' already, so this is just a convenient way to do it. 
Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5193:
---

Attachment: LUCENE-5193.patch

Simple patch for Lucene modules only, since they already support jar-src.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753579#comment-13753579
 ] 

Shai Erera commented on LUCENE-5193:


If there are no objections, I'll commit it later today.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5193:
---

Attachment: LUCENE-5193.patch

Previous patch did not jar-src core and test-framework.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch, LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753586#comment-13753586
 ] 

Uwe Schindler commented on LUCENE-5189:
---

Hi,
I had an idea yesterday when thinking about this. Currently (like for deletes) 
we can update DocValues based on an ID term (by docid is not easily possible 
with IndexWriter). As the ID term can be anything, you could also use some 
(group) key that updates lots of documents (like you can delete all documents 
with a specific term). The current code updates the given field for all those 
documents to a fixed value. My two ideas are:

- we could also support update by query (means like for deletes you provide a 
query that selects the documents to update)
- we could make modifications possible: Instead of giving a value that is set 
for all selected documents, we could provide a callback interface that is 
used to modify the current docvalue (numeric or String) of the document to 
update and returns a changed value. This would be a one-method interface, so it 
could be used as closure in Java 8, like {{writer.updateDocValues(term, value 
- value+1);}} (in Java 6/7 this would be {{writer.updateDocValues(term, new 
NumericDocValuesUpdater() \{ public long update(long value) \{ return value+1; 
\}\});}}). Servers like Solr or ElasticSearch could implement this 
interface/closure using e.g. javascript, so one could execute a docvalues 
update and pass a javascript function applied to every value. We just have to 
think about concurency: What happens if 2 threads are updating the same value 
at the same time - maybe this is already handled by the BufferedDeletesQueue!?

I just wanted to write this down in this issue, so we could think about 
allowing to implement this. Of course the current patch is more important to 
get the whole game running! The updateable by term/query is just one thing 
which is often requested by users. The typical example is a webapp where you 
can vote for a document. In that case one would execute the closure {{value - 
value+1}}. If we implement this so low level, the whole concurreny should be 
easier than how it is currently impelemented e.g. in Solr or ES.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Test failures the other day

2013-08-29 Thread Erick Erickson
Just in case this ever helps track this down. The other day I had a
situation in which I could NOT run a successful test end-to-end (while
trying to proof SOLR-4817). Usually one of the distrib tests would fail.
Not always the same one. And executing with the seed wouldn't fail. It was
only trying to run the full suite.

Rebooted my machine and all was well, no failures at all.

So how the my environment is getting whacked such that running the full
test suite fails is a mystery...

FWIW,
Erick


[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753600#comment-13753600
 ] 

Shai Erera commented on LUCENE-5189:


I definitely want to add update by query, but in a separate issue. And the 
callback idea is interesting. This callback would need to also get the docid I 
guess (it's missing in your API example)?

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5204) Queries with shards.tolerant=true and stats=true or spellcheck=on do not work

2013-08-29 Thread Anca Kopetz (JIRA)
Anca Kopetz created SOLR-5204:
-

 Summary: Queries with shards.tolerant=true and stats=true or 
spellcheck=on do not work
 Key: SOLR-5204
 URL: https://issues.apache.org/jira/browse/SOLR-5204
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Anca Kopetz


In a SolrCloud environment with 2 shards, if one server is down :
* when we execute queries with shards.tolerant=truestats=true, a 
NullPointerException is thrown

{code} 
java.lang.NullPointerException
at 
org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:105)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:722)
{code} 

* when we execute queries with shards.tolerant=truespellcheck=on, a 
NullPointerException is thrown
{code}
2013-08-26 13:51:42,347 [http-8080-8] ERROR 
org.apache.solr.servlet.SolrDispatchFilter:log:119  - 
null:java.lang.NullPointerException
at 
org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Bug or backcompat example: Solr example/multicore/solr.xml in legacy format?

2013-08-29 Thread Jan Høydahl
+1 for nuking multi core example. And schema less should become the new 
default too, nuking yet another set of parallel configs!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

28. aug. 2013 kl. 16:47 skrev Mark Miller markrmil...@gmail.com:

 I have an old JIRA where I started working on this, but I cannot find it.
 
 There has been no need for the multi core example for years now. I did a 
 bunch of work taking it out at one point, but I'm sure that work is old 
 enough to be useless now. Never go around to committing it.
 
 A few tests tie into those configs and I think there was some other flotsam 
 and jettsom to clean up.
 
 - Mark
 
 On Aug 27, 2013, at 6:10 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 bq:  I think we should just get rid of it entirely
 
 +1, especially since we're going to core discovery, the collections API, etc.
 
 FWIW,
 Erick
 
 
 On Tue, Aug 27, 2013 at 3:42 PM, Shawn Heisey s...@elyograg.org wrote:
 On 8/27/2013 11:24 AM, Jack Krupansky wrote:
 I just happened to notice that the solr.xml file in the Solr
 example/multicore in branch_4x (and 4.4 as well) is still in the old
 legacy format (with cores/core). Is that merely an oversight or
 intentional for demonstrating backwards compatibility?
 
 The example/multicore directory seems to generally very out of date. The 
 schema uses an ancient version, and doesn't have any good examples of how to 
 use analyzers effectively.  I'm fairly sure that all the examples use 
 solr.xml and are therefore inherently multicore.
 
 Unless we plan to thoroughly update the multicore example so it's as modern 
 as the main example, I think we should just get rid of it entirely.
 
 If we need an example that uses legacy config methods, I think we should 
 make a new subdirectory.  It should come with an extensive README and the 
 solrconfig/schema should be more heavily commented than the standard example.
 
 Thanks,
 Shawn
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753622#comment-13753622
 ] 

Uwe Schindler commented on LUCENE-5189:
---

bq. This callback would need to also get the docid I guess (it's missing in 
your API example)?

Of course we could add this. Java 8 would also support this cool syntax, 
something like: {{writer.updateDocValues(term, (docid, value) - value+1);}}

The Java 8 example here was just syntactic sugar: For all this its only 
important that it is an {{interface}} with only one method that gets as many 
parameters as needed and returns one value. We automatically get the cool java 
8 syntax for users, if we design the callback interface to these guidelines. 
One common example  is the Comparator interface in Java. Every ComparatorT 
can be written in this cool syntax :-)

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753720#comment-13753720
 ] 

Robert Muir commented on LUCENE-5191:
-

{quote}
As we are not working in unquoted attributes
{quote}

You cannot make this determination. If you want to copy this method and put a 
less secure version in SimpleHTMLEncoder, thats cool with me.

But don't make PostingsHighlighter less secure: -1 to that.

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5084) new field type - EnumField

2013-08-29 Thread Elran Dvir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elran Dvir updated SOLR-5084:
-

Attachment: Solr-5084.trunk.patch

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-29 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753736#comment-13753736
 ] 

Elran Dvir commented on SOLR-5084:
--

Hi all,

I attached a new patch.
The patch is based on trunk.
It contains changes regarding the issues Robert mentioned (Thanks Robert):
1. fixed the bug where string inputs weren't mapped into their numeric values 
in ValueSourceScorer.getRangeScorer and getRangeQuery
2. removed analysis chain.

In the next following days, I will attach fixes for the remaining issues:
1.Verify value strictness on startup (numeric values start at 0, increment by 
1).
2.Throwing exception when indexed value is not in the configuration (either 
number or string).

Thank you all.

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753774#comment-13753774
 ] 

Uwe Schindler commented on LUCENE-5191:
---

I did not want to modify yours although I disagree.

I will commit the current patch and remove the useless extra encoding.

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5200) Add REST support for reading and modifying Solr configuration

2013-08-29 Thread Michael Della Bitta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753781#comment-13753781
 ] 

Michael Della Bitta commented on SOLR-5200:
---

We've wanted the ability to tune commit properties for bulk indexing, and then 
switch to more incremental indexing-friendly setup on the fly, for a while. +1.

 Add REST support for reading and modifying Solr configuration
 -

 Key: SOLR-5200
 URL: https://issues.apache.org/jira/browse/SOLR-5200
 Project: Solr
  Issue Type: New Feature
Reporter: Steve Rowe
Assignee: Steve Rowe

 There should be a REST API to allow full read access to, and write access to 
 some elements of, Solr's per-core and per-node configuration not already 
 covered by the Schema REST API: 
 {{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and 
 {{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of 
 {{solr.properties}}).
 Use cases for runtime configuration modification include scripted setup, 
 troubleshooting, and tuning.
 Tentative rules-of-thumb about configuration items that should not be 
 modifiable at runtime:
 # Startup-only items, e.g. where to start core discovery
 # Items that are deprecated in 4.X and will be removed in 5.0
 # Items that if modified should be followed by a full re-index
 Some issues to consider:
 Persistence: How (and even whether) to handle persistence for configuration 
 modifications via REST API is not clear - e.g. persisting the entire config 
 file or having one or more sidecar config files that get persisted.  The 
 extent of what should be modifiable will likely affect how persistence is 
 implemented.  For example, if the only {{solrconfig.xml}} modifiable items 
 turn out to be plugin configurations, an alternative to 
 full-{{solrconfig.xml}} persistence could be individual plugin registration 
 of runtime config modifiable items, along with per-plugin sidecar config 
 persistence.
 Live reload: Most (if not all) per-core configuration modifications will 
 require core reload, though it will be a live reload, so some things won't 
 be modifiable, e.g. {{dataDir}} and {{IndexWriter}} related settings in 
 {{indexConfig}} - see SOLR-3592.  (Should a full reload be supported to 
 handle changes in these places?)
 Interpolation aka property substitution: I think it would be useful on read 
 access to optionally return raw values in addition to the interpolated 
 values, e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. 
 interpolated value {{8983}}.   Modification requests will accept raw values - 
 property interpolation will be applied.  At present interpolation is done 
 once, at parsing time, but if property value modification is supported via 
 the REST API, an alternative could be to delay interpolation until values are 
 requested; in this way, property value modification would not trigger 
 re-parsing the affected configuration source.
 Response format: Similarly to the schema REST API, results could be returned 
 in XML, JSON, or any other response writer's output format.
 Transient cores: How should non-loaded transient cores be handled?  Simplest 
 thing would be to load the transient core before handling the request, just 
 like other requests.
 Below I provide an exhaustive list of configuration items in the files in 
 question and indicate which ones I think could be modifiable at runtime.  I 
 don't mean to imply that these must all be made modifiable, or for those that 
 are made modifiable, that they must be made so at once - a piecemeal approach 
 will very likely be more appropriate.
 h2. {{solrconfig.xml}}
 Note that XIncludes and includes via Document Entities won't survive a 
 modification request (assuming persistence is via overwriting the original 
 file).
 ||XPath under {{/config/}}||Should be modifiable via REST 
 API?||Rationale||Description||
 |{{luceneMatchVersion}}|No|Modifying this should be followed by a full 
 re-index|Controls what version of Lucene various components of Solr adhere to|
 |{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available 
 via classloader for {{solrconfig.xml}} and {{schema.xml}}| 
 |{{dataDir}}|No|Not supported by live RELOAD|Holds all index data|
 |{{directoryFactory}}|No|Not supported by live RELOAD|index directory 
 factory|
 |{{codecFactory}}|No|Modifying this should be followed by a full 
 re-index|index codec factory, per-field SchemaCodecFactory by default|
 |{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it 
 should be possible to modify an already Managed schema's mutability|Managed 
 or Classic (non-mutable) schema factory|
 |{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by live 
 RELOAD|low-level indexing behavior|

[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753786#comment-13753786
 ] 

Robert Muir commented on SOLR-5084:
---

{quote}
As long as the config forces them to be explicit about the values (and has 
error checking at startup that the values start a 0 and are monotomicly 
increasing ints) then anyone who wants to insert values into their config is 
going to have to pause and think about the fact that there is a concrete int 
associated with the existing values – and is more likely to realize that 
changing those ints has consequences.
{quote}

If the values are implicitly 0, 1, 2, ... n, then why force the user to write 
that out? 

If you are worried about idiot users, add a comment around the field type to 
the example:

{code}
!-- note: you cannot change the order/existing values without reindexing.
 but you can always add new values to the end. --
{code}

Otherwise it just makes the configuration overly verbose to have them write 
0..n themselves.

 new field type - EnumField
 --

 Key: SOLR-5084
 URL: https://issues.apache.org/jira/browse/SOLR-5084
 Project: Solr
  Issue Type: New Feature
Reporter: Elran Dvir
 Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
 Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch


 We have encountered a use case in our system where we have a few fields 
 (Severity. Risk etc) with a closed set of values, where the sort order for 
 these values is pre-determined but not lexicographic (Critical is higher than 
 High). Generically this is very close to how enums work.
 To implement, I have prototyped a new type of field: EnumField where the 
 inputs are a closed predefined  set of strings in a special configuration 
 file (similar to currency.xml).
 The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3580) In ExtendedDismax, lowercase 'not' operator is not being treated as an operator when 'lowercaseOperators' is enabled

2013-08-29 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753788#comment-13753788
 ] 

Eric Pugh commented on SOLR-3580:
-

I was about to submit a patch for the fact that 'NOT' and 'not' don't work the 
same, when I stumbled across this issue.  My patch file looks rather remarkably 
like [~mdodswo...@salesforce.com] first patch as well!

One thing is that the wiki needs an update: 
http://wiki.apache.org/solr/ExtendedDisMax#lowercaseOperators   I can put that 
in, referring to the patch files as option if you need not:NOT support.

I would like to see something committed, as my customer has the same need for 
NOT to work.   Their users are sophisticated, know the syntax etc.   Backup 
plan is to do something custom.



 In ExtendedDismax, lowercase 'not' operator is not being treated as an 
 operator when 'lowercaseOperators' is enabled
 

 Key: SOLR-3580
 URL: https://issues.apache.org/jira/browse/SOLR-3580
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.0-ALPHA
Reporter: Michael Dodsworth
Priority: Minor
 Attachments: SOLR-3580.patch, SOLR-3580-proposal.patch


 When lowercase operator support is enabled (for edismax), the lowercase 'not' 
 operator is being wrongly treated as a literal term (and not as an operator).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5204) Queries with shards.tolerant=true and stats=true or spellcheck=on do not work

2013-08-29 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753817#comment-13753817
 ] 

Shalin Shekhar Mangar commented on SOLR-5204:
-

Yes, shards.tolerant is supported only in facet, query and grouping only. Stats 
or spellcheck do not support this param yet.

 Queries with shards.tolerant=true and stats=true or spellcheck=on do not work
 -

 Key: SOLR-5204
 URL: https://issues.apache.org/jira/browse/SOLR-5204
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Anca Kopetz

 In a SolrCloud environment with 2 shards, if one server is down :
 * when we execute queries with shards.tolerant=truestats=true, a 
 NullPointerException is thrown
 {code} 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:105)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:722)
 {code} 
 * when we execute queries with shards.tolerant=truespellcheck=on, a 
 NullPointerException is thrown
 {code}
 2013-08-26 13:51:42,347 [http-8080-8] ERROR 
 org.apache.solr.servlet.SolrDispatchFilter:log:119  - 
 null:java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.SpellCheckComponent.finishStage(SpellCheckComponent.java:323)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5201) UIMAUpdateRequestProcessor should reuse the AnalysisEngine

2013-08-29 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753823#comment-13753823
 ] 

Hoss Man commented on SOLR-5201:


[~teofili]: I'm not sure what kind of state the AnalysisEngine maintains that 
might be reused/pollute subsequent requests, but there are two things you could 
do to cache an AnalysisEngine for various durations depending on what you're 
looking for...

* you could create  cache the engine in the UIAMAUpdateRequestProcessor object 
and then it will be re-used for each document included in a single update 
request
* you could create  cache the engine in the 
UIAMAUpdateRequestProcessorFactory, passing it to each 
UIAMAUpdateRequestProcessor it creates, and then it will be re-used for every 
document included in every request.

 UIMAUpdateRequestProcessor should reuse the AnalysisEngine
 --

 Key: SOLR-5201
 URL: https://issues.apache.org/jira/browse/SOLR-5201
 Project: Solr
  Issue Type: Improvement
  Components: contrib - UIMA
Affects Versions: 4.4
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 4.5, 5.0


 As reported in http://markmail.org/thread/2psiyl4ukaejl4fx 
 UIMAUpdateRequestProcessor instantiates an AnalysisEngine for each request 
 which is bad for performance therefore it'd be nice if such AEs could be 
 reused whenever that's possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41

2013-08-29 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5194:
--

 Summary: TestBackwardsCompatibility should not test Pulsing41
 Key: LUCENE-5194
 URL: https://issues.apache.org/jira/browse/LUCENE-5194
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


Spinoff from LUCENE-3069, where Billy discovered this ...

For some reason it's currently testing a Pulsing41 index (at least 
index.41.cfs.zip), but we do not guarantee back compat for PulsingPF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753840#comment-13753840
 ] 

ASF subversion and git services commented on SOLR-4249:
---

Commit 1518717 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1518717 ]

SOLR-4249: UniqFieldsUpdateProcessorFactory now extends 
FieldMutatingUpdateProcessorFactory and supports all of it's selector options

 change UniqFieldsUpdateProcessorFactory to subclass 
 FieldValueSubsetUpdateProcessorFactory
 --

 Key: SOLR-4249
 URL: https://issues.apache.org/jira/browse/SOLR-4249
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Minor
 Attachments: SOLR-4249.patch


 UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
 change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
 redundent code could be eliminated from that class, and the factory could be 
 made more configurable by supporting all of the field matching logic in 
 FieldMutatingUpdateProcessorFactory, not just a list of field names.
 (the only new code that would be needed is handling the legacy config case 
 currently supported by UniqFieldsUpdateProcessorFactory)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753854#comment-13753854
 ] 

ASF subversion and git services commented on LUCENE-5194:
-

Commit 1518720 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1518720 ]

LUCENE-5194: fix 41 test indices to not use PulsingPostingsFormat

 TestBackwardsCompatibility should not test Pulsing41
 

 Key: LUCENE-5194
 URL: https://issues.apache.org/jira/browse/LUCENE-5194
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


 Spinoff from LUCENE-3069, where Billy discovered this ...
 For some reason it's currently testing a Pulsing41 index (at least 
 index.41.cfs.zip), but we do not guarantee back compat for PulsingPF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753857#comment-13753857
 ] 

ASF subversion and git services commented on LUCENE-5194:
-

Commit 1518721 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518721 ]

LUCENE-5194: fix 41 test indices to not use PulsingPostingsFormat

 TestBackwardsCompatibility should not test Pulsing41
 

 Key: LUCENE-5194
 URL: https://issues.apache.org/jira/browse/LUCENE-5194
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


 Spinoff from LUCENE-3069, where Billy discovered this ...
 For some reason it's currently testing a Pulsing41 index (at least 
 index.41.cfs.zip), but we do not guarantee back compat for PulsingPF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41

2013-08-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5194.


Resolution: Fixed

 TestBackwardsCompatibility should not test Pulsing41
 

 Key: LUCENE-5194
 URL: https://issues.apache.org/jira/browse/LUCENE-5194
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


 Spinoff from LUCENE-3069, where Billy discovered this ...
 For some reason it's currently testing a Pulsing41 index (at least 
 index.41.cfs.zip), but we do not guarantee back compat for PulsingPF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5192:
---

Attachment: LUCENE-5192.patch

Maybe something like this?  (for trunk)

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753907#comment-13753907
 ] 

ASF subversion and git services commented on SOLR-4249:
---

Commit 1518746 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518746 ]

SOLR-4249: UniqFieldsUpdateProcessorFactory now extends 
FieldMutatingUpdateProcessorFactory and supports all of it's selector options 
(merge r1518717)

 change UniqFieldsUpdateProcessorFactory to subclass 
 FieldValueSubsetUpdateProcessorFactory
 --

 Key: SOLR-4249
 URL: https://issues.apache.org/jira/browse/SOLR-4249
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Minor
 Attachments: SOLR-4249.patch


 UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
 change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
 redundent code could be eliminated from that class, and the factory could be 
 made more configurable by supporting all of the field matching logic in 
 FieldMutatingUpdateProcessorFactory, not just a list of field names.
 (the only new code that would be needed is handling the legacy config case 
 currently supported by UniqFieldsUpdateProcessorFactory)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-5192:



Hmm, that fix wasn't thread safe (the map inside FieldInfos.FieldNumbers is an 
ordinary HashMap).

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753958#comment-13753958
 ] 

Steve Rowe commented on LUCENE-5193:


+1

I was worried that lucene-codecs src jar wouldn't be built -- in my mind it's 
in the same category as core and test-framework: an internal module -- but it's 
pulled in by the {{modules-crawl}} macro, which runs over all sub-directories 
with {{build.xml}} except {{build/}}, {{core/}}, {{test-framework/}}, and 
{{tools/}}.

I'll make another patch for Solr and the top-level {{build.xml}}.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch, LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754013#comment-13754013
 ] 

Shai Erera commented on LUCENE-5192:


Ahh, good catch. I didn't notice FieldNumbers is sync'd. But, I think this _if_ 
is wrong/problematic:

{noformat}
-if (docValues != null) {
+if (!fi.hasDocValues()  docValues != null) {
+  // First time we are seeing doc values type for
+  // this field:
{noformat}

With this fix, if somebody tries to add a field 'f' as NUMERIC and then BINARY, 
we won't catch it? This is caught today by FI.setDVType, but with this fix, 
that won't be called? Do I miss something? Perhaps you can add an 'else if' and 
compare the given type and fi.getDVType(), but that's just duplicating code 
from FI.setDVType.

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754021#comment-13754021
 ] 

Robert Muir commented on LUCENE-5192:
-

I don't care about code duplication here. We should not invoke the global 
synced fieldnumbers shit for every element, only when the setting actually 
changes

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754040#comment-13754040
 ] 

Shai Erera commented on LUCENE-5192:


In that case it should change to:

{code:java}
if (docValues != null) {
  if (!fi.hasDocValues()) {
// First time we are seeing doc values type for
// this field:
fi.setDocValuesType(docValues);

// must also update docValuesType map so it's
// aware of this field's DocValueType 
globalFieldNumbers.setDocValuesType(fi.number, name, docValues);
  } else if (docValues != fi.getDocValuesType()) {
// THROW EX
  }
}
{code}

Or, we do this:

{code:java}
if (docValues != null) {
  // only pay the synchronization cost if fi does not already have a DVType
  boolean updateGlobal = !fi.hasDocValues();
  fi.setDocValuesType(docValues); // this will also perform the consistency 
check.
  if (updateGlobal) {
globalFieldNumbers.set(...);
  }
}
{code}

Since FieldInfo.setDVType is also called from DocFieldsProcessor, I prefer to 
try and keep the consistency check in one place.

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754069#comment-13754069
 ] 

Michael McCandless commented on LUCENE-5192:


bq. With this fix, if somebody tries to add a field 'f' as NUMERIC and then 
BINARY, we won't catch it? 

Actually, we still catch it, because in DocValuesProcessor.addField we always 
call fieldInfo.setDocValuesType(), so the exc will be thrown from there.

Still, I think addOrUpdate *should* fold in the docValues type ... so I'll just 
go with Shai's 2nd suggestion ...


 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set

2013-08-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754089#comment-13754089
 ] 

Erick Erickson commented on SOLR-4478:
--

I got to thinking about this and trying to take it out of mothballs and I'm 
starting to think it's a terrible idea for 4.x and should be postponed or 
abandoned unless and until we do something like what has been discussed 
elsewhere; having there be one source of truth (ZooKeeper has been discussed 
for instance). So I'll list out the issues I've thought about and if there are 
straightforward answers to them I'll be happy to reconsider.

Each issue is probably technically do-able, but the sum (and ones I haven't 
seen yet) totally scare me.

1 Traditional master/slave architectures. Let's say we change the schema (it'd 
have to be on the master?). How to get that to the slaves? Currently the 
confFiles directive has an explicit test and will not copy a directory. I'm not 
convinced it'd even work with relative paths and listing _every_ file in the 
configset dir would be kludgy at best. And I think the confFiles directive 
doesn't work outside the conf directory for the core it's replicating anyway. 
I suppose the user could copy the configset directory to all the nodes in the 
farm, but

2 The new REST API for modifying the schema. In non-SolrCloud mode, how does 
that work? Is it only allowed on the master (assuming we can solve 1)? How to 
enforce?

3 Sharing the solrConfig object is also fraught with issues as discussed 
above. There's already the share schema option, so at least it's possible to 
have one shared schema.

4 How to get any changes reloaded in a master/slave environment for all the 
affected cores on all the machines? You'd need some kind of manual process of 
going to each one and issuing a new command ReloadAllCores or build in some 
kind of notification system. Or we'd need to require the user to keep a list of 
all the nodes and all the cores and script reloading them all. Nobody should be 
re-inventing ZooKeeper.

5 How to get any changes reloaded in even the non master/slave environment for 
all the affected cores? A new command? Periodic polling? Check every 
query/update request?

6 Sticky wickets I haven't thought of yet, I'm afraid, very afraid... Each of 
these is solvable, but considering the effort involved it doesn't seem like 
it's worth pursuing right now, at least my interest is disappearing.

And wrapped around this is that SolrCloud already handles most of the things 
I'm worried about, especially getting changes propagated to all the right 
places in the cluster. SolrCloud already has a way to reload all the nodes that 
take part in a collection. SolrCloud already has the notifications of changes 
to the config set built in (at least I think, if not it will). 

My feeling at this point is that supporting this well would turn into a huge 
amount of work _that would then be thrown away_ if we go to a one source of 
truth model in Solr5 (or even 6). And that actually _using_ the capability 
would be fragile and complex. So unless I can be convinced otherwise, I'm going 
to assign this back to nobody and forget about it.


 Allow cores to specify a named config set
 -

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for 

[jira] [Commented] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754094#comment-13754094
 ] 

ASF subversion and git services commented on SOLR-4249:
---

Commit 1518836 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1518836 ]

SOLR-4249: remove legacy UniqFieldsUpdateProcessorFactory init param syntax 
from trunk for 5.0

 change UniqFieldsUpdateProcessorFactory to subclass 
 FieldValueSubsetUpdateProcessorFactory
 --

 Key: SOLR-4249
 URL: https://issues.apache.org/jira/browse/SOLR-4249
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Minor
 Attachments: SOLR-4249.patch


 UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
 change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
 redundent code could be eliminated from that class, and the factory could be 
 made more configurable by supporting all of the field matching logic in 
 FieldMutatingUpdateProcessorFactory, not just a list of field names.
 (the only new code that would be needed is handling the legacy config case 
 currently supported by UniqFieldsUpdateProcessorFactory)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4249.


Resolution: Fixed

 change UniqFieldsUpdateProcessorFactory to subclass 
 FieldValueSubsetUpdateProcessorFactory
 --

 Key: SOLR-4249
 URL: https://issues.apache.org/jira/browse/SOLR-4249
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4249.patch


 UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
 change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
 redundent code could be eliminated from that class, and the factory could be 
 made more configurable by supporting all of the field matching logic in 
 FieldMutatingUpdateProcessorFactory, not just a list of field names.
 (the only new code that would be needed is handling the legacy config case 
 currently supported by UniqFieldsUpdateProcessorFactory)
 ---
 For users of 4.x starting with 4.5, the existing init param syntax will still 
 be supported, but a warning will be logged recommending they switch to using 
 {{arr name=fieldName.../arr}} instead of {{lst 
 name=fields../lst}}.  Starting with 5.0, the fields option won't be 
 recognized at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4249:
---

  Description: 
UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change 
it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code 
could be eliminated from that class, and the factory could be made more 
configurable by supporting all of the field matching logic in 
FieldMutatingUpdateProcessorFactory, not just a list of field names.

(the only new code that would be needed is handling the legacy config case 
currently supported by UniqFieldsUpdateProcessorFactory)

---

For users of 4.x starting with 4.5, the existing init param syntax will still 
be supported, but a warning will be logged recommending they switch to using 
{{arr name=fieldName.../arr}} instead of {{lst name=fields../lst}}. 
 Starting with 5.0, the fields option won't be recognized at all.


  was:
UniqFieldsUpdateProcessorFactory has been arround for a while, but if we change 
it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of redundent code 
could be eliminated from that class, and the factory could be made more 
configurable by supporting all of the field matching logic in 
FieldMutatingUpdateProcessorFactory, not just a list of field names.

(the only new code that would be needed is handling the legacy config case 
currently supported by UniqFieldsUpdateProcessorFactory)

Fix Version/s: 5.0
   4.5

 change UniqFieldsUpdateProcessorFactory to subclass 
 FieldValueSubsetUpdateProcessorFactory
 --

 Key: SOLR-4249
 URL: https://issues.apache.org/jira/browse/SOLR-4249
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-4249.patch


 UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
 change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
 redundent code could be eliminated from that class, and the factory could be 
 made more configurable by supporting all of the field matching logic in 
 FieldMutatingUpdateProcessorFactory, not just a list of field names.
 (the only new code that would be needed is handling the legacy config case 
 currently supported by UniqFieldsUpdateProcessorFactory)
 ---
 For users of 4.x starting with 4.5, the existing init param syntax will still 
 be supported, but a warning will be logged recommending they switch to using 
 {{arr name=fieldName.../arr}} instead of {{lst 
 name=fields../lst}}.  Starting with 5.0, the fields option won't be 
 recognized at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5191:
--

Attachment: LUCENE-5191.patch

Attached is a new patch also escaping the single ' and the forwards slash 
(although the latter is not really required, but I did this to make Robert 
happy). I refuse to encode the Latin1 chars.

I will commit this in a minute.

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch, LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754110#comment-13754110
 ] 

ASF subversion and git services commented on LUCENE-5191:
-

Commit 1518839 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1518839 ]

LUCENE-5191: Fix Unicode corrumption in HTML escaping of Standard Highlighter 
and Fast Vector Highlighter.

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch, LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-5193:
---

Attachment: LUCENE-5193.patch

This patch incorporates Shai's Lucene patch, and adds Solr and top-level 
{{jar-src}} targets.

I also took the opportunity to fix up Solr's {{jar-src}} specialization (needed 
for Solr-specific manifest entries) to be like Lucene's: the {{$\{build.dir}}} 
is created, and the module's {{src/resources/**}} are included (only solr-uima 
and solr-langid have these at this point).

I think it's ready to go - if you like, Shai, I can commit.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754112#comment-13754112
 ] 

ASF subversion and git services commented on LUCENE-5191:
-

Commit 1518840 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518840 ]

Merged revision(s) 1518839 from lucene/dev/trunk:
LUCENE-5191: Fix Unicode corrumption in HTML escaping of Standard Highlighter 
and Fast Vector Highlighter.

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch, LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754118#comment-13754118
 ] 

Uwe Schindler commented on LUCENE-5193:
---

Thanks, looks good! Especially as the resources are now in the source JAR, 
which is done by the maven archiver plugin, too.

Thanks also for adding the info text on top-level build, so {{ant}} prints it 
in the usage help.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5191.
---

Resolution: Fixed

 SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
 --

 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5191.patch, LUCENE-5191.patch


 The highlighter provides a function to escape HTML, which does to much. To 
 create valid HTML only , , ,  must be escaped, everything else can kept 
 unescaped. The escaper unfortunately does also additionally escape everything 
  127, which is unneeded if your web site has the correct encoding. It also 
 produces huge amounts of HTML entities if used with eastern languages.
 This would not be a bugf if the escaping would be correct, but it isn't, it 
 escapes like that:
 {{result.append(\#).append((int)ch).append(;);}}
 So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
 the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
 U+10400 (deseret capital letter long i) would be escaped as 
 {{\#55297;\#56320;}} and not as {{\#66560;}}.
 So we should remove the stupid encoding of chars  127 which is simply 
 useless :-)
 See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5194) Annecdotal reports of what smells like thread safety issues with concurrent partial updates?

2013-08-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754144#comment-13754144
 ] 

Yonik Seeley commented on SOLR-5194:


After reviewing all the issues, I don't think this is due to any thread safety 
issues, but due to partial support for BigDecimal.

 Annecdotal reports of what smells like thread safety issues with concurrent 
 partial updates?
 

 Key: SOLR-5194
 URL: https://issues.apache.org/jira/browse/SOLR-5194
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 In SOLR-4021 two users reported seeing errors similar to the crux of that 
 issue (ie: JavaBinCodec errors) only when doing bulk document adds while 
 concurrently using partial updates.
 this smells like a thread safety issue arround the transaction log -- opening 
 a new issue in the hopes that thye can post specific stack traces here since 
 it seems to be a distinct problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4021) JavaBinCodec has poor default behavior for unrecognized classes of objects

2013-08-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754152#comment-13754152
 ] 

Yonik Seeley commented on SOLR-4021:


It looks like DIH can produce BigDecimal values, which historically did not 
have support in Solr, and currently only has partial support.
Either DIH needs to be changed to avoid BigDecimal, or we need to add better 
BigDecimal support (at a minimum, the JavaBin format, and perhaps to atomic 
updates too).

 JavaBinCodec has poor default behavior for unrecognized classes of objects
 --

 Key: SOLR-4021
 URL: https://issues.apache.org/jira/browse/SOLR-4021
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.0
Reporter: Hoss Man

 It seems that JavaBinCodec has inconsistent serialize/deserialize behavior 
 when dealing with objects of classes that it doesn't recognized.  In 
 particular, unrecnognized objects seem to be serialized with the full 
 classname prepented to the toString() value, and then that resulting 
 concatentated string is left as is during deserialization.
 as a concrete example: serializing  deserializing a BigDecimal value results 
 in a final value like java.math.BigDecimal:1848.66 even though for most 
 users the simple toString() value would have worked as intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5198) Make default similarty configurable

2013-08-29 Thread Feihong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754236#comment-13754236
 ] 

Feihong Huang commented on SOLR-5198:
-

Make default similary configurable maybe make sense, 
Such as We can use BM25Similarity instead of TFIDFSimilarity just through 
modifying configure, 
rather than writing other custom schemasimilarityfactory. 

 Make default similarty configurable
 ---

 Key: SOLR-5198
 URL: https://issues.apache.org/jira/browse/SOLR-5198
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


   Though the code has supported for customizing scoring on a per-field basis 
 in using similarity/ in a schema's fieldType and 
 we can configure our custom similarity factory in schema,  we can't configure 
 the default similarty and it is hardcode in SchemaSimilarityFactory. 
   If we want to use another similarity as default similarty instead of 
 DefaultSimilarity provided by lucene, we must to write another similarity 
 factory to do this. Therefore, it is necessary to make default similarty 
 configurable. 
   Any comments is welcomed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5198) Make default similarty configurable

2013-08-29 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754271#comment-13754271
 ] 

Shawn Heisey commented on SOLR-5198:


I am using BM25 without any custom code.  Here's the top of my schema.xml:

{noformat}
schema name=ncdat version=1.5

  similarity class=solr.BM25SimilarityFactory/
{noformat}



 Make default similarty configurable
 ---

 Key: SOLR-5198
 URL: https://issues.apache.org/jira/browse/SOLR-5198
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


   Though the code has supported for customizing scoring on a per-field basis 
 in using similarity/ in a schema's fieldType and 
 we can configure our custom similarity factory in schema,  we can't configure 
 the default similarty and it is hardcode in SchemaSimilarityFactory. 
   If we want to use another similarity as default similarty instead of 
 DefaultSimilarity provided by lucene, we must to write another similarity 
 factory to do this. Therefore, it is necessary to make default similarty 
 configurable. 
   Any comments is welcomed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo

2013-08-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5058.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5
 Assignee: Hoss Man

Thanks for reporting this Thomas

 org.apache.solr.update.PeerSync Logging Warning Typo
 

 Key: SOLR-5058
 URL: https://issues.apache.org/jira/browse/SOLR-5058
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Thomas Murphy
Assignee: Hoss Man
Priority: Trivial
  Labels: easyfix
 Fix For: 4.5, 5.0

   Original Estimate: 5m
  Remaining Estimate: 5m

 Log entry appears on Solr Admin Logging interface:
 WARN  PeerSyncno frame of reference to tell of we've missed updates
 There is a typo, this looks like it should read to tell if we've
 PeerSync expands to org.apache.solr.update.PeerSync

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754282#comment-13754282
 ] 

ASF subversion and git services commented on SOLR-5058:
---

Commit 1518874 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518874 ]

SOLR-5058: log msg typo (merge r1518872)

 org.apache.solr.update.PeerSync Logging Warning Typo
 

 Key: SOLR-5058
 URL: https://issues.apache.org/jira/browse/SOLR-5058
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Thomas Murphy
Priority: Trivial
  Labels: easyfix
   Original Estimate: 5m
  Remaining Estimate: 5m

 Log entry appears on Solr Admin Logging interface:
 WARN  PeerSyncno frame of reference to tell of we've missed updates
 There is a typo, this looks like it should read to tell if we've
 PeerSync expands to org.apache.solr.update.PeerSync

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5058) org.apache.solr.update.PeerSync Logging Warning Typo

2013-08-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754277#comment-13754277
 ] 

ASF subversion and git services commented on SOLR-5058:
---

Commit 1518872 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1518872 ]

SOLR-5058: log msg typo

 org.apache.solr.update.PeerSync Logging Warning Typo
 

 Key: SOLR-5058
 URL: https://issues.apache.org/jira/browse/SOLR-5058
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Thomas Murphy
Priority: Trivial
  Labels: easyfix
   Original Estimate: 5m
  Remaining Estimate: 5m

 Log entry appears on Solr Admin Logging interface:
 WARN  PeerSyncno frame of reference to tell of we've missed updates
 There is a typo, this looks like it should read to tell if we've
 PeerSync expands to org.apache.solr.update.PeerSync

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5194) TestBackwardsCompatibility should not test Pulsing41

2013-08-29 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754307#comment-13754307
 ] 

Han Jiang commented on LUCENE-5194:
---

Thanks Mike!

 TestBackwardsCompatibility should not test Pulsing41
 

 Key: LUCENE-5194
 URL: https://issues.apache.org/jira/browse/LUCENE-5194
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 5.0, 4.5


 Spinoff from LUCENE-3069, where Billy discovered this ...
 For some reason it's currently testing a Pulsing41 index (at least 
 index.41.cfs.zip), but we do not guarantee back compat for PulsingPF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5193) Add jar-src to build.xml

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754347#comment-13754347
 ] 

Shai Erera commented on LUCENE-5193:


Looks good Steve. Feel free to commit. I'm not sure I'll be able to today.

 Add jar-src to build.xml
 

 Key: LUCENE-5193
 URL: https://issues.apache.org/jira/browse/LUCENE-5193
 Project: Lucene - Core
  Issue Type: New Feature
  Components: general/build
Reporter: Shai Erera
Priority: Minor
 Attachments: LUCENE-5193.patch, LUCENE-5193.patch, LUCENE-5193.patch


 I think it's useful if we have a top-level jar-src which generates source 
 jars for all modules. One can basically do that by iterating through the 
 directories and calling 'ant jar-src' already, so this is just a convenient 
 way to do it. Will attach a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5203) Strengthen the function of Min should match, making it select BooleanClause as Occur.MUST according to the weight of query

2013-08-29 Thread yinyue (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754360#comment-13754360
 ] 

yinyue commented on SOLR-5203:
--

Good feature, it's useful when we have weighting terms.

 Strengthen the function of Min should match, making it select BooleanClause 
 as Occur.MUST according to the weight of query
 --

 Key: SOLR-5203
 URL: https://issues.apache.org/jira/browse/SOLR-5203
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


 In some case, we want the value of mm to select BooleanClause as Occur.MUST 
 can according to the weight of query. 
 Only if the weight larger than the threshold, it can be selected as 
 Occur.MUST. The threshold can be configurable, equaling the minimum integer 
 by default. 
 Any comments is welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5198) Make default similarty configurable

2013-08-29 Thread HeXin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754372#comment-13754372
 ] 

HeXin commented on SOLR-5198:
-

hi, Shawn, you are right. 
But i think you have written the class BM25SimilarityFactory at first and maybe 
its function just to provide BM25Similarity as default similarity.

Maybe i have not describe the feature clearly.  I just want the two scenarios 
below can be done just through modifying schema.xml. 

1. If we want to use a different default similarity rather than 
TFIDFSimilarity. 
2. If we want to do per-field support and make BM25Similarity as default 
similarity for the fields which not configure similarity. 

I think we can support it without any custom code.


 Make default similarty configurable
 ---

 Key: SOLR-5198
 URL: https://issues.apache.org/jira/browse/SOLR-5198
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


   Though the code has supported for customizing scoring on a per-field basis 
 in using similarity/ in a schema's fieldType and 
 we can configure our custom similarity factory in schema,  we can't configure 
 the default similarty and it is hardcode in SchemaSimilarityFactory. 
   If we want to use another similarity as default similarty instead of 
 DefaultSimilarity provided by lucene, we must to write another similarity 
 factory to do this. Therefore, it is necessary to make default similarty 
 configurable. 
   Any comments is welcomed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5192) FieldInfos.Builder failed to catch adding field with different DV type under some circumstances

2013-08-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754381#comment-13754381
 ] 

Shai Erera commented on LUCENE-5192:


Ahh that explains it. +1 to commit the synchronization fix!

 FieldInfos.Builder failed to catch adding field with different DV type under 
 some circumstances
 ---

 Key: LUCENE-5192
 URL: https://issues.apache.org/jira/browse/LUCENE-5192
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5192.patch, LUCENE-5192.patch


 I found it while working on LUCENE-5189. I'll attach a patch with a simple 
 testcase which reproduces the problem and a fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org