[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753312#comment-13753312
 ] 

Shai Erera commented on LUCENE-5189:


Thanks Rob. I forgot about SuppressCodecs :). I guess I was confused by why 
Lucene40 was picked in the first place, as I thought we don't test writing 
indexes with old Codecs?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5200) Add REST support for reading and modifying Solr configuration

2013-08-28 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5200:
-

Description: 
There should be a REST API to allow full read access to, and write access to 
some elements of, Solr's per-core and per-node configuration not already 
covered by the Schema REST API: 
{{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and 
{{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of 
{{solr.properties}}).

Use cases for runtime configuration modification include scripted setup, 
troubleshooting, and tuning.

Tentative rules-of-thumb about configuration items that should not be 
modifiable at runtime:

# Startup-only items, e.g. where to start core discovery
# Items that are deprecated in 4.X and will be removed in 5.0
# Items that if modified should be followed by a full re-index

Some issues to consider:

Persistence: How (and even whether) to handle persistence for configuration 
modifications via REST API is not clear - e.g. persisting the entire config 
file or having one or more sidecar config files that get persisted.  The extent 
of what should be modifiable will likely affect how persistence is implemented. 
 For example, if the only {{solrconfig.xml}} modifiable items turn out to be 
plugin configurations, an alternative to full-{{solrconfig.xml}} persistence 
could be individual plugin registration of runtime config modifiable items, 
along with per-plugin sidecar config persistence.

"Live" reload: Most (if not all) per-core configuration modifications will 
require core reload, though it will be a "live" reload, so some things won't be 
modifiable, e.g. {{}} and {{IndexWriter}} related settings in 
{{}} - see SOLR-3592.  (Should a full reload be supported to 
handle changes in these places?)

Interpolation aka property substitution: I think it would be useful on read 
access to optionally return raw values in addition to the interpolated values, 
e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. 
interpolated value {{8983}}.   Modification requests will accept raw values - 
property interpolation will be applied.  At present interpolation is done once, 
at parsing time, but if property value modification is supported via the REST 
API, an alternative could be to delay interpolation until values are requested; 
in this way, property value modification would not trigger re-parsing the 
affected configuration source.

Response format: Similarly to the schema REST API, results could be returned in 
XML, JSON, or any other response writer's output format.

Transient cores: How should non-loaded transient cores be handled?  Simplest 
thing would be to load the transient core before handling the request, just 
like other requests.

Below I provide an exhaustive list of configuration items in the files in 
question and indicate which ones I think could be modifiable at runtime.  I 
don't mean to imply that these must all be made modifiable, or for those that 
are made modifiable, that they must be made so at once - a piecemeal approach 
will very likely be more appropriate.

h2. {{solrconfig.xml}}

Note that XIncludes and includes via Document Entities won't survive a 
modification request (assuming persistence is via overwriting the original 
file).

||XPath under {{/config/}}||Should be modifiable via REST 
API?||Rationale||Description||
|{{luceneMatchVersion}}|No|Modifying this should be followed by a full 
re-index|Controls what version of Lucene various components of Solr adhere to|
|{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available 
via classloader for {{solrconfig.xml}} and {{schema.xml}}| 
|{{dataDir}}|No|Not supported by "live" RELOAD|Holds all index data|
|{{directoryFactory}}|No|Not supported by "live" RELOAD|index directory factory|
|{{codecFactory}}|No|Modifying this should be followed by a full re-index|index 
codec factory, per-field SchemaCodecFactory by default|
|{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it 
should be possible to modify an already Managed schema's mutability|Managed or 
Classic (non-mutable) schema factory|
|{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by "live" 
RELOAD|low-level indexing behavior|
|{{jmx}}|Yes| |Enables JMX if an MBeanServer is found|
|{{updateHandler@class}}|No| |Defaults to DirectUpdateHandler2|
|{{updateHandler/updateLog}}|No| |Enables a transaction log, configures its 
directory and synchronization|
|{{updateHandler/autoCommit}}|Yes| |Durability: enables hard autocommit, 
configures max interval and whether to open a searcher afterward| 
|{{updateHandler/autoSoftCommit}}|Yes| |Visibility: enables soft autocommit, 
configures max interval|
|{{updateHandler/commitWithin/softCommit}}|Yes| |Whether commitWithin update 
request param should trigger a soft commit instead of hard commit|
|{{updateHandler/listener}}|Yes| |Upda

[jira] [Created] (SOLR-5200) Add REST support for reading and modifying Solr configuration

2013-08-28 Thread Steve Rowe (JIRA)
Steve Rowe created SOLR-5200:


 Summary: Add REST support for reading and modifying Solr 
configuration
 Key: SOLR-5200
 URL: https://issues.apache.org/jira/browse/SOLR-5200
 Project: Solr
  Issue Type: New Feature
Reporter: Steve Rowe
Assignee: Steve Rowe


There should be a REST API to allow full read access to, and write access to 
some elements of, Solr's per-core and per-node configuration not already 
covered by the Schema REST API: 
{{solrconfig.xml}}/{{core.properties}}/{{solrcore.properties}} and 
{{solr.xml}}/{{solr.properties}} (SOLR-4718 discusses addition of 
{{solr.properties}}).

Use cases for runtime configuration modification include scripted setup, 
troubleshooting, and tuning.

Tentative rules-of-thumb about configuration items that should not be 
modifiable at runtime:

# Startup-only items, e.g. where to start core discovery
# Items that are deprecated in 4.X and will be removed in 5.0
# Items that if modified should be followed by a full re-index

Some issues to consider:

Persistence: How (and even whether) to handle persistence for configuration 
modifications via REST API is not clear - e.g. persisting the entire config 
file or having one or more sidecar config files that get persisted.  The extent 
of what should be modifiable will likely affect how persistence is implemented. 
 For example, if the only {{solrconfig.xml}} modifiable items turn out to be 
plugin configurations, an alternative to full-{{solrconfig.xml}} persistence 
could be individual plugin registration of runtime config modifiable items, 
along with per-plugin sidecar config persistence.

"Live" reload: Most (if not all) per-core configuration modifications will 
require core reload, though it will be a "live" reload, so some things won't be 
modifiable, e.g. {{}} and {{IndexWriter}} related settings in 
{{}} - see SOLR-3592.  (Should a full reload be supported to 
handle changes in these places?)

Interpolation aka property substitution: I think it would be useful on read 
access to optionally return raw values in addition to the interpolated values, 
e.g. {{solr.xml}} {{hostPort}} raw value {{$\{jetty.port:8983}}} vs. 
interpolated value {{8983}}.   Modification requests will accept raw values - 
property interpolation will be applied.  At present interpolation is done once, 
at parsing time, but if property value modification is supported via the REST 
API, an alternative could be to delay interpolation until values are requested; 
in this way, property value modification would not trigger re-parsing the 
affected configuration source.

Response format: Similarly to the schema REST API, results could be returned in 
XML, JSON, or any other response writer's output format.

Transient cores: How should non-loaded transient cores be handled?  Simplest 
thing would be to load the transient core before handling the request, just 
like other requests.

Below I provide an exhaustive list of configuration items in the files in 
question and indicate which ones I think could be modifiable at runtime.  I 
don't mean to imply that these must all be made modifiable, or for those that 
are made modifiable, that they must be made so at once - a piecemeal approach 
will very likely be more appropriate.

h2. {{solrconfig.xml}}

Note that XIncludes and includes via Document Entities won't survive a 
modification request (assuming persistence is via overwriting the original 
file).

||XPath under {{/config/}}||Should be modifiable via REST 
API?||Rationale||Description||
|{{luceneMatchVersion}}|No|Modifying this should be followed by a full 
re-index|Controls what version of Lucene various components of Solr adhere to|
|{{lib}}|Yes|Required for adding plugins at runtime|Contained jars available 
via classloader for {{solrconfig.xml}} and {{schema.xml}}| 
|{{dataDir}}|No|Not supported by "live" RELOAD|Holds all index data|
|{{directoryFactory}}|No|Not supported by "live" RELOAD|index directory factory|
|{{codecFactory}}|No|Modifying this should be followed by a full re-index|index 
codec factory, per-field SchemaCodecFactory by default|
|{{schemaFactory}}|Partial|Although the class shouldn't be modifiable, it 
should be possible to modify an already Managed schema's mutability|Managed or 
Classic (non-mutable) schema factory|
|{{indexConfig}}|No|{{IndexWriter}}-related settings not supported by "live" 
RELOAD|low-level indexing behavior|
|{{jmx}}|Yes||Enables JMX if an MBeanServer is found|
|{{updateHandler@class}}|No||Defaults to DirectUpdateHandler2|
|{{updateHandler/updateLog}}|No||Enables a transaction log, configures its 
directory and synchronization|
|{{updateHandler/autoCommit}}|Yes||Durability: enables hard autocommit, 
configures max interval and whether to open a searcher afterward| 
|{{updateHandler/autoSoftCommit}}|Yes||Visibility: enables soft autocommit, 
configures max interval|
|{{updateHandl

[jira] [Updated] (SOLR-4249) change UniqFieldsUpdateProcessorFactory to subclass FieldValueSubsetUpdateProcessorFactory

2013-08-28 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4249:
---

Attachment: SOLR-4249.patch

A "phase 1" patch that switches UniqFieldsUpdateProcessorFactory to be a 
subclass of FieldValueSubsetUpdateProcessorFactory with some custom init logic 
to deal with the previous "fields" config syntax and log a warning that it is 
deprecated.  Includes a new test of the FieldMutatingUpdateProcessor selector 
syntax, but leaves the other existing tests alone to prove that it still works.

plan is to commit & backport this, then commit a trunk only change removing the 
backcompat support for the hackish syntax and upate the test configs 
accordingly.

> change UniqFieldsUpdateProcessorFactory to subclass 
> FieldValueSubsetUpdateProcessorFactory
> --
>
> Key: SOLR-4249
> URL: https://issues.apache.org/jira/browse/SOLR-4249
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Minor
> Attachments: SOLR-4249.patch
>
>
> UniqFieldsUpdateProcessorFactory has been arround for a while, but if we 
> change it to subclass FieldValueSubsetUpdateProcessorFactory, a lot of 
> redundent code could be eliminated from that class, and the factory could be 
> made more configurable by supporting all of the field matching logic in 
> FieldMutatingUpdateProcessorFactory, not just a list of field names.
> (the only new code that would be needed is handling the legacy config case 
> currently supported by UniqFieldsUpdateProcessorFactory)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753009#comment-13753009
 ] 

Uwe Schindler edited comment on LUCENE-5191 at 8/28/13 11:53 PM:
-

Hi Walter,

I agree this could be used to fix, but its not useful here! There is no need to 
escape codepoints > 127. It just produces huge junks of escapes for all eastern 
languages! Escaping chars > 127 was done in the 1990s when web pages were not 
able to use other charsets than ISO-8859-1 or US-ASCII (and HTTP version 0.9 
was not binary safe).

  was (Author: thetaphi):
Hi Walter,

I agree this could be used to fix, but its not useful here! There is no need to 
escape codepoints > 127. It just produces huge junks of escapes for all eastern 
languages! Escaping chars > 127 was done in the 1990s when web pages were not 
able to use other charsets than ISO-8859-1 or US-ASCII.
  
> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5191.patch
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753009#comment-13753009
 ] 

Uwe Schindler edited comment on LUCENE-5191 at 8/28/13 11:52 PM:
-

Hi Walter,

I agree this could be used to fix, but its not useful here! There is no need to 
escape codepoints > 127. It just produces huge junks of escapes for all eastern 
languages! Escaping chars > 127 was done in the 1990s when web pages were not 
able to use other charsets than ISO-8859-1 or US-ASCII.

  was (Author: thetaphi):
Hi Walter,

I agree this could be used to fix, but its useless! There is no need to escape 
codepoints > 127. It just produces huge junks of escapes for all eastern 
languages! Escaping chars > 127 was done in the 1990s when web pages were not 
able to use other charsets than ISO-8859-1.
  
> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5191.patch
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753009#comment-13753009
 ] 

Uwe Schindler commented on LUCENE-5191:
---

Hi Walter,

I agree this could be used to fix, but its useless! There is no need to escape 
codepoints > 127. It just produces huge junks of escapes for all eastern 
languages! Escaping chars > 127 was done in the 1990s when web pages were not 
able to use other charsets than ISO-8859-1.

> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5191.patch
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753002#comment-13753002
 ] 

Walter Underwood commented on LUCENE-5191:
--

A different fix would be to iterate on codepoints instead of characters. That 
would fix the botch with high and low surrogates.

> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5191.patch
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5191:
--

Attachment: LUCENE-5191.patch

Simple patch.

> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5191.patch
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5191:
--

Description: 
The highlighter provides a function to escape HTML, which does to much. To 
create valid HTML only ", <, >, & must be escaped, everything else can kept 
unescaped. The escaper unfortunately does also additionally escape everything > 
127, which is unneeded if your web site has the correct encoding. It also 
produces huge amounts of HTML entities if used with eastern languages.

This would not be a bugf if the escaping would be correct, but it isn't, it 
escapes like that:

{{result.append("\&#").append((int)ch).append(";");}}

So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the 
UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:

U+10400 (deseret capital letter long i) would be escaped as 
{{&\#55297;&\#56320;}} and not as {{&\#66560;}}.

So we should remove the stupid encoding of chars > 127 which is simply useless 
:-)


See also: https://github.com/elasticsearch/elasticsearch/issues/3587

  was:
The highlighter provides a function to escape HTML, which does to much. To 
create valid HTML only ", <, >, & must be escaped, everything else can kept 
unescaped. The escaper unfortunately does also additionally escape everything > 
127, which is unneeded if your web site has the correct encoding. It also 
produces huge amounts of HTML entities if used with eastern languages.

This would not be a bugf if the escaping would be correct, but it isn't, it 
escapes like that:

{{result.append("&#").append((int)ch).append(";");}}

So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the 
UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:

U+10400 (deseret capital letter long i) would be escaped as �� 
and not as 𐐀

So we should remove the stupid encoding of chars > 127 which is simply useless 
:-)


See also: https://github.com/elasticsearch/elasticsearch/issues/3587


> SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP
> --
>
> Key: LUCENE-5191
> URL: https://issues.apache.org/jira/browse/LUCENE-5191
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
>
> The highlighter provides a function to escape HTML, which does to much. To 
> create valid HTML only ", <, >, & must be escaped, everything else can kept 
> unescaped. The escaper unfortunately does also additionally escape everything 
> > 127, which is unneeded if your web site has the correct encoding. It also 
> produces huge amounts of HTML entities if used with eastern languages.
> This would not be a bugf if the escaping would be correct, but it isn't, it 
> escapes like that:
> {{result.append("\&#").append((int)ch).append(";");}}
> So it escapes not (as HTML needs) the unicode codepoint, instead it escapes 
> the UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:
> U+10400 (deseret capital letter long i) would be escaped as 
> {{&\#55297;&\#56320;}} and not as {{&\#66560;}}.
> So we should remove the stupid encoding of chars > 127 which is simply 
> useless :-)
> See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5191) SimpleHTMLEncoder in Highlighter module breaks Unicode outside BMP

2013-08-28 Thread Uwe Schindler (JIRA)
Uwe Schindler created LUCENE-5191:
-

 Summary: SimpleHTMLEncoder in Highlighter module breaks Unicode 
outside BMP
 Key: LUCENE-5191
 URL: https://issues.apache.org/jira/browse/LUCENE-5191
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5


The highlighter provides a function to escape HTML, which does to much. To 
create valid HTML only ", <, >, & must be escaped, everything else can kept 
unescaped. The escaper unfortunately does also additionally escape everything > 
127, which is unneeded if your web site has the correct encoding. It also 
produces huge amounts of HTML entities if used with eastern languages.

This would not be a bugf if the escaping would be correct, but it isn't, it 
escapes like that:

{{result.append("&#").append((int)ch).append(";");}}

So it escapes not (as HTML needs) the unicode codepoint, instead it escapes the 
UTF-16 char, which is incorrect, e.g. for our all-time favourite Deseret:

U+10400 (deseret capital letter long i) would be escaped as �� 
and not as 𐐀

So we should remove the stupid encoding of chars > 127 which is simply useless 
:-)


See also: https://github.com/elasticsearch/elasticsearch/issues/3587

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752977#comment-13752977
 ] 

Robert Muir commented on LUCENE-5189:
-

You can add \@SuppressCodecs(\{"Lucene40", "SomethingElse", ...\}) annotation 
to the top of your test for this.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5118) imrove testing of indexConfig parsing

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752827#comment-13752827
 ] 

ASF subversion and git services commented on SOLR-5118:
---

Commit 1518379 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518379 ]

SOLR-5118: more testing of edge case and some error conditions (merge r1518352)

> imrove testing of indexConfig parsing
> -
>
> Key: SOLR-5118
> URL: https://issues.apache.org/jira/browse/SOLR-5118
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> there is a lot of sprinkled arround checks in unrelated to ensure that 
> indexConfig option parsing picks up the correct merge policy and merge 
> schedulre and what not.
> as part of switching all of these tests to use randomized indexConfig 
> options, we need to ensure that this kind of testing for explicitly specified 
> config is rock solid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-08-28 Thread Jessica Cheng (JIRA)
Jessica Cheng created SOLR-5199:
---

 Summary: Restarting zookeeper makes the overseer stop processing 
queue events
 Key: SOLR-5199
 URL: https://issues.apache.org/jira/browse/SOLR-5199
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Jessica Cheng
 Attachments: 5199-log

Taking the external zookeeper down (I'm just testing, so I only have one 
external zookeeper instance running) and then bringing it back up seems to have 
caused the overseer to stop processing queue event.

I tried to issue the delete collection command (curl 
'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and each 
time it just timed out. Looking at the zookeeper data, I see
... 
/overseer
   collection-queue-work
 qn-02
 qn-04
 qn-06
...
and the qn-xxx are not being processed.

Attached please find the log from the overseer (according to 
/overseer_elect/leader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-08-28 Thread Jessica Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessica Cheng updated SOLR-5199:


Attachment: 5199-log

> Restarting zookeeper makes the overseer stop processing queue events
> 
>
> Key: SOLR-5199
> URL: https://issues.apache.org/jira/browse/SOLR-5199
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Jessica Cheng
>  Labels: overseer, zookeeper
> Attachments: 5199-log
>
>
> Taking the external zookeeper down (I'm just testing, so I only have one 
> external zookeeper instance running) and then bringing it back up seems to 
> have caused the overseer to stop processing queue event.
> I tried to issue the delete collection command (curl 
> 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and 
> each time it just timed out. Looking at the zookeeper data, I see
> ... 
> /overseer
>collection-queue-work
>  qn-02
>  qn-04
>  qn-06
> ...
> and the qn-xxx are not being processed.
> Attached please find the log from the overseer (according to 
> /overseer_elect/leader).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752814#comment-13752814
 ] 

Erick Erickson commented on SOLR-5081:
--

Mike:

Thanks for letting us know! This is a tricky one

> Highly parallel document insertion hangs SolrCloud
> --
>
> Key: SOLR-5081
> URL: https://issues.apache.org/jira/browse/SOLR-5081
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3.1
>Reporter: Mike Schrag
> Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 
> node solrcloud cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 1031181
> max locked memory   (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files  (-n) 32768
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 515590
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think 
> this is Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752804#comment-13752804
 ] 

ASF subversion and git services commented on LUCENE-5190:
-

Commit 1518354 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1518354 ]

LUCENE-5190: Fix failure of TestCheckIndex.testLuceneConstantVersion in Jenkins 
trunk clover build and other builds using -Ddev.version.suffix

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5118) imrove testing of indexConfig parsing

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752800#comment-13752800
 ] 

ASF subversion and git services commented on SOLR-5118:
---

Commit 1518352 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1518352 ]

SOLR-5118: more testing of edge case and some error conditions

> imrove testing of indexConfig parsing
> -
>
> Key: SOLR-5118
> URL: https://issues.apache.org/jira/browse/SOLR-5118
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Hoss Man
>Assignee: Hoss Man
>
> there is a lot of sprinkled arround checks in unrelated to ensure that 
> indexConfig option parsing picks up the correct merge policy and merge 
> schedulre and what not.
> as part of switching all of these tests to use randomized indexConfig 
> options, we need to ensure that this kind of testing for explicitly specified 
> config is rock solid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5190.
---

Resolution: Fixed

I committed this with just a minor mod: I removed the escaping of "\-" in the 
regex.

Thanks Hoss & Steve

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5190:
--

Fix Version/s: 4.5
   5.0

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752806#comment-13752806
 ] 

ASF subversion and git services commented on LUCENE-5190:
-

Commit 1518361 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518361 ]

Merged revision(s) 1518354 from lucene/dev/trunk:
LUCENE-5190: Fix failure of TestCheckIndex.testLuceneConstantVersion in Jenkins 
trunk clover build and other builds using -Ddev.version.suffix

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752797#comment-13752797
 ] 

Steve Rowe commented on LUCENE-5190:


+1 to commit

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5190:
--

Attachment: LUCENE-5190.patch

This patch fixes the issue now for all cases.

This just removes anything after "\-" from the version string before it 
compares. This makes some "appendixes" added by package maintainer or jenkins 
disappear before checking. Everything in a version string after the dash is 
just appendix.

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch, LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752784#comment-13752784
 ] 

Uwe Schindler commented on LUCENE-5190:
---

Sorry previous patch does not solve this issue.

The reason for this failing is, because Jenkins only sets version prefix and 
the original version is preserved (and $dev.version changed). The patch fixes 
the first assert, but not the last one.

I think the best fix is to not parse anything after the "\-" in the version. So 
all special cases like "\-SNAPSHOT" can be ignored.

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752753#comment-13752753
 ] 

Steve Rowe commented on LUCENE-5190:


bq. the assert following the one you changed in that test will also fail, won't 
it?

I'm wrong: local variable {{version}} is set to the value of the 
{{lucene.version}} sysprop set in the {{common-build.xml}} macro 
{{test-macro}}, and {{lucene.version}} is defined as {{dev.version}}, which is 
never overridden by the {{-Dversion=...}} sysprop cmdline override.  

So this assert will compare {{Constants.LUCENE_VERSION}} "5.0-SNAPSHOT" to 
{{version}}/{{lucene.version}}/{{dev.version}} "5.0-SNAPSHOT" and succeed.


> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752741#comment-13752741
 ] 

Steve Rowe commented on LUCENE-5190:


[~thetaphi‍], the assert following the one you changed in that test will also 
fail, won't it?

{code:java}
assertTrue(Constants.LUCENE_VERSION + " should start with: "+version,
   Constants.LUCENE_VERSION.startsWith(version));
{code}

{{Constants.LUCENE_VERSION}} in trunk will be "5.0-SNAPSHOT", and under clover 
{{version}} will be like "5.0-2013-08-11_15-22-48".  Maybe strip trailing stuff 
after "-" from {{version}} before making this comparison?

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752733#comment-13752733
 ] 

Uwe Schindler edited comment on LUCENE-5190 at 8/28/13 7:01 PM:


This should fix the bug. This only affects clover builds because the following:
- The hourly Jenkins builds don't change the version string, because artifacts 
are not archived, Javadocs are also not archived (its just heavy testing)
- The nightly Artifact builds change the version to contain the Jenkins Build 
Timestamp/Build Number. But when building nightly artifacts we don't run tests, 
so this is not triggered
- The Clover builds run tests, but they have to change the build number like 
the nightly artifact builds, because the version number is part of the archived 
artifacts (in that case the Clover HTML pages). So those should contain (like 
Javadocs) a build number.

We should change this to not only allow "\-SNAPSHOT", but instead allow any 
version that is equal to expected build version or has anything starting with 
"\-" appended.

This would allow others also to build and run artifacts with customized 
versions, like "Lucene 4.5.0-dfsg-ubuntu1" (for the Debian guys among us).

The patch changes the assert to allow this.

  was (Author: thetaphi):
This should fix the bug. This only affects clover builds because the 
following:
- The hourly Jenkins builds don't change the version string, because artifacts 
are not archived, Javadocs are also not archived (its just heavy testing)
- The nightly Artifact builds change the version to contain the Jenkins Build 
Timestamp/Build Number. But when building nightly artifacts we don't run tests, 
so this is not triggered
- The Clover builds run tests, but they have to change the build number like 
the nightly artifact builds, because the version number is part of the archived 
artifacts (in that case the Clover HTML pages). So those should contain (like 
Javadocs) a build number.

We should change this to not only allow "-SNAPSHOT", but instead allow any 
version that is equal to expected build version or has anything starting with 
"-" appended.

This would allow others also to build and run artifacts with customized 
versions, like "Lucene 4.5.0-dfsg-ubuntu1" (for the Debian guys among us).

The patch changes the assert to allow this.
  
> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5190:
--

Attachment: LUCENE-5190.patch

This should fix the bug. This only affects clover builds because the following:
- The hourly Jenkins builds don't change the version string, because artifacts 
are not archived, Javadocs are also not archived (its just heavy testing)
- The nightly Artifact builds change the version to contain the Jenkins Build 
Timestamp/Build Number. But when building nightly artifacts we don't run tests, 
so this is not triggered
- The Clover builds run tests, but they have to change the build number like 
the nightly artifact builds, because the version number is part of the archived 
artifacts (in that case the Clover HTML pages). So those should contain (like 
Javadocs) a build number.

We should change this to not only allow "-SNAPSHOT", but instead allow any 
version that is equal to expected build version or has anything starting with 
"-" appended.

This would allow others also to build and run artifacts with customized 
versions, like "Lucene 4.5.0-dfsg-ubuntu1" (for the Debian guys among us).

The patch changes the assert to allow this.

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
> Attachments: LUCENE-5190.patch
>
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5190:
-

Assignee: Uwe Schindler

> Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins 
> trunk clover build
> 
>
> Key: LUCENE-5190
> URL: https://issues.apache.org/jira/browse/LUCENE-5190
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Uwe Schindler
>
> I'm out of the loop on how clover is run, and how the build system sets up th 
> version params, but looking at the coverage reports i noticed that the trunk 
> clover build seems to have been failing consistently for a while -- some 
> sporadic test failures, but one consistent failure smells like it has to do 
> with a build configuration problem...
> {noformat}
> java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
>   at 
> __randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at org.junit.Assert.assertTrue(Assert.java:43)
>   at 
> org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
>   at 
> org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-28 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752699#comment-13752699
 ] 

Mike Schrag commented on SOLR-5081:
---

I think we tracked this down on our side. We noticed when testing another part 
of the system that we had SYN flood warnings in the system logs. I believe the 
kernel was blocking traffic to the Solr port once it believed that Hadoop was 
attacking it. By turning off net.ipv4.tcp_syncookies and increasing the 
net.ipv4.tcp_max_syn_backlog, the problem seems to have gone away. This also 
explains why I was able to connect to Solr and insert still from another 
machine even when accessed died from the Hadoop cluster.

> Highly parallel document insertion hangs SolrCloud
> --
>
> Key: SOLR-5081
> URL: https://issues.apache.org/jira/browse/SOLR-5081
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3.1
>Reporter: Mike Schrag
> Attachments: threads.txt
>
>
> If I do a highly parallel document load using a Hadoop cluster into an 18 
> node solrcloud cluster, I can deadlock solr every time.
> The ulimits on the nodes are:
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 1031181
> max locked memory   (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files  (-n) 32768
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 515590
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> The open file count is only around 4000 when this happens.
> If I bounce all the servers, things start working again, which makes me think 
> this is Solr and not ZK.
> I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5190) Consistent failure of TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build

2013-08-28 Thread Hoss Man (JIRA)
Hoss Man created LUCENE-5190:


 Summary: Consistent failure of 
TestCheckIndex.testLuceneConstantVersion in jenkins trunk clover build
 Key: LUCENE-5190
 URL: https://issues.apache.org/jira/browse/LUCENE-5190
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Hoss Man


I'm out of the loop on how clover is run, and how the build system sets up th 
version params, but looking at the coverage reports i noticed that the trunk 
clover build seems to have been failing consistently for a while -- some 
sporadic test failures, but one consistent failure smells like it has to do 
with a build configuration problem...

{noformat}
java.lang.AssertionError: Invalid version: 5.0-2013-08-11_15-22-48
at 
__randomizedtesting.SeedInfo.seed([648EC34D8642C547:A7103483A05D2588]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.lucene.index.TestCheckIndex.__CLR3_1_10l79zdz2ior(TestCheckIndex.java:132)
at 
org.apache.lucene.index.TestCheckIndex.testLuceneConstantVersion(TestCheckIndex.java:118)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-28 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752659#comment-13752659
 ] 

Hoss Man commented on SOLR-5084:


bq. And I still would really like it if we didn't need a separate XML file for 
each enumerated type: its like a parallel schema.xml: I think it would be much 
better if we could nest this underneath the fieldtype.

it would be nice, but as far as i know there is no way for a FieldType to do 
this -- making this FieldType use an attribute to refer to another file (just 
like ExternalFile field does, or StopWordsFilterFactory, or 
SynonymFilterFactory, etc...) seems like a suitable approach for now, and 
if/when someone enhances FieldType configuration in general, then it can be 
revisted.  (ie: it doesn't seem fair to Elran to object to this patch/feature 
given that he's working iwth the APIs available)

bq. Finally, I still think the ordinals should be implicit in the list (as i 
mentioned before). This way the thing can actually be efficient.

I agree that it makes sense to require that the ordinals be "dense" (ie: start 
at 0, no gaps allowed).

But in my opinion, from a usability standpoint, I think it's actually better to 
force the Solr admin writing the config to explicit about the numeric mappings 
in the config so that they *have* to be aware of the fact that a specific 
numeric value is used under the covers (ie: in hte indexed/docValues fields) 
for each value that the end users get.  It seems like it will help minimize the 
risk of someone assuming that only the "labels" matter in the configs and the 
can insert new ones to get the sorting they want.

Example:

If the config looked like this...

{noformat}

  LOW
  HIGH

{noformat}

...then a user might not realize there is anything wrong with making the 
following additions w/o re-indexing...

{noformat}

  NONE
  LOW
  MEDIUM
  HIGH

{noformat}

...and if they did that they would silently get bogus results -- no obvious 
error at runtime.

As long as the config forces them to be explicit about the values (and has 
error checking at startup that the values start a "0" and are monotomicly 
increasing ints) then anyone who wants to "insert" values into their config is 
going to have to pause and think about the fact that there is a concrete int 
associated with the existing values -- and is more likely to realize that 
changing those ints has consequences.


> new field type - EnumField
> --
>
> Key: SOLR-5084
> URL: https://issues.apache.org/jira/browse/SOLR-5084
> Project: Solr
>  Issue Type: New Feature
>Reporter: Elran Dvir
> Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
> Solr-5084.patch, Solr-5084.patch, Solr-5084.patch
>
>
> We have encountered a use case in our system where we have a few fields 
> (Severity. Risk etc) with a closed set of values, where the sort order for 
> these values is pre-determined but not lexicographic (Critical is higher than 
> High). Generically this is very close to how enums work.
> To implement, I have prototyped a new type of field: EnumField where the 
> inputs are a closed predefined  set of strings in a special configuration 
> file (similar to currency.xml).
> The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0-ea-b102) - Build # 3201 - Failure!

2013-08-28 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3201/
Java: 32bit/jdk1.8.0-ea-b102 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.store.TestRateLimiter.testPause

Error Message:
we should sleep less than 2 seconds but did: 2563 millis

Stack Trace:
java.lang.AssertionError: we should sleep less than 2 seconds but did: 2563 
millis
at 
__randomizedtesting.SeedInfo.seed([953CB120BDE76C26:F39CF51E36CA3520]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.lucene.store.TestRateLimiter.testPause(TestRateLimiter.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:491)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:724)




Build Log:
[...truncated 510 lines...]
   [junit4] Suite: org.apache.lucene.store.TestRateLimiter
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRateLimiter 
-Dtests.method=testPause -Dtests.seed=953CB120BDE76C26 -Dtests.slow=true 
-Dtests.locale=uk_UA -Dtests.timezone=Australia/North 
-Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 2.61s | TestRateLimiter.testPause <<<
   [junit4]> Throwable #1: java.lang.AssertionError: we should sleep less 

[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Patch addresses Rob's idea:

* ReaderAndLiveDocs and SegCoreReaders set segmentSuffix to docValuesGen and 
also set SegReadState.directory accordingly (CFS or si.info.dir if dvGen != -1).

* All the changes to DVFormat were removed (including the 45Producer/Consumer). 
I had to fix a bug in PerFieldDVF which ignore state.segmentSuffix (and also 
resolved a TODO on the way, since it now respects it).

* Removed the nocommit in ReaderAndLiveDocs regarding letting 
TrackingDirWrapper forbid createOutput on a file which is referenced by a 
commit, since now Codecs are not aware of dvGen at all. As long as they don't 
ignore segmentSuffix (which they better, otherwise they're broken), they can be 
upgraded safely to support DVUpdate. We can still do that though under a 
separate issue, as another safety mechanism.

I wanted to get rid of the nocommit in TestNumericDocValuesUpdates which sets 
the default Codec to Lucene45 since now presumably all Codecs should support 
dv-update. But when the test runs with Lucene40 (I haven't tried other codecs, 
it's the first one that failed), I hit an exception as if trying to write to 
the same CFS file. Looking at Lucene40DVF.fieldsProducer, I see that it 
defaults to CFS extension and also Lucene40DVWriter uses hard-coded 
segmentSuffix="dv" and ignore state.segmentSuffix. I guess that the actual 
Codec that was used is Lucene40RWDocValuesFormat, otherwise fieldsProducer 
should have hit an exception. I didn't know our tests pick "old" codecs at 
random too :). How can I avoid picking the "old" Codecs (40, 42)? I still want 
to test other codecs, such as Asserting, maybe MemoryDVF (if it's chosen at 
random).

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-08-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752528#comment-13752528
 ] 

Michael McCandless commented on LUCENE-3069:


PostingsReaderBase.pulsed is quite crazy ... really the terms dict
should not need this information, ideally.

Pulsing has no back-compat guarantees, so it's fine to only support
writing the "new" format and being able to read it.  Ie, if this
change is only for impersonation then we shouldn't need to do it, I
think?

Also, this is spooky:

{code}
int start = (int)in.getFilePointer();
{code}

Isn't that unsafe in general?  Ie it could overflow int...


> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752462#comment-13752462
 ] 

Robert Muir commented on LUCENE-5189:
-

I think its true we can tackle this in a separate issue: but I think i'd rather 
have SR/SCR just pass the correct directory always in the 
segmentreadstate/segmentwritestate to the different codec components (e.g. 
segmentreadstate.dir is always the 'correct' directory the codec component 
should use, and even when CFS is enabled, livedocsformat always gets the inner 
one and so on).

Its ok if we want to have the 'inner dir' accessible in SegmentInfo for SR/SCR 
to do this: like we could make it package private and everything would then 
just work?

This would greatly reduce the possibility of mistakes. I think having CFS fall 
back on its inner directory on FileNotFoundException is less desirable.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752456#comment-13752456
 ] 

Shai Erera commented on LUCENE-5189:


The problem is that there are two Directories and the logic of where the file 
is read from depends if it's gen'd or not (so far it has been only livedocs). 
Maybe what we can do is have CFS revert to directory.openInput if file does not 
exist? We can do that in a separate issue? If we fix that, then I think we 
might really be able to "hide" the gen from the Codec cleanly. Actually, if the 
fix is that simple (CFD.openInput reverting to dir.openInput), I can do it as 
part of this issue, it's small enough?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Bug or backcompat example: Solr example/multicore/solr.xml in legacy format?

2013-08-28 Thread Mark Miller
I have an old JIRA where I started working on this, but I cannot find it.

There has been no need for the multi core example for years now. I did a bunch 
of work taking it out at one point, but I'm sure that work is old enough to be 
useless now. Never go around to committing it.

A few tests tie into those configs and I think there was some other flotsam and 
jettsom to clean up.

- Mark

On Aug 27, 2013, at 6:10 PM, Erick Erickson  wrote:

> bq:  I think we should just get rid of it entirely
> 
> +1, especially since we're going to core discovery, the collections API, etc.
> 
> FWIW,
> Erick
> 
> 
> On Tue, Aug 27, 2013 at 3:42 PM, Shawn Heisey  wrote:
> On 8/27/2013 11:24 AM, Jack Krupansky wrote:
> I just happened to notice that the solr.xml file in the Solr
> example/multicore in branch_4x (and 4.4 as well) is still in the old
> legacy format (with /). Is that merely an oversight or
> intentional for demonstrating backwards compatibility?
> 
> The example/multicore directory seems to generally very out of date. The 
> schema uses an ancient version, and doesn't have any good examples of how to 
> use analyzers effectively.  I'm fairly sure that all the examples use 
> solr.xml and are therefore inherently multicore.
> 
> Unless we plan to thoroughly update the multicore example so it's as modern 
> as the main example, I think we should just get rid of it entirely.
> 
> If we need an example that uses legacy config methods, I think we should make 
> a new subdirectory.  It should come with an extensive README and the 
> solrconfig/schema should be more heavily commented than the standard example.
> 
> Thanks,
> Shawn
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 



[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Rename fieldInfosGen to docValuesGen. I think I got all places fixed.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752438#comment-13752438
 ] 

Robert Muir commented on LUCENE-5189:
-

by the way if we want to fix the state.directory vs state.segmentInfo.dir i 
would love to help with this, its bothered me forever.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5198) Make default similarty configurable

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752433#comment-13752433
 ] 

Robert Muir commented on SOLR-5198:
---

This isnt really true.

If you want to use a different default similarity, use that instead of 
schemasimilarityfactory.

if you want to do per-field support and do things differently, just make your 
own core-aware similarityfactory (schemasimilarity is not special, its just an 
example we provide).

There are too many possibilities for things people want to do and too many 
traps. Plugging in your own for expert stuff is really the way to go here.

> Make default similarty configurable
> ---
>
> Key: SOLR-5198
> URL: https://issues.apache.org/jira/browse/SOLR-5198
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.4
>Reporter: HeXin
>Priority: Minor
> Fix For: 4.5, 5.0
>
>
>   Though the code has supported for customizing scoring on a per-field basis 
> in using  in a schema's fieldType and 
> we can configure our custom similarity factory in schema,  we can't configure 
> the default similarty and it is hardcode in SchemaSimilarityFactory. 
>   If we want to use another similarity as default similarty instead of 
> DefaultSimilarity provided by lucene, we must to write another similarity 
> factory to do this. Therefore, it is necessary to make default similarty 
> configurable. 
>   Any comments is welcomed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752436#comment-13752436
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
Maybe for writing it can work, but the producer needs to know from which 
Directory to read the file, e.g. if it's CFS, the gen'd files are written 
outside. 
{quote}

Wait... we shouldnt design around this bug though (and it is an api bug). this 
problem you point out is definitely existing bogusness: I think we should fix 
this instead so a codec gets a single "directory" and doesnt need to know or 
care what its impl is, and whether its got TrackingDirectoryWrapper or CFS or 
whatever around it.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5198) Make default similarty configurable

2013-08-28 Thread HeXin (JIRA)
HeXin created SOLR-5198:
---

 Summary: Make default similarty configurable
 Key: SOLR-5198
 URL: https://issues.apache.org/jira/browse/SOLR-5198
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.4
Reporter: HeXin
Priority: Minor
 Fix For: 4.5, 5.0


  Though the code has supported for customizing scoring on a per-field basis in 
using  in a schema's fieldType and 
we can configure our custom similarity factory in schema,  we can't configure 
the default similarty and it is hardcode in SchemaSimilarityFactory. 

  If we want to use another similarity as default similarty instead of 
DefaultSimilarity provided by lucene, we must to write another similarity 
factory to do this. Therefore, it is necessary to make default similarty 
configurable. 

  Any comments is welcomed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752424#comment-13752424
 ] 

Shai Erera commented on LUCENE-5189:


bq. Can we refer to this consistently as docValuesGen

Yes, I think that makes sense. At some point I supported this by gen'ing 
FieldInfos hence the name, but things have changed since. I'll rename.

bq. Maybe we shouldnt pass this parameter to the codec at all. Instead 
IndexWriter can just put this into the segment suffix and the codec can be 
blissfully unaware? 

Maybe for writing it can work, but the producer needs to know from which 
Directory to read the file, e.g. if it's CFS, the gen'd files are written 
outside. I have this code in Lucene45DVProducer:

{code}
final Directory dir;
if (fieldInfosGen != -1) {
  dir = state.segmentInfo.dir; // gen'd files are written outside CFS, so 
use SegInfo directory
} else {
  dir = state.directory;
}
{code}

I think that if we want to mask that away from the Codec entirely, we should 
somehow tell the Codec the segmentSuffix and the Directory from which to read 
the file. Would another Directory parameter be confusing (since we also have it 
in SegReadState)?

bq. I hope we can do this in a cleaner way than 3.x did it for setNorm, that 
was really crazy

Well ... I don't really know how setNorm worked in 3.x, so I'll do what I think 
and you tell me if it's crazy or not? :)

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752426#comment-13752426
 ] 

Michael McCandless commented on LUCENE-5183:


I rather like the "x == false" instead of "!x" as well: it's more explicit / 
readable.

> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752419#comment-13752419
 ] 

Robert Muir commented on LUCENE-5189:
-

{quote}
That way you can end up with e.g. _0_Lucene45_0_1.dvd and *.dvm for field 'f'
...
I put a nocommit in DVFormat.fieldsConsumer/Producer by adding another variant 
which takes fieldInfosGen.
...
I want to have only one variant of that method, thereby breaking the API. This 
is important IMO cause we need to ensure that whatever custom DVFormats out 
there pay attention to the new fieldInfosGen parameter, or otherwise they might 
overwrite previously created files.
{quote}

Maybe we shouldnt pass this parameter to the codec at all. Instead IndexWriter 
can just put this into the segment suffix and the codec can be blissfully 
unaware? 

{quote}
SegmentCoreReaders no longer has a single DVConsumer it uses, but rather per 
field it uses the appropriate DVConsumer (depends on the 'gen').

I put a nocommit to remove DVConsumers from SegCoreReaders into a 
RefCount'd object in SegmentReader so that we can keep SegCoreReaders manage 
the 'readers' that are shared between all SegReaders, and also make sure to 
reuse same DVConsumer by multiple SegReaders. I'll do that later.
{quote}

I hope we can do this in a cleaner way than 3.x did it for setNorm, that was 
really crazy :)

{quote}
I put a nocommit in DVFormat.fieldsConsumer/Producer by adding another variant 
which takes fieldInfosGen.
{quote}

Can we refer to this consistently as docValuesGen or something else (I see the 
patch already does this in some places, but other places its called 
fieldInfosGen). I dont think this should ever be referred to as fieldInfosGen 
because, its not a generation for the fieldinfos data, and that would be 
horribly scary!


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752409#comment-13752409
 ] 

Shai Erera commented on LUCENE-5189:


I forgot to mention -- this work started from a simple patch Rob sent me few 
weeks, so thanks Rob! And also, thanks Mike for helping me get around Lucene 
core code! At times I felt like I'm literally hammering through the code to get 
the thing working ;).

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189.patch

Patch adds numeric-dv field updates capabilities:

* IndexWriter.updateNumericDocValue(term, field, value) updates the value of 
'field' of all documents associated with 'term' to the new 'value'

* When you update the value of field 'f' of few documents, a new pair of 
.dvd/.dvm files are created, with the values of all documents for that field.
** That way you can end up with e.g. _0_Lucene45_0_1.dvd and *.dvm for field 
'f' and the _0.cfs for other fields which were not updated.
** SegmentInfoPerCommit tracks for each field in which 'gen' it's recorded, and 
SegmentCoreReaders uses that map to read the values of the field from the 
respective gen.

* TestNumericDocValuesUpdates contains a dozen or so testcases which cover 
different angles, from simple updates, to unsetting values, merging segments, 
deletes etc. During development I ran into many interesting scenarios :).

* ReaderAndLiveDocs.writeLiveDocs applies in addition to the deletes, the field 
updates too. BufferedDeletes tracks the updates, similar to how it tracks 
deletes.

* SegmentCoreReaders no longer has a single DVConsumer it uses, but rather per 
field it uses the appropriate DVConsumer (depends on the 'gen').
** I put a nocommit to remove DVConsumers from SegCoreReaders into a RefCount'd 
object in SegmentReader so that we can keep SegCoreReaders manage the 'readers' 
that are shared between all SegReaders, and also make sure to reuse same 
DVConsumer by multiple SegReaders. I'll do that later.

* Segment merging is supported in that when a segment with updates is merged, 
the correct values are written to the merged segment and the resulting segment 
has no 'gen' .dvd.

* I put a nocommit in DVFormat.fieldsConsumer/Producer by adding another 
variant which takes fieldInfosGen. The default impl throws 
UnsupportedOpException, while Lucene45 implements it.
** I want to have only one variant of that method, thereby breaking the API. 
This is important IMO cause we need to ensure that whatever custom DVFormats 
out there pay attention to the new fieldInfosGen parameter, or otherwise they 
might overwrite previously created files.
** There is also a nocommit touching that with a suggestion to forbid 
createOutput call in TrackingDir if the file is already referenced by an 
IndexCommit.
** It is important that we break something here so that users/apps pay 
attention to the new feature -- suggestions are welcome!

Few remarks:

* For now, only updating by a single term is supported (simplicity).
* You cannot add a new field through field update, only update existing fields. 
This is a 'schema' change, and there are other way to do it, e.g. through 
addIndexes and FilterAtomicReader. Attempting to support it means that we need 
to created gen'd FieldInfosFormat also, which complicates matters.
* I dropped some nocommits about renaming classes/methods. I didn't want to do 
it yet, cause it creates an unnecessarily bloated patch. Feel free to comment, 
we can take care of the renames later.
* I will probably create a branch for that feature cause there are some things 
that need to be take care of (add some tests, finish Codecs support etc.)
* Also, I haven't yet benchmarked the effect of field updates on 
indexing/search ... I will get to it at some point, but if someone wants to 
help, I promise not to say no :).

I may have forgot to describe some changes, feel free to ask for clarification!

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> cha

[jira] [Commented] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752376#comment-13752376
 ] 

Robert Muir commented on LUCENE-5183:
-

Thats intentional: when there is a complex boolean expression, i do this on 
purpose to make it more readable and intent and precedence clear.

I dont see a benefit of using ! here, it only makes code more difficult to 
read. I generally avoid it entirely these days.


> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752369#comment-13752369
 ] 

Adrien Grand commented on LUCENE-5183:
--

Patch looks good to me too. Can we just replace the occurrences of 
{{docsWithField.get(doc) == false}} with {{!docsWithField.get(doc)}}?

> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752364#comment-13752364
 ] 

Jack Krupansky commented on LUCENE-5183:


What's the Fix Version here? 4.5 as well as 5.0?

Is there any backcompat issue with a 4.4 index that has Binary.EmptyBytes?


> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Make default similarty configurable

2013-08-28 Thread HeXin
hi, 
  Though the code has supported for customizing scoring on a per-field basis in 
using  in a schema's fieldType and 
we can configure our custom similarity factory in schema,  we can't configure 
the default similarty and it is hardcode in SchemaSimilarityFactory. 


If we want to use another similarity as default similarty instead of 
DefaultSimilarity provided by lucene, 
we must to write another similarity factory to do this. Therefore, i think we 
can make default similarty configurable. 


Any comments is welcomed. 


HeXin

[jira] [Created] (SOLR-5197) SolrCloud: 500 error with combination of debug and group in distributed search

2013-08-28 Thread Sannier Elodie (JIRA)
Sannier Elodie created SOLR-5197:


 Summary: SolrCloud: 500 error with combination of debug and group 
in distributed search
 Key: SOLR-5197
 URL: https://issues.apache.org/jira/browse/SOLR-5197
 Project: Solr
  Issue Type: Bug
Reporter: Sannier Elodie
Priority: Minor


With SolrCloud 4.4.0 with two shards, when grouping on a field
and using the "debug" parameter in distributed mode, there is a 500 error.

http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity&debug=true
(idem with debug=timing, query or results)



500
109

*:*
popularity
true
true




Server at http://10.76.76.157:8983/solr/collection1 returned non ok status:500, 
message:Server Error


org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at 
http://10.76.76.157:8983/solr/collection1 returned non ok status:500, 
message:Server Error at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)
 at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at 
java.util.concurrent.FutureTask.run(FutureTask.java:166) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at 
java.util.concurrent.FutureTask.run(FutureTask.java:166) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:679)

500



see http://markmail.org/thread/gauat2zdkxm6ldjx

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5189) Numeric DocValues Updates

2013-08-28 Thread Shai Erera (JIRA)
Shai Erera created LUCENE-5189:
--

 Summary: Numeric DocValues Updates
 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera


In LUCENE-4258 we started to work on incremental field updates, however the 
amount of changes are immense and hard to follow/consume. The reason is that we 
targeted postings, stored fields, DV etc., all from the get go.

I'd like to start afresh here, with numeric-dv-field updates only. There are a 
couple of reasons to that:

* NumericDV fields should be easier to update, if e.g. we write all the values 
of all the documents in a segment for the updated field (similar to how 
livedocs work, and previously norms).

* It's a fairly contained issue, attempting to handle just one data type to 
update, yet requires many changes to core code which will also be useful for 
updating other data types.

* It has value in and on itself, and we don't need to allow updating all the 
data types in Lucene at once ... we can do that gradually.

I have some working patch already which I'll upload next, explaining the 
changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 57376 - Failure!

2013-08-28 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/57376/

No tests ran.

Build Log:
[...truncated 14 lines...]
ERROR: Failed to update http://svn.apache.org/repos/asf/lucene/dev/trunk
org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS 
/repos/asf/lucene/dev/trunk failed
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:379)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:364)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:352)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:708)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:628)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:103)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1018)
at 
org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getRepositoryUUID(DAVRepository.java:148)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(SVNBasicDelegate.java:339)
at 
org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(SVNBasicDelegate.java:328)
at 
org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.update(SVNUpdateClient16.java:482)
at 
org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUpdateClient16.java:364)
at 
org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUpdateClient16.java:274)
at 
org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.java:27)
at 
org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.java:11)
at 
org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:20)
at 
org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1235)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:291)
at 
org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:311)
at 
org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:291)
at 
org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:387)
at 
hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater.java:157)
at 
hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:153)
at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:903)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:884)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:867)
at hudson.FilePath.act(FilePath.java:906)
at hudson.FilePath.act(FilePath.java:879)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:843)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:781)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1394)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581)
at hudson.model.Run.execute(Run.java:1593)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:247)
Caused by: svn: E175002: OPTIONS /repos/asf/lucene/dev/trunk failed
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:154)
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:97)
... 38 more
Caused by: org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS request 
failed on '/repos/asf/lucene/dev/trunk'
svn: E175002: unknown host
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:64)
at 
org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:51)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:754)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373)
... 37 more
Caused by: svn: E175002: OPTIONS request failed on '/repos/asf/lucene/dev/trunk'
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection._request(HTTPConnection.java:752)
... 38 more
Caused by: svn: E175002: unknown host
at 
org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208)
at 
org.tmatesoft.svn.

[jira] [Created] (SOLR-5196) SolrCloud: no "timing" when no result in distributed mode

2013-08-28 Thread Sannier Elodie (JIRA)
Sannier Elodie created SOLR-5196:


 Summary: SolrCloud: no "timing" when no result in distributed mode
 Key: SOLR-5196
 URL: https://issues.apache.org/jira/browse/SOLR-5196
 Project: Solr
  Issue Type: Bug
Reporter: Sannier Elodie
Priority: Minor


With SolrCloud 4.4.0 with two shards, with the "debugQuery=true" parameter, 
when a query does not return documents then the "timing" debug information is 
not returned:
curl -sS "http://localhost:8983/solr/select?q=dummy&debugQuery=true"; | grep -o 
'.*'

see http://markmail.org/thread/ckgc64ypo3p76gkc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752321#comment-13752321
 ] 

Erick Erickson commented on SOLR-5084:
--

Yeah, if you would Actually, 4x (eventually 4.5) would be
better, and against trunk would be even best. Uf/when we 
apply it we'll merge it into the 4x branch.

But also take a look at Robert's latest comments, he's one
of the deep-level Lucene knowledge.

Best
Erick

> new field type - EnumField
> --
>
> Key: SOLR-5084
> URL: https://issues.apache.org/jira/browse/SOLR-5084
> Project: Solr
>  Issue Type: New Feature
>Reporter: Elran Dvir
> Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
> Solr-5084.patch, Solr-5084.patch, Solr-5084.patch
>
>
> We have encountered a use case in our system where we have a few fields 
> (Severity. Risk etc) with a closed set of values, where the sort order for 
> these values is pre-determined but not lexicographic (Critical is higher than 
> High). Generically this is very close to how enums work.
> To implement, I have prototyped a new type of field: EnumField where the 
> inputs are a closed predefined  set of strings in a special configuration 
> file (similar to currency.xml).
> The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Make default similarty configurable

2013-08-28 Thread HeXin
hi, 
  Though the code has supported for customizing scoring on a per-field basis in 
using  in a schema's fieldType and 
we can configure our custom similarity factory in schema,  we can't configure 
the default similarty and it is hardcode in SchemaSimilarityFactory. 


If we want to use another similarity as default similarty instead of 
DefaultSimilarity provided by lucene, 
we must to write another similarity factory to do this. Therefore, i think we 
can make default similarty configurable. 


Any comments is welcomed. 


HeXin





[jira] [Commented] (SOLR-5084) new field type - EnumField

2013-08-28 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752306#comment-13752306
 ] 

Elran Dvir commented on SOLR-5084:
--

Hi Erick,

I developed the feature with Solr 4.2.1 source code.
I can create a similar patch based on Solr 4.4. Do you want me to create and 
attach it?

Thanks.

> new field type - EnumField
> --
>
> Key: SOLR-5084
> URL: https://issues.apache.org/jira/browse/SOLR-5084
> Project: Solr
>  Issue Type: New Feature
>Reporter: Elran Dvir
> Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, 
> Solr-5084.patch, Solr-5084.patch, Solr-5084.patch
>
>
> We have encountered a use case in our system where we have a few fields 
> (Severity. Risk etc) with a closed set of values, where the sort order for 
> these values is pre-determined but not lexicographic (Critical is higher than 
> High). Generically this is very close to how enums work.
> To implement, I have prototyped a new type of field: EnumField where the 
> inputs are a closed predefined  set of strings in a special configuration 
> file (similar to currency.xml).
> The code is based on 4.2.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5190) SolrEntityProcessor substitutes variables only once in child entities

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752305#comment-13752305
 ] 

ASF subversion and git services commented on SOLR-5190:
---

Commit 1518165 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1518165 ]

SOLR-5190: SolrEntityProcessor substitutes variables only once in child entities

> SolrEntityProcessor substitutes variables only once in child entities
> -
>
> Key: SOLR-5190
> URL: https://issues.apache.org/jira/browse/SOLR-5190
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.4
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5190.patch
>
>
> As noted by users on the mailing list and elsewhere, SolrEntityProcessor 
> cannot be used in a child entity because it substitutes variables only once.
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg88002.html
> http://stackoverflow.com/questions/15734308/solrentityprocessor-is-called-only-once-for-sub-entities?lq=1
> SOLR-3336 attempted to fix the problem by moving variable substitution to the 
> doQuery method but that fix is not complete because the doQuery method is 
> called only once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5190) SolrEntityProcessor substitutes variables only once in child entities

2013-08-28 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5190.
-

Resolution: Fixed

> SolrEntityProcessor substitutes variables only once in child entities
> -
>
> Key: SOLR-5190
> URL: https://issues.apache.org/jira/browse/SOLR-5190
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.4
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5190.patch
>
>
> As noted by users on the mailing list and elsewhere, SolrEntityProcessor 
> cannot be used in a child entity because it substitutes variables only once.
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg88002.html
> http://stackoverflow.com/questions/15734308/solrentityprocessor-is-called-only-once-for-sub-entities?lq=1
> SOLR-3336 attempted to fix the problem by moving variable substitution to the 
> doQuery method but that fix is not complete because the doQuery method is 
> called only once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5190) SolrEntityProcessor substitutes variables only once in child entities

2013-08-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752302#comment-13752302
 ] 

ASF subversion and git services commented on SOLR-5190:
---

Commit 1518161 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1518161 ]

SOLR-5190: SolrEntityProcessor substitutes variables only once in child entities

> SolrEntityProcessor substitutes variables only once in child entities
> -
>
> Key: SOLR-5190
> URL: https://issues.apache.org/jira/browse/SOLR-5190
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.4
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5190.patch
>
>
> As noted by users on the mailing list and elsewhere, SolrEntityProcessor 
> cannot be used in a child entity because it substitutes variables only once.
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg88002.html
> http://stackoverflow.com/questions/15734308/solrentityprocessor-is-called-only-once-for-sub-entities?lq=1
> SOLR-3336 attempted to fix the problem by moving variable substitution to the 
> doQuery method but that fix is not complete because the doQuery method is 
> called only once.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752274#comment-13752274
 ] 

Michael McCandless commented on LUCENE-5183:


+1, patch looks good!

> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5183) BinaryDocValues inconsistencies

2013-08-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5183:


Attachment: LUCENE-5183.patch

Here is a patch removing the EMPTY_BYTES. I dont care about BINARY at all, but 
this part of the API is bogus and must be removed.

> BinaryDocValues inconsistencies
> ---
>
> Key: LUCENE-5183
> URL: https://issues.apache.org/jira/browse/LUCENE-5183
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5183.patch
>
>
> Some current inconsistencies:
> * Binary/SortedDocValues.EMPTY_BYTES should be removed (BytesRef.EMPTY_BYTES 
> should be used in its place): FieldCache.getDocsWithField should be used to 
> determine missing. Thats fine if FC wants to "back" its Bits by some special 
> placeholder value, but thats its impl detail not part of the API.
> * Sorting comparator of Binary should either be removed (is this REALLY 
> useful?) or should support missingValue(): and it should support this for 
> SortedDocValues in any case: solr does it, but lucene wont allow it accept 
> for numerics?!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3550) Create example code for core

2013-08-28 Thread Manpreet (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752252#comment-13752252
 ] 

Manpreet commented on LUCENE-3550:
--

Hi Aleksandra -

I have been away from it for a while. 

Resuming my work from this week. Sure I will do that. 

Thanks
-Manpreet

> Create example code for core
> 
>
> Key: LUCENE-3550
> URL: https://issues.apache.org/jira/browse/LUCENE-3550
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Shai Erera
>  Labels: newdev
> Attachments: LUCENE-3550.patch, LUCENE-3550-sort.patch
>
>
> Trunk has gone under lots of API changes. Some of which are not trivial, and 
> the migration path from 3.x to 4.0 seems hard. I'd like to propose some way 
> to tackle this, by means of live example code.
> The facet module implements this approach. There is live Java code under 
> src/examples that demonstrate some well documented scenarios. The code itself 
> is documented, in addition to javadoc. Also, the code itself is being unit 
> tested regularly.
> We found it very difficult to keep documentation up-to-date -- javadocs 
> always lag behind, Wiki pages get old etc. However, when you have live Java 
> code, you're *forced* to keep it up-to-date. It doesn't compile if you break 
> the API, it fails to run if you change internal impl behavior. If you keep it 
> simple enough, its documentation stays simple to.
> And if we are successful at maintaining it (which we must be, otherwise the 
> build should fail), then people should have an easy experience migrating 
> between releases. So say you take the simple scenario "I'd like to index 
> documents which have the fields ID, date and body". Then you create an 
> example class/method that accomplishes that. And between releases, this code 
> gets updated, and people can follow the changes required to implement that 
> scenario.
> I'm not saying the examples code should always stay optimized. We can aim at 
> that, but I don't try to fool myself thinking that we'll succeed. But at 
> least we can get it compiled and regularly unit tested.
> I think that it would be good if we introduce the concept of examples such 
> that if a module (core, contrib, modules) have an src/examples, we package it 
> in a .jar and include it with the binary distribution. That's for a first 
> step. We can also have meta examples, under their own module/contrib, that 
> show how to combine several modules together (this might even uncover API 
> problems), but that's definitely a second phase.
> At first, let's do the "unit examples" (ala unit tests) and better start with 
> core. Whatever we succeed at writing for 4.0 will only help users. So let's 
> use this issue to:
> # List example scenarios that we want to demonstrate for core
> # Building the infrastructure in our build system to package and distribute a 
> module's examples.
> Please feel free to list here example scenarios that come to mind. We can 
> then track what's been done and what's not. The more we do the better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3550) Create example code for core

2013-08-28 Thread Aleksandra Wozniak (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandra Wozniak updated LUCENE-3550:
---

Attachment: LUCENE-3550-sort.patch

Hi,

recently I started learning Lucene API and I along the way created a few 
snippets showing different Lucene features. I found this issue by coincidence 
and I decided to rework one of them to fit into the examples implementation – 
I'm sending a patch with my sort example + a corresponding unit test.

Manpreet, I see that you started working on this issue a while ago – I don't 
want to interfere with your work. You can incorporate my example in your code 
or use it in any other way, if you find it useful.

Cheers,
Aleksandra

> Create example code for core
> 
>
> Key: LUCENE-3550
> URL: https://issues.apache.org/jira/browse/LUCENE-3550
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Shai Erera
>  Labels: newdev
> Attachments: LUCENE-3550.patch, LUCENE-3550-sort.patch
>
>
> Trunk has gone under lots of API changes. Some of which are not trivial, and 
> the migration path from 3.x to 4.0 seems hard. I'd like to propose some way 
> to tackle this, by means of live example code.
> The facet module implements this approach. There is live Java code under 
> src/examples that demonstrate some well documented scenarios. The code itself 
> is documented, in addition to javadoc. Also, the code itself is being unit 
> tested regularly.
> We found it very difficult to keep documentation up-to-date -- javadocs 
> always lag behind, Wiki pages get old etc. However, when you have live Java 
> code, you're *forced* to keep it up-to-date. It doesn't compile if you break 
> the API, it fails to run if you change internal impl behavior. If you keep it 
> simple enough, its documentation stays simple to.
> And if we are successful at maintaining it (which we must be, otherwise the 
> build should fail), then people should have an easy experience migrating 
> between releases. So say you take the simple scenario "I'd like to index 
> documents which have the fields ID, date and body". Then you create an 
> example class/method that accomplishes that. And between releases, this code 
> gets updated, and people can follow the changes required to implement that 
> scenario.
> I'm not saying the examples code should always stay optimized. We can aim at 
> that, but I don't try to fool myself thinking that we'll succeed. But at 
> least we can get it compiled and regularly unit tested.
> I think that it would be good if we introduce the concept of examples such 
> that if a module (core, contrib, modules) have an src/examples, we package it 
> in a .jar and include it with the binary distribution. That's for a first 
> step. We can also have meta examples, under their own module/contrib, that 
> show how to combine several modules together (this might even uncover API 
> problems), but that's definitely a second phase.
> At first, let's do the "unit examples" (ala unit tests) and better start with 
> core. Whatever we succeed at writing for 4.0 will only help users. So let's 
> use this issue to:
> # List example scenarios that we want to demonstrate for core
> # Building the infrastructure in our build system to package and distribute a 
> module's examples.
> Please feel free to list here example scenarios that come to mind. We can 
> then track what's been done and what's not. The more we do the better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-SmokeRelease-4.x - Build # 103 - Still Failing

2013-08-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/103/

No tests ran.

Build Log:
[...truncated 34252 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease
 [copy] Copying 416 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/lucene
 [copy] Copying 194 files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/solr
 [exec] JAVA6_HOME is /home/hudson/tools/java/latest1.6
 [exec] JAVA7_HOME is /home/hudson/tools/java/latest1.7
 [exec] NOTE: output encoding is US-ASCII
 [exec] 
 [exec] Load release URL 
"file:/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeRelease/"...
 [exec] 
 [exec] Test Lucene...
 [exec]   test basics...
 [exec]   get KEYS
 [exec] 0.1 MB in 0.01 sec (11.2 MB/sec)
 [exec]   check changes HTML...
 [exec]   download lucene-4.5.0-src.tgz...
 [exec] 27.1 MB in 0.04 sec (679.6 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.5.0.tgz...
 [exec] 49.0 MB in 0.07 sec (687.8 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   download lucene-4.5.0.zip...
 [exec] 58.8 MB in 0.11 sec (547.4 MB/sec)
 [exec] verify md5/sha1 digests
 [exec]   unpack lucene-4.5.0.tgz...
 [exec] verify JAR/WAR metadata...
 [exec] test demo with 1.6...
 [exec]   got 5717 hits for query "lucene"
 [exec] test demo with 1.7...
 [exec]   got 5717 hits for query "lucene"
 [exec] check Lucene's javadoc JAR
 [exec] 
 [exec] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/lucene/build/fakeReleaseTmp/unpack/lucene-4.5.0/docs/core/org/apache/lucene/util/AttributeSource.html
 [exec]   broken details HTML: Method Detail: addAttributeImpl: closing 
"" does not match opening ""
 [exec]   broken details HTML: Method Detail: getAttribute: closing 
"" does not match opening ""
 [exec] Traceback (most recent call last):
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1450, in 
 [exec] main()
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1394, in main
 [exec] smokeTest(baseURL, svnRevision, version, tmpDir, isSigned, 
testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 1431, in smokeTest
 [exec] unpackAndVerify('lucene', tmpDir, artifact, svnRevision, 
version, testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 607, in unpackAndVerify
 [exec] verifyUnpacked(project, artifact, unpackPath, svnRevision, 
version, testArgs)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 786, in verifyUnpacked
 [exec] checkJavadocpath('%s/docs' % unpackPath)
 [exec]   File 
"/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/dev-tools/scripts/smokeTestRelease.py",
 line 904, in checkJavadocpath
 [exec] raise RuntimeError('missing javadocs package summaries!')
 [exec] RuntimeError: missing javadocs package summaries!

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-SmokeRelease-4.x/build.xml:314:
 exec returned: 1

Total time: 19 minutes 52 seconds
Build step 'Invoke Ant' marked build as failure
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-08-28 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

Patch, to show the impersonation hack for Pulsing format. 

We cannot perfectly impersonate old pulsing format yet: the old format divided 
metadata block as inlined bytes and wrapped bytes, so when the term dict reader 
reads the length of metadata block, it is actually the length of 'inlined 
block'... And the 'wrapped block' won't be loaded for wrapped PF.

However, to introduce a new method in PostingsReaderBase doesn't seem to be a 
good way...

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4877) SolrIndexSearcher#getDocSetNC should check for null return in AtomicReader#fields()

2013-08-28 Thread Feihong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752179#comment-13752179
 ] 

Feihong Huang commented on SOLR-4877:
-

Thanks for comments and it make sense.

> SolrIndexSearcher#getDocSetNC should check for null return in 
> AtomicReader#fields()
> ---
>
> Key: SOLR-4877
> URL: https://issues.apache.org/jira/browse/SOLR-4877
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.2, 4.3
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.3.1, 4.4, 5.0
>
> Attachments: SOLR-4877-nospecialcase.patch, SOLR-4877.patch
>
>
> In LUCENE-5023 it was reported that composite reader contexts should not 
> contain null fields() readers. But this is wrong, as a null-fields() reader 
> may contain documents, just no fields.
> fields() and terms() is documented to return null, so DocSets should check 
> for null (like all queries do in Lucene). It seems that DocSetNC does not 
> correctly check for null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org