[jira] Commented: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007345#comment-13007345
 ] 

Bill Bell commented on SOLR-2242:
-

OK I did the required work, can we get more feedback or get it committed? What 
else is needed?

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Resolved: (SOLR-2426) Build failing

2011-03-15 Thread Bill Bell
THis started working when I did the following:

#cd C:\Users\bbell\solr
#ant compile
#cd solr
#ant example

If I did a direct "ant example" it was giving the errors below. I'll
double check my java version too.

On 3/15/11 5:53 AM, "Robert Muir (JIRA)"  wrote:

>
> [ 
>https://issues.apache.org/jira/browse/SOLR-2426?page=com.atlassian.jira.pl
>ugin.system.issuetabpanels:all-tabpanel ]
>
>Robert Muir resolved SOLR-2426.
>---
>
>Resolution: Not A Problem
>
>Trunk requires java 6.
>
>> Build failing
>> -
>>
>> Key: SOLR-2426
>> URL: https://issues.apache.org/jira/browse/SOLR-2426
>> Project: Solr
>>  Issue Type: Bug
>>Reporter: Bill Bell
>>
>> ant clean
>> ant example
>> trunk
>> [javac]  ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
>> llector.java:77: incompatible types
>> [javac] found   : org.apache.solr.search.BitDocSet
>> [javac] required: org.apache.solr.search.DocSet
>> [javac]   return new BitDocSet(bits,pos);
>> [javac]  ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
>> llector.java:132: incompatible types
>> [javac] found   : org.apache.solr.search.SortedIntDocSet
>> [javac] required: org.apache.solr.search.DocSet
>> [javac]   return new SortedIntDocSet(scratch, pos);
>> [javac]  ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
>> llector.java:136: incompatible types
>> [javac] found   : org.apache.solr.search.BitDocSet
>> [javac] required: org.apache.solr.search.DocSet
>> [javac]   return new BitDocSet(bits,pos);
>> [javac]  ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:26: org.apache.solr.search.DocSlice is not abstract and does not
>>override abs
>> tract method getTopFilter() in org.apache.solr.search.DocSet
>> [javac] public class DocSlice extends DocSetBase implements DocList
>>{
>> [javac]^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:54: incompatible types
>> [javac] found   : org.apache.solr.search.DocSlice
>> [javac] required: org.apache.solr.search.DocList
>> [javac] if (this.offset == offset && this.len==len) return this;
>> [javac]^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:62: incompatible types
>> [javac] found   : org.apache.solr.search.DocSlice
>> [javac] required: org.apache.solr.search.DocList
>> [javac] if (this.offset == offset && this.len == realLen)
>>return this;
>> [javac] 
>> ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:63: incompatible types
>> [javac] found   : org.apache.solr.search.DocSlice
>> [javac] required: org.apache.solr.search.DocList
>> [javac] return new DocSlice(offset, realLen, docs, scores,
>>matches, maxS
>> core);
>> [javac]^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:130: intersection(org.apache.solr.search.DocSet) in
>>org.apache.solr.search.Do
>> cSet cannot be applied to (org.apache.solr.search.DocSlice)
>> [javac]   return other.intersection(this);
>> [javac]   ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
>> va:139: intersectionSize(org.apache.solr.search.DocSet) in
>>org.apache.solr.searc
>> h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
>> [javac]   return other.intersectionSize(this);
>> [javac]   ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
>> maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
>> [javac] found   : java.util.List
>> [javac] required:
>>java.util.List
>> [javac]   Query q = super.getBooleanQuery(clauses,
>>disableCoord);
>> [javac]   ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
>> maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
>> [javac] found   : java.util.List
>> [javac] required:
>>java.util.List
>> [javac]   super.addClause(clauses, conj, mods, q);
>> [javac]   ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
>> e.java:107: warning: [unchecked] unchecked cast
>> [javac] found   : java.lang.Object
>> [javac] required:
>>java.util.List> che.Stats>
>> [javac] statsList = (List)
>>persistence;
>> [javac]  ^
>> [javac] 
>>C:\Users\bbell\solr\solr\src\java\or

[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007340#comment-13007340
 ] 

David Smiley commented on SOLR-2429:


Heh, me too!  I was pondering this last night; I know specific queries will 
needlessly pollute the cache.  I was imagining a syntax such as this:  
fq={!cache=no}queryhere

> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007314#comment-13007314
 ] 

Lance Norskog edited comment on SOLR-1499 at 3/16/11 3:52 AM:
--

Yes you can!

* The source index has to store all of the fields.
* I would do a series of short queries rather than one long one.

Thank you for thinking of this.

It could also be used to recombine cores- you can change your partitioning 
strategy, for example.

  was (Author: lancenorskog):
Yes you can!

* The source index has to store all of the fields.
* I would do a series of short queries rather than one long one.

Thank you for thinking of this.
  
> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Erik Hatcher
> Fix For: Next
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007324#comment-13007324
 ] 

Otis Gospodnetic commented on SOLR-2429:


I'm with Hoss.  For many months now, I've been dreaming about the possibility 
of telling Solr to execute a query without caching the results.

> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007314#comment-13007314
 ] 

Lance Norskog commented on SOLR-1499:
-

Yes you can!

* The source index has to store all of the fields.
* I would do a series of short queries rather than one long one.

Thank you for thinking of this.

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Erik Hatcher
> Fix For: Next
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2969) fix two stopwords typos

2011-03-15 Thread Robert Muir (JIRA)
fix two stopwords typos
---

 Key: LUCENE-2969
 URL: https://issues.apache.org/jira/browse/LUCENE-2969
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2969.patch

See:

http://svn.tartarus.org/snowball?view=rev&revision=543
http://permalink.gmane.org/gmane.comp.search.snowball/1249


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2969) fix two stopwords typos

2011-03-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2969:


Attachment: LUCENE-2969.patch

> fix two stopwords typos
> ---
>
> Key: LUCENE-2969
> URL: https://issues.apache.org/jira/browse/LUCENE-2969
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-2969.patch
>
>
> See:
> http://svn.tartarus.org/snowball?view=rev&revision=543
> http://permalink.gmane.org/gmane.comp.search.snowball/1249

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007265#comment-13007265
 ] 

Ryan McKinley commented on SOLR-2429:
-

I'm not sure this is related -- it could be -- I'm looking writing a custom 
query from:
{code:java}
  @Override
  public Query getFieldQuery(QParser parser, SchemaField field, String 
externalVal)
{code}

and it would be great to know if this is used as a filter or not -- should it 
include scoring?  Are there ways to build the query where parts are cached and 
some is not?  



> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-03-15 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007236#comment-13007236
 ] 

Ahmet Arslan commented on SOLR-1499:


Hi,

Can i use this to upgrade solr version? Where the lucene/solr indices are not 
compatible?

Thanks,
Ahmet

> SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via 
> SolrJ
> -
>
> Key: SOLR-1499
> URL: https://issues.apache.org/jira/browse/SOLR-1499
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Reporter: Lance Norskog
>Assignee: Erik Hatcher
> Fix For: Next
>
> Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, 
> SOLR-1499.patch, SOLR-1499.patch
>
>
> The SolrEntityProcessor queries an external Solr instance. The Solr documents 
> returned are unpacked and emitted as DIH fields.
> The SolrEntityProcessor uses the following attributes:
> * solr='http://localhost:8983/solr/sms'
> ** This gives the URL of the target Solr instance.
> *** Note: the connection to the target Solr uses the binary SolrJ format.
> * query='Jefferson&sort=id+asc'
> ** This gives the base query string use with Solr. It can include any 
> standard Solr request parameter. This attribute is processed under the 
> variable resolution rules and can be driven in an inner stage of the indexing 
> pipeline.
> * rows='10'
> ** This gives the number of rows to fetch per request..
> ** The SolrEntityProcessor always fetches every document that matches the 
> request..
> * fields='id,tag'
> ** This selects the fields to be returned from the Solr request.
> ** These must also be declared as  elements.
> ** As with all fields, template processors can be used to alter the contents 
> to be passed downwards.
> * timeout='30'
> ** This limits the query to 5 seconds. This can be used as a fail-safe to 
> prevent the indexing session from freezing up. By default the timeout is 5 
> minutes.
> Limitations:
> * Solr errors are not handled correctly.
> * Loop control constructs have not been tested.
> * Multi-valued returned fields have not been tested.
> The unit tests give examples of how to use it as the root entity and an inner 
> entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2749) Co-occurrence filter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007229#comment-13007229
 ] 

Steven Rowe commented on LUCENE-2749:
-

bq. this filter would definitely something that i could use

What use case(s) are you thinking of?

> Co-occurrence filter
> 
>
> Key: LUCENE-2749
> URL: https://issues.apache.org/jira/browse/LUCENE-2749
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 3.1, 4.0
>Reporter: Steven Rowe
>Priority: Minor
> Fix For: 4.0
>
>
> The co-occurrence filter to be developed here will output sets of tokens that 
> co-occur within a given window onto a token stream.  
> These token sets can be ordered either lexically (to allow order-independent 
> matching/counting) or positionally (e.g. sliding windows of positionally 
> ordered co-occurring terms that include all terms in the window are called 
> n-grams or shingles). 
> The parameters to this filter will be: 
> * window size: this can be a fixed sequence length, sentence/paragraph 
> context (these will require sentence/paragraph segmentation, which is not in 
> Lucene yet), or over the entire token stream (full field width)
> * minimum number of co-occurring terms: >= 2
> * maximum number of co-occurring terms: <= window size
> * token set ordering (lexical or positional)
> One use case for co-occurring token sets is as candidates for collocations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2968) SurroundQuery doesn't support SpanNot

2011-03-15 Thread Grant Ingersoll (JIRA)
SurroundQuery doesn't support SpanNot
-

 Key: LUCENE-2968
 URL: https://issues.apache.org/jira/browse/LUCENE-2968
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


It would be nice if we could do span not in the surround query, as they are 
quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 5964 - Failure

2011-03-15 Thread Steven A Rowe
The build never made it past the initial pre-build "ant clean":

---
clean:
   [delete] Deleting directory 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/build.xml:114:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/common-build.xml:191:
 Unable to delete file 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/6/index.TestLockFactory6.-6904310916879757798/_2b.fdx
  
---


> -Original Message-
> From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
> Sent: Tuesday, March 15, 2011 5:56 PM
> To: dev@lucene.apache.org
> Subject: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 5964 - Failure
> 
> Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-
> 3.x/5964/
> 
> 6 tests failed.
> FAILED:  TEST-org.apache.lucene.index.TestIndexWriter.xml.
> 
> Error Message:
> 
> 
> Stack Trace:
> Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-
> only-3.x/checkout/lucene/build/backwards/test/TEST-
> org.apache.lucene.index.TestIndexWriter.xml was length 0
> 
> FAILED:  TEST-org.apache.lucene.search.TestBoolean2.xml.
> 
> Error Message:
> 
> 
> Stack Trace:
> Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-
> only-3.x/checkout/lucene/build/backwards/test/TEST-
> org.apache.lucene.search.TestBoolean2.xml was length 0
> 
> REGRESSION:  org.apache.lucene.store.TestLockFactory.testStressLocks
> 
> Error Message:
> IndexWriter hit unexpected exceptions
> 
> Stack Trace:
> junit.framework.AssertionFailedError: IndexWriter hit unexpected
> exceptions
>   at
> org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.j
> ava:172)
>   at
> org.apache.lucene.store.TestLockFactory.testStressLocks(TestLockFactory.ja
> va:142)
>   at
> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255)
> 
> 
> FAILED:  .org.apache.lucene.store.TestRAMDirectory
> 
> Error Message:
> org.apache.lucene.store.TestRAMDirectory
> 
> Stack Trace:
> java.lang.ClassNotFoundException: org.apache.lucene.store.TestRAMDirectory
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:186)
> 
> 
> FAILED:  .org.apache.lucene.util.TestNumericUtils
> 
> Error Message:
> org.apache.lucene.util.TestNumericUtils
> 
> Stack Trace:
> java.lang.ClassNotFoundException: org.apache.lucene.util.TestNumericUtils
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:186)
> 
> 
> FAILED:  .org.apache.lucene.util.TestSmallFloat
> 
> Error Message:
> org.apache.lucene.util.TestSmallFloat
> 
> Stack Trace:
> java.lang.ClassNotFoundException: org.apache.lucene.util.TestSmallFloat
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:186)
> 
> 
> 
> 
> Build Log (for compile errors):
> [...truncated 47 lines...]
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007217#comment-13007217
 ] 

Hoss Man commented on SOLR-2429:


why not extend Query? ... it could actually rewrite to the Query it wraps, 
giving us the best of both worlds.

FWIW: it also seems like it would make sense for this type of syntax/decoration 
to work with the "q" param (skipping the queryResultCache)

> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 5964 - Failure

2011-03-15 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/5964/

6 tests failed.
FAILED:  TEST-org.apache.lucene.index.TestIndexWriter.xml.

Error Message:


Stack Trace:
Test report file 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/TEST-org.apache.lucene.index.TestIndexWriter.xml
 was length 0

FAILED:  TEST-org.apache.lucene.search.TestBoolean2.xml.

Error Message:


Stack Trace:
Test report file 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/backwards/test/TEST-org.apache.lucene.search.TestBoolean2.xml
 was length 0

REGRESSION:  org.apache.lucene.store.TestLockFactory.testStressLocks

Error Message:
IndexWriter hit unexpected exceptions

Stack Trace:
junit.framework.AssertionFailedError: IndexWriter hit unexpected exceptions
at 
org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.java:172)
at 
org.apache.lucene.store.TestLockFactory.testStressLocks(TestLockFactory.java:142)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255)


FAILED:  .org.apache.lucene.store.TestRAMDirectory

Error Message:
org.apache.lucene.store.TestRAMDirectory

Stack Trace:
java.lang.ClassNotFoundException: org.apache.lucene.store.TestRAMDirectory
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)


FAILED:  .org.apache.lucene.util.TestNumericUtils

Error Message:
org.apache.lucene.util.TestNumericUtils

Stack Trace:
java.lang.ClassNotFoundException: org.apache.lucene.util.TestNumericUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)


FAILED:  .org.apache.lucene.util.TestSmallFloat

Error Message:
org.apache.lucene.util.TestSmallFloat

Stack Trace:
java.lang.ClassNotFoundException: org.apache.lucene.util.TestSmallFloat
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)




Build Log (for compile errors):
[...truncated 47 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This minimizes the number of calls to validate (there is still one extra call 
via the benchmark module since it invokes the common lucene compile target).  
Also splits it out into Lucene, Solr and Modules.

I'd consider it close to good enough at this point.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007205#comment-13007205
 ] 

Steven Rowe commented on LUCENE-2960:
-

bq. How about an IWC base class, extended by IWCinit and IWClive. IWCinit has 
setters for everything, and IW.getConfig() returns IWClive, which has no 
setters for things you can't set on the fly.

I tried to implement this, but couldn't figure out a way to avoid code and 
javadoc duplication and/or separation for the live setters, which need to be on 
both the init and live versions.  Duplication/separation of this sort would be 
begging for trouble.  (The live setters can't be on the base class because the 
init and live versions would have to return different types to allow for proper 
chaining.)

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007164#comment-13007164
 ] 

Dawid Weiss commented on LUCENE-2967:
-

Yes, now I see this difference on the 38M too:

trunk:
{noformat}
56.462
55.725
55.544
55.522
{noformat}
w/patch:
{noformat}
59.9
59.6
{noformat}

I'll see if I can find out the problem here; I assume the collision ratio 
should be nearly identical... but who knows. This is of no priority, but 
interesting stuff. I'll close if I can't get it better than the trunk version.

> Use linear probing with an additional good bit avalanching function in FST's 
> NodeHash.
> --
>
> Key: LUCENE-2967
> URL: https://issues.apache.org/jira/browse/LUCENE-2967
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2967.patch
>
>
> I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
> who suggested that linear probing, given a hash mixing function with good 
> avalanche properties, is a way better method of constructing lookups in 
> associative arrays compared to quadratic probing. Indeed, with linear probing 
> you can implement removals from a hash map without removed slot markers and 
> linear probing has nice properties with respect to modern CPUs (caches). I've 
> reimplemented HPPC's hash maps to use linear probing and we observed a nice 
> speedup (the same applies for fastutils of course).
> This patch changes NodeHash's implementation to use linear probing. The code 
> is a bit simpler (I think :). I also moved the load factor to a constant -- 
> 0.5 seems like a generous load factor, especially if we allow large FSTs to 
> be built. I don't see any significant speedup in constructing large automata, 
> but there is no slowdown either (I checked on one machine only for now, but 
> will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007139#comment-13007139
 ] 

Robert Muir commented on LUCENE-2960:
-

You win the fact that this is such an expert thing, and it should not confuse 
99% of users who won't need to change these settings in a live way.

This is a central API to using lucene, sorry i would rather see IWConfig be 
reverted completely than see this deprecation/undeprecation loop, it would just 
cause too much confusion.


> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007136#comment-13007136
 ] 

Earwin Burrfoot commented on LUCENE-2960:
-

You avoid deprecation/undeprecation and binary incompatibility, while 
incompatibly changing semantics. What do you win?

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007123#comment-13007123
 ] 

Robert Muir commented on LUCENE-2960:
-

Its exactly the lack of consensus we see here, thats why I am 100% against 
having the setter approach.

I'm totally against some deprecation/undeprecation loop because we in future 
releases another setting
wants to be "live".

It seems the only way we can avoid this, is for javadoc to be the only 
specification as to whether a setting
does or does not take effect "live".


> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007122#comment-13007122
 ] 

Yonik Seeley commented on SOLR-2429:


The annoying part here is we need more metadata than just "Query" that we use 
now for a filter.
Unfortunately, SolrIndexSearcher uses List everywhere.

We could create something like a SolrQuery extends Query that wrapped a normal 
query and added additional metadata (like cache options).  That's a bit messier 
since we'd have instanceof checks and casts everywhere though.

Another option is to create a SolrQuery class that does not extend Query - 
hence methods taking List would now need to take List

{code}
class SolrQuery {
  Query q;
  QParser qparser;
  boolean cache;
  ...
}
{code}

Thoughts?

> ability to not cache a filter
> -
>
> Key: SOLR-2429
> URL: https://issues.apache.org/jira/browse/SOLR-2429
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>
> A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2429) ability to not cache a filter

2011-03-15 Thread Yonik Seeley (JIRA)
ability to not cache a filter
-

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley


A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007085#comment-13007085
 ] 

Steven Rowe commented on SOLR-2427:
---

bq. I found the problem being (damn) silent JVM update in Mac OSX which 
simlinked 1.5 Java version to 1.6

Apple rocks!

bq. However the uima-core version had to be switched to 2.3.1 release (the 
snapshot one was the first jar I uploaded just some days before the release).

The manifest in {{solr/contrib/uima/lib/uima-core.jar}} listed the version as 
2.3.1-SNAPSHOT, and when I did a diff with the jar from the maven central repo, 
all of the .class files were different.  So I'm not sure what happened here, 
but the jar in Solr's source tree was definitely not the same as the released 
jar.  Maybe the released 2.3.1 jar you posted was never committed?  I don't 
know.

Anyway, it's fixed now.

bq. Thanks for taking care.

No problem.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Blocker
> Fix For: 3.1, 3.2, 4.0
>
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007078#comment-13007078
 ] 

Tommaso Teofili commented on SOLR-2427:
---

Hello Steven,
I found the problem being (damn) silent JVM update in Mac OSX which simlinked 
1.5 Java version to 1.6 :(
However the uima-core version had to be switched to 2.3.1 release (the snapshot 
one was the first jar I uploaded just some days before the release).
Thanks for taking care.


> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Blocker
> Fix For: 3.1, 3.2, 4.0
>
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007048#comment-13007048
 ] 

Earwin Burrfoot commented on LUCENE-2960:
-

bq. Oh yeah. But then we'd clone the full IWC on every set... this seems like 
overkill in the name of "purity".
So what? What exactly is overkill? Few wasted bytes and CPU ns for an object 
that's created a couple of times during application lifetime?
There are also builders, which are very similar to what Steven is proposing.

bq. Another thought is to offer all settings on the IWC for init convenience 
and exposure and then add javadoc about updaters on IW for those settings that 
can be changed on the fly
That's exactly how I'd like to see it.

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2427.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Committed:
- lucene_solr_3_1 revision 1081856
- branch_3x revision 1081860
- trunk revision 1081880

Ant build & tests succeed.  Maven build & tests succeed.  {{ant -Dversion=... 
-Dspecversion=... prepare-release sign-artifacts}} works and the generated 
Maven artifacts look good.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Blocker
> Fix For: 3.1, 3.2, 4.0
>
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007043#comment-13007043
 ] 

Steven Rowe commented on LUCENE-2960:
-

How about an IWC base class, extended by IWCinit and IWClive.  IWCinit has 
setters for everything, and IW.getConfig() returns IWClive, which has no 
setters for things you can't set on the fly.

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007036#comment-13007036
 ] 

Mark Miller commented on LUCENE-2960:
-

{quote}I really don't like that this approach would split IW configuration
into two places.  Like you look at the javadocs for IWC and think that
you cannot change the RAM buffer size.

IWC should be the one place you go to see which settings you can
change about the IW.  That some of these settings take effect "live"
while others do not is really an orthogonal (and I think, secondary,
ie handled fine w/ jdocs) aspect/concern.{quote}

You can just as easily argue that the javadocs for IWC could explain that live 
settings are on the IW.

That pattern just smells wrong. 

{quote}
But, if you want to change something live, you can
IW.getConfig().setFoo(...). The config instance is a private clone to
that IW.
{quote}

This is better than nothing.

Another thought is to offer all settings on the IWC for init convenience and 
exposure and then add javadoc about updaters on IW for those settings that can 
be changed on the fly - or one update method and enums...

> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007031#comment-13007031
 ] 

Michael McCandless commented on LUCENE-2967:


Hmm, unfortunately, I'm seeing the patch make FST building slower, at
least in my env/test set.  I built FST for the 38M wikipedia terms.

I ran 6 times each, alternating trunk & patch.

I also turned off saving the FST, and ran -noverify, so I'm only
measuring time to build it.  I run java -Xmx2g -Xms2g -Xbatch, and
measure wall clock time.

Times on trunk (seconds):

{noformat}
  43.795
  43.493
  44.343
  44.045
  43.645
  43.846
{noformat}

Times w/ patch:

{noformat}
  46.595
  47.751
  47.901
  47.901
  47.901
  47.700
{noformat}

We could also try less generous load factors...


> Use linear probing with an additional good bit avalanching function in FST's 
> NodeHash.
> --
>
> Key: LUCENE-2967
> URL: https://issues.apache.org/jira/browse/LUCENE-2967
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2967.patch
>
>
> I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
> who suggested that linear probing, given a hash mixing function with good 
> avalanche properties, is a way better method of constructing lookups in 
> associative arrays compared to quadratic probing. Indeed, with linear probing 
> you can implement removals from a hash map without removed slot markers and 
> linear probing has nice properties with respect to modern CPUs (caches). I've 
> reimplemented HPPC's hash maps to use linear probing and we observed a nice 
> speedup (the same applies for fastutils of course).
> This patch changes NodeHash's implementation to use linear probing. The code 
> is a bit simpler (I think :). I also moved the load factor to a constant -- 
> 0.5 seems like a generous load factor, especially if we allow large FSTs to 
> be built. I don't see any significant speedup in constructing large automata, 
> but there is no slowdown either (I checked on one machine only for now, but 
> will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007011#comment-13007011
 ] 

Michael McCandless commented on LUCENE-2960:


bq. Hmmm, infoStream is just for debugging... should we really make it volatile?

I'll remove its volatile...

{quote}
bq. IWC cannot be made immutable – you build it up incrementally (new 
IWC(...).setThis(...).setThat(...)). Its fields cannot be final.

Setters can return modified immutable copy of 'this'. So you get both 
incremental building and immutability.
{quote}

Oh yeah.  But then we'd clone the full IWC on every set... this seems
like overkill in the name of "purity".

{quote}
What about earlier compromise mentioned by Shay, Mark, me? Keep setters for 
'live' properties on IW.
This clearly draws the line, and you don't have to consult Javadocs for each 
and every setting to know if you can change it live or not.
{quote}

I really don't like that this approach would split IW configuration
into two places.  Like you look at the javadocs for IWC and think that
you cannot change the RAM buffer size.

IWC should be the one place you go to see which settings you can
change about the IW.  That some of these settings take effect "live"
while others do not is really an orthogonal (and I think, secondary,
ie handled fine w/ jdocs) aspect/concern.


> Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter
> --
>
> Key: LUCENE-2960
> URL: https://issues.apache.org/jira/browse/LUCENE-2960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Shay Banon
>Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2960.patch
>
>
> In 3.1 the ability to setRAMBufferSizeMB is deprecated, and removed in trunk. 
> It would be great to be able to control that on a live IndexWriter. Other 
> possible two methods that would be great to bring back are 
> setTermIndexInterval and setReaderTermsIndexDivisor. Most of the other 
> setters can actually be set on the MergePolicy itself, so no need for setters 
> for those (I think).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007003#comment-13007003
 ] 

Michael McCandless commented on LUCENE-2573:


bq. it currently holds the ram usage for that DWPT when it was checked out so 
that I can reduce the flushBytes accordingly. We can maybe get rid of it 
entirely but I don't want to rely on the DWPT bytesUsed() though.

Hmm, but, once a DWPT is pulled from production, its bytesUsed()
should not be changing anymore?  Like why can't we use it to hold its
bytesUsed?

bq. I generally don't like cluttering DocWriter and let it grow like IW. 
DocWriterSession might not be the ideal name for this class but its really a 
ram tracker for this DW. Yet, we can move out some parts that do not directly 
relate to mem tracking. Maybe DocWriterBytes?

Well DocWriter is quite small now :) (On RT branch).  And adding
another class means we have to be careful about proper sync'ing (lock
order, to avoid deadlock)... and I think it should get smaller if we
can remove state[] array, FlushState enum, etc. but, OK I guess we can
leave it as separate for now.  How about DocumentsWriterRAMUsage?
RAMTracker?

{quote}
bq. Instead of FlushPolicy.message, can't the policy call DW.message?

I don't want to couple that API to DW. What would be the benefit beside from 
saving a single method?
{quote}

Hmm, good point.  Though, it already has a SetOnce --
how come?  Can the policy call IW.message?  I just think FlushPolicy
ought to be very lean, ie show you exactly what you need to
implement...

{quote}
bq. On the by-RAM flush policies... when you hit the high water mark, we
should 
1) flush all DWPTs and 2) stall any other threads.

Well I am not sure if we should do that. I don't really see why we should 
forcefully stop the world here. Incoming threads will pick up a flush 
immediately and if we have enough resources to index further why should we wait 
until all DWPT are flushed. if we stall I fear that we could queue up threads 
that could help flushing while stalling would simply stop them doing anything, 
right? You can still control this with the healthiness though. We currently do 
flush all DWPT btw. once we hit the HW.
{quote}

As long as we default the high mark to something "generous" (2X low
mark), I think this approach should work well.

Ie, we "begin" flushing as soon as low mark is crossed on active RAM.
We pick the biggest DWPT and take it of rotation, and immediately
deduct its RAM usage from the active pool.  If, while we are still
flushing, active RAM again grows above the low mark, then we pull
another DWPT, etc.  But then if ever the total flushing + active
exceeds the high mark, we stall.

BTW why do we track flushPending RAM vs flushing RAM?  Is that
distinction necessary?  (Can't we just track "flushing" RAM?).


> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2427:
--

 Priority: Blocker  (was: Trivial)
Affects Version/s: 3.1
Fix Version/s: 3.1

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Blocker
> Fix For: 3.1
>
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006960#comment-13006960
 ] 

Steven Rowe commented on SOLR-2427:
---

It looks to me like the UIMA contrib was committed before uima-core 2.3.1 was 
released, using a 2.3.1-SNAPSHOT version of the jar, and then never upgraded 
after the release.

I think it makes sense to switch the version of the uima-core jar in Solr's 
source tree to the released 2.3.1 version, and then stop publishing a 
Solr-specific uima-core jar.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006952#comment-13006952
 ] 

Steven Rowe commented on SOLR-2427:
---

Crap, I got the uima-core situation exactly backward.

The version in {{solr/contrib/uima/lib/}} was compiled, by you, Tommaso, using 
Java 1.6 (according to {{META-INF/MANIFEST.MF}}).  However, since the 
clustering contrib tests succeed under Java 1.5, I assume that although the jar 
was compiled using Java 1.6, the target version was 1.5.

The version in the maven central repository was actually compiled with 1.5 
(again, according to {{META-INF/MANIFEST.MF}}).

Tommaso, why is the version in Solr's source tree different from the maven 
version of the jar?

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006951#comment-13006951
 ] 

Tommaso Teofili commented on SOLR-2427:
---

That is unexpected as UIMA should've been deployed with 1.5. I'll check this 
out as soon as I can.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006950#comment-13006950
 ] 

Steven Rowe commented on SOLR-2427:
---

Hmm, [uimaj-core-2.3.1.jar in the maven 
repository|http://repo1.maven.org/maven2/org/apache/uima/uimaj-core/2.3.1/] was 
compiled with Java 1.6, while the version in {{solr/contrib/uima/lib/}} was 
compiled with Java 1.5.  Tommaso, do you know of a maven-hosted 
Java-1.5-compiled version of the uima-core jar?  If not, I will leave things as 
they are now, continuing to publish a Solr-specific Java-1.5-compiled uima-core 
jar.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006946#comment-13006946
 ] 

Tommaso Teofili commented on SOLR-2427:
---

bq.  That makes little sense, though, now that I have reconsidered it, so I'll 
drop maven publishing of the Solr-specific uima-core jar. The other UIMA 
SNAPSHOT dependencies, however, will need to be published as Solr-specific 
versions, since the maven central repository rejects POMs with SNAPSHOT 
dependencies.

+1 :)

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This hooks it into compile-core, but has the unfortunate side-effect of being 
called a whole bunch of times, which is not good.  Need to read up on how to 
avoid that in ant (or if anyone has suggestions, that would be great).

Otherwise, I think the baseline functionality is ready to go.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006942#comment-13006942
 ] 

Steven Rowe commented on SOLR-2427:
---

Thanks Tommaso, I will rename them.

Separately, although you previously said that uima-core.jar is the released 
2.3.1 version, I still had been thinking that along with the other UIMA jars, 
its maven artifact should be published under the Apache Solr project.  That 
makes little sense, though, now that I have reconsidered it, so I'll drop maven 
publishing of the Solr-specific uima-core jar.  The other UIMA SNAPSHOT 
dependencies, however, will need to be published as Solr-specific versions, 
since the maven central repository rejects POMs with SNAPSHOT dependencies.

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2428:
--

Description: As of not-yet-released version 3.4.4, the carrot2-core jar 
will be published as a retrowoven 1.5 version (in addition to a 
Java-1.6-compiled version) - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]
  (was: As of not-yet-released version 3.4.4, the carrot2-core will publish a 
retowoven 1.5 version of the jar - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878])

> Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
> -
>
> Key: SOLR-2428
> URL: https://issues.apache.org/jira/browse/SOLR-2428
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Affects Versions: 3.1.1, 3.2
>Reporter: Steven Rowe
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.1.1, 3.2
>
>
> As of not-yet-released version 3.4.4, the carrot2-core jar will be published 
> as a retrowoven 1.5 version (in addition to a Java-1.6-compiled version) - 
> see Dawid Weiss's comment on 
> [LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-2428:
-

Assignee: Dawid Weiss

> Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
> -
>
> Key: SOLR-2428
> URL: https://issues.apache.org/jira/browse/SOLR-2428
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Affects Versions: 3.1.1, 3.2
>Reporter: Steven Rowe
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 3.1.1, 3.2
>
>
> As of not-yet-released version 3.4.4, the carrot2-core will publish a 
> retowoven 1.5 version of the jar - see Dawid Weiss's comment on 
> [LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned SOLR-2427:
-

Assignee: Steven Rowe

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Steven Rowe
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2957) generate-maven-artifacts target should include all non-Mavenized Lucene & Solr dependencies

2011-03-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006937#comment-13006937
 ] 

Steven Rowe commented on LUCENE-2957:
-

Thanks Dawid - I've created SOLR-2428 to track upgrading once 3.4.4 has been 
released.

> generate-maven-artifacts target should include all non-Mavenized Lucene & 
> Solr dependencies
> ---
>
> Key: LUCENE-2957
> URL: https://issues.apache.org/jira/browse/LUCENE-2957
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1, 3.2, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 3.2, 4.0
>
> Attachments: LUCENE-2923-part3.patch, LUCENE-2957-part2.patch, 
> LUCENE-2957.patch
>
>
> Currently, in addition to deploying artifacts for all of the Lucene and Solr 
> modules to a repository (by default local), the {{generate-maven-artifacts}} 
> target also deploys artifacts for the following non-Mavenized Solr 
> dependencies (lucene_solr_3_1 version given here):
> # {{solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar}} as 
> org.apache.solr:solr-commons-csv:3.1
> # {{solr/lib/apache-solr-noggit-r944541.jar}} as 
> org.apache.solr:solr-noggit:3.1
> \\ \\
> The following {{.jar}}'s should be added to the above list (lucene_solr_3_1 
> version given here):
> \\ \\
> # {{lucene/contrib/icu/lib/icu4j-4_6.jar}}
> # 
> {{lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ}}{{-1257.jar}}
> # {{solr/contrib/clustering/lib/carrot2-core-3.4.2.jar}}**
> # {{solr/contrib/uima/lib/uima-an-alchemy.jar}}
> # {{solr/contrib/uima/lib/uima-an-calais.jar}}
> # {{solr/contrib/uima/lib/uima-an-tagger.jar}}
> # {{solr/contrib/uima/lib/uima-an-wst.jar}}
> # {{solr/contrib/uima/lib/uima-core.jar}}
> \\ \\
> I think it makes sense to follow the same model as the current non-Mavenized 
> dependencies:
> \\ \\
> * {{groupId}} = {{org.apache.solr/.lucene}}
> * {{artifactId}} = {{solr-/lucene-}},
> * {{version}} = .
> **The carrot2-core jar doesn't need to be included in trunk's release 
> artifacts, since there already is a Mavenized Java6-compiled jar.  branch_3x 
> and lucene_solr_3_1 will need this Solr-specific Java5-compiled maven 
> artifact, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006936#comment-13006936
 ] 

Tommaso Teofili commented on SOLR-2427:
---

The mentioned jars have the following versions and revisions:
- uima-core.jar is 2.3.1 (released)
- uima-an-alchemy.jar is 2.3.1-SNAPSHOT revision 1062868
- uima-an-calais.jaris 2.3.1-SNAPSHOT revision 1062868
- uima-an-tagger.jar is 2.3.1-SNAPSHOT revision 1062868
- uima-an-wst.jar is 2.3.1-SNAPSHOT revision 1076132
Hope this helps

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2428) Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar

2011-03-15 Thread Steven Rowe (JIRA)
Upgrade carrot2-core dependency to a version with a Java 1.5-compiled jar
-

 Key: SOLR-2428
 URL: https://issues.apache.org/jira/browse/SOLR-2428
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Clustering
Affects Versions: 3.1.1, 3.2
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1.1, 3.2


As of not-yet-released version 3.4.4, the carrot2-core will publish a retowoven 
1.5 version of the jar - see Dawid Weiss's comment on 
[LUCENE-2957|https://issues.apache.org/jira/browse/LUCENE-2957?focusedCommentId=13006878&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13006878]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006930#comment-13006930
 ] 

Robert Muir commented on SOLR-2427:
---

I agree, i think best would be to format them like the others in solr: for 
example commons-csv-1.0-SNAPSHOT-r966014.jar

> UIMA jars are missing version numbers
> -
>
> Key: SOLR-2427
> URL: https://issues.apache.org/jira/browse/SOLR-2427
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Priority: Trivial
>
> We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

Pretty close to standalone completion.  Next step to hook it in.  I'm going to 
commit the license naming normalization now but not the validation code yet.

Also, renamed LicenseChecker to DependencyChecker as it might be useful for 
checking other things like that all jars have version numbers.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Grant Ingersoll (JIRA)
UIMA jars are missing version numbers
-

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Trivial


We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-2967:


Attachment: LUCENE-2967.patch

Linear probing in NodeHash.

> Use linear probing with an additional good bit avalanching function in FST's 
> NodeHash.
> --
>
> Key: LUCENE-2967
> URL: https://issues.apache.org/jira/browse/LUCENE-2967
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: 4.0
>
> Attachments: LUCENE-2967.patch
>
>
> I recently had an interesting discussion with Sebastiano Vigna (fastutil), 
> who suggested that linear probing, given a hash mixing function with good 
> avalanche properties, is a way better method of constructing lookups in 
> associative arrays compared to quadratic probing. Indeed, with linear probing 
> you can implement removals from a hash map without removed slot markers and 
> linear probing has nice properties with respect to modern CPUs (caches). I've 
> reimplemented HPPC's hash maps to use linear probing and we observed a nice 
> speedup (the same applies for fastutils of course).
> This patch changes NodeHash's implementation to use linear probing. The code 
> is a bit simpler (I think :). I also moved the load factor to a constant -- 
> 0.5 seems like a generous load factor, especially if we allow large FSTs to 
> be built. I don't see any significant speedup in constructing large automata, 
> but there is no slowdown either (I checked on one machine only for now, but 
> will verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.

2011-03-15 Thread Dawid Weiss (JIRA)
Use linear probing with an additional good bit avalanching function in FST's 
NodeHash.
--

 Key: LUCENE-2967
 URL: https://issues.apache.org/jira/browse/LUCENE-2967
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Trivial
 Fix For: 4.0


I recently had an interesting discussion with Sebastiano Vigna (fastutil), who 
suggested that linear probing, given a hash mixing function with good avalanche 
properties, is a way better method of constructing lookups in associative 
arrays compared to quadratic probing. Indeed, with linear probing you can 
implement removals from a hash map without removed slot markers and linear 
probing has nice properties with respect to modern CPUs (caches). I've 
reimplemented HPPC's hash maps to use linear probing and we observed a nice 
speedup (the same applies for fastutils of course).

This patch changes NodeHash's implementation to use linear probing. The code is 
a bit simpler (I think :). I also moved the load factor to a constant -- 0.5 
seems like a generous load factor, especially if we allow large FSTs to be 
built. I don't see any significant speedup in constructing large automata, but 
there is no slowdown either (I checked on one machine only for now, but will 
verify on other machines too).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1081745 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

2011-03-15 Thread Dawid Weiss
Thanks Mike :)
Dawid

On Tue, Mar 15, 2011 at 1:22 PM, Michael McCandless
 wrote:
> Looks good Dawid!
>
> On Tue, Mar 15, 2011 at 8:20 AM,   wrote:
>> Author: dweiss
>> Date: Tue Mar 15 12:20:03 2011
>> New Revision: 1081745
>>
>> URL: http://svn.apache.org/viewvc?rev=1081745&view=rev
>> Log:
>> Adding -noverify and a little bit nicer output to TestFSTs. These are 
>> debugging/analysis utils that are not used anywhere, so I commit them 
>> without the patch.
>>
>> Modified:
>>    
>> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>>
>> Modified: 
>> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java?rev=1081745&r1=1081744&r2=1081745&view=diff
>> ==
>> --- 
>> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>>  (original)
>> +++ 
>> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>>  Tue Mar 15 12:20:03 2011
>> @@ -25,16 +25,7 @@ import java.io.IOException;
>>  import java.io.InputStreamReader;
>>  import java.io.OutputStreamWriter;
>>  import java.io.Writer;
>> -import java.util.ArrayList;
>> -import java.util.Arrays;
>> -import java.util.Collections;
>> -import java.util.HashMap;
>> -import java.util.HashSet;
>> -import java.util.Iterator;
>> -import java.util.List;
>> -import java.util.Map;
>> -import java.util.Random;
>> -import java.util.Set;
>> +import java.util.*;
>>
>>  import org.apache.lucene.analysis.MockAnalyzer;
>>  import org.apache.lucene.document.Document;
>> @@ -1098,7 +1089,7 @@ public class TestFSTs extends LuceneTest
>>
>>     protected abstract T getOutput(IntsRef input, int ord) throws 
>> IOException;
>>
>> -    public void run(int limit) throws IOException {
>> +    public void run(int limit, boolean verify) throws IOException {
>>       BufferedReader is = new BufferedReader(new InputStreamReader(new 
>> FileInputStream(wordsFileIn), "UTF-8"), 65536);
>>       try {
>>         final IntsRef intsRef = new IntsRef(10);
>> @@ -1115,7 +1106,9 @@ public class TestFSTs extends LuceneTest
>>
>>           ord++;
>>           if (ord % 50 == 0) {
>> -            System.out.println(((System.currentTimeMillis()-tStart)/1000.0) 
>> + "s: " + ord + "...");
>> +            System.out.println(
>> +                String.format(Locale.ENGLISH,
>> +                    "%6.2fs: %9d...", ((System.currentTimeMillis() - 
>> tStart) / 1000.0), ord));
>>           }
>>           if (ord >= limit) {
>>             break;
>> @@ -1144,6 +1137,10 @@ public class TestFSTs extends LuceneTest
>>
>>         System.out.println("Saved FST to fst.bin.");
>>
>> +        if (!verify) {
>> +          System.exit(0);
>> +        }
>> +
>>         System.out.println("\nNow verify...");
>>
>>         is.close();
>> @@ -1194,6 +1191,7 @@ public class TestFSTs extends LuceneTest
>>     int inputMode = 0;                             // utf8
>>     boolean storeOrds = false;
>>     boolean storeDocFreqs = false;
>> +    boolean verify = true;
>>     while(idx < args.length) {
>>       if (args[idx].equals("-prune")) {
>>         prune = Integer.valueOf(args[1+idx]);
>> @@ -1215,6 +1213,9 @@ public class TestFSTs extends LuceneTest
>>       if (args[idx].equals("-ords")) {
>>         storeOrds = true;
>>       }
>> +      if (args[idx].equals("-noverify")) {
>> +        verify = false;
>> +      }
>>       idx++;
>>     }
>>
>> @@ -1235,7 +1236,7 @@ public class TestFSTs extends LuceneTest
>>           return new PairOutputs.Pair(o1.get(ord),
>>                                                  
>> o2.get(_TestUtil.nextInt(rand, 1, 5000)));
>>         }
>> -      }.run(limit);
>> +      }.run(limit, verify);
>>     } else if (storeOrds) {
>>       // Store only ords
>>       final PositiveIntOutputs outputs = 
>> PositiveIntOutputs.getSingleton(true);
>> @@ -1244,7 +1245,7 @@ public class TestFSTs extends LuceneTest
>>         public Long getOutput(IntsRef input, int ord) {
>>           return outputs.get(ord);
>>         }
>> -      }.run(limit);
>> +      }.run(limit, verify);
>>     } else if (storeDocFreqs) {
>>       // Store only docFreq
>>       final PositiveIntOutputs outputs = 
>> PositiveIntOutputs.getSingleton(false);
>> @@ -1257,7 +1258,7 @@ public class TestFSTs extends LuceneTest
>>           }
>>           return outputs.get(_TestUtil.nextInt(rand, 1, 5000));
>>         }
>> -      }.run(limit);
>> +      }.run(limit, verify);
>>     } else {
>>       // Store nothing
>>       final NoOutputs outputs = NoOutputs.getSingleton();
>> @@ -1267,7 +1268,7 @@ public class TestFSTs extends LuceneTest
>>         public Object getOutput(IntsRef input, int ord) {
>>           return NO_OUTPUT;
>>         }
>> -      }.run(limit);
>> +      }.run(limit, verify);
>>     }
>>   }
>>
>

Re: svn commit: r1081745 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java

2011-03-15 Thread Michael McCandless
Looks good Dawid!

On Tue, Mar 15, 2011 at 8:20 AM,   wrote:
> Author: dweiss
> Date: Tue Mar 15 12:20:03 2011
> New Revision: 1081745
>
> URL: http://svn.apache.org/viewvc?rev=1081745&view=rev
> Log:
> Adding -noverify and a little bit nicer output to TestFSTs. These are 
> debugging/analysis utils that are not used anywhere, so I commit them without 
> the patch.
>
> Modified:
>    
> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>
> Modified: 
> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java?rev=1081745&r1=1081744&r2=1081745&view=diff
> ==
> --- 
> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>  (original)
> +++ 
> lucene/dev/trunk/lucene/src/test/org/apache/lucene/util/automaton/fst/TestFSTs.java
>  Tue Mar 15 12:20:03 2011
> @@ -25,16 +25,7 @@ import java.io.IOException;
>  import java.io.InputStreamReader;
>  import java.io.OutputStreamWriter;
>  import java.io.Writer;
> -import java.util.ArrayList;
> -import java.util.Arrays;
> -import java.util.Collections;
> -import java.util.HashMap;
> -import java.util.HashSet;
> -import java.util.Iterator;
> -import java.util.List;
> -import java.util.Map;
> -import java.util.Random;
> -import java.util.Set;
> +import java.util.*;
>
>  import org.apache.lucene.analysis.MockAnalyzer;
>  import org.apache.lucene.document.Document;
> @@ -1098,7 +1089,7 @@ public class TestFSTs extends LuceneTest
>
>     protected abstract T getOutput(IntsRef input, int ord) throws IOException;
>
> -    public void run(int limit) throws IOException {
> +    public void run(int limit, boolean verify) throws IOException {
>       BufferedReader is = new BufferedReader(new InputStreamReader(new 
> FileInputStream(wordsFileIn), "UTF-8"), 65536);
>       try {
>         final IntsRef intsRef = new IntsRef(10);
> @@ -1115,7 +1106,9 @@ public class TestFSTs extends LuceneTest
>
>           ord++;
>           if (ord % 50 == 0) {
> -            System.out.println(((System.currentTimeMillis()-tStart)/1000.0) 
> + "s: " + ord + "...");
> +            System.out.println(
> +                String.format(Locale.ENGLISH,
> +                    "%6.2fs: %9d...", ((System.currentTimeMillis() - tStart) 
> / 1000.0), ord));
>           }
>           if (ord >= limit) {
>             break;
> @@ -1144,6 +1137,10 @@ public class TestFSTs extends LuceneTest
>
>         System.out.println("Saved FST to fst.bin.");
>
> +        if (!verify) {
> +          System.exit(0);
> +        }
> +
>         System.out.println("\nNow verify...");
>
>         is.close();
> @@ -1194,6 +1191,7 @@ public class TestFSTs extends LuceneTest
>     int inputMode = 0;                             // utf8
>     boolean storeOrds = false;
>     boolean storeDocFreqs = false;
> +    boolean verify = true;
>     while(idx < args.length) {
>       if (args[idx].equals("-prune")) {
>         prune = Integer.valueOf(args[1+idx]);
> @@ -1215,6 +1213,9 @@ public class TestFSTs extends LuceneTest
>       if (args[idx].equals("-ords")) {
>         storeOrds = true;
>       }
> +      if (args[idx].equals("-noverify")) {
> +        verify = false;
> +      }
>       idx++;
>     }
>
> @@ -1235,7 +1236,7 @@ public class TestFSTs extends LuceneTest
>           return new PairOutputs.Pair(o1.get(ord),
>                                                  
> o2.get(_TestUtil.nextInt(rand, 1, 5000)));
>         }
> -      }.run(limit);
> +      }.run(limit, verify);
>     } else if (storeOrds) {
>       // Store only ords
>       final PositiveIntOutputs outputs = 
> PositiveIntOutputs.getSingleton(true);
> @@ -1244,7 +1245,7 @@ public class TestFSTs extends LuceneTest
>         public Long getOutput(IntsRef input, int ord) {
>           return outputs.get(ord);
>         }
> -      }.run(limit);
> +      }.run(limit, verify);
>     } else if (storeDocFreqs) {
>       // Store only docFreq
>       final PositiveIntOutputs outputs = 
> PositiveIntOutputs.getSingleton(false);
> @@ -1257,7 +1258,7 @@ public class TestFSTs extends LuceneTest
>           }
>           return outputs.get(_TestUtil.nextInt(rand, 1, 5000));
>         }
> -      }.run(limit);
> +      }.run(limit, verify);
>     } else {
>       // Store nothing
>       final NoOutputs outputs = NoOutputs.getSingleton();
> @@ -1267,7 +1268,7 @@ public class TestFSTs extends LuceneTest
>         public Object getOutput(IntsRef input, int ord) {
>           return NO_OUTPUT;
>         }
> -      }.run(limit);
> +      }.run(limit, verify);
>     }
>   }
>
>
>
>



-- 
Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucen

[jira] Resolved: (SOLR-2426) Build failing

2011-03-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-2426.
---

Resolution: Not A Problem

Trunk requires java 6.

> Build failing
> -
>
> Key: SOLR-2426
> URL: https://issues.apache.org/jira/browse/SOLR-2426
> Project: Solr
>  Issue Type: Bug
>Reporter: Bill Bell
>
> ant clean
> ant example
> trunk
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
> llector.java:77: incompatible types
> [javac] found   : org.apache.solr.search.BitDocSet
> [javac] required: org.apache.solr.search.DocSet
> [javac]   return new BitDocSet(bits,pos);
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
> llector.java:132: incompatible types
> [javac] found   : org.apache.solr.search.SortedIntDocSet
> [javac] required: org.apache.solr.search.DocSet
> [javac]   return new SortedIntDocSet(scratch, pos);
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
> llector.java:136: incompatible types
> [javac] found   : org.apache.solr.search.BitDocSet
> [javac] required: org.apache.solr.search.DocSet
> [javac]   return new BitDocSet(bits,pos);
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:26: org.apache.solr.search.DocSlice is not abstract and does not override 
> abs
> tract method getTopFilter() in org.apache.solr.search.DocSet
> [javac] public class DocSlice extends DocSetBase implements DocList {
> [javac]^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:54: incompatible types
> [javac] found   : org.apache.solr.search.DocSlice
> [javac] required: org.apache.solr.search.DocList
> [javac] if (this.offset == offset && this.len==len) return this;
> [javac]^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:62: incompatible types
> [javac] found   : org.apache.solr.search.DocSlice
> [javac] required: org.apache.solr.search.DocList
> [javac] if (this.offset == offset && this.len == realLen) return this;
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:63: incompatible types
> [javac] found   : org.apache.solr.search.DocSlice
> [javac] required: org.apache.solr.search.DocList
> [javac] return new DocSlice(offset, realLen, docs, scores, matches, 
> maxS
> core);
> [javac]^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:130: intersection(org.apache.solr.search.DocSet) in 
> org.apache.solr.search.Do
> cSet cannot be applied to (org.apache.solr.search.DocSlice)
> [javac]   return other.intersection(this);
> [javac]   ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
> va:139: intersectionSize(org.apache.solr.search.DocSet) in 
> org.apache.solr.searc
> h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
> [javac]   return other.intersectionSize(this);
> [javac]   ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
> maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
> [javac] found   : java.util.List
> [javac] required: java.util.List
> [javac]   Query q = super.getBooleanQuery(clauses, disableCoord);
> [javac]   ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
> maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
> [javac] found   : java.util.List
> [javac] required: java.util.List
> [javac]   super.addClause(clauses, conj, mods, q);
> [javac]   ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
> e.java:107: warning: [unchecked] unchecked cast
> [javac] found   : java.lang.Object
> [javac] required: 
> java.util.List che.Stats>
> [javac] statsList = (List) persistence;
> [javac]  ^
> [javac] 
> C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
> e.java:263: warning: [unchecked] unchecked cast
> [javac] found   : java.util.Set
> [javac] required: java.util.Set
> [javac]   for (Map.Entry e : (Set )items.entrySet()) {
> [javac] ^
> [javac] 
> 

[jira] Commented: (LUCENE-2957) generate-maven-artifacts target should include all non-Mavenized Lucene & Solr dependencies

2011-03-15 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006878#comment-13006878
 ] 

Dawid Weiss commented on LUCENE-2957:
-

Hi Steven. This issue is closed, but just to mark it for the future: I've added 
a retrowoven version of Carrot2-core, it will be part of maintenance release 
3.4.4:
https://oss.sonatype.org/content/repositories/snapshots/org/carrot2/carrot2-core/3.4.4-SNAPSHOT/

The -jdk15 classifier is the one working with Java 1.5 (I checked with our 
examples and they work fine, so there should be no problems with it in SOLR).

> generate-maven-artifacts target should include all non-Mavenized Lucene & 
> Solr dependencies
> ---
>
> Key: LUCENE-2957
> URL: https://issues.apache.org/jira/browse/LUCENE-2957
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1, 3.2, 4.0
>Reporter: Steven Rowe
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.1, 3.2, 4.0
>
> Attachments: LUCENE-2923-part3.patch, LUCENE-2957-part2.patch, 
> LUCENE-2957.patch
>
>
> Currently, in addition to deploying artifacts for all of the Lucene and Solr 
> modules to a repository (by default local), the {{generate-maven-artifacts}} 
> target also deploys artifacts for the following non-Mavenized Solr 
> dependencies (lucene_solr_3_1 version given here):
> # {{solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar}} as 
> org.apache.solr:solr-commons-csv:3.1
> # {{solr/lib/apache-solr-noggit-r944541.jar}} as 
> org.apache.solr:solr-noggit:3.1
> \\ \\
> The following {{.jar}}'s should be added to the above list (lucene_solr_3_1 
> version given here):
> \\ \\
> # {{lucene/contrib/icu/lib/icu4j-4_6.jar}}
> # 
> {{lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ}}{{-1257.jar}}
> # {{solr/contrib/clustering/lib/carrot2-core-3.4.2.jar}}**
> # {{solr/contrib/uima/lib/uima-an-alchemy.jar}}
> # {{solr/contrib/uima/lib/uima-an-calais.jar}}
> # {{solr/contrib/uima/lib/uima-an-tagger.jar}}
> # {{solr/contrib/uima/lib/uima-an-wst.jar}}
> # {{solr/contrib/uima/lib/uima-core.jar}}
> \\ \\
> I think it makes sense to follow the same model as the current non-Mavenized 
> dependencies:
> \\ \\
> * {{groupId}} = {{org.apache.solr/.lucene}}
> * {{artifactId}} = {{solr-/lucene-}},
> * {{version}} = .
> **The carrot2-core jar doesn't need to be included in trunk's release 
> artifacts, since there already is a Mavenized Java6-compiled jar.  branch_3x 
> and lucene_solr_3_1 will need this Solr-specific Java5-compiled maven 
> artifact, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr query POST and not in GET

2011-03-15 Thread Gastone Penzo
Hi,
is possible to change Solr sending query method from get to post?
because my query has a lot of OR..OR..OR and the log says to me Request URI
too large
Where can i change it??
thanx




-- 
Gastone Penzo

www.solr-italia.it
The first italian blog about SOLR


[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006871#comment-13006871
 ] 

Simon Willnauer commented on LUCENE-2573:
-

bq. I still see a healtiness (mis-spelled) in DW.
ugh. I will fix
{quote}
I'd rather not have the stalling/healthiness be baked into the API, at
all. Can we put the hijack logic entirely private in the flush-by-ram
policies? (Ie remove isStalled()/hijackThreadsForFlush()).
{quote}

I agree for the hijack part but the isStalled is something I might want to 
control. I mean we can still open it up eventually so rather make it private 
for now but keep a not on in. 

{quote}
Can we move FlushSpecification out of FlushPolicy? Ie, it's a private
impl detail of DW right? (Not part of FlushPolicy's API). Actually
why do we need it? Can't we just return the DWPT?
{quote}

it currently holds the ram usage for that DWPT when it was checked out so that 
I can reduce the flushBytes accordingly. We can maybe get rid of it entirely 
but I don't want to rely on the DWPT bytesUsed() though.
We can certainly move it out - this inner class is a relict though.

bq. Why do we have a separate DocWriterSession? Can't this be absorbed
into DocWriter?

I generally don't like cluttering DocWriter and let it grow like IW. 
DocWriterSession might not be the ideal name for this class but its really a 
ram tracker for this DW. Yet, we can move out some parts that do not directly 
relate to mem tracking. Maybe DocWriterBytes?

bq. Be careful defaulting TermsHash.trackAllocations to true – eg term
vectors wants this to be false.

I need to go through the IndexingChain and check carefully where to track 
memory anyway. I haven't got to that yet but good that you mention it that one 
could easily get lost.





bq. Instead of FlushPolicy.message, can't the policy call DW.message?
I don't want to couple that API to DW. What would be the benefit beside from 
saving a single method?
{quote}
On the by-RAM flush policies... when you hit the high water mark, we
should 1) 
flush all DWPTs and 2) stall any other threads.
{quote}
Well I am not sure if we should do that. I don't really see why we should 
forcefully stop the world here. Incoming threads will pick up a flush 
immediately and if we have enough resources to index further why should we wait 
until all DWPT are flushed. if we stall I fear that we could queue up threads 
that could help flushing while stalling would simply stop them doing anything, 
right? You can still control this with the healthiness though. We currently do 
flush all DWPT btw. once we hit the HW. 

{quote}
Why do we dereference the DWPTs with their ord? EG, can't we just
store their 
"state" (active or flushPending) on the DWPT instead of in
a separate states[]?
{quote}
That is definitely an option. I will give that a go.
{quote}
Do we really need FlushState.Aborted? And if not... do we really need

FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I 
think is then redundant w/ flushPending boolean?).
{quote}
this needs some more refactoring I will attach another iteration
{quote}
I think the default low water should be 1X of your RAM buffer? And
high water 
maybe 2X? (For both flush-by-RAM policies).
{quote}
hmm, I think we need to revise the maxRAMBufferMB Javadoc anyway so we have all 
the freedom to do whatever we want. yet, I think we should try to keep the RAM 
consumption similar to what it would have used in a previous release. So if we 
say HW is 2x then suddenly some apps might run out of memory. I am not sure if 
we should do that or rather stick to the 90% to 110% for now.  We need to find 
good defaults for this anyway.


> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water m

Re: I want to take part in Google Summer Code 2011

2011-03-15 Thread Anurag
I did one of the project where i crawled the data through Nutch-1.0 and did
indexing to Apache solr to establish a search engine with proper UI like
autosuggest,spellcheck running on tomcat server .

Now we are extending the project to included novel fuzzy queries usign OWA
operator like "at least half", "as many as possible" etc...this is different
from usual boolean search. We are refering to a paper presented by our
respected Prof. M.M. Sufyan Beg. This will be implemented in Apache-solr .

-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-want-to-take-part-in-Google-Summer-Code-2011-tp2668316p2680987.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: v2 of the release based on feedback.

Note: SOLR-2242-distinctFacet.patch not needed (left for history))

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: New ver)

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Comment: was deleted

(was: Maybe, but I thought all params were supposed to be lower case?

I can easily change that ??)

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: (was: SOLR-2242-distinctFacet.patch)

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006792#comment-13006792
 ] 

Bill Bell edited comment on SOLR-2242 at 3/15/11 8:22 AM:
--

I am going to use your suggestion. You will not have to set the limit. Getting 
the numFacetTerms will be optional, and you also will be able to NOT get the 
hgids as well. I propose this (please comment):

This will ONLY output the numFacetTerms (no hgid facet counts):
http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=hgid&f.hgid.facet.numFacetTerms=1

This assumes the count will be limit=-1

{code}

  
   7  
  

{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=hgid&facet.mincount=1&f.hgid.facet.numFacetTerms=2

{code}

  
   7  
   
1
1
1
1
1
5
1
   
  

{code}

  was (Author: billnbell):
I am going to use your suggestion. You will not have to set the limit. 
Getting the numFacetTerms will be optional, and you also will be able to NOT 
get the hgids as well. I propose this (please comment):

This will ONLY output the numFacetTerms (no hgid facet counts):
http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=hgid&f.hgid.facet.numfacetterms=1

This assumes the count will be limit=-1

{code}

  
   7  
  

{code}

This will output the numFacetTerms AND hgid:
http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=hgid&facet.mincount=1&f.hgid.facet.numfacetterms=2

{code}

  
   7  
   
1
1
1
1
1
5
1
   
  

{code}
  
> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: SOLR.2242.v2.patch

New ver

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2242-distinctFacet.patch, SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2242) Get distinct count of names for a facet field

2011-03-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2242:


Attachment: (was: SOLR-2242.v2.patch)

> Get distinct count of names for a facet field
> -
>
> Key: SOLR-2242
> URL: https://issues.apache.org/jira/browse/SOLR-2242
> Project: Solr
>  Issue Type: New Feature
>  Components: Response Writers
>Affects Versions: 4.0
>Reporter: Bill Bell
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2242-distinctFacet.patch, SOLR.2242.v2.patch
>
>
> When returning facet.field= you will get a list of matches for 
> distinct values. This is normal behavior. This patch tells you how many 
> distinct values you have (# of rows). Use with limit=-1 and mincount=1.
> The feature is called "namedistinct". Here is an example:
> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=manu&facet.mincount=1&facet.limit=-1&f.manu.facet.namedistinct=0&facet.field=price&f.price.facet.namedistinct=1
> Here is an example on field "hgid" (without namedistinct):
> {code}
> - 
> - 
>   1 
>   1 
>   1 
>   1 
>   1 
>   5 
>   1 
>   
>   
> {code}
> With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
> HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
> HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
> (7), not the number of values (11).
> {code}
> - 
> - 
>   7 
>   
>   
> {code}
> This works actually really good to get total number of fields for a 
> group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2426) Build failing

2011-03-15 Thread Bill Bell (JIRA)
Build failing
-

 Key: SOLR-2426
 URL: https://issues.apache.org/jira/browse/SOLR-2426
 Project: Solr
  Issue Type: Bug
Reporter: Bill Bell


ant clean
ant example
trunk
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:77: incompatible types
[javac] found   : org.apache.solr.search.BitDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new BitDocSet(bits,pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:132: incompatible types
[javac] found   : org.apache.solr.search.SortedIntDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new SortedIntDocSet(scratch, pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSetHitCo
llector.java:136: incompatible types
[javac] found   : org.apache.solr.search.BitDocSet
[javac] required: org.apache.solr.search.DocSet
[javac]   return new BitDocSet(bits,pos);
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:26: org.apache.solr.search.DocSlice is not abstract and does not override abs
tract method getTopFilter() in org.apache.solr.search.DocSet
[javac] public class DocSlice extends DocSetBase implements DocList {
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:54: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] if (this.offset == offset && this.len==len) return this;
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:62: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] if (this.offset == offset && this.len == realLen) return this;
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:63: incompatible types
[javac] found   : org.apache.solr.search.DocSlice
[javac] required: org.apache.solr.search.DocList
[javac] return new DocSlice(offset, realLen, docs, scores, matches, maxS
core);
[javac]^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:130: intersection(org.apache.solr.search.DocSet) in org.apache.solr.search.Do
cSet cannot be applied to (org.apache.solr.search.DocSlice)
[javac]   return other.intersection(this);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\DocSlice.ja
va:139: intersectionSize(org.apache.solr.search.DocSet) in org.apache.solr.searc
h.DocSet cannot be applied to (org.apache.solr.search.DocSlice)
[javac]   return other.intersectionSize(this);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
maxQParserPlugin.java:829: warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: java.util.List
[javac]   Query q = super.getBooleanQuery(clauses, disableCoord);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\ExtendedDis
maxQParserPlugin.java:845: warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: java.util.List
[javac]   super.addClause(clauses, conj, mods, q);
[javac]   ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
e.java:107: warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: java.util.List
[javac] statsList = (List) persistence;
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\FastLRUCach
e.java:263: warning: [unchecked] unchecked cast
[javac] found   : java.util.Set
[javac] required: java.util.Set
[javac]   for (Map.Entry e : (Set )items.entrySet()) {
[javac] ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\Grouping.ja
va:61: warning: [unchecked] unchecked call to add(java.lang.String,T) as a membe
r of the raw type org.apache.solr.common.util.NamedList
[javac]   grouped.add(key, groupResult);  // grouped={ key={
[javac]  ^
[javac] C:\Users\bbell\solr\solr\src\java\org\apache\solr\search\Grouping.ja
va:64: warning: [unchecked] unchecked call to add(java.lang.String,T) as a membe
r of the raw type o

ClassCastException SOLR 1709 Distributed Date Faceting

2011-03-15 Thread Viswa S

Folks,

I applied the 4.x patch onto trunk and complied. However there seems to be run 
time exception as below

Thanks
Viswa

type Status report

message java.util.Date cannot be cast to java.lang.Integer 
java.lang.ClassCastException: java.util.Date cannot be cast to 
java.lang.Integer at 
org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:294)
 at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:326)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
java.lang.Thread.run(Unknown Source)

description The server encountered an internal error (java.util.Date cannot be 
cast to java.lang.Integer java.lang.ClassCastException: java.util.Date cannot 
be cast to java.lang.Integer at 
org.apache.solr.handler.component.FacetComponent.countFacets(FacetComponent.java:294)
 at 
org.apache.solr.handler.component.FacetComponent.handleResponses(FacetComponent.java:232)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:326)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
java.lang.Thread.run(Unknown Source) ) that prevented it from fulfilling this 
request.