from:"Peter Sturge \(JIRA\)"

[jira] [Commented] (SOLR-1861) HTTP Authentication for sharded queries

2012-04-29 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264570#comment-13264570
]

Peter Sturge commented on SOLR-1861:

Would a Solrj client be able to intrinsically handle a distributed shard
request? It could make separate requests for each shard, but you wouldn't have
the nice advantage of distributed searches, with aggregated facets, ranges etc.
that's built-in on the server side. Or perhaps I've misunderstood your Solrj
suggestion?

HTTP Authentication for sharded queries
---

Key: SOLR-1861
URL: https://issues.apache.org/jira/browse/SOLR-1861
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Labels: authentication, distributed, http, shard
Attachments: SearchHandler.java, SearchHandler.java

This issue came out of a requirement to have HTTP authentication for queries.
Currently, HTTP authentication works for querying single servers, but it's
not possible for distributed searches across multiple shards to receive
authenticated http requests.
This patch adds the option for Solr clients to pass shard-specific http
credentials to SearchHandler, which can then use these credentials when
making http requests to shards.
Here's how the patch works:
A final constant String called {{shardcredentials}} acts as the name of the
SolrParams parameter key name.
The format for the value associated with this key is a comma-delimited list
of colon-separated tokens:
{{
shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
}}
A client adds these parameters to their sharded request.
In the absence of {{shardcredentials}} and/or matching credentials, the patch
reverts to the existing behaviour of using a default http client (i.e. no
credentials). This ensures b/w compatibility.
When SearchHandler receives the request, it passes the 'shardcredentials'
parameter to the HttpCommComponent via the submit() method.
The HttpCommComponent parses the parameter string, and when it finds matching
credentials for a given shard, it creates an HttpClient object with those
credentials, and then sends the request using this.
Note: Because the match comparison is a string compare (a.o.t. dns compare),
the host/ip names used in the shardcredentials parameters must match those
used in the shards parameter.
Impl Notes:
This patch is used and tested on the 1.4 release codebase. There weren't any
significant diffs between the 1.4 release and the latest trunk for
SearchHandler, so should be fine on other trunks, but I've only tested with
the 1.4 release code base.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3421) Distributed Search doesn't allow for HTTP Authentication

2012-04-28 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264296#comment-13264296
 ] 

Peter Sturge commented on SOLR-3421:


There is an existing patch for this behaviour - see:
issues.apache.org/jira/browse/SOLR-1861

This patch allows distributed credentials to be passed inside the url, where 
SearchHandler then parses this an creates HttpConnections for each shard in the 
distributed search.
Some useful extensions to this approach would be the use of certificates 
(instead of explicit credentials), and/or acl lists stored on the server side, 
with pre-authentication (e.g. via passing hash values instead of explicit 
credentials). The base mechanism provided in this patch can be used in both 
cases.

HTH!
Peter


 Distributed Search doesn't allow for HTTP Authentication
 

 Key: SOLR-3421
 URL: https://issues.apache.org/jira/browse/SOLR-3421
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 3.6, 4.0
 Environment: Sharded solr cluster
Reporter: Michael Della Bitta
Priority: Minor
  Labels: auth, distributed_search, ssl

 The distributed search feature allows one to configure the list of shards the 
 SearchHandler should query and aggregate results from using the shards 
 parameter. Unfortunately, there is no way to configure any sort of 
 authentication between shards and a distributed search-enabled SearchHandler. 
 It'd be good to be able to specify an authentication type, auth credentials, 
 and transport security to allow installations that don't have the benefit of 
 being protected by a firewall some measure of security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3421) Distributed Search doesn't allow for HTTP Authentication

2012-04-28 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264297#comment-13264297
 ] 

Peter Sturge commented on SOLR-3421:


It's also worth noting that one of the advantages of this approach is that is 
allows for partial results to be returned (with error details in the response) 
if one or more shards are unavailable, but others are ok. An optional flag to 
allow this (or not) can switch this feature on or off.


 Distributed Search doesn't allow for HTTP Authentication
 

 Key: SOLR-3421
 URL: https://issues.apache.org/jira/browse/SOLR-3421
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 3.6, 4.0
 Environment: Sharded solr cluster
Reporter: Michael Della Bitta
Priority: Minor
  Labels: auth, distributed_search, ssl

 The distributed search feature allows one to configure the list of shards the 
 SearchHandler should query and aggregate results from using the shards 
 parameter. Unfortunately, there is no way to configure any sort of 
 authentication between shards and a distributed search-enabled SearchHandler. 
 It'd be good to be able to specify an authentication type, auth credentials, 
 and transport security to allow installations that don't have the benefit of 
 being protected by a firewall some measure of security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

2011-06-15 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049727#comment-13049727
 ] 

Peter Sturge commented on SOLR-2593:


This is a really great idea, thanks!
If it's possible, it would be cool to have config parameters to:
 create a new core
 overwrite an existing core
 rename an existing core, then create (rolling backup)
 merge with an existing core (ever-growing, but kind of an accessible 'archive' 
index)


 A new core admin command 'split' for splitting index
 

 Key: SOLR-2593
 URL: https://issues.apache.org/jira/browse/SOLR-2593
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
 Fix For: 4.0


 If an index is too large/hot it would be desirable to split it out to another 
 core .
 This core may eventually be replicated out to another host.
 There can be to be multiple strategies 
 * random split of x or x% 
 * fq=user:johndoe
 example 
 example :
 command=splitsplit=20percentnewcore=my_new_index
 or
 command=splitfq=user:johndoenewcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1709) Distributed Date and Range Faceting

2011-04-21 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022697#comment-13022697
 ] 

Peter Sturge commented on SOLR-1709:


Yes, the deprecation story makes sense.

Regarding SOLR-1729, I'm pretty sure this already works for 3x (it was 
originally created on/for the 3x branch). I guess Yonik's NOW changes were 
destined for trunk, but I've been using the current SOLR-1729 patch on 3x 
branch and is working fine in production environments.

Thanks
Peter


 Distributed Date and Range Faceting
 ---

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.0

 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java, SOLR-1709.patch, 
 SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1709) Distributed Date Faceting

2011-04-17 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020802#comment-13020802
 ] 

Peter Sturge commented on SOLR-1709:


Updating ResponseBuilder rather than FacetInfo really came from tracing the 
references through the hierarchy - so, I don't think anything is missed by 
moving this to FacetInfo props, and should provide better encapsulation.
Deprecating data faceting in favour of generic range faceting should be fine, 
as long as there exists a clear path to easily move from 'the way we were' with 
date facets, to 'the way it will be' (range faceting). It would be a shame to 
break clients that rely on the existing date facet parameters/syntax, so I 
guess if they're mapped to range (I think some of this is in 3.x already?), 
that would be good.

Thanks


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.0

 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java, SOLR-1709.patch, 
 SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-04-07 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016770#comment-13016770
 ] 

Peter Sturge commented on SOLR-2438:


As I mentioned above, the approach is a little bit different from SOLR-219, and 
its scope is [perhaps] more targeted at case-insensitive wildcards only.

It's also a completely self-contained patch. I've found that when a JIRA issue 
contains lots of (1) 'non-evolutionary' patches, it becomes difficult to know 
which patch is which.
I agree that a new issue means commenters of 219 would need to look at this 
issue. I've added a link on SOLR-219 to relate it to this issue so it's easier 
to track.
Hope this helps clarify.

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)

Case Insensitive Search for Wildcard Queries


 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge


This patch adds support to allow case-insensitive queries on wildcard searches 
for configured TextField field types.

This patch extends the excellent work done Yonik and Michael in SOLR-219.
The approach here is different enough (imho) to warrant a separate JIRA issue.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2438:
---

Attachment: SOLR-2438.patch

Attached patch file

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2011-03-23 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010268#comment-13010268
 ] 

Peter Sturge commented on SOLR-2438:


If you're like me, you may have often wondered why MyTerm, myterm, myter* and 
MyTer* can return different, and sometimes empty results.
This patch addresses this for wildcard queries by adding an attribute to 
relevant solr.TextField entries in schema.xml.
The new attribute is called:  {{ignoreCaseForWildcards}}

Example entry in schema.xml:
{code:title=schema.xml [excerpt]|borderStyle=solid}
fieldType name=text_lcws class=solr.TextField positionIncrementGap=100 
ignoreCaseForWildcards=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
  /analyzer
/fieldType
{code}

It's worth noting that this will lower-case text for ALL terms that match the 
field type - including synonyms and stemmers.

For backward compatibility, the default behaviour is as before - i.e. a case 
sensitive wildcard search ({{ignoreCaseForWildcards=false}}).

The patch was created against the lucene_solr_3_1 branch. I've not applied it 
yet on trunk.

[caveat emptor] I freely admit I'm no schema expert, so commiters and community 
members may see use cases where this approach could pose problems. I'm all for 
feedback to enhance the functionality...

The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches 
in Solr - in line with the 'it just works' Solr philosophy.

Enjoy!


 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
 Attachments: SOLR-2438.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2026) Need infrastructure support in Solr for requests that perform multiple sequential queries

2011-03-04 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002701#comment-13002701
 ] 

Peter Sturge commented on SOLR-2026:


Hi Karl,

This patch is a really good idea - many thanks for coming up with this!
I've tried applying this on trunk, but I get a few compile errors from the 
patch, and I'm not quite sure how to use it in a query.

The compile errors have to do with:
SearchHandler.java (~line 267):
ResponseBuilder rb = new ResponseBuilder();
ResponseBuilder doesn't have a no-arg ctor

ResponseBuilder.java (~line 141): (copyFrom())
debug = rb.debug;
There is no 'debug' parameter.

I've fixed these up locally, but as I've only just looked at this, I thought 
I'd run it by you before patching it up.
There's also an NPE thrown if debugQuery=true (@DebugComponent.java:56)

I haven't been able to build a query that seems to work..
Do you have any example query urls you use for testing?
   http://127.0.0.1:9000/solr/select?qt=multiqueryblahblah etc...

Many thanks!
Peter



 Need infrastructure support in Solr for requests that perform multiple 
 sequential queries
 -

 Key: SOLR-2026
 URL: https://issues.apache.org/jira/browse/SOLR-2026
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2026.patch, SOLR-2026.patch


 Several known cases exist where multiple index searches need to be performed 
 in order to arrive at the final result.  Typically, these have the constraint 
 that the results from one search query are required in order to form a 
 subsequent search query.  While it is possible to write a custom 
 QueryComponent or search handler to perform this task, an extension to the 
 SearchHandler base class would readily permit such query sequences to be 
 configured using solrconfig.xml.
 I will be therefore writing and attaching a patch tomorrow morning which 
 supports this extended functionality in a backwards-compatible manner.  The 
 tricky part, which is figuring out how to funnel the output of the previous 
 search result into the next query, can be readily achieved by use of the 
 SolrRequestObject.getContext() functionality.  The stipulation will therefore 
 be that the SolrRequestObject's lifetime will be that of the entire request, 
 which makes complete sense.  (The SolrResponseObject's lifetime will, on the 
 other hand, be limited to a single query, and the last response so formed 
 will be what gets actually returned by SearchHandler.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2011-02-15 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994716#comment-12994716
 ] 

Peter Sturge commented on SOLR-1709:


Hi David,

Thank you thank you thank you for working on this and providing tests - your 
efforts are very much appreciated!

For deprecation of facet.date, I suspect it probably shouldn't be deprecated 
until a fully-fledged replacement is ready, ported and committed, but if 
SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729), 
that's great.

Many thanks,
Peter


 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: FacetComponent.java, FacetComponent.java, 
 ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, 
 solr-1.4.0-solr-1709.patch


 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2245) MailEntityProcessor Update

2011-02-15 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847
]

Peter Sturge commented on SOLR-2245:

I've been meaning to get back to this, as I have made some local updates to
this that help performance.
Could you give me some feedback on these 2 questions please - it would be
really useful:
* Is there a committer's standard or similar spec that describes what tests
should be included, and if so, could you point me to it please?
I can then make sure I include appropriate tests
* Is there a time-frame for committing for this or next release?
I have a product release of my own coming fup or beg-March, so if I know
the time-scales, I can plan accordingly.

Thanks!
Peter

MailEntityProcessor Update
--

Key: SOLR-2245
URL: https://issues.apache.org/jira/browse/SOLR-2245
Project: Solr
Issue Type: Improvement
Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
Fix For: 1.4.2

Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip

This patch addresses a number of issues in the MailEntityProcessor
contrib-extras module.
The changes are outlined here:
* Added an 'includeContent' entity attribute to allow specifying content to
be included independently of processing attachments
e.g. entity includeContent=true processAttachments=false . . . /
would include message content, but not attachment content
* Added a synonym called 'processAttachments', which is synonymous to the
mis-spelled (and singular) 'processAttachement' property. This property
functions the same as processAttachement. Default= 'true' - if either is
false, then attachments are not processed. Note that only one of these should
really be specified in a given entity tag.
* Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is
unread, not deleted etc.), there is still a property value stored in the
'flags' field (the value is the string none)
Note: there is a potential backward compat issue with FLAGS.NONE for clients
that expect the absence of the 'flags' field to mean 'Not read'. I'm
calculating this would be extremely rare, and is inadviasable in any case as
user flags can be arbitrarily set, so fixing it up now will ensure future
client access will be consistent.
* The folder name of an email is now included as a field called 'folder'
(e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing
processing
* The addPartToDocument() method that processes attachments is significantly
re-written, as there looked to be no real way the existing code would ever
actually process attachment content and add it to the row data
Tested on the 3.x trunk with a number of popular imap servers.

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2011-01-28 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988013#action_12988013
]

Peter Sturge commented on SOLR-1709:

Hi David,

Yes, at the time my patching wasn't working (Windows env for my sins), so I
thought it would be better to make the source available than not. Thomas H.
kindly did turn it into a udiff patch last year.
I agree it would be good to include this functionality (along with SOLR-1729 +
Yonik's recent 'NOW' changes).
I have a product release coming up in a few weeks, so I won't have many cycles
before then. Of course it would be great if you have any time to invest making
this more 'commitable'.
I admit because I'm not a Solr commiter, I'm not as familiar with the
requirements. If you can let me know the 'missing elements', I'm happy to look
at contributing what's needed, or if you prefer, divide up the tasks that need
doing.

Many thanks,
Peter

Distributed Date Faceting
-

Key: SOLR-1709
URL: https://issues.apache.org/jira/browse/SOLR-1709
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: FacetComponent.java, FacetComponent.java,
ResponseBuilder.java, solr-1.4.0-solr-1709.patch

This patch is for adding support for date facets when using distributed
searches.
Date faceting across multiple machines exposes some time-based issues that
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch
(i.e. merged date facets are at a time-of-day, not necessarily at a universal
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the
basis for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to
the first by 1 'gap', these 'earlier' or 'later' facets will not be merged
in.
There are several reasons for this:
* Performance: It's faster to check facet_date lists against a single map's
data, rather than against each other, particularly if there are many shards
* If 'earlier' and/or 'later' facet_dates are added in, this will make the
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3
or more hours of data)
This could be dealt with if timezone and skew information was added, and
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and
'now' parameters to the 'facet_dates' map. This would tell requesters what
time and TZ the remote server thinks it is, and so multiple shards' time data
can be normalized.
The patch affects 2 files in the Solr core:
org.apache.solr.handler.component.FacetComponent.java
org.apache.solr.handler.component.ResponseBuilder.java
The main changes are in FacetComponent - ResponseBuilder is just to hold the
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but
really, if facet.date parameters are specified, it is assumed they are
desired.
Comments suggestions welcome.
As a favour to ask, if anyone could take my 2 source files and create a PATCH
file from it, it would be greatly appreciated, as I'm having a bit of trouble
with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-13 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970785#action_12970785
]

Peter Sturge commented on SOLR-1729:

Many thanks for finishing off this patch. Sorry I didn't get time to fix this,
been swamped with so many projects at the moment.
That's great you got the thread local NOW included as well. Thanks!

Date Facet now override time parameter
--

Key: SOLR-1729
URL: https://issues.apache.org/jira/browse/SOLR-1729
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Fix For: 4.0

Attachments: FacetParams.java, SimpleFacets.java,
solr-1.4.0-solr-1729.patch, SOLR-1729.patch, SOLR-1729.patch,
SOLR-1729.patch, UnInvertedField.java

This PATCH introduces a new query parameter that tells a (typically, but not
necessarily) remote server what time to use as 'NOW' when calculating date
facets for a query (and, for the moment, date facets *only*) - overriding the
default behaviour of using the local server's current time.
This gets 'round a problem whereby an explicit time range is specified in a
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the
given time range (in fact, any explicit time range).
Because DateMathParser performs all its calculations from 'NOW', remote
callers have to work out how long ago 'then0' and 'then1' are from 'now', and
use the relative-to-now values in the facet.date.xxx parameters. If a remote
server has a different opinion of NOW compared to the caller, the results
will be skewed (e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting
(see SOLR-1709), where multiple shards may all be running with different
times, and the faceting needs to be aligned.
The new parameter is called 'facet.date.now', and takes as a parameter a
(stringified) long that is the number of milliseconds from the epoch (1 Jan
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call.
This was chosen over a formatted date to delineate it from a 'searchable'
time and to avoid superfluous date parsing. This makes the value generally a
programatically-set value, but as that is where the use-case is for this type
of parameter, this should be ok.
NOTE: This parameter affects date facet timing only. If there are other areas
of a query that rely on 'NOW', these will not interpret this value. This is a
broader issue about setting a 'query-global' NOW that all parts of query
analysis can share.
Source files affected:
FacetParams.java (holds the new constant FACET_DATE_NOW)
SimpleFacets.java getFacetDateCounts() NOW parameter modified
This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as
it's a general change for date faceting, it was deemed deserving of its own
patch. I will be updating SOLR-1709 in due course to include the use of this
new parameter, after some rfc acceptance.
A possible enhancement to this is to detect facet.date fields, look for and
match these fields in queries (if they exist), and potentially determine
automatically the required time skew, if any. There are a whole host of
reasons why this could be problematic to implement, so an explicit
facet.date.now parameter is the safest route.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-03 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966494#action_12966494
]

Peter Sturge commented on SOLR-1729:

Hi Peter,

Not sure why it would work, then not...

Both these patches were submitted just before all the version name
changes (which I'm still getting to grips with).

At the time, I think 1.4.1 was the latest release train. For 3.x
recently we've done some manual merging due to some other changes
(forwarding http credentials to remote shards).

I'll have a look at building a separate 'branch3x' patch version, as
there may have been some separate back porting changes in the affected
files that's breaking the current patch.
Are you using the latest release, or the latest trunk version?

Thanks,
Peter

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-03 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966533#action_12966533
]

Peter Sturge commented on SOLR-1729:

Hi Peter,

So, the patches are clean (for 1.4.1), but the tests are failing for
1.4.1? Or is the failure in 3.x? Sorry, but I'm a bit confused which
bit isn't working now.

Thanks,
Peter

On Fri, Dec 3, 2010 at 1:05 PM, Peter Karich (JIRA) j...@apache.org wrote:

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-12-02 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966031#action_12966031
]

Peter Sturge commented on SOLR-1709:

It's a good idea to apply SOLR-1729 in any case, as it caters for any
time skew in documents and between machines.
Without it, result counts 'on the edges' could be incorrect. 1729 is
quite 'passive', in that if you don't specify a 'FACET_DATE_NOW'
parameter int he request, it runs as without the patch.

In terms of readiness, we've been using these patches in production
environments for months now. (We use it with the 3.x trunk branch)

Yonik, et al. were talking about a more general update with regards
how NOW is configured on a machine (since it is used in places other
than just date facets), and this is the
'extra' work to be done, but things work fine as they are for disti
date faceting.

Thanks,
Peter

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-12-02 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966238#action_12966238
]

Peter Sturge commented on SOLR-1729:

So is 1709 ok, but 1729 isn't?

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2245) MailEntityProcessor Update

2010-11-25 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2245:
---

Attachment: SOLR-2245.zip

This patch update does a more proper delta-import implementation, rather than 
the kludge used in the previous version.
MailEntityProcessor with this patch is useful for importing emails 'en-masse' 
the first time 'round, then only new mails after that.

Behaviour:
* If you send a full-import command, then the 'fetchMailsSince' property 
specified in data-config.xml will always be used.
* If you send a delta-import command, the 'fetchMailsSince' property specified 
in data-config.xml is used for the first call only. 
  Subsequent delta-import commands will use the time since the last index 
update.

There are significant code changes in this version. So much so, that I've 
included the complete MailEntityProcessor source as well as a PATCH file.

This version doesn't use the persistent last_index_time functionality of 
dataimport.properties (i.e. it's delta only for the life of the solr process). 
If I get some free cycles, I'll try to put this in.


 MailEntityProcessor Update
 --

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip


 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
  e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these should 
 really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there is a potential backward compat issue with FLAGS.NONE for clients 
 that expect the absence of the 'flags' field to mean 'Not read'. I'm 
 calculating this would be extremely rare, and is inadviasable in any case as 
 user flags can be arbitrarily set, so fixing it up now will ensure future 
 client access will be consistent.
 * The folder name of an email is now included as a field called 'folder' 
 (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
 processing
 * The addPartToDocument() method that processes attachments is significantly 
 re-written, as there looked to be no real way the existing code would ever 
 actually process attachment content and add it to the row data
 Tested on the 3.x trunk with a number of popular imap servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2245) MailEntityProcessor Update

2010-11-25 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935898#action_12935898
]

Peter Sturge commented on SOLR-2245:

Forgo to mention...
Because this now supports delta-import commands, the 'deltaFetch' attribute is
no longer needed and is not used.

MailEntityProcessor Update
--

Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2245) MailEntityProcessor Update

2010-11-23 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-2245:
---

Attachment: SOLR-2245.patch

MailEntityProcessor Update
--

Attachments: SOLR-2245.patch, SOLR-2245.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2245) MailEntityProcessor Update

2010-11-23 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934800#action_12934800
 ] 

Peter Sturge commented on SOLR-2245:


This latest version of the updated MailEntityProcessor adds a few new features:

1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into 
this patch
2. Added a hacky version of delta mail retrieval for scheduled import runs:
   The new property is called 'deltaFetch'. If 'true', the first time the 
import is run, it will read the 'fetchMailsSince' property and import as normal
   On subsequent runs (within the same process session), the import will 
only fetch mail since the last run.
   Because it uses a runtime system property to hold the last_index_time, 
and there is currently no persistence, if/when the server is restarted, the 
last_index_time is not saved and the original fetchMailsSince value is used.
   I couldn't find exposed APIs for the dataimport.properties file (all the 
methods are private or pkg protected), persistence is not included in this 
patch version
3. Added support for including shared folders in the import
4. Added support for including personal folders (other folders) in the import

A typical {{monospaced}}entity{{monospaced}} element in data-config.xml might 
look something like this:

{{monospaced}}
entity name=email
  user=u...@mydomain.com 
  password=userpwd 
  host=imap.mydomain.com 
  fetchMailsSince=2010-08-01 00:00:00 
  deltaFetch=true
  include=
  exclude=
  recurse=false
  folders=INBOX,Inbox,inbox
  includeContent=true
  processAttachments=true
  includeOtherUserFolders=true
  includeSharedFolders=true
  batchSize=100
  processor=MailEntityProcessor
  protocol=imap/
{{monospaced}}


 MailEntityProcessor Update
 --

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2245.patch


 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
  e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these should 
 really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there is a potential backward compat issue with FLAGS.NONE for clients 
 that expect the absence of the 'flags' field to mean 'Not read'. I'm 
 calculating this would be extremely rare, and is inadviasable in any case as 
 user flags can be arbitrarily set, so fixing it up now will ensure future 
 client access will be consistent.
 * The folder name of an email is now included as a field called 'folder' 
 (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing 
 processing
 * The addPartToDocument() method that processes attachments is significantly 
 re-written, as there looked to be no real way the existing code would ever 
 actually process attachment content and add it to the row data
 Tested on the 3.x trunk with a number of popular imap servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2245) MailEntityProcessor Update

2010-11-23 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934800#action_12934800
 ] 

Peter Sturge edited comment on SOLR-2245 at 11/23/10 5:58 AM:
--

This latest version of the updated MailEntityProcessor adds a few new features:

1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into 
this patch
2. Added a hacky version of delta mail retrieval for scheduled import runs:
   The new property is called 'deltaFetch'. If 'true', the first time the 
import is run, it will read the 'fetchMailsSince' property and import as normal
   On subsequent runs (within the same process session), the import will 
only fetch mail since the last run.
   Because it uses a runtime system property to hold the last_index_time, 
and there is currently no persistence, if/when the server is restarted, the 
last_index_time is not saved and the original fetchMailsSince value is used.
   I couldn't find exposed APIs for the dataimport.properties file (all the 
methods are private or pkg protected), persistence is not included in this 
patch version
3. Added support for including shared folders in the import
4. Added support for including personal folders (other folders) in the import

A typical entity element in data-config.xml might look something like this:
{code:xml}
entity name=email
  user=u...@mydomain.com 
  password=userpwd 
  host=imap.mydomain.com 
  fetchMailsSince=2010-08-01 00:00:00 
  deltaFetch=true
  include=
  exclude=
  recurse=false
  folders=INBOX,Inbox,inbox
  includeContent=true
  processAttachments=true
  includeOtherUserFolders=true
  includeSharedFolders=true
  batchSize=100
  processor=MailEntityProcessor
  protocol=imap/
{code} 


  was (Author: midiman):
This latest version of the updated MailEntityProcessor adds a few new 
features:

1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into 
this patch
2. Added a hacky version of delta mail retrieval for scheduled import runs:
   The new property is called 'deltaFetch'. If 'true', the first time the 
import is run, it will read the 'fetchMailsSince' property and import as normal
   On subsequent runs (within the same process session), the import will 
only fetch mail since the last run.
   Because it uses a runtime system property to hold the last_index_time, 
and there is currently no persistence, if/when the server is restarted, the 
last_index_time is not saved and the original fetchMailsSince value is used.
   I couldn't find exposed APIs for the dataimport.properties file (all the 
methods are private or pkg protected), persistence is not included in this 
patch version
3. Added support for including shared folders in the import
4. Added support for including personal folders (other folders) in the import

A typical {{monospaced}}entity{{monospaced}} element in data-config.xml might 
look something like this:

{{monospaced}}
entity name=email
  user=u...@mydomain.com 
  password=userpwd 
  host=imap.mydomain.com 
  fetchMailsSince=2010-08-01 00:00:00 
  deltaFetch=true
  include=
  exclude=
  recurse=false
  folders=INBOX,Inbox,inbox
  includeContent=true
  processAttachments=true
  includeOtherUserFolders=true
  includeSharedFolders=true
  batchSize=100
  processor=MailEntityProcessor
  protocol=imap/
{{monospaced}}

  
 MailEntityProcessor Update
 --

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2245.patch, SOLR-2245.patch


 This patch addresses a number of issues in the MailEntityProcessor 
 contrib-extras module.
 The changes are outlined here:
 * Added an 'includeContent' entity attribute to allow specifying content to 
 be included independently of processing attachments
  e.g. entity includeContent=true processAttachments=false . . . / 
 would include message content, but not attachment content
 * Added a synonym called 'processAttachments', which is synonymous to the 
 mis-spelled (and singular) 'processAttachement' property. This property 
 functions the same as processAttachement. Default= 'true' - if either is 
 false, then attachments are not processed. Note that only one of these should 
 really be specified in a given entity tag.
 * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
 unread, not deleted etc.), there is still a property value stored in the 
 'flags' field (the value is the string none)
 Note: there

[jira] Updated: (SOLR-2245) MailEntityProcessor Update

2010-11-18 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-2245:
---

Attachment: SOLR-2245.patch

MailEntityProcessor Update
--

Attachments: SOLR-2245.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2245) MailEntityProcessor Update

2010-11-18 Thread Peter Sturge (JIRA)

MailEntityProcessor Update
--

 Key: SOLR-2245
 URL: https://issues.apache.org/jira/browse/SOLR-2245
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 1.4
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2
 Attachments: SOLR-2245.patch

This patch addresses a number of issues in the MailEntityProcessor 
contrib-extras module.

The changes are outlined here:
* Added an 'includeContent' entity attribute to allow specifying content to be 
included independently of processing attachments
 e.g. entity includeContent=true processAttachments=false . . . / 
would include message content, but not attachment content
* Added a synonym called 'processAttachments', which is synonymous to the 
mis-spelled (and singular) 'processAttachement' property. This property 
functions the same as processAttachement. Default= 'true' - if either is false, 
then attachments are not processed. Note that only one of these should really 
be specified in a given entity tag.
* Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is 
unread, not deleted etc.), there is still a property value stored in the 
'flags' field (the value is the string none)
Note: there is a potential backward compat issue with FLAGS.NONE for clients 
that expect the absence of the 'flags' field to mean 'Not read'. I'm 
calculating this would be extremely rare, and is inadviasable in any case as 
user flags can be arbitrarily set, so fixing it up now will ensure future 
client access will be consistent.
* The folder name of an email is now included as a field called 'folder' (e.g. 
folder=INBOX.Sent). This is quite handy in search/post-indexing processing
* The addPartToDocument() method that processes attachments is significantly 
re-written, as there looked to be no real way the existing code would ever 
actually process attachment content and add it to the row data

Tested on the 3.x trunk with a number of popular imap servers.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-11-08 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929786#action_12929786
]

Peter Sturge commented on SOLR-1709:

Hi Peter,

Thanks for your message. There's of course the issue of 'now' as described in
some of the above comments. This is perhaps a little ancillary to this issue,
but not totally irrelevant.

The issue of time zone/skew on distributed shards is currently handled by
SOLR-1729 by passing a 'facet.date.now=epochtime' parameter in the search
query. This is then used by the particapating shards to use as 'now'. Of
course, there are a number of ways to skin that one, but this is a
straightforward solution that is backward compatible and still easy to
implement in client code.

Note that the facet.date.now change is not part of this patch - see SOLR-1729
for a separate patch for this parameter. (kept separate because it's, strictly
speaking, a separate issue generally for distributed search)

It's not that eariler/later aren't supported - the date facet 'edges' are fine,
it's just the patch will 'quantize the ends' of the start/end date facets if
the time is skewed from the calling server. This is where SOLR-1729 comes into
play, so that this doesn't happen.

As this is a pre-3x/4x branch patch, the testing is a bit limited on the
latest trunk(s). Having said that, I have this (and SOLR-1729) building/running
fine on my svn 3x branch release copy.
Any other questions, or info you need, please do let me know.

Thanks!
Peter

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2100) Fix for saving commit points during java-based backups

2010-09-06 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906497#action_12906497
]

Peter Sturge commented on SOLR-2100:

I'm not really familiar with the reservation code for replication, but will it
still save the commit point for replication even if another commit (or many
commits) come along during replication?

By default, it would probably be rare, as the data to be replicated is only a
delta and would likely not take too long to complete.
This was the problem with backups - it's a full file copy of everything, which
typically takes minutes on large indexes - longer if writing to a remote
volume.

As the replication timing is configurable, you could have a scenario where the
amount of data to be replicated is very significant, and is generally remote,
so could take some time to complete. Would the reserveration mechanism still
hold the commit point if 1,2, 5 or 10 commits came along during the replication
process?

ReplicationHandler.postCommit() calls saveCommitPoint()/releaseCommitPoint(),
so as things stand this would preserve the commit point even if a separate
reserveration didn't, and there's no price to pay for holding the indexVersion
in this way.

Not sure what the standard policy is for marking issues Resolved/Closed, so
I'll leave this up to you. But do let me know if you'd like me to perform any
additional testing.

Fix for saving commit points during java-based backups
--

Key: SOLR-2100
URL: https://issues.apache.org/jira/browse/SOLR-2100
Project: Solr
Issue Type: Bug
Components: replication (java)
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
Fix For: 1.4.2

Attachments: SOLR-2100.PATCH

Original Estimate: 0h
Remaining Estimate: 0h

This patch fixes the saving of commit points during backup operations.
This fixes the perviously commited (for 1.4) SOLR-1475 patch.
1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the
'savedCommits' map.
2. Also, the testing of the presence of a commit point uses the contains()
method instead of containsKey().
The result of this means that backups for anything but toy indexes fail,
because the commit points are deleted (after 10s) before the full backup is
completed.
This patch addresses these 2 issues.
Tested with 1.4.1 release trunk, but should also work fine with 1.4.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2100) Fix for saving commit points during java-based backups

2010-09-03 Thread Peter Sturge (JIRA)

Fix for saving commit points during java-based backups
--

 Key: SOLR-2100
 URL: https://issues.apache.org/jira/browse/SOLR-2100
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4.1, 1.4
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2


This patch fixes the saving of commit points during backup operations.

This fixes the perviously commited (for 1.4) SOLR-1475 patch.

1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the 
'savedCommits' map.
2. Also, the testing of the presence of a commit point uses the contains() 
method instead of containsKey().

The result of this means that backups for anything but toy indexes fail, 
because the commit points are deleted (after 10s) before the full backup is 
completed.

This patch addresses these 2 issues.

Tested with 1.4.1 release trunk, but should also work fine with 1.4.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2100) Fix for saving commit points during java-based backups

2010-09-03 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-2100:
---

Attachment: SOLR-2100.PATCH

 Fix for saving commit points during java-based backups
 --

 Key: SOLR-2100
 URL: https://issues.apache.org/jira/browse/SOLR-2100
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4, 1.4.1
Reporter: Peter Sturge
Priority: Minor
 Fix For: 1.4.2

 Attachments: SOLR-2100.PATCH

   Original Estimate: 0h
  Remaining Estimate: 0h

 This patch fixes the saving of commit points during backup operations.
 This fixes the perviously commited (for 1.4) SOLR-1475 patch.
 1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the 
 'savedCommits' map.
 2. Also, the testing of the presence of a commit point uses the contains() 
 method instead of containsKey().
 The result of this means that backups for anything but toy indexes fail, 
 because the commit points are deleted (after 10s) before the full backup is 
 completed.
 This patch addresses these 2 issues.
 Tested with 1.4.1 release trunk, but should also work fine with 1.4.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr

2010-05-12 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866674#action_12866674
 ] 

Peter Sturge commented on SOLR-1163:


Hi Uri,

Really like what you've done here. +1 +vote!

I've had a go on your demo site and that looks cool.

When I download and try to connect to a core (I've tried my own core, and the 
Solr 'example'), I always get:

   'Could not load solr core ('corename'): The JSON request failed or timed out'

If I turn on Firebug, the only msg I get is this:

reference to undefined property window[c + x$]
[Break on this error] function DKd(h,d,e,b,f){var 
c=gM+CKd++;i...)}},5000);document.body.appendChild(g)}\n

There doesn't seem to be any log/debug of what the problem might be. Are there 
any logging options that can be enabled?

Many thanks,
Peter


 Solr Explorer - A generic GWT client for Solr
 -

 Key: SOLR-1163
 URL: https://issues.apache.org/jira/browse/SOLR-1163
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 1.3
Reporter: Uri Boness
 Attachments: graphics.zip, SOLR-1163.zip, SOLR-1163.zip, 
 solr-explorer.patch, solr-explorer.patch


 The attached patch is a GWT generic client for solr. It is currently 
 standalone, meaning that once built, one can open the generated HTML file in 
 a browser and communicate with any deployed solr. It is configured with it's 
 own configuration file, where one can configure the solr instance/core to 
 connect to. Since it's currently standalone and completely client side based, 
 it uses JSON with padding (cross-side scripting) to connect to remote solr 
 servers. Some of the supported features:
 - Simple query search
 - Sorting - one can dynamically define new sort criterias
 - Search results are rendered very much like Google search results are 
 rendered. It is also possible to view all stored field values for every hit. 
 - Custom hit rendering - It is possible to show thumbnails (images) per hit 
 and also customize a view for a hit based on html templates
 - Faceting - one can dynamically define field and query facets via the UI. it 
 is also possible to pre-configure these facets in the configuration file.
 - Highlighting - you can dynamically configure highlighting. it can also be 
 pre-configured in the configuration file
 - Spellchecking - you can dynamically configure spell checking. Can also be 
 done in the configuration file. Supports collation. It is also possible to 
 send build and reload commands.
 - Data import handler - if used, it is possible to send a full-import and 
 status command (delta-import is not implemented yet, but it's easy to add)
 - Console - For development time, there's a small console which can help to 
 better understand what's going on behind the scenes. One can use it to:
 ** view the client logs
 ** browse the solr scheme
 ** View a break down of the current search context
 ** View a break down of the query URL that is sent to solr
 ** View the raw JSON response returning from Solr
 This client is actually a platform that can be greatly extended for more 
 things. The goal is to have a client where the explorer part is just one view 
 of it. Other future views include: Monitoring, Administration, Query Builder, 
 DataImportHandler configuration, and more...
 To get a better view of what's currently possible. We've set up a public 
 version of this client at: http://search.jteam.nl/explorer. This client is 
 configured with one solr instance where crawled YouTube movies where indexed. 
 You can also check out a screencast for this deployed client: 
 http://search.jteam.nl/help
 The patch created a new folder in the contrib. directory. Since the patch 
 doesn't contain binaries, an additional zip file is provides that needs to be 
 extract to add all the required graphics. This module is maven2 based and is 
 configured in such a way that all GWT related tools/libraries are 
 automatically downloaded when the modules is compiled. One of the artifacts 
 of the build is a war file which can be deployed in any servlet container.
 NOTE: this client works best on WebKit based browsers (for performance 
 reason) but also works on firefox and ie 7+. That said, it should be taken 
 into account that it is still under development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time

2010-04-29 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862150#action_12862150
 ] 

Peter Sturge commented on SOLR-1895:


It's worth bearing in mind that more than just a username is required in the 
input in order to ensure secure access. Otherwise, security is compromised 
simply by guessing (or already knowing) the username of someone with higher 
privileges.

For example:
User Dishwasher has low privileges
User Admin has high privileges

When Dishwasher logs in, all he/she has to do is put Admin's name in the input 
argument, and has now assumed Admin's rights.
User Admin doesn't need to be logged in for this to happen.


 LCF SearchComponent plugin for enforcing LCF security at search time
 

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
 Fix For: 1.5

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time

2010-04-29 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862191#action_12862191
 ] 

Peter Sturge commented on SOLR-1895:



The presumption is that the Solr webapp is not the final user interface, and is 
indeed not accessible to the user at all.


Given that search requests are http-based, how would this be done, in say, an 
intranet environment? I agree that a user interface wouldn't expose any means 
to change the http parameters, but if http is available to the UI, it'll also 
be available to a web browser's search bar at the same station (unless some 
tunnelling, proxy or similar is used).

Totally agree on the server lock down - hopefully, everyone does this already 
as a matter of course!

There are a couple of ways to address the impersonator problem. Probably the 
most robust way is to use SSL authentication from client to container, then 
have the Solr app integrate with the container (like we talked about for the 
authentication piece) and use its session certificate to ensure that any 
requests coming from the remote station match those of the originally 
authenticated user.

A somewhat easier method is to use the hash and session id mechanism used in 
SOLR-1872. This provides pgp protection for stopping impersonation (even 
gaining any access from a browser), but wouldn't be suitable outside of an 
intranet environment (for exposed internet access, it would really need to be 
SSL - for sensitive data, though, you wouldn't expect it to be exposed across a 
DMZ anyway).



 LCF SearchComponent plugin for enforcing LCF security at search time
 

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
 Fix For: 1.5

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time

2010-04-29 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862206#action_12862206
 ] 

Peter Sturge commented on SOLR-1895:


{quote}
The usual way is to configure the application server running solr to either use 
certificate authentication (which requires the connecting client to be able to 
identify themselves via a secure cert)
{quote}

Yes, cert authentication is a good way to go, but once you've got one (because 
you have at least some privileges), you can by bypass the lower-layer doc 
security because you've already done the cert auth.

{quote}
configure the application server to not accept connections from (say) anything 
other than the localhost adapter.
{quote}

I don't understand how localhost-only would give you any access off the box.
I guess what I meant was, your client is wherever your client is, and this 
client could (and probably would) have a web browser installed. If a bona-fide 
user was an IT Operator, it would be easy for him/her to 'pretend' to be an HR 
Manager, unless some kind of post-login identity check prevents it.

One way 'round this is to encrypt part or all of the http parameters 
(essentially, this is what the hash mechanism does in SOLR-1872).


 LCF SearchComponent plugin for enforcing LCF security at search time
 

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
 Fix For: 1.5

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time

2010-04-29 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862225#action_12862225
 ] 

Peter Sturge commented on SOLR-1895:


That makes total sense to keep a proxy app separate. 

Why wouldn't users interact with Solr directly? There's a lot of client-side 
stuff available to do just that. I wouldn't have thought there are too many 
implementations out there that completely block Solr http read access, because 
this would break replication, distributed searching, spell checkers, custom 
handlers etc. Generally, web proxies and firewalls etc. do a good job on this 
side of things, which is one of the reasons doc-level security is such a tricky 
business - you have to let traffic through and restrict it in solr.war that you 
would normally not let anywhere near Solr.

You're right that /update, /admin etc. need to be 'locked-down', but this is 
quite strightforward, so as not to allow users access to write or change 
anything.



 LCF SearchComponent plugin for enforcing LCF security at search time
 

 Key: SOLR-1895
 URL: https://issues.apache.org/jira/browse/SOLR-1895
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Karl Wright
 Fix For: 1.5

 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, 
 LCFSecurityFilter.java


 I've written an LCF SearchComponent which filters returned results based on 
 access tokens provided by LCF's authority service.  The component requires 
 you to configure the appropriate authority service URL base, e.g.:
   !-- LCF document security enforcement component --
   searchComponent name=lcfSecurity class=LCFSecurityFilter
 str 
 name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str
   /searchComponent
 Also required are the following schema.xml additions:
!-- Security fields --
field name=allow_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_document type=string indexed=true 
 stored=false multiValued=true/
field name=allow_token_share type=string indexed=true 
 stored=false multiValued=true/
field name=deny_token_share type=string indexed=true stored=false 
 multiValued=true/
 Finally, to tie it into the standard request handler, it seems to need to run 
 last:
   requestHandler name=standard class=solr.SearchHandler default=true
 arr name=last-components
   strlcfSecurity/str
 /arr
 ...
 I have not set a package for this code.  Nor have I been able to get it 
 reviewed by someone as conversant with Solr as I would prefer.  It is my 
 hope, however, that this module will become part of the standard Solr 1.5 
 suite of search components, since that would tie it in with LCF nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-12 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

Updates a typo or two plus some misc tweaks.

{code}
searchComponent name=SolrACLSecurity
class=org.apache.solr.handler.security.SolrACLSecurity
!-- SolrACLSecurityKey can be any alphanumeric string, the more complex
the better.
For production environments, don't use the default value - create a
new value.
This property needs to be present in all firstSearcher and
newSearcher warming queries, otherwise
those requests will be blocked.
--
str
name=SolrACLSecurityKeyzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
str name=config-fileacl.xml/str
!-- Auditing: Set audit to true to log all searches, including failed
access attempts --
bool name=audittrue/bool
int name=maxFileSizeInMB10/int
int name=maxFileCount1/int
str name=auditFileaudit.log/str
!--
User lockout
'lockoutThreshold' is the number of consecutive incorrect logins
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5
lockoutTime=15
--
str name=lockoutThreshold5/str
str name=lockoutTime15/str
/searchComponent
{code}

Thanks,
Peter

Document-level Access Control in Solr
-

Key: SOLR-1872
URL: https://issues.apache.org/jira/browse/SOLR-1872
Project: Solr
Issue Type: New Feature
Components: SearchComponents - other
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: SolrACLSecurity.java, SolrACLSecurity.java,
SolrACLSecurity.rar

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.java

This update adds in optional auditing of searches by users and failed access
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from
solrconfg.xml:

{{
searchComponent name=SolrACLSecurity
class=org.apache.solr.handler.security.SolrACLSecurity
!-- SolrACLSecurityKey can be any alphanumeric string, the more complex
the better.
For production environments, don't use the default value - create a
new value.
This property needs to be present in all firstSearcher and
newSearcher warming queries, otherwise
those requests will be blocked.
--
str
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
str name=config-fileacl.xml/str
!-- Auditing: Set audit to true to log all searches, including failed
access attempts --
bool name=audittrue/bool
int name=maxFileSizeInMB10/int
int name=maxFileCount1/int
str name=auditFileaudit.log/str
!--
User lockout
'lockoutThreshold' is the number of consecutive incorrect logins
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5
lockoutTIme=15
--
str name=lockoutThreshold5/str
str name=lockoutTime15/str
/searchComponent
}}

Document-level Access Control in Solr
-

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:16 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent}}


  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent
}}

  
 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very good work

[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr

2010-04-11 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698
 ] 

Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:18 AM:
-

This update adds in optional auditing of searches by users and failed access 
attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{code}
  searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent
{code}



  was (Author: midiman):
This update adds in optional auditing of searches by users and failed 
access attempts, plus a few minor tweaks.

To configure auditing, here is a sample searchComponent section from 
solrconfg.xml:

{{searchComponent name=SolrACLSecurity 
class=org.apache.solr.handler.security.SolrACLSecurity
  !-- SolrACLSecurityKey can be any alphanumeric string, the more complex 
the better.
   For production environments, don't use the default value - create a 
new value.
   This property needs to be present in all firstSearcher and 
newSearcher warming queries, otherwise
   those requests will be blocked.
  --
  str 
name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str
  str name=config-fileacl.xml/str
  !-- Auditing: Set audit to true to log all searches, including failed 
access attempts --
  bool name=audittrue/bool
  int name=maxFileSizeInMB10/int
  int name=maxFileCount1/int
  str name=auditFileaudit.log/str
  !-- 
User lockout 
'lockoutThreshold' is the number of consecutive incorrect logins 
before locking out the account
'lockoutTime' is the number of minutes to lockout the account
If 'lockoutThreshold' is 0 or less, account lockout is disabled (no 
accounts are ever locked out)
If not specified, the default values are: lockThreshold=5 
lockoutTIme=15
  --
  str name=lockoutThreshold5/str
  str name=lockoutTime15/str
 /searchComponent}}

  
 Document-level Access Control in Solr
 -

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.java, SolrACLSecurity.rar


 This issue relates to providing document-level access control for Solr index 
 data.
 A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
 separate JIRA issue, rather than tack on to SOLR-1834, as the approach here 
 is somewhat different, and I didn't want to confuse things or step on Anders' 
 good work.
 There have been lots of discussions about document-level access in Solr using 
 LCF, custom comoponents and the like. Access Control is one of those subjects 
 that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
 agrees with the approaches taken here, it does, at the very least, highlight 
 some of the salient issues surrounding access control in Solr, and will 
 hopefully initiate a healthy discussion on the range of related requirements, 
 with the aim of finding the optimum balance of requirements.
 The approach taken here is document and schema agnostic - i.e. the access 
 control is independant of what is or will be in the index, and no schema 
 changes are required. This version doesn't include LDAP/AD integration, but 
 could be added relatively easily (see Ander's very

[jira] Created: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)

Document-level Access Control in Solr
-

 Key: SOLR-1872
 URL: https://issues.apache.org/jira/browse/SOLR-1872
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SolrACLSecurity.rar

This issue relates to providing document-level access control for Solr index 
data.

A related JIRA issue is: SOLR-1834. I thought it would be best if I created a 
separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is 
somewhat different, and I didn't want to confuse things or step on Anders' good 
work.

There have been lots of discussions about document-level access in Solr using 
LCF, custom comoponents and the like. Access Control is one of those subjects 
that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone 
agrees with the approaches taken here, it does, at the very least, highlight 
some of the salient issues surrounding access control in Solr, and will 
hopefully initiate a healthy discussion on the range of related requirements, 
with the aim of finding the optimum balance of requirements.

The approach taken here is document and schema agnostic - i.e. the access 
control is independant of what is or will be in the index, and no schema 
changes are required. This version doesn't include LDAP/AD integration, but 
could be added relatively easily (see Ander's very good work on this in 
SOLR-1834). Note that, at the moment, this version doesn't deal with /update, 
/replication etc., it's currently a /select thing at the moment (but it could 
be used for these).

This approach uses a SearchComponent subclass called SolrACLSecurity. Its 
configuration is read in from solrconfig.xml in the usual way, and the 
allow/deny configuration is split out into a config file called acl.xml.

acl.xml defines a number of users and groups (and 1 global for 'everyone'), and 
assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
When the SearchComponent is initialized, user objects are created and cached, 
including an 'allow' list and a 'deny' list.
When a request comes in, these lists are used to build filter queries ('allows' 
are OR'ed and 'denies' are NAND'ed), and then added to the query request.

Because the allow and deny elements are simply subsearch queries (e.g. 
{{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any 
stored data that can be queried, including already existing data.

Authentication
One of the sticky problems with access control is how to determine who's asking 
for data. There are many approaches, and to stay in the generic vein the 
current mechanism uses http parameters for this.
For an initial search, a client includes a {{username=somename}} parameter and 
a {{hash=pwdhash}} hash of its password. If the request sends the correct 
parameters, the search is granted and a uuid parameter is returned in the 
response header. This uuid can then be used in subsequent requests from the 
client. If the request is wrong, the SearchComponent fails and will increment 
the user's failed login count (if a valid user was specified). If this count 
exceeds the configured lockoutThreshold, no further requests are granted until 
the lockoutTime has elapsed.
This mechanism protects against some types of attacks (e.g. CLRF, dictionary 
etc.), but it really needs container HTTPS as well (as would most other auth 
implementations). Incorporating SSL certificates for authentication and making 
the authentication mechanism pluggable would be a nice improvement (i.e. 
separate authentication from access control).

Another issue is how internal searchers perform autowarming etc. The solution 
here is to use a local key called 'SolrACLSecurityKey'. This key is local and 
[should be] unique to that server. firstSearcher, newSearcher et al then 
include this key in their parameters so they can perform autowarming without 
constraint. Again, there are likely many ways to achieve this, this approach is 
but one.

The attached rar holds the source and associated configuration. This has been 
tested on the 1.4 release codebase (search in the attached solrconfig.xml for 
SolrACLSecurity to find the relevant sections in this file).

I hope this proves helpful for people who are looking for this sort of 
functionality in Solr, and more generally to address how such a mechanism could 
ultimately be integrated into a future Solr release.

Many thanks,
Peter





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1872) Document-level Access Control in Solr

2010-04-08 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1872:
---

Attachment: SolrACLSecurity.rar

Document-level Access Control in Solr
-

This issue relates to providing document-level access control for Solr index
data.
A related JIRA issue is: SOLR-1834. I thought it would be best if I created a
separate JIRA issue, rather than tack on to SOLR-1834, as the approach here
is somewhat different, and I didn't want to confuse things or step on Anders'
good work.
There have been lots of discussions about document-level access in Solr using
LCF, custom comoponents and the like. Access Control is one of those subjects
that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone
agrees with the approaches taken here, it does, at the very least, highlight
some of the salient issues surrounding access control in Solr, and will
hopefully initiate a healthy discussion on the range of related requirements,
with the aim of finding the optimum balance of requirements.
The approach taken here is document and schema agnostic - i.e. the access
control is independant of what is or will be in the index, and no schema
changes are required. This version doesn't include LDAP/AD integration, but
could be added relatively easily (see Ander's very good work on this in
SOLR-1834). Note that, at the moment, this version doesn't deal with /update,
/replication etc., it's currently a /select thing at the moment (but it could
be used for these).
This approach uses a SearchComponent subclass called SolrACLSecurity. Its
configuration is read in from solrconfig.xml in the usual way, and the
allow/deny configuration is split out into a config file called acl.xml.
acl.xml defines a number of users and groups (and 1 global for 'everyone'),
and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements.
When the SearchComponent is initialized, user objects are created and cached,
including an 'allow' list and a 'deny' list.
When a request comes in, these lists are used to build filter queries
('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query
request.
Because the allow and deny elements are simply subsearch queries (e.g.
{{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any
stored data that can be queried, including already existing data.
Authentication
One of the sticky problems with access control is how to determine who's
asking for data. There are many approaches, and to stay in the generic vein
the current mechanism uses http parameters for this.
For an initial search, a client includes a {{username=somename}} parameter
and a {{hash=pwdhash}} hash of its password. If the request sends the correct
parameters, the search is granted and a uuid parameter is returned in the
response header. This uuid can then be used in subsequent requests from the
client. If the request is wrong, the SearchComponent fails and will increment
the user's failed login count (if a valid user was specified). If this count
exceeds the configured lockoutThreshold, no further requests are granted
until the lockoutTime has elapsed.
This mechanism protects against some types of attacks (e.g. CLRF, dictionary
etc.), but it really needs container HTTPS as well (as would most other auth
implementations). Incorporating SSL certificates for authentication and
making the authentication mechanism pluggable would be a nice improvement
(i.e. separate authentication from access control).
Another issue is how internal searchers perform autowarming etc. The solution
here is to use a local key called 'SolrACLSecurityKey'. This key is local and
[should be] unique to that server. firstSearcher, newSearcher et al then
include this key in their parameters so they can perform autowarming without
constraint. Again, there are likely many ways to achieve this, this approach
is but one.
The attached rar holds the source and associated configuration. This has been
tested on the 1.4 release codebase (search in the attached solrconfig.xml for
SolrACLSecurity to find the relevant sections in this file).
I hope this proves helpful for people who are looking for this sort of
functionality in Solr, and more generally to address how such a mechanism
could ultimately be integrated into a future Solr release.
Many thanks,
Peter

--
This message is automatically generated by JIRA.
-
You can reply

[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused

2010-04-03 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853168#action_12853168
]

Peter Sturge commented on SOLR-1143:

This is a cool patch - yes, very useful.

I've found a couple of issues with it, though:

1. When going through the 'waiting for shard replies' loop, because no
exception is thrown on shard failure, the next block after the loop can throw a
NullPointerException in {{SearchComponent.handleResponses()}} for any
SearchComponent that checks shard responses. It could be that this doesn't
always happen, but it certainly happens in FacetComponent when date_facets are
turned on.

2. There's a bit of code that sets {{partialResults=true}} if there's at least
one failure, but it doesn't set it to false if everything's ok. In order for
the patch to operate, this parameter must have already been present and true,
otherwise the patch is essentially 'disabled' anyway (problem of using the same
parameter as input and result).

I've made some modifications to the patch for these and a couple of other
things:

1. FacetComponent modified to check for null shard reponse. Perhaps it would be
better to check this in SearchHandler.handleResponses(), but then no
SearchComponents would be contacted re failed shards, even if they don't care
that it's failed (is that a good thing?).

2. Added a new CommonParams parameter called FAILED_SHARDS.
{{partialResults}} is now only an input parameter to enable the feature (Note:
{{partialResults}} is referenced in RequestHandlerBase, but it's not from the
patch - is this an existing parameter that is used for something else?! If so,
perhaps the name should be changed to something like {{allowPartialResults}} to
avoid b/w compat and other potential conflicts).
The output parameter that goes in the response header is now:
{{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no
failedShards in the response header, otherwise, a list of failed shards is
given. This is very useful to alert someone/something that a server/network
needs attention (e.g. a health checker thread could run empty disributed
seaches solely for the purpose of checking status).

3. Changed the detection of a shard request error to be any Exception, rather
than just ConnectException. This way, any failure is caught and can be
actioned. Possible TODO: it might be nice to include a short message (Exception
class name?) in the FAILED_SHARDS parameter about what failed (e.g.
ConnectException, IOException, etc.). If you like this idea, please say so, and
I'll include it - i.e. something like:
{{
failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}}

I'm currently testing these changes in our internal build. In the meantime, any
comments are grealy appreciated. If there are no objections, I'll add a patch
update when the dev test run is complete.

Return partial results when a connection to a shard is refused
--

Key: SOLR-1143
URL: https://issues.apache.org/jira/browse/SOLR-1143
Project: Solr
Issue Type: Improvement
Components: search
Reporter: Nicolas Dessaigne
Assignee: Grant Ingersoll
Fix For: 1.5

Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch

If any shard is down in a distributed search, a ConnectException it thrown.
Here's a little patch that change this behaviour: if we can't connect to a
shard (ConnectException), we get partial results from the active shards. As
for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we
set the parameter partialResults at true.
This patch also adresses a problem expressed in the mailing list about a year
ago
(http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
We have a use case that needs this behaviour and we would like to know your
thougths about such a behaviour? Should it be the default behaviour for
distributed search?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

HTTP Authentication for sharded queries
---

This patch adds the option for Solr clients to pass shard-specific http
credentials to SearchHandler, which can then use these credentials when making
http requests to shards.

Here's how the patch works:

A final constant String called {{shardcredentials}} acts as the name of the
SolrParams parameter key name.
The format for the value associated with this key is a comma-delimited list of
colon-separated tokens:
{{
shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
}}
A client adds these parameters to their sharded request.
In the absence of {{shardcredentials}} and/or matching credentials, the patch
reverts to the existing behaviour of using a default http client (i.e. no
credentials). This ensures b/w compatibility.

When SearchHandler receives the request, it passes the 'shardcredentials'
parameter to the HttpCommComponent via the submit() method.
The HttpCommComponent parses the parameter string, and when it finds matching
credentials for a given shard, it creates an HttpClient object with those
credentials, and then sends the request using this.
Note: Because the match comparison is a string compare (a.o.t. dns compare),
the host/ip names used in the shardcredentials parameters must match those used
in the shards parameter.

Impl Notes:
This patch is used and tested on the 1.4 release codebase. There weren't any
significant diffs between the 1.4 release and the latest trunk for
SearchHandler, so should be fine on other trunks, but I've only tested with the
1.4 release code base.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

Apologies that this is the source file and not a diff'ed patch file.

I've tried so many Win doze svn products, but I just can't get them to create a
patch file (I'm sure this is more down to me not configuring them correctly,
rather than rapidsvn, visualsvn, Tortoisesvn etc.).
If someone would like to create a patch file from this source, that would be
extraordinarily kind of you!
In any case, the changes to this file are quite straightforward.

HTTP Authentication for sharded queries
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

A small update to this patch to support distributed searches with multiple
cores.

HTTP Authentication for sharded queries
---

Key: SOLR-1861
URL: https://issues.apache.org/jira/browse/SOLR-1861
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
Attachments: SearchHandler.java, SearchHandler.java

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-03-26 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850159#action_12850159
 ] 

Peter Sturge commented on SOLR-1672:


I agree there's some refactoring to do to bring it in line with current 
FacetParams conventions. At the same time, it would be good to look at wrapping 
up the functionality into a method, and covering all the code paths in the way 
you describe.

I've been wanting to get to finishing off this patch, but I'm in the throws of 
a product release myself, so I've not had many spare cycles.

You mention termenum, fieldcache, uninverted - presumably, these are among the 
code paths that need to cater for facet counts. If you know them, can you add a 
comment here that lists all the areas that need to be catered for, so that none 
are left out (if it's more than those 3).

Thanks!
Peter


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This

[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-02-17 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1729:
---

Attachment: UnInvertedField.java

Hi Thomas,

Thanks for catching this. I thought I'd attached that one. *sigh* Honestly,
that is really slack of me - many apologies.
The attached UnInvertedField.java has the updated getCounts() method. Any
troubles, let me know.

Thanks!
Peter

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-02-16 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834222#action_12834222
]

Peter Sturge commented on SOLR-1709:

Hi Thomas,

Hmmm...TermsHelper is an inner class inside TermsComponent.
In the code base that I have, this class exists within TermsComponent. I've
just had a look on the
http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the
TermsComponent *doesn't* have this inner class.

Not sure where the difference is, as I would have got my codebase from the same
set of mirrors as you (unless some mirrors are out-of-sync?).

TermsComponent hasn't changed in this patch, so I don't know much about this
class. One thing to try is to diff the 2 files above with your 1.4 codebase,
and merge the changes into your codebase. The differences should be very easy
to see.

This does highlight the very good policy for putting patch files as attachments
rather than source files. This is my fault, as we don't use svn in our (win)
environment, and Tortoise SVN crashes explorer64, so i'm not able to make
compatible diff files - sorry.

If you do create a couple of diff files, it would be very kind of you if you
could post it up on this issue for others?

Thanks!

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-02-03 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829043#action_12829043
]

Peter Sturge commented on SOLR-1729:

Hi Chris,
Thanks for your comments - I hope I didn't sound like your comments were taken
wrongly - I absolutely count on comments from you and other experts to make
sure I'm not missing some important functionality and/or side effect. You know
the code base far better than I, so its great that you take the time to point
out all the different bits and peices that need addressing.

I can certainly understand the need to address the 'core-global' isssues raised
by you and Yonik for storing a ThreadLocal 'query-global' 'NOW'.
I suppose the main issue in implementing the thread-local route is that we'd
have to make sure we found every place in the query core that references now,
and point those references to the new variable? If the 'code-at-large'
[hopefully] always calls the date math routines for finding 'NOW', great, it
should be relatively straightforward. If there are any stray e.g.
System.currentTimeMillis(), then it's a bit more fiddly, but still do-able.

??it's all handled internally by DateField??
Sounds like DateField would the best candidate for holding the ThreadLocal? The
query handler code can set the variable of its DateField instance if it's set
in a query parameter, otherwise it just defaults to it's own local (UTC) time.
Could be done similarly to DateField.ThreadLocalDateFormat, perhaps?

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-28 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805995#action_12805995
]

Peter Sturge commented on SOLR-1729:

??...they might not all get queried at the exact same time??

I suppose this is what the explicit 'NOW' is meant to resolve -
staggered/lagged receipt/response, and, in an erzatz fashion, discrepencies in
local time sync. Since the passed-in 'NOW' is relative only to the epoch,
network latency is handled, and time-sync on any given server is assumed to be
correct.

??...multiple requets might be made to a single server for different phrases of
the distributed request that expect to get the same answers.??

As long as the same code path is followed for such requests, it should honour
the same (passed-in) 'NOW'. Are there scenarios where this is not the case? In
which case, yes, these would need to be addressed.

??...unless filter queries that use date math also respect it the counts
returned from date faceting will still potentially be non-sensical.??

Definitely filter queries will need to get/use/honour the same 'NOW' as its
corresponding query, otherwise anarchy will quickly ensue.
Can you point me toward the class(es) where filter queries' date math lives,
and I'll have a look? As filter queries are cached separately, can you think of
any potential caching issues relating to filter queries?

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2010-01-22 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803858#action_12803858
 ] 

Peter Sturge commented on SOLR-1672:


Jan, you are absolutely correct that the parameter should (and will) be 'desc'.

I have an update in my queue of things todo which changes this, but also 
removes the new 'facet.sortorder' parameter, and includes instead 'facet.sort 
desc' as a valid parameter for facet.sort. This keeps things nice and tidy and 
consistent.

The 'facet.sortorder' parameter was really as POC to try out the behaviour 
before changing the core parameter syntax of the existing 'facet.sort' 
parameter. Not that's done, the parameter will be rolled into 'facet.sort'.

Thanks,
Peter


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1729) Date Facet now override time parameter

2010-01-22 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803860#action_12803860
]

Peter Sturge commented on SOLR-1729:

I agree there are wider issues that relate to this -- this particular patch
addresses the time sync issue for allowing distributed date facets to happen.
In this case, you must have multiple cores using the same NOW for all, so that
your date facets are consistent. In fact, it doesn't really matter which now
you use, as long they're all the same -- the caller setting the now value makes
the most sense.

For other time-related queries, this might not be the case, but as you rightly
pointed out, these are not addressed here.

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)

Date Facet now override time parameter
--

 Key: SOLR-1729
 URL: https://issues.apache.org/jira/browse/SOLR-1729
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This PATCH introduces a new query parameter that tells a (typically, but not 
necessarily) remote server what time to use as 'NOW' when calculating date 
facets for a query (and, for the moment, date facets *only*) - overriding the 
default behaviour of using the local server's current time.

This gets 'round a problem whereby an explicit time range is specified in a 
query (e.g. timestamp:[then0 TO then1]), and date facets are required for the 
given time range (in fact, any explicit time range). 
Because DateMathParser performs all its calculations from 'NOW', remote callers 
have to work out how long ago 'then0' and 'then1' are from 'now', and use the 
relative-to-now values in the facet.date.xxx parameters. If a remote server has 
a different opinion of NOW compared to the caller, the results will be skewed 
(e.g. they are in a different time-zone, not time-synced etc.).
This becomes particularly salient when performing distributed date faceting 
(see SOLR-1709), where multiple shards may all be running with different times, 
and the faceting needs to be aligned.

The new parameter is called 'facet.date.now', and takes as a parameter a 
(stringified) long that is the number of milliseconds from the epoch (1 Jan 
1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. 
This was chosen over a formatted date to delineate it from a 'searchable' time 
and to avoid superfluous date parsing. This makes the value generally a 
programatically-set value, but as that is where the use-case is for this type 
of parameter, this should be ok.

NOTE: This parameter affects date facet timing only. If there are other areas 
of a query that rely on 'NOW', these will not interpret this value. This is a 
broader issue about setting a 'query-global' NOW that all parts of query 
analysis can share.

Source files affected:
FacetParams.java   (holds the new constant FACET_DATE_NOW)
SimpleFacets.java  getFacetDateCounts() NOW parameter modified

This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as 
it's a general change for date faceting, it was deemed deserving of its own 
patch. I will be updating SOLR-1709 in due course to include the use of this 
new parameter, after some rfc acceptance.

A possible enhancement to this is to detect facet.date fields, look for and 
match these fields in queries (if they exist), and potentially determine 
automatically the required time skew, if any. There are a whole host of reasons 
why this could be problematic to implement, so an explicit facet.date.now 
parameter is the safest route.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1729) Date Facet now override time parameter

2010-01-21 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1729:
---

Attachment: FacetParams.java
SimpleFacets.java

These are the source files affected for this patch.
Apologies for not creating a PATCH file - my tortoise svn is not working for
creating patch files.
If anyone would like to create a patch from these, that would be
extraordinarily kind of you!

Diff: (trunk: 1.4 Release)
FacetParams.java:
Add at line 179:
/**
* String that tells the date facet counter what time to use as 'now'.
*
* The value of this parameter, if it exists, must be a stringified long
* of the number of milliseconds since the epoch (milliseconds since 1 Jan
1970 00:00).
* System.currentTimeMillis() provides this.
*
* The DateField and DateMathParser work out their times relative to 'now'.
* By default, 'now' is the local machine's System.currentTimeMillis().
* This parameter overrides the local value to use a different time.
* This is very useful for remote server queries where the times on the
querying
* machine are skewed/different than that of the date faceting machine.
* This is a date.facet global query parameter (i.e. not per field)
* @see DateMathParser
* @see DateField
*/
public static final String FACET_DATE_NOW = facet.date.now;

SimpleFacets.java:
Change at line 551:
-final Date NOW = new Date();
+ final Date NOW = new Date(params.get(FacetParams.FACET_DATE_NOW) != null
? Long.parseLong(params.get(facet.date.now)) : System.currentTimeMillis());

Date Facet now override time parameter
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-21 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1709:
---

Attachment: FacetComponent.java

Updated version of FacetComponent.java after more testing and sync with
FacetParams.FACET_DATE_NOW (see SOLR-1729).
For use with the 1.4 trunk (along with the existing ResponseBuilder.java in
this patch).

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-09 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798411#action_12798411
]

Peter Sturge commented on SOLR-1709:

Yonik,

Yes, I can see what you mean that of course NOW will affect anything
date-related to a given query.
I'm wondering whether the passing of 'NOW' to shards should be a separate
issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'),
as its scope and ramifications go far beyond simply distributed date faceting.
The whole area of code relating to date math is one that I'm not familiar with,
but do let me know if there's anything you'd like me to look at.

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797957#action_12797957
]

Peter Sturge commented on SOLR-1709:

I've heard of Tortoise, I'll give that a try, thanks.

On the time-zone/skew issue, perhaps a more efficient approach would be a
'push' rather than 'pull' - i.e.:

Requesters would include an optional parameter that told remote shards what
time to use as 'NOW', and which TZ to use for date faceting.
This would avoid having to translate loads of time strings at merge time.

Thanks,
Peter

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Sturge updated SOLR-1709:
---

Attachment: ResponseBuilder.java
FacetComponent.java

Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm
attaching the source files here. With some time, which at the moment I don't
have, I'm sure I could get svn working. Rather than anyone have to wait for me
to get the patch file created, I thought it best to get the source uploaded, so
people can start using it.
Thanks, Peter

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-08 Thread Peter Sturge (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798233#action_12798233
]

Peter Sturge commented on SOLR-1709:

Definitely true! -- messing about with Date strings isn't great for performance.

As the NOW parameter would be for internal request use only (i.e. not for the
indexer, not for human consumption), could it not just be an epoch long? The
adjustment math should then be nice and quick (no string/date
parsing/formatting; at worst just one Date.getTimeInMillis() call if the time
is stored locally as a string).

Distributed Date Faceting
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

2010-01-07 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge resolved SOLR-1672.


Resolution: Fixed

Marking as resolved.


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Peter Sturge (JIRA)

Distributed Date Faceting
-

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor


This patch is for adding support for date facets when using distributed 
searches.

Date faceting across multiple machines exposes some time-based issues that 
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. 
merged date facets are at a time-of-day, not necessarily at a universal 
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis 
for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the 
first by 1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's 
data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3 or 
more hours of data)
This could be dealt with if timezone and skew information was added, and 
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 
'now' parameters to the 'facet_dates' map. This would tell requesters what time 
and TZ the remote server thinks it is, and so multiple shards' time data can be 
normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the 
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but 
really, if facet.date parameters are specified, it is assumed they are desired.
Comments  suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH 
file from it, it would be greatly appreciated, as I'm having a bit of trouble 
with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)

RFE: facet reverse sort count
-

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor


As suggested by Chris Hosstetter, I have added an optional Comparator to the 
BoundedTreeSetLong in the UnInvertedField class.
This optional comparator is used when a new (and also optional) field facet 
parameter called 'facet.sortorder' is set to the string 'dsc' 
(e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc 
for all facets).
Note that this parameter has no effect if facet.method=enum.
Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
its default behaviour.
 
This change affects 2 source files:
 UnInvertedField.java
[line 438] The getCounts() method signature is modified to add the 
'facetSortOrder' parameter value to the end of the argument list.
 
DIFF UnInvertedField.java:
- public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix) throws IOException {

+ public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
offset, int limit, Integer mincount, boolean missing, String sort, String 
prefix, String facetSortOrder) throws IOException {

[line 556] The getCounts() method is modified to create an overridden 
BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 
'dsc'.
DIFF UnInvertedField.java:
- final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);

+ final BoundedTreeSetLong queue = (sort.equals(count) || 
sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
BoundedTreeSetLong(maxsize, new Comparator()
{ @Override
public int compare(Object o1, Object o2)
{
  if (o1 == null || o2 == null)
return 0;
  int result = ((Long) o1).compareTo((Long) o2);
  return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
}}) : new BoundedTreeSetLong(maxsize)) : null;

 SimpleFacets.java
[line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
retrieve the new parameter, if present. 'asc' used as a default value.
DIFF SimpleFacets.java:

+ String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc);
 
[line 253] The call to uif.getCounts() in the getTermCounts() method is 
modified to pass the 'facetSortOrder' value string.
DIFF SimpleFacets.java:
- counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix);
+ counts = uif.getCounts(searcher, base, offset, limit, 
mincount,missing,sort,prefix, facetSortOrder);

Implementation Notes:
I have noted in testing that I was not able to retrieve any '0' counts as I had 
expected.
I believe this could be because there appear to be some optimizations in 
SimpleFacets/count caching such that zero counts are not iterated (at least not 
by default)
as a performance enhancement.
I could be wrong about this, and zero counts may appear under some other as yet 
untested circumstances. Perhaps an expert familiar with this part of the code 
can clarify.
In fact, this is not such a bad thing (at least for my requirements), as a 
whole bunch of zero counts is not necessarily useful (for my requirements, 
starting at '1' is just right).
 
There may, however, be instances where someone *will* want zero counts - e.g. 
searching for zero product stock counts (e.g. 'what have we run out of'). I was 
envisioning the facet.mincount field
being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
or possibly higher), but because of the caching/optimization, the behaviour is 
somewhat different than expected.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1672:
---

Attachment: SOLR-1672.patch

Patch diff file for adding facet reverse sorting


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1672) RFE: facet reverse sort count

2009-12-18 Thread Peter Sturge (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792424#action_12792424
 ] 

Peter Sturge commented on SOLR-1672:


Patch SOLR-1672.patch now included for review


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

64 matches

Mail list logo