[jira] [Commented] (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264570#comment-13264570 ] Peter Sturge commented on SOLR-1861: Would a Solrj client be able to intrinsically handle a distributed shard request? It could make separate requests for each shard, but you wouldn't have the nice advantage of distributed searches, with aggregated facets, ranges etc. that's built-in on the server side. Or perhaps I've misunderstood your Solrj suggestion? HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Labels: authentication, distributed, http, shard Attachments: SearchHandler.java, SearchHandler.java This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3421) Distributed Search doesn't allow for HTTP Authentication
[ https://issues.apache.org/jira/browse/SOLR-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264296#comment-13264296 ] Peter Sturge commented on SOLR-3421: There is an existing patch for this behaviour - see: issues.apache.org/jira/browse/SOLR-1861 This patch allows distributed credentials to be passed inside the url, where SearchHandler then parses this an creates HttpConnections for each shard in the distributed search. Some useful extensions to this approach would be the use of certificates (instead of explicit credentials), and/or acl lists stored on the server side, with pre-authentication (e.g. via passing hash values instead of explicit credentials). The base mechanism provided in this patch can be used in both cases. HTH! Peter Distributed Search doesn't allow for HTTP Authentication Key: SOLR-3421 URL: https://issues.apache.org/jira/browse/SOLR-3421 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 3.6, 4.0 Environment: Sharded solr cluster Reporter: Michael Della Bitta Priority: Minor Labels: auth, distributed_search, ssl The distributed search feature allows one to configure the list of shards the SearchHandler should query and aggregate results from using the shards parameter. Unfortunately, there is no way to configure any sort of authentication between shards and a distributed search-enabled SearchHandler. It'd be good to be able to specify an authentication type, auth credentials, and transport security to allow installations that don't have the benefit of being protected by a firewall some measure of security. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3421) Distributed Search doesn't allow for HTTP Authentication
[ https://issues.apache.org/jira/browse/SOLR-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264297#comment-13264297 ] Peter Sturge commented on SOLR-3421: It's also worth noting that one of the advantages of this approach is that is allows for partial results to be returned (with error details in the response) if one or more shards are unavailable, but others are ok. An optional flag to allow this (or not) can switch this feature on or off. Distributed Search doesn't allow for HTTP Authentication Key: SOLR-3421 URL: https://issues.apache.org/jira/browse/SOLR-3421 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 3.6, 4.0 Environment: Sharded solr cluster Reporter: Michael Della Bitta Priority: Minor Labels: auth, distributed_search, ssl The distributed search feature allows one to configure the list of shards the SearchHandler should query and aggregate results from using the shards parameter. Unfortunately, there is no way to configure any sort of authentication between shards and a distributed search-enabled SearchHandler. It'd be good to be able to specify an authentication type, auth credentials, and transport security to allow installations that don't have the benefit of being protected by a firewall some measure of security. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index
[ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049727#comment-13049727 ] Peter Sturge commented on SOLR-2593: This is a really great idea, thanks! If it's possible, it would be cool to have config parameters to: create a new core overwrite an existing core rename an existing core, then create (rolling backup) merge with an existing core (ever-growing, but kind of an accessible 'archive' index) A new core admin command 'split' for splitting index Key: SOLR-2593 URL: https://issues.apache.org/jira/browse/SOLR-2593 Project: Solr Issue Type: New Feature Reporter: Noble Paul Fix For: 4.0 If an index is too large/hot it would be desirable to split it out to another core . This core may eventually be replicated out to another host. There can be to be multiple strategies * random split of x or x% * fq=user:johndoe example example : command=splitsplit=20percentnewcore=my_new_index or command=splitfq=user:johndoenewcore=john_doe_index -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1709) Distributed Date and Range Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022697#comment-13022697 ] Peter Sturge commented on SOLR-1709: Yes, the deprecation story makes sense. Regarding SOLR-1729, I'm pretty sure this already works for 3x (it was originally created on/for the 3x branch). I guess Yonik's NOW changes were destined for trunk, but I've been using the current SOLR-1729 patch on 3x branch and is working fine in production environments. Thanks Peter Distributed Date and Range Faceting --- Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Assignee: Hoss Man Priority: Minor Fix For: 4.0 Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, SOLR-1709.patch, SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020802#comment-13020802 ] Peter Sturge commented on SOLR-1709: Updating ResponseBuilder rather than FacetInfo really came from tracing the references through the hierarchy - so, I don't think anything is missed by moving this to FacetInfo props, and should provide better encapsulation. Deprecating data faceting in favour of generic range faceting should be fine, as long as there exists a clear path to easily move from 'the way we were' with date facets, to 'the way it will be' (range faceting). It would be a shame to break clients that rely on the existing date facet parameters/syntax, so I guess if they're mapped to range (I think some of this is in 3.x already?), that would be good. Thanks Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Assignee: Hoss Man Priority: Minor Fix For: 4.0 Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, SOLR-1709.patch, SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016770#comment-13016770 ] Peter Sturge commented on SOLR-2438: As I mentioned above, the approach is a little bit different from SOLR-219, and its scope is [perhaps] more targeted at case-insensitive wildcards only. It's also a completely self-contained patch. I've found that when a JIRA issue contains lots of (1) 'non-evolutionary' patches, it becomes difficult to know which patch is which. I agree that a new issue means commenters of 219 would need to look at this issue. I've added a link on SOLR-219 to relate it to this issue so it's easier to track. Hope this helps clarify. Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries
Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2438: --- Attachment: SOLR-2438.patch Attached patch file Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010268#comment-13010268 ] Peter Sturge commented on SOLR-2438: If you're like me, you may have often wondered why MyTerm, myterm, myter* and MyTer* can return different, and sometimes empty results. This patch addresses this for wildcard queries by adding an attribute to relevant solr.TextField entries in schema.xml. The new attribute is called: {{ignoreCaseForWildcards}} Example entry in schema.xml: {code:title=schema.xml [excerpt]|borderStyle=solid} fieldType name=text_lcws class=solr.TextField positionIncrementGap=100 ignoreCaseForWildcards=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType {code} It's worth noting that this will lower-case text for ALL terms that match the field type - including synonyms and stemmers. For backward compatibility, the default behaviour is as before - i.e. a case sensitive wildcard search ({{ignoreCaseForWildcards=false}}). The patch was created against the lucene_solr_3_1 branch. I've not applied it yet on trunk. [caveat emptor] I freely admit I'm no schema expert, so commiters and community members may see use cases where this approach could pose problems. I'm all for feedback to enhance the functionality... The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches in Solr - in line with the 'it just works' Solr philosophy. Enjoy! Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge Attachments: SOLR-2438.patch This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2026) Need infrastructure support in Solr for requests that perform multiple sequential queries
[ https://issues.apache.org/jira/browse/SOLR-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13002701#comment-13002701 ] Peter Sturge commented on SOLR-2026: Hi Karl, This patch is a really good idea - many thanks for coming up with this! I've tried applying this on trunk, but I get a few compile errors from the patch, and I'm not quite sure how to use it in a query. The compile errors have to do with: SearchHandler.java (~line 267): ResponseBuilder rb = new ResponseBuilder(); ResponseBuilder doesn't have a no-arg ctor ResponseBuilder.java (~line 141): (copyFrom()) debug = rb.debug; There is no 'debug' parameter. I've fixed these up locally, but as I've only just looked at this, I thought I'd run it by you before patching it up. There's also an NPE thrown if debugQuery=true (@DebugComponent.java:56) I haven't been able to build a query that seems to work.. Do you have any example query urls you use for testing? http://127.0.0.1:9000/solr/select?qt=multiqueryblahblah etc... Many thanks! Peter Need infrastructure support in Solr for requests that perform multiple sequential queries - Key: SOLR-2026 URL: https://issues.apache.org/jira/browse/SOLR-2026 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Priority: Minor Fix For: 4.0 Attachments: SOLR-2026.patch, SOLR-2026.patch Several known cases exist where multiple index searches need to be performed in order to arrive at the final result. Typically, these have the constraint that the results from one search query are required in order to form a subsequent search query. While it is possible to write a custom QueryComponent or search handler to perform this task, an extension to the SearchHandler base class would readily permit such query sequences to be configured using solrconfig.xml. I will be therefore writing and attaching a patch tomorrow morning which supports this extended functionality in a backwards-compatible manner. The tricky part, which is figuring out how to funnel the output of the previous search result into the next query, can be readily achieved by use of the SolrRequestObject.getContext() functionality. The stipulation will therefore be that the SolrRequestObject's lifetime will be that of the entire request, which makes complete sense. (The SolrResponseObject's lifetime will, on the other hand, be limited to a single query, and the last response so formed will be what gets actually returned by SearchHandler.) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994716#comment-12994716 ] Peter Sturge commented on SOLR-1709: Hi David, Thank you thank you thank you for working on this and providing tests - your efforts are very much appreciated! For deprecation of facet.date, I suspect it probably shouldn't be deprecated until a fully-fledged replacement is ready, ported and committed, but if SOLR-1240 can functionally slot-in (including the 'NOW' stuff in SOLR-1729), that's great. Many thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994847#comment-12994847 ] Peter Sturge commented on SOLR-2245: I've been meaning to get back to this, as I have made some local updates to this that help performance. Could you give me some feedback on these 2 questions please - it would be really useful: * Is there a committer's standard or similar spec that describes what tests should be included, and if so, could you point me to it please? I can then make sure I include appropriate tests * Is there a time-frame for committing for this or next release? I have a product release of my own coming fup or beg-March, so if I know the time-scales, I can plan accordingly. Thanks! Peter MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988013#action_12988013 ] Peter Sturge commented on SOLR-1709: Hi David, Yes, at the time my patching wasn't working (Windows env for my sins), so I thought it would be better to make the source available than not. Thomas H. kindly did turn it into a udiff patch last year. I agree it would be good to include this functionality (along with SOLR-1729 + Yonik's recent 'NOW' changes). I have a product release coming up in a few weeks, so I won't have many cycles before then. Of course it would be great if you have any time to invest making this more 'commitable'. I admit because I'm not a Solr commiter, I'm not as familiar with the requirements. If you can let me know the 'missing elements', I'm happy to look at contributing what's needed, or if you prefer, divide up the tasks that need doing. Many thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970785#action_12970785 ] Peter Sturge commented on SOLR-1729: Many thanks for finishing off this patch. Sorry I didn't get time to fix this, been swamped with so many projects at the moment. That's great you got the thread local NOW included as well. Thanks! Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Fix For: 4.0 Attachments: FacetParams.java, SimpleFacets.java, solr-1.4.0-solr-1729.patch, SOLR-1729.patch, SOLR-1729.patch, SOLR-1729.patch, UnInvertedField.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966494#action_12966494 ] Peter Sturge commented on SOLR-1729: Hi Peter, Not sure why it would work, then not... Both these patches were submitted just before all the version name changes (which I'm still getting to grips with). At the time, I think 1.4.1 was the latest release train. For 3.x recently we've done some manual merging due to some other changes (forwarding http credentials to remote shards). I'll have a look at building a separate 'branch3x' patch version, as there may have been some separate back porting changes in the affected files that's breaking the current patch. Are you using the latest release, or the latest trunk version? Thanks, Peter Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java, solr-1.4.0-solr-1729.patch, UnInvertedField.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966533#action_12966533 ] Peter Sturge commented on SOLR-1729: Hi Peter, So, the patches are clean (for 1.4.1), but the tests are failing for 1.4.1? Or is the failure in 3.x? Sorry, but I'm a bit confused which bit isn't working now. Thanks, Peter On Fri, Dec 3, 2010 at 1:05 PM, Peter Karich (JIRA) j...@apache.org wrote: Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java, solr-1.4.0-solr-1729.patch, UnInvertedField.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966031#action_12966031 ] Peter Sturge commented on SOLR-1709: It's a good idea to apply SOLR-1729 in any case, as it caters for any time skew in documents and between machines. Without it, result counts 'on the edges' could be incorrect. 1729 is quite 'passive', in that if you don't specify a 'FACET_DATE_NOW' parameter int he request, it runs as without the patch. In terms of readiness, we've been using these patches in production environments for months now. (We use it with the 3.x trunk branch) Yonik, et al. were talking about a more general update with regards how NOW is configured on a machine (since it is used in places other than just date facets), and this is the 'extra' work to be done, but things work fine as they are for disti date faceting. Thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966238#action_12966238 ] Peter Sturge commented on SOLR-1729: So is 1709 ok, but 1729 isn't? Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java, solr-1.4.0-solr-1729.patch, UnInvertedField.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2245: --- Attachment: SOLR-2245.zip This patch update does a more proper delta-import implementation, rather than the kludge used in the previous version. MailEntityProcessor with this patch is useful for importing emails 'en-masse' the first time 'round, then only new mails after that. Behaviour: * If you send a full-import command, then the 'fetchMailsSince' property specified in data-config.xml will always be used. * If you send a delta-import command, the 'fetchMailsSince' property specified in data-config.xml is used for the first call only. Subsequent delta-import commands will use the time since the last index update. There are significant code changes in this version. So much so, that I've included the complete MailEntityProcessor source as well as a PATCH file. This version doesn't use the persistent last_index_time functionality of dataimport.properties (i.e. it's delta only for the life of the solr process). If I get some free cycles, I'll try to put this in. MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12935898#action_12935898 ] Peter Sturge commented on SOLR-2245: Forgo to mention... Because this now supports delta-import commands, the 'deltaFetch' attribute is no longer needed and is not used. MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch, SOLR-2245.zip This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2245: --- Attachment: SOLR-2245.patch MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934800#action_12934800 ] Peter Sturge commented on SOLR-2245: This latest version of the updated MailEntityProcessor adds a few new features: 1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into this patch 2. Added a hacky version of delta mail retrieval for scheduled import runs: The new property is called 'deltaFetch'. If 'true', the first time the import is run, it will read the 'fetchMailsSince' property and import as normal On subsequent runs (within the same process session), the import will only fetch mail since the last run. Because it uses a runtime system property to hold the last_index_time, and there is currently no persistence, if/when the server is restarted, the last_index_time is not saved and the original fetchMailsSince value is used. I couldn't find exposed APIs for the dataimport.properties file (all the methods are private or pkg protected), persistence is not included in this patch version 3. Added support for including shared folders in the import 4. Added support for including personal folders (other folders) in the import A typical {{monospaced}}entity{{monospaced}} element in data-config.xml might look something like this: {{monospaced}} entity name=email user=u...@mydomain.com password=userpwd host=imap.mydomain.com fetchMailsSince=2010-08-01 00:00:00 deltaFetch=true include= exclude= recurse=false folders=INBOX,Inbox,inbox includeContent=true processAttachments=true includeOtherUserFolders=true includeSharedFolders=true batchSize=100 processor=MailEntityProcessor protocol=imap/ {{monospaced}} MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934800#action_12934800 ] Peter Sturge edited comment on SOLR-2245 at 11/23/10 5:58 AM: -- This latest version of the updated MailEntityProcessor adds a few new features: 1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into this patch 2. Added a hacky version of delta mail retrieval for scheduled import runs: The new property is called 'deltaFetch'. If 'true', the first time the import is run, it will read the 'fetchMailsSince' property and import as normal On subsequent runs (within the same process session), the import will only fetch mail since the last run. Because it uses a runtime system property to hold the last_index_time, and there is currently no persistence, if/when the server is restarted, the last_index_time is not saved and the original fetchMailsSince value is used. I couldn't find exposed APIs for the dataimport.properties file (all the methods are private or pkg protected), persistence is not included in this patch version 3. Added support for including shared folders in the import 4. Added support for including personal folders (other folders) in the import A typical entity element in data-config.xml might look something like this: {code:xml} entity name=email user=u...@mydomain.com password=userpwd host=imap.mydomain.com fetchMailsSince=2010-08-01 00:00:00 deltaFetch=true include= exclude= recurse=false folders=INBOX,Inbox,inbox includeContent=true processAttachments=true includeOtherUserFolders=true includeSharedFolders=true batchSize=100 processor=MailEntityProcessor protocol=imap/ {code} was (Author: midiman): This latest version of the updated MailEntityProcessor adds a few new features: 1. Incorporated SOLR-1958 (exception if fetchMailsSince isn't specified) into this patch 2. Added a hacky version of delta mail retrieval for scheduled import runs: The new property is called 'deltaFetch'. If 'true', the first time the import is run, it will read the 'fetchMailsSince' property and import as normal On subsequent runs (within the same process session), the import will only fetch mail since the last run. Because it uses a runtime system property to hold the last_index_time, and there is currently no persistence, if/when the server is restarted, the last_index_time is not saved and the original fetchMailsSince value is used. I couldn't find exposed APIs for the dataimport.properties file (all the methods are private or pkg protected), persistence is not included in this patch version 3. Added support for including shared folders in the import 4. Added support for including personal folders (other folders) in the import A typical {{monospaced}}entity{{monospaced}} element in data-config.xml might look something like this: {{monospaced}} entity name=email user=u...@mydomain.com password=userpwd host=imap.mydomain.com fetchMailsSince=2010-08-01 00:00:00 deltaFetch=true include= exclude= recurse=false folders=INBOX,Inbox,inbox includeContent=true processAttachments=true includeOtherUserFolders=true includeSharedFolders=true batchSize=100 processor=MailEntityProcessor protocol=imap/ {{monospaced}} MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch, SOLR-2245.patch This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there
[jira] Updated: (SOLR-2245) MailEntityProcessor Update
[ https://issues.apache.org/jira/browse/SOLR-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2245: --- Attachment: SOLR-2245.patch MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2245) MailEntityProcessor Update
MailEntityProcessor Update -- Key: SOLR-2245 URL: https://issues.apache.org/jira/browse/SOLR-2245 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 1.4.1, 1.4 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2245.patch This patch addresses a number of issues in the MailEntityProcessor contrib-extras module. The changes are outlined here: * Added an 'includeContent' entity attribute to allow specifying content to be included independently of processing attachments e.g. entity includeContent=true processAttachments=false . . . / would include message content, but not attachment content * Added a synonym called 'processAttachments', which is synonymous to the mis-spelled (and singular) 'processAttachement' property. This property functions the same as processAttachement. Default= 'true' - if either is false, then attachments are not processed. Note that only one of these should really be specified in a given entity tag. * Added a FLAGS.NONE value, so that if an email has no flags (i.e. it is unread, not deleted etc.), there is still a property value stored in the 'flags' field (the value is the string none) Note: there is a potential backward compat issue with FLAGS.NONE for clients that expect the absence of the 'flags' field to mean 'Not read'. I'm calculating this would be extremely rare, and is inadviasable in any case as user flags can be arbitrarily set, so fixing it up now will ensure future client access will be consistent. * The folder name of an email is now included as a field called 'folder' (e.g. folder=INBOX.Sent). This is quite handy in search/post-indexing processing * The addPartToDocument() method that processes attachments is significantly re-written, as there looked to be no real way the existing code would ever actually process attachment content and add it to the row data Tested on the 3.x trunk with a number of popular imap servers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929786#action_12929786 ] Peter Sturge commented on SOLR-1709: Hi Peter, Thanks for your message. There's of course the issue of 'now' as described in some of the above comments. This is perhaps a little ancillary to this issue, but not totally irrelevant. The issue of time zone/skew on distributed shards is currently handled by SOLR-1729 by passing a 'facet.date.now=epochtime' parameter in the search query. This is then used by the particapating shards to use as 'now'. Of course, there are a number of ways to skin that one, but this is a straightforward solution that is backward compatible and still easy to implement in client code. Note that the facet.date.now change is not part of this patch - see SOLR-1729 for a separate patch for this parameter. (kept separate because it's, strictly speaking, a separate issue generally for distributed search) It's not that eariler/later aren't supported - the date facet 'edges' are fine, it's just the patch will 'quantize the ends' of the start/end date facets if the time is skewed from the calling server. This is where SOLR-1729 comes into play, so that this doesn't happen. As this is a pre-3x/4x branch patch, the testing is a bit limited on the latest trunk(s). Having said that, I have this (and SOLR-1729) building/running fine on my svn 3x branch release copy. Any other questions, or info you need, please do let me know. Thanks! Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, solr-1.4.0-solr-1709.patch This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2100) Fix for saving commit points during java-based backups
[ https://issues.apache.org/jira/browse/SOLR-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906497#action_12906497 ] Peter Sturge commented on SOLR-2100: I'm not really familiar with the reservation code for replication, but will it still save the commit point for replication even if another commit (or many commits) come along during replication? By default, it would probably be rare, as the data to be replicated is only a delta and would likely not take too long to complete. This was the problem with backups - it's a full file copy of everything, which typically takes minutes on large indexes - longer if writing to a remote volume. As the replication timing is configurable, you could have a scenario where the amount of data to be replicated is very significant, and is generally remote, so could take some time to complete. Would the reserveration mechanism still hold the commit point if 1,2, 5 or 10 commits came along during the replication process? ReplicationHandler.postCommit() calls saveCommitPoint()/releaseCommitPoint(), so as things stand this would preserve the commit point even if a separate reserveration didn't, and there's no price to pay for holding the indexVersion in this way. Not sure what the standard policy is for marking issues Resolved/Closed, so I'll leave this up to you. But do let me know if you'd like me to perform any additional testing. Fix for saving commit points during java-based backups -- Key: SOLR-2100 URL: https://issues.apache.org/jira/browse/SOLR-2100 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2100.PATCH Original Estimate: 0h Remaining Estimate: 0h This patch fixes the saving of commit points during backup operations. This fixes the perviously commited (for 1.4) SOLR-1475 patch. 1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the 'savedCommits' map. 2. Also, the testing of the presence of a commit point uses the contains() method instead of containsKey(). The result of this means that backups for anything but toy indexes fail, because the commit points are deleted (after 10s) before the full backup is completed. This patch addresses these 2 issues. Tested with 1.4.1 release trunk, but should also work fine with 1.4. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2100) Fix for saving commit points during java-based backups
Fix for saving commit points during java-based backups -- Key: SOLR-2100 URL: https://issues.apache.org/jira/browse/SOLR-2100 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4.1, 1.4 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 This patch fixes the saving of commit points during backup operations. This fixes the perviously commited (for 1.4) SOLR-1475 patch. 1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the 'savedCommits' map. 2. Also, the testing of the presence of a commit point uses the contains() method instead of containsKey(). The result of this means that backups for anything but toy indexes fail, because the commit points are deleted (after 10s) before the full backup is completed. This patch addresses these 2 issues. Tested with 1.4.1 release trunk, but should also work fine with 1.4. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2100) Fix for saving commit points during java-based backups
[ https://issues.apache.org/jira/browse/SOLR-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2100: --- Attachment: SOLR-2100.PATCH Fix for saving commit points during java-based backups -- Key: SOLR-2100 URL: https://issues.apache.org/jira/browse/SOLR-2100 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4, 1.4.1 Reporter: Peter Sturge Priority: Minor Fix For: 1.4.2 Attachments: SOLR-2100.PATCH Original Estimate: 0h Remaining Estimate: 0h This patch fixes the saving of commit points during backup operations. This fixes the perviously commited (for 1.4) SOLR-1475 patch. 1. In IndexDeletionPolicyWrapper.java, commit points are not saved to the 'savedCommits' map. 2. Also, the testing of the presence of a commit point uses the contains() method instead of containsKey(). The result of this means that backups for anything but toy indexes fail, because the commit points are deleted (after 10s) before the full backup is completed. This patch addresses these 2 issues. Tested with 1.4.1 release trunk, but should also work fine with 1.4. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr
[ https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12866674#action_12866674 ] Peter Sturge commented on SOLR-1163: Hi Uri, Really like what you've done here. +1 +vote! I've had a go on your demo site and that looks cool. When I download and try to connect to a core (I've tried my own core, and the Solr 'example'), I always get: 'Could not load solr core ('corename'): The JSON request failed or timed out' If I turn on Firebug, the only msg I get is this: reference to undefined property window[c + x$] [Break on this error] function DKd(h,d,e,b,f){var c=gM+CKd++;i...)}},5000);document.body.appendChild(g)}\n There doesn't seem to be any log/debug of what the problem might be. Are there any logging options that can be enabled? Many thanks, Peter Solr Explorer - A generic GWT client for Solr - Key: SOLR-1163 URL: https://issues.apache.org/jira/browse/SOLR-1163 Project: Solr Issue Type: New Feature Components: web gui Affects Versions: 1.3 Reporter: Uri Boness Attachments: graphics.zip, SOLR-1163.zip, SOLR-1163.zip, solr-explorer.patch, solr-explorer.patch The attached patch is a GWT generic client for solr. It is currently standalone, meaning that once built, one can open the generated HTML file in a browser and communicate with any deployed solr. It is configured with it's own configuration file, where one can configure the solr instance/core to connect to. Since it's currently standalone and completely client side based, it uses JSON with padding (cross-side scripting) to connect to remote solr servers. Some of the supported features: - Simple query search - Sorting - one can dynamically define new sort criterias - Search results are rendered very much like Google search results are rendered. It is also possible to view all stored field values for every hit. - Custom hit rendering - It is possible to show thumbnails (images) per hit and also customize a view for a hit based on html templates - Faceting - one can dynamically define field and query facets via the UI. it is also possible to pre-configure these facets in the configuration file. - Highlighting - you can dynamically configure highlighting. it can also be pre-configured in the configuration file - Spellchecking - you can dynamically configure spell checking. Can also be done in the configuration file. Supports collation. It is also possible to send build and reload commands. - Data import handler - if used, it is possible to send a full-import and status command (delta-import is not implemented yet, but it's easy to add) - Console - For development time, there's a small console which can help to better understand what's going on behind the scenes. One can use it to: ** view the client logs ** browse the solr scheme ** View a break down of the current search context ** View a break down of the query URL that is sent to solr ** View the raw JSON response returning from Solr This client is actually a platform that can be greatly extended for more things. The goal is to have a client where the explorer part is just one view of it. Other future views include: Monitoring, Administration, Query Builder, DataImportHandler configuration, and more... To get a better view of what's currently possible. We've set up a public version of this client at: http://search.jteam.nl/explorer. This client is configured with one solr instance where crawled YouTube movies where indexed. You can also check out a screencast for this deployed client: http://search.jteam.nl/help The patch created a new folder in the contrib. directory. Since the patch doesn't contain binaries, an additional zip file is provides that needs to be extract to add all the required graphics. This module is maven2 based and is configured in such a way that all GWT related tools/libraries are automatically downloaded when the modules is compiled. One of the artifacts of the build is a war file which can be deployed in any servlet container. NOTE: this client works best on WebKit based browsers (for performance reason) but also works on firefox and ie 7+. That said, it should be taken into account that it is still under development. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862150#action_12862150 ] Peter Sturge commented on SOLR-1895: It's worth bearing in mind that more than just a username is required in the input in order to ensure secure access. Otherwise, security is compromised simply by guessing (or already knowing) the username of someone with higher privileges. For example: User Dishwasher has low privileges User Admin has high privileges When Dishwasher logs in, all he/she has to do is put Admin's name in the input argument, and has now assumed Admin's rights. User Admin doesn't need to be logged in for this to happen. LCF SearchComponent plugin for enforcing LCF security at search time Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Fix For: 1.5 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862191#action_12862191 ] Peter Sturge commented on SOLR-1895: The presumption is that the Solr webapp is not the final user interface, and is indeed not accessible to the user at all. Given that search requests are http-based, how would this be done, in say, an intranet environment? I agree that a user interface wouldn't expose any means to change the http parameters, but if http is available to the UI, it'll also be available to a web browser's search bar at the same station (unless some tunnelling, proxy or similar is used). Totally agree on the server lock down - hopefully, everyone does this already as a matter of course! There are a couple of ways to address the impersonator problem. Probably the most robust way is to use SSL authentication from client to container, then have the Solr app integrate with the container (like we talked about for the authentication piece) and use its session certificate to ensure that any requests coming from the remote station match those of the originally authenticated user. A somewhat easier method is to use the hash and session id mechanism used in SOLR-1872. This provides pgp protection for stopping impersonation (even gaining any access from a browser), but wouldn't be suitable outside of an intranet environment (for exposed internet access, it would really need to be SSL - for sensitive data, though, you wouldn't expect it to be exposed across a DMZ anyway). LCF SearchComponent plugin for enforcing LCF security at search time Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Fix For: 1.5 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862206#action_12862206 ] Peter Sturge commented on SOLR-1895: {quote} The usual way is to configure the application server running solr to either use certificate authentication (which requires the connecting client to be able to identify themselves via a secure cert) {quote} Yes, cert authentication is a good way to go, but once you've got one (because you have at least some privileges), you can by bypass the lower-layer doc security because you've already done the cert auth. {quote} configure the application server to not accept connections from (say) anything other than the localhost adapter. {quote} I don't understand how localhost-only would give you any access off the box. I guess what I meant was, your client is wherever your client is, and this client could (and probably would) have a web browser installed. If a bona-fide user was an IT Operator, it would be easy for him/her to 'pretend' to be an HR Manager, unless some kind of post-login identity check prevents it. One way 'round this is to encrypt part or all of the http parameters (essentially, this is what the hash mechanism does in SOLR-1872). LCF SearchComponent plugin for enforcing LCF security at search time Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Fix For: 1.5 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1895) LCF SearchComponent plugin for enforcing LCF security at search time
[ https://issues.apache.org/jira/browse/SOLR-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862225#action_12862225 ] Peter Sturge commented on SOLR-1895: That makes total sense to keep a proxy app separate. Why wouldn't users interact with Solr directly? There's a lot of client-side stuff available to do just that. I wouldn't have thought there are too many implementations out there that completely block Solr http read access, because this would break replication, distributed searching, spell checkers, custom handlers etc. Generally, web proxies and firewalls etc. do a good job on this side of things, which is one of the reasons doc-level security is such a tricky business - you have to let traffic through and restrict it in solr.war that you would normally not let anywhere near Solr. You're right that /update, /admin etc. need to be 'locked-down', but this is quite strightforward, so as not to allow users access to write or change anything. LCF SearchComponent plugin for enforcing LCF security at search time Key: SOLR-1895 URL: https://issues.apache.org/jira/browse/SOLR-1895 Project: Solr Issue Type: New Feature Components: SearchComponents - other Reporter: Karl Wright Fix For: 1.5 Attachments: LCFSecurityFilter.java, LCFSecurityFilter.java, LCFSecurityFilter.java I've written an LCF SearchComponent which filters returned results based on access tokens provided by LCF's authority service. The component requires you to configure the appropriate authority service URL base, e.g.: !-- LCF document security enforcement component -- searchComponent name=lcfSecurity class=LCFSecurityFilter str name=AuthorityServiceBaseURLhttp://localhost:8080/lcf-authority-service/str /searchComponent Also required are the following schema.xml additions: !-- Security fields -- field name=allow_token_document type=string indexed=true stored=false multiValued=true/ field name=deny_token_document type=string indexed=true stored=false multiValued=true/ field name=allow_token_share type=string indexed=true stored=false multiValued=true/ field name=deny_token_share type=string indexed=true stored=false multiValued=true/ Finally, to tie it into the standard request handler, it seems to need to run last: requestHandler name=standard class=solr.SearchHandler default=true arr name=last-components strlcfSecurity/str /arr ... I have not set a package for this code. Nor have I been able to get it reviewed by someone as conversant with Solr as I would prefer. It is my hope, however, that this module will become part of the standard Solr 1.5 suite of search components, since that would tie it in with LCF nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.java Updates a typo or two plus some misc tweaks. {code} searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityKeyzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTime=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent {code} Thanks, Peter Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.java, SolrACLSecurity.java, SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the request is wrong, the SearchComponent fails and will increment the user's failed login count (if a
[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.java This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTIme=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent }} Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.java, SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the
[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698 ] Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:16 AM: - This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTIme=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent}} was (Author: midiman): This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{ searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTIme=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent }} Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.java, SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work
[jira] Issue Comment Edited: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855698#action_12855698 ] Peter Sturge edited comment on SOLR-1872 at 4/11/10 6:18 AM: - This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {code} searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTIme=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent {code} was (Author: midiman): This update adds in optional auditing of searches by users and failed access attempts, plus a few minor tweaks. To configure auditing, here is a sample searchComponent section from solrconfg.xml: {{searchComponent name=SolrACLSecurity class=org.apache.solr.handler.security.SolrACLSecurity !-- SolrACLSecurityKey can be any alphanumeric string, the more complex the better. For production environments, don't use the default value - create a new value. This property needs to be present in all firstSearcher and newSearcher warming queries, otherwise those requests will be blocked. -- str name=SolrACLSecurityzxb79j3g76A79N8N2AbR0K852976qr1klt86xv436j2/str str name=config-fileacl.xml/str !-- Auditing: Set audit to true to log all searches, including failed access attempts -- bool name=audittrue/bool int name=maxFileSizeInMB10/int int name=maxFileCount1/int str name=auditFileaudit.log/str !-- User lockout 'lockoutThreshold' is the number of consecutive incorrect logins before locking out the account 'lockoutTime' is the number of minutes to lockout the account If 'lockoutThreshold' is 0 or less, account lockout is disabled (no accounts are ever locked out) If not specified, the default values are: lockThreshold=5 lockoutTIme=15 -- str name=lockoutThreshold5/str str name=lockoutTime15/str /searchComponent}} Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.java, SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very
[jira] Created: (SOLR-1872) Document-level Access Control in Solr
Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the request is wrong, the SearchComponent fails and will increment the user's failed login count (if a valid user was specified). If this count exceeds the configured lockoutThreshold, no further requests are granted until the lockoutTime has elapsed. This mechanism protects against some types of attacks (e.g. CLRF, dictionary etc.), but it really needs container HTTPS as well (as would most other auth implementations). Incorporating SSL certificates for authentication and making the authentication mechanism pluggable would be a nice improvement (i.e. separate authentication from access control). Another issue is how internal searchers perform autowarming etc. The solution here is to use a local key called 'SolrACLSecurityKey'. This key is local and [should be] unique to that server. firstSearcher, newSearcher et al then include this key in their parameters so they can perform autowarming without constraint. Again, there are likely many ways to achieve this, this approach is but one. The attached rar holds the source and associated configuration. This has been tested on the 1.4 release codebase (search in the attached solrconfig.xml for SolrACLSecurity to find the relevant sections in this file). I hope this proves helpful for people who are looking for this sort of functionality in Solr, and more generally to address how such a mechanism could ultimately be integrated into a future Solr release. Many thanks, Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1872) Document-level Access Control in Solr
[ https://issues.apache.org/jira/browse/SOLR-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1872: --- Attachment: SolrACLSecurity.rar Document-level Access Control in Solr - Key: SOLR-1872 URL: https://issues.apache.org/jira/browse/SOLR-1872 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SolrACLSecurity.rar This issue relates to providing document-level access control for Solr index data. A related JIRA issue is: SOLR-1834. I thought it would be best if I created a separate JIRA issue, rather than tack on to SOLR-1834, as the approach here is somewhat different, and I didn't want to confuse things or step on Anders' good work. There have been lots of discussions about document-level access in Solr using LCF, custom comoponents and the like. Access Control is one of those subjects that quickly spreads to lots of 'ratholes' to dive into. Even if not everyone agrees with the approaches taken here, it does, at the very least, highlight some of the salient issues surrounding access control in Solr, and will hopefully initiate a healthy discussion on the range of related requirements, with the aim of finding the optimum balance of requirements. The approach taken here is document and schema agnostic - i.e. the access control is independant of what is or will be in the index, and no schema changes are required. This version doesn't include LDAP/AD integration, but could be added relatively easily (see Ander's very good work on this in SOLR-1834). Note that, at the moment, this version doesn't deal with /update, /replication etc., it's currently a /select thing at the moment (but it could be used for these). This approach uses a SearchComponent subclass called SolrACLSecurity. Its configuration is read in from solrconfig.xml in the usual way, and the allow/deny configuration is split out into a config file called acl.xml. acl.xml defines a number of users and groups (and 1 global for 'everyone'), and assigns 0 or more {{acl-allow}} and/or {{acl-deny}} elements. When the SearchComponent is initialized, user objects are created and cached, including an 'allow' list and a 'deny' list. When a request comes in, these lists are used to build filter queries ('allows' are OR'ed and 'denies' are NAND'ed), and then added to the query request. Because the allow and deny elements are simply subsearch queries (e.g. {{acl-allowsomefield:secret/acl-allow}}, this mechanism will work on any stored data that can be queried, including already existing data. Authentication One of the sticky problems with access control is how to determine who's asking for data. There are many approaches, and to stay in the generic vein the current mechanism uses http parameters for this. For an initial search, a client includes a {{username=somename}} parameter and a {{hash=pwdhash}} hash of its password. If the request sends the correct parameters, the search is granted and a uuid parameter is returned in the response header. This uuid can then be used in subsequent requests from the client. If the request is wrong, the SearchComponent fails and will increment the user's failed login count (if a valid user was specified). If this count exceeds the configured lockoutThreshold, no further requests are granted until the lockoutTime has elapsed. This mechanism protects against some types of attacks (e.g. CLRF, dictionary etc.), but it really needs container HTTPS as well (as would most other auth implementations). Incorporating SSL certificates for authentication and making the authentication mechanism pluggable would be a nice improvement (i.e. separate authentication from access control). Another issue is how internal searchers perform autowarming etc. The solution here is to use a local key called 'SolrACLSecurityKey'. This key is local and [should be] unique to that server. firstSearcher, newSearcher et al then include this key in their parameters so they can perform autowarming without constraint. Again, there are likely many ways to achieve this, this approach is but one. The attached rar holds the source and associated configuration. This has been tested on the 1.4 release codebase (search in the attached solrconfig.xml for SolrACLSecurity to find the relevant sections in this file). I hope this proves helpful for people who are looking for this sort of functionality in Solr, and more generally to address how such a mechanism could ultimately be integrated into a future Solr release. Many thanks, Peter -- This message is automatically generated by JIRA. - You can reply
[jira] Commented: (SOLR-1143) Return partial results when a connection to a shard is refused
[ https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853168#action_12853168 ] Peter Sturge commented on SOLR-1143: This is a cool patch - yes, very useful. I've found a couple of issues with it, though: 1. When going through the 'waiting for shard replies' loop, because no exception is thrown on shard failure, the next block after the loop can throw a NullPointerException in {{SearchComponent.handleResponses()}} for any SearchComponent that checks shard responses. It could be that this doesn't always happen, but it certainly happens in FacetComponent when date_facets are turned on. 2. There's a bit of code that sets {{partialResults=true}} if there's at least one failure, but it doesn't set it to false if everything's ok. In order for the patch to operate, this parameter must have already been present and true, otherwise the patch is essentially 'disabled' anyway (problem of using the same parameter as input and result). I've made some modifications to the patch for these and a couple of other things: 1. FacetComponent modified to check for null shard reponse. Perhaps it would be better to check this in SearchHandler.handleResponses(), but then no SearchComponents would be contacted re failed shards, even if they don't care that it's failed (is that a good thing?). 2. Added a new CommonParams parameter called FAILED_SHARDS. {{partialResults}} is now only an input parameter to enable the feature (Note: {{partialResults}} is referenced in RequestHandlerBase, but it's not from the patch - is this an existing parameter that is used for something else?! If so, perhaps the name should be changed to something like {{allowPartialResults}} to avoid b/w compat and other potential conflicts). The output parameter that goes in the response header is now: {{failedShards=shard0;shard1;shardn}}. If everything succeeds, there will be no failedShards in the response header, otherwise, a list of failed shards is given. This is very useful to alert someone/something that a server/network needs attention (e.g. a health checker thread could run empty disributed seaches solely for the purpose of checking status). 3. Changed the detection of a shard request error to be any Exception, rather than just ConnectException. This way, any failure is caught and can be actioned. Possible TODO: it might be nice to include a short message (Exception class name?) in the FAILED_SHARDS parameter about what failed (e.g. ConnectException, IOException, etc.). If you like this idea, please say so, and I'll include it - i.e. something like: {{ failedShards=myshard:8983/solr/core0|ConnectException;myothershard:8983/solr/core0|IOException}} I'm currently testing these changes in our internal build. In the meantime, any comments are grealy appreciated. If there are no objections, I'll add a patch update when the dev test run is complete. Return partial results when a connection to a shard is refused -- Key: SOLR-1143 URL: https://issues.apache.org/jira/browse/SOLR-1143 Project: Solr Issue Type: Improvement Components: search Reporter: Nicolas Dessaigne Assignee: Grant Ingersoll Fix For: 1.5 Attachments: SOLR-1143-2.patch, SOLR-1143-3.patch, SOLR-1143.patch If any shard is down in a distributed search, a ConnectException it thrown. Here's a little patch that change this behaviour: if we can't connect to a shard (ConnectException), we get partial results from the active shards. As for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we set the parameter partialResults at true. This patch also adresses a problem expressed in the mailing list about a year ago (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html) We have a use case that needs this behaviour and we would like to know your thougths about such a behaviour? Should it be the default behaviour for distributed search? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries
HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java Apologies that this is the source file and not a diff'ed patch file. I've tried so many Win doze svn products, but I just can't get them to create a patch file (I'm sure this is more down to me not configuring them correctly, rather than rapidsvn, visualsvn, Tortoisesvn etc.). If someone would like to create a patch file from this source, that would be extraordinarily kind of you! In any case, the changes to this file are quite straightforward. HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SearchHandler.java This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java A small update to this patch to support distributed searches with multiple cores. HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SearchHandler.java, SearchHandler.java This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850159#action_12850159 ] Peter Sturge commented on SOLR-1672: I agree there's some refactoring to do to bring it in line with current FacetParams conventions. At the same time, it would be good to look at wrapping up the functionality into a method, and covering all the code paths in the way you describe. I've been wanting to get to finishing off this patch, but I'm in the throws of a product release myself, so I've not had many spare cycles. You mention termenum, fieldcache, uninverted - presumably, these are among the code paths that need to cater for facet counts. If you know them, can you add a comment here that lists all the areas that need to be catered for, so that none are left out (if it's more than those 3). Thanks! Peter RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This
[jira] Updated: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1729: --- Attachment: UnInvertedField.java Hi Thomas, Thanks for catching this. I thought I'd attached that one. *sigh* Honestly, that is really slack of me - many apologies. The attached UnInvertedField.java has the updated getCounts() method. Any troubles, let me know. Thanks! Peter Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java, UnInvertedField.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834222#action_12834222 ] Peter Sturge commented on SOLR-1709: Hi Thomas, Hmmm...TermsHelper is an inner class inside TermsComponent. In the code base that I have, this class exists within TermsComponent. I've just had a look on the http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the TermsComponent *doesn't* have this inner class. Not sure where the difference is, as I would have got my codebase from the same set of mirrors as you (unless some mirrors are out-of-sync?). TermsComponent hasn't changed in this patch, so I don't know much about this class. One thing to try is to diff the 2 files above with your 1.4 codebase, and merge the changes into your codebase. The differences should be very easy to see. This does highlight the very good policy for putting patch files as attachments rather than source files. This is my fault, as we don't use svn in our (win) environment, and Tortoise SVN crashes explorer64, so i'm not able to make compatible diff files - sorry. If you do create a couple of diff files, it would be very kind of you if you could post it up on this issue for others? Thanks! Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829043#action_12829043 ] Peter Sturge commented on SOLR-1729: Hi Chris, Thanks for your comments - I hope I didn't sound like your comments were taken wrongly - I absolutely count on comments from you and other experts to make sure I'm not missing some important functionality and/or side effect. You know the code base far better than I, so its great that you take the time to point out all the different bits and peices that need addressing. I can certainly understand the need to address the 'core-global' isssues raised by you and Yonik for storing a ThreadLocal 'query-global' 'NOW'. I suppose the main issue in implementing the thread-local route is that we'd have to make sure we found every place in the query core that references now, and point those references to the new variable? If the 'code-at-large' [hopefully] always calls the date math routines for finding 'NOW', great, it should be relatively straightforward. If there are any stray e.g. System.currentTimeMillis(), then it's a bit more fiddly, but still do-able. ??it's all handled internally by DateField?? Sounds like DateField would the best candidate for holding the ThreadLocal? The query handler code can set the variable of its DateField instance if it's set in a query parameter, otherwise it just defaults to it's own local (UTC) time. Could be done similarly to DateField.ThreadLocalDateFormat, perhaps? Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805995#action_12805995 ] Peter Sturge commented on SOLR-1729: ??...they might not all get queried at the exact same time?? I suppose this is what the explicit 'NOW' is meant to resolve - staggered/lagged receipt/response, and, in an erzatz fashion, discrepencies in local time sync. Since the passed-in 'NOW' is relative only to the epoch, network latency is handled, and time-sync on any given server is assumed to be correct. ??...multiple requets might be made to a single server for different phrases of the distributed request that expect to get the same answers.?? As long as the same code path is followed for such requests, it should honour the same (passed-in) 'NOW'. Are there scenarios where this is not the case? In which case, yes, these would need to be addressed. ??...unless filter queries that use date math also respect it the counts returned from date faceting will still potentially be non-sensical.?? Definitely filter queries will need to get/use/honour the same 'NOW' as its corresponding query, otherwise anarchy will quickly ensue. Can you point me toward the class(es) where filter queries' date math lives, and I'll have a look? As filter queries are cached separately, can you think of any potential caching issues relating to filter queries? Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803858#action_12803858 ] Peter Sturge commented on SOLR-1672: Jan, you are absolutely correct that the parameter should (and will) be 'desc'. I have an update in my queue of things todo which changes this, but also removes the new 'facet.sortorder' parameter, and includes instead 'facet.sort desc' as a valid parameter for facet.sort. This keeps things nice and tidy and consistent. The 'facet.sortorder' parameter was really as POC to try out the behaviour before changing the core parameter syntax of the existing 'facet.sort' parameter. Not that's done, the parameter will be rolled into 'facet.sort'. Thanks, Peter RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803860#action_12803860 ] Peter Sturge commented on SOLR-1729: I agree there are wider issues that relate to this -- this particular patch addresses the time sync issue for allowing distributed date facets to happen. In this case, you must have multiple cores using the same NOW for all, so that your date facets are consistent. In fact, it doesn't really matter which now you use, as long they're all the same -- the caller setting the now value makes the most sense. For other time-related queries, this might not be the case, but as you rightly pointed out, these are not addressed here. Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1729) Date Facet now override time parameter
Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1729: --- Attachment: FacetParams.java SimpleFacets.java These are the source files affected for this patch. Apologies for not creating a PATCH file - my tortoise svn is not working for creating patch files. If anyone would like to create a patch from these, that would be extraordinarily kind of you! Diff: (trunk: 1.4 Release) FacetParams.java: Add at line 179: /** * String that tells the date facet counter what time to use as 'now'. * * The value of this parameter, if it exists, must be a stringified long * of the number of milliseconds since the epoch (milliseconds since 1 Jan 1970 00:00). * System.currentTimeMillis() provides this. * * The DateField and DateMathParser work out their times relative to 'now'. * By default, 'now' is the local machine's System.currentTimeMillis(). * This parameter overrides the local value to use a different time. * This is very useful for remote server queries where the times on the querying * machine are skewed/different than that of the date faceting machine. * This is a date.facet global query parameter (i.e. not per field) * @see DateMathParser * @see DateField */ public static final String FACET_DATE_NOW = facet.date.now; SimpleFacets.java: Change at line 551: -final Date NOW = new Date(); + final Date NOW = new Date(params.get(FacetParams.FACET_DATE_NOW) != null ? Long.parseLong(params.get(facet.date.now)) : System.currentTimeMillis()); Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1709: --- Attachment: FacetComponent.java Updated version of FacetComponent.java after more testing and sync with FacetParams.FACET_DATE_NOW (see SOLR-1729). For use with the 1.4 trunk (along with the existing ResponseBuilder.java in this patch). Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798411#action_12798411 ] Peter Sturge commented on SOLR-1709: Yonik, Yes, I can see what you mean that of course NOW will affect anything date-related to a given query. I'm wondering whether the passing of 'NOW' to shards should be a separate issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), as its scope and ramifications go far beyond simply distributed date faceting. The whole area of code relating to date math is one that I'm not familiar with, but do let me know if there's anything you'd like me to look at. Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, ResponseBuilder.java This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797957#action_12797957 ] Peter Sturge commented on SOLR-1709: I've heard of Tortoise, I'll give that a try, thanks. On the time-zone/skew issue, perhaps a more efficient approach would be a 'push' rather than 'pull' - i.e.: Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time. Thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1709: --- Attachment: ResponseBuilder.java FacetComponent.java Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm attaching the source files here. With some time, which at the moment I don't have, I'm sure I could get svn working. Rather than anyone have to wait for me to get the patch file created, I thought it best to get the source uploaded, so people can start using it. Thanks, Peter Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, ResponseBuilder.java This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1709) Distributed Date Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798233#action_12798233 ] Peter Sturge commented on SOLR-1709: Definitely true! -- messing about with Date strings isn't great for performance. As the NOW parameter would be for internal request use only (i.e. not for the indexer, not for human consumption), could it not just be an epoch long? The adjustment math should then be nice and quick (no string/date parsing/formatting; at worst just one Date.getTimeInMillis() call if the time is stored locally as a string). Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetComponent.java, ResponseBuilder.java This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge resolved SOLR-1672. Resolution: Fixed Marking as resolved. RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1709) Distributed Date Faceting
Distributed Date Faceting - Key: SOLR-1709 URL: https://issues.apache.org/jira/browse/SOLR-1709 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4 Reporter: Peter Sturge Priority: Minor This patch is for adding support for date facets when using distributed searches. Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of: Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time). The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in. This means that if subsequent shards' facet_dates are skewed in relation to the first by 1 'gap', these 'earlier' or 'later' facets will not be merged in. There are several reasons for this: * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data) This could be dealt with if timezone and skew information was added, and the dates were normalized. One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized. The patch affects 2 files in the Solr core: org.apache.solr.handler.component.FacetComponent.java org.apache.solr.handler.component.ResponseBuilder.java The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage. One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired. Comments suggestions welcome. As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1672) RFE: facet reverse sort count
RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1672: --- Attachment: SOLR-1672.patch Patch diff file for adding facet reverse sorting RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 24h Remaining Estimate: 24h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792424#action_12792424 ] Peter Sturge commented on SOLR-1672: Patch SOLR-1672.patch now included for review RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 24h Remaining Estimate: 24h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.