Solr nightly build failure
set-fsdir: init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/solr/build [mkdir] Created dir: /tmp/apache-solr-nightly/solr/build/web compile-lucene: [echo] Building analyzers... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: Trying to override old definition of task contrib-crawl javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: [mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/classes/java [javac] Compiling 409 source files to /tmp/apache-solr-nightly/lucene/build/classes/java [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile-test: [mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/classes/test [javac] Compiling 224 source files to /tmp/apache-solr-nightly/lucene/build/classes/test [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [copy] Copying 19 files to /tmp/apache-solr-nightly/lucene/build/classes/test init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: [mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java [javac] Compiling 127 source files to /tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [copy] Copying 19 files to /tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java compile: [echo] Building highlighter... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc build-memory: [echo] Highlighter building dependency contrib/memory [echo] Building memory... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: Trying to override old definition of task contrib-crawl javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: compile-test: [javac] Compiling 1 source file to /tmp/apache-solr-nightly/lucene/build/classes/test init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: [mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/contrib/memory/classes/java [javac] Compiling 1 source file to /tmp/apache-solr-nightly/lucene/build/contrib/memory/classes/java [javac] Note: /tmp/apache-solr-nightly/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. jar-core: [jar] Building jar: /tmp/apache-solr-nightly/lucene/build/contrib/memory/lucene-memory-3.1-dev.jar default: build-queries: [echo] Highlighter building dependency contrib/queries [echo] Building queries... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: Trying to override old definition of task contrib-crawl javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: compile-test: [javac] Compiling 1 source file to /tmp/apache-solr-nightly/lucene/build/classes/test init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: compile-core: [mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/contrib/queries/classes/java [javac] Compiling 17 source files to /tmp/apache-solr-nightly/lucene/build/contrib/queries/classes/java [javac] Note: /tmp/apache-solr-nightly/lucene/contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details.
HTTP Authentication for shards
Hi, I've come across an interesting problem with regards distributed searching, and thought I'd share it here and see if anyone else has come across it and/or comment on the proposed solution: *Requirement:* A requirement of my particular Solr environment is that queries are subject to http authentication (I currently use Jetty basic realm auth, but any http auth is affected). i.e. If you don't have a username/password, you can't look at anything. For most use cases, I'm guessing that queries aren't generally subject to authentication, hence this post... *Problem:* Querying a single server is easy, because my client app creates/manages its own HttpClient object. When it comes to querying across shards, the default SearchHandler uses a 'plain-vanilla' http client for its CommonsHttpSolrServer instance that makes the request to each shard (in HttpCommComponent.submit()). There is no provision to pass it any credentials. Perhaps document-level security might be a better way to handle access control for searching in general, but that's a different can of worms... :-) *Proposed Solution:* A proposed solution for overall Solr access for searching across http-authenticated shards is this: 1. Define parameter(s) syntax for shard credentials. 2. Modify (or subclass) SearchHandler, in particular the HttpCommComponent.submit() method, to optionally look for shard-specific credentials in its ModifiableSolrParams params. If it finds credentials, it creates/reuses an HttpClient object with these and passes this to the SolrServer instance for the search request. Because the credentials parameter would be totally optional, it should be fine to patch SearchHandler 'in-line' without subclassing, so that patches/updates will work without having to modify solrconfig.xml. (feel free to disagree with me on this!) 3. This also requires a modification to SearchHandler.handleRequestBody() to extract the credentials parameter(s) and pass these on to the submit() request (similar to what it does now for SHARDS_QT). 4. Clients would populate their sharded query request with the defined parameter(s) for each shard (I'm using SolrJ so there's app logic to do this, but should be ok for other client types). I admit I'm not an expert on SearchHandler inner workings, so if there are other code paths that would be affected by this, or any other potential issues, any advice/insight is greatly appreciated! If anyone thinks this is a barmy idea, or has come up with a better solution, please say! Many thanks, Peter
[jira] Commented: (SOLR-1819) Upgrade to Tika 0.7
[ https://issues.apache.org/jira/browse/SOLR-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852801#action_12852801 ] Grant Ingersoll commented on SOLR-1819: --- Looks like Tika has an RC out. I'll give it a try. Upgrade to Tika 0.7 --- Key: SOLR-1819 URL: https://issues.apache.org/jira/browse/SOLR-1819 Project: Solr Issue Type: Improvement Reporter: Tricia Williams Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 See title. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1568: -- Attachment: SOLR-1568.patch Got some tests and most of the rework from Yonik's comments. Some of the tests explicitly fail due to bugs in the underlying tile stuff in Lucene. Added support for handling the poles and the prime and 180th meridian to the LatLonType. I think we're in pretty good shape now, assuming the underlying Lucene bits get fixed soon. Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1586) Create Spatial Point FieldTypes
[ https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-1586. --- Resolution: Fixed Create Spatial Point FieldTypes --- Key: SOLR-1586 URL: https://issues.apache.org/jira/browse/SOLR-1586 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: examplegeopointdoc.patch.txt, SOLR-1586-geohash.patch, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112209.geopointonly.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt, SOLR-1586.Mattmann.120709.geohashonly.patch.txt, SOLR-1586.Mattmann.121209.geohash.outarr.patch.txt, SOLR-1586.Mattmann.121209.geohash.outstr.patch.txt, SOLR-1586.Mattmann.122609.patch.txt, SOLR-1586.patch, SOLR-1586.patch Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields. Fields should take in lat/lon points in a single form, as in: field name=foolat lon/field -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852808#action_12852808 ] Grant Ingersoll commented on SOLR-773: -- Dan, sfilt can take a units measurement, but internally it uses miles. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852809#action_12852809 ] Grant Ingersoll commented on SOLR-773: -- Status update: SOLR-1568, which is the last big piece, I think, is almost done. I added a new LatLonType which should make it super easy to do pure LatLon stuff (Point is more for Rectangular Coordinate System. I guess maybe we should rename it?) and it should be easy to extend to use different distance methods. I will try to document some more on the wiki. There are some minor bugs related to sorting by function right now, but it should be usable for people just doing spatial stuff (SOLR-1297). Probably the next most important piece to get in place is SOLR-1298 and it's related item SOLR-705. Help on those pieces would be most appreciated. As always, people kicking the tires on the trunk is appreciated too. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1622) Add aggregate Math capabilities to Solr above and beyond the StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852810#action_12852810 ] Grant Ingersoll commented on SOLR-1622: --- Some notes from IRC: -cool - if you go into the stats stuff, dump the silly string based numerics (that we had to do in the past) and also make it per-segment - we need one capability in the lucene FieldCache, and we could dump the legacy SortedInt stuff for good -that's simply the ability to tell of a document had a value or not Add aggregate Math capabilities to Solr above and beyond the StatsComponent --- Key: SOLR-1622 URL: https://issues.apache.org/jira/browse/SOLR-1622 Project: Solr Issue Type: New Feature Components: search Reporter: Grant Ingersoll Priority: Minor It would be really cool if we could have a QueryComponent that enabled doing aggregating calculations on search results similar to what the StatsComponent does, but in a more generic way. I also think it makes sense to reuse some of the function query capabilities (like the parser, etc.). I imagine the interface might look like: {code} math=truefunc=recip(sum(A)) {code} This would calculate the reciprocal of the sum of the values in the field A. Then, you could do go across fields, too {code} math=truefunc=recip(sum(A, B, C)) {code} Which would sum the values across fields A, B and C. It is important to make the functions pluggable and reusable. Might be also nice to see if we can share the core calculations between function queries and this capability such that if someone adds a new aggregating function, it can also be used as a new Function query. Of course, we'd want plugin functions, too, so that people can plugin their own functions. After this is implemented, I think StatsComponent becomes a derivative of the new MathComponent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852811#action_12852811 ] Robert Muir commented on SOLR-1852: --- Committed the test to trunk: revision 930262. enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Robert Muir Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852813#action_12852813 ] Uri Boness commented on SOLR-773: - Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate meta element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852813#action_12852813 ] Uri Boness edited comment on SOLR-773 at 4/2/10 1:23 PM: - Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate meta element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized. btw, the other duplicate issues is SOLR-1566 was (Author: uboness): Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate meta element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries
HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java Apologies that this is the source file and not a diff'ed patch file. I've tried so many Win doze svn products, but I just can't get them to create a patch file (I'm sure this is more down to me not configuring them correctly, rather than rapidsvn, visualsvn, Tortoisesvn etc.). If someone would like to create a patch file from this source, that would be extraordinarily kind of you! In any case, the changes to this file are quite straightforward. HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SearchHandler.java This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1740) ShingleFilterFactory improvements
[ https://issues.apache.org/jira/browse/SOLR-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852891#action_12852891 ] Steven Rowe commented on SOLR-1740: --- Thank you, Robert. ShingleFilterFactory improvements - Key: SOLR-1740 URL: https://issues.apache.org/jira/browse/SOLR-1740 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.1 Reporter: Steven Rowe Assignee: Robert Muir Priority: Minor Fix For: 3.1 Attachments: SOLR-1740.patch, SOLR-1740.patch ShingleFilterFactory should allow specification of minimum shingle size (in addition to maximum shingle size), as well as the separator to use between tokens. These are implemented at LUCENE-2218. The attached patch allows ShingleFilterFactory to accept configuration of these items, and includes tests against the new functionality in TestShingleFilterFactory. Solr will have to upgrade to lucene-analyzers-3.1-dev.jar before the attached patch will apply. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1862) CLONE -java.io.IOException: read past EOF
CLONE -java.io.IOException: read past EOF - Key: SOLR-1862 URL: https://issues.apache.org/jira/browse/SOLR-1862 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Alexander S Assignee: Yonik Seeley Priority: Critical Fix For: 1.5 A query with relevancy scores of all zeros produces an invalid doclist that includes sentinel values 2147483647 and causes Solr to request that invalid docid from Lucene which results in a java.io.IOException: read past EOF http://search.lucidimagination.com/search/document/2d5359c0e0d103be/java_io_ioexception_read_past_eof_after_solr_1_4_0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1860) improve stopwords list handling
[ https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852978#action_12852978 ] Robert Muir commented on SOLR-1860: --- A third idea from Hoss Man: We should make it easy to edit these lists like english. So an idea is to create an intl/ folder or similar under the example with stopwords_fr.txt, stopwords_de.txt Additionally we could have a schema-intl.xml with example types 'text_fr', 'text_de', etc setup for various languages. I like this idea best. improve stopwords list handling --- Key: SOLR-1860 URL: https://issues.apache.org/jira/browse/SOLR-1860 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Currently Solr makes it easy to use english stopwords for StopFilter or CommonGramsFilter. Recently in lucene, we added stopwords lists (mostly, but not all from snowball) to all the language analyzers. So it would be nice if a user can easily specify that they want to use a french stopword list, and use it for StopFilter or CommonGrams. The ones from snowball, are however formatted in a different manner than the others (although in Lucene we have parsers to deal with this). Additionally, we abstract this from Lucene users by adding a static getDefaultStopSet to all analyzers. There are two approaches, the first one I think I prefer the most, but I'm not sure it matters as long as we have good examples (maybe a foreign language example schema?) 1. The user would specify something like: filter class=solr.StopFilterFactory fromAnalyzer=org.apache.lucene.analysis.FrenchAnalyzer .../ This would just grab the CharArraySet from the FrenchAnalyzer's getDefaultStopSet method, who cares where it comes from or how its loaded. 2. We add support for snowball-formatted stopwords lists, and the user could something like: filter class=solr.StopFilterFactory words=org/apache/lucene/analysis/snowball/french_stop.txt format=snowball ... / The disadvantage to this is they have to know where the list is, what format its in, etc. For example: snowball doesn't provide Romanian or Turkish stopword lists to go along with their stemmers, so we had to add our own. Let me know what you guys think, and I will create a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries
[ https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-1861: --- Attachment: SearchHandler.java A small update to this patch to support distributed searches with multiple cores. HTTP Authentication for sharded queries --- Key: SOLR-1861 URL: https://issues.apache.org/jira/browse/SOLR-1861 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: SearchHandler.java, SearchHandler.java This issue came out of a requirement to have HTTP authentication for queries. Currently, HTTP authentication works for querying single servers, but it's not possible for distributed searches across multiple shards to receive authenticated http requests. This patch adds the option for Solr clients to pass shard-specific http credentials to SearchHandler, which can then use these credentials when making http requests to shards. Here's how the patch works: A final constant String called {{shardcredentials}} acts as the name of the SolrParams parameter key name. The format for the value associated with this key is a comma-delimited list of colon-separated tokens: {{ shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN }} A client adds these parameters to their sharded request. In the absence of {{shardcredentials}} and/or matching credentials, the patch reverts to the existing behaviour of using a default http client (i.e. no credentials). This ensures b/w compatibility. When SearchHandler receives the request, it passes the 'shardcredentials' parameter to the HttpCommComponent via the submit() method. The HttpCommComponent parses the parameter string, and when it finds matching credentials for a given shard, it creates an HttpClient object with those credentials, and then sends the request using this. Note: Because the match comparison is a string compare (a.o.t. dns compare), the host/ip names used in the shardcredentials parameters must match those used in the shards parameter. Impl Notes: This patch is used and tested on the 1.4 release codebase. There weren't any significant diffs between the 1.4 release and the latest trunk for SearchHandler, so should be fine on other trunks, but I've only tested with the 1.4 release code base. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Document level security in Apache Solr
Hi Anders- see comments below... Two weeks ago I created a JIRA issue ( https://issues.apache.org/jira/browse/SOLR-1834) involving document level security in Apache Solr and submitted a patch containing a search component that can be seen as a starting point for making Solr handle document level security. I believe that document security is an essential part of an enterprise search engine and I hope that this contribution can start a discussion about how this should be handled in Solr (possibly in conjunction with the Lucene Connector Framework). Thanks for posting the code -- a quick pass it looks good. I agree some cordination with Lucene Connectors will make sense. On the patch, it looks good, but to get into the the dist, it will probably need some sort of tests. I'm not sure how that would work with windows authentication (I don't' know much about it, but it has been on my long term TODO list for a while!) Perhaps we could have tests that would run on systems that have somethign to test agains, but not fail when running on linux (or something) As this contribution shows I would like to help to develop the security capabilities of Solr together with the community because I believe that it will improve Solr’s appeal to large enterprises. Moreover I think that most of us believe that a transparent security system will in the end give rise to the best security. agree -- the more people to poke holes, the better I hope some of you can take the time to look at the patch, try it out and think about: 1) 1. Should this be a contrib module in Solr? (And if so, what needs to be done to contribute it?) I think a contrib module makes sense. For things to move forward, a committer needs to step up to the plate. I would love to, but don't have much time soon. To make it easier for people to feel comfortable with it, tests and doc help lots. 2) 2. Should document level security be a core feature in Solr? (And if so, what is the best way to integrate it into Solr?) I'm not quite sure what you mean by 'core' -- I think it makes sense to live as a contrib for a while and see how things develop. 3) 3. How can this integrate with connectors like the Lucene Connector Framework? I.e. how do you create a uniform way to talk about Access Control Lists (http://en.wikipedia.org/wiki/Access_control_list). good question! That would be really powerful. P.s (for the nerdy) I have some ideas about putting the security deeper into Solr, perhaps by creating a secure SolrIndexReader and a secure SolrIndexSearcher that are fed user credentials from a search component. What do you think about this? What are you thinking here? To me, it seems like the index would need to contain all data and a SearchComponet would take user credentials and augment the query (group:[a b c] or whatever) The advantage of keeping the same IndexSearch across all users is that it can share a cache where appropriate. As I understand it, currently it’s possible to declare your own SolrIndexReader but not your own SolrIndexSearcher. not sure on this... ryan
[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853047#action_12853047 ] Lance Norskog commented on SOLR-1858: - Also, Distributed Search by one core among other cores in the same instance must use the HTTP transport rather than direct internal access. Embedded Solr does not support Distributed Search - Key: SOLR-1858 URL: https://issues.apache.org/jira/browse/SOLR-1858 Project: Solr Issue Type: New Feature Components: search Reporter: Lance Norskog Priority: Minor It is impossible to do a Distributed Search across multiple cores in an EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled shards, and EmbeddedSolr does not export an HTTP interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853053#action_12853053 ] Lance Norskog commented on SOLR-1858: - This could be handled with one simple change: add a new URL protocol 'solr'. Details: the SolrJ library would include a new method that creates a server from an URL, and the server factory would support 'solr://' and 'solr://core' as URLs. The meaning of solr:// changes when used within an app, an EmbeddedSolr instance, or within the web app. In an Embedded Solr instance, it refers to the embedded instance itself. In a servlet container instance, it refers to that instance. 'solr://' would not be supported within a client app, because there is no Solr instance in the app. In short, the 'solr://' URL refers to the Solr instance available within the current JVM, _via the current classloader_. Embedded Solr does not support Distributed Search - Key: SOLR-1858 URL: https://issues.apache.org/jira/browse/SOLR-1858 Project: Solr Issue Type: New Feature Components: search Reporter: Lance Norskog Priority: Minor It is impossible to do a Distributed Search across multiple cores in an EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled shards, and EmbeddedSolr does not export an HTTP interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853054#action_12853054 ] Lance Norskog commented on SOLR-1858: - Use cases possible: * An embedded instance can do a distributed search across multiple cores which are inside the instance or are remote. * The SolrEntityProcessor is an (uncommitted) plugin for the DataImportHandler ([SOLR-1499]). It does a search against a Solr instance and supplies the resulting document, or document series, to the DIH processing chain. With the 'solr://' option, this tool can do queries against its own Solr instance with no HTTP overhead. Embedded Solr does not support Distributed Search - Key: SOLR-1858 URL: https://issues.apache.org/jira/browse/SOLR-1858 Project: Solr Issue Type: New Feature Components: search Reporter: Lance Norskog Priority: Minor It is impossible to do a Distributed Search across multiple cores in an EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled shards, and EmbeddedSolr does not export an HTTP interface. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.