Solr nightly build failure

2010-04-02 Thread solr-dev

set-fsdir:

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/solr/build
[mkdir] Created dir: /tmp/apache-solr-nightly/solr/build/web

compile-lucene:
 [echo] Building analyzers...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/classes/java
[javac] Compiling 409 source files to 
/tmp/apache-solr-nightly/lucene/build/classes/java
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-test:
[mkdir] Created dir: /tmp/apache-solr-nightly/lucene/build/classes/test
[javac] Compiling 224 source files to 
/tmp/apache-solr-nightly/lucene/build/classes/test
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
 [copy] Copying 19 files to 
/tmp/apache-solr-nightly/lucene/build/classes/test

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java
[javac] Compiling 127 source files to 
/tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
 [copy] Copying 19 files to 
/tmp/apache-solr-nightly/lucene/build/contrib/analyzers/common/classes/java

compile:
 [echo] Building highlighter...
Trying to override old definition of task m2-deploy
Trying to override old definition of task invoke-javadoc

build-memory:
 [echo] Highlighter building dependency contrib/memory
 [echo] Building memory...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
/tmp/apache-solr-nightly/lucene/build/classes/test

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/lucene/build/contrib/memory/classes/java
[javac] Compiling 1 source file to 
/tmp/apache-solr-nightly/lucene/build/contrib/memory/classes/java
[javac] Note: 
/tmp/apache-solr-nightly/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java
 uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

jar-core:
  [jar] Building jar: 
/tmp/apache-solr-nightly/lucene/build/contrib/memory/lucene-memory-3.1-dev.jar

default:

build-queries:
 [echo] Highlighter building dependency contrib/queries
 [echo] Building queries...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:
Trying to override old definition of task contrib-crawl

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:

compile-test:
[javac] Compiling 1 source file to 
/tmp/apache-solr-nightly/lucene/build/classes/test

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

compile-core:
[mkdir] Created dir: 
/tmp/apache-solr-nightly/lucene/build/contrib/queries/classes/java
[javac] Compiling 17 source files to 
/tmp/apache-solr-nightly/lucene/build/contrib/queries/classes/java
[javac] Note: 
/tmp/apache-solr-nightly/lucene/contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.


HTTP Authentication for shards

2010-04-02 Thread Peter Sturge
Hi,

I've come across an interesting problem with regards distributed searching,
and thought I'd share it here and see if anyone else has come across it
and/or comment on the proposed solution:

*Requirement:*
A requirement of my particular Solr environment is that queries are subject
to http authentication (I currently use Jetty basic realm auth, but any http
auth is affected).
i.e. If you don't have a username/password, you can't look at anything.
For most use cases, I'm guessing that queries aren't generally subject to
authentication, hence this post...

*Problem:*
Querying a single server is easy, because my client app creates/manages its
own HttpClient object.
When it comes to querying across shards, the default SearchHandler uses a
'plain-vanilla' http client for its CommonsHttpSolrServer instance that
makes the request to each shard (in HttpCommComponent.submit()).
There is no provision to pass it any credentials.

Perhaps document-level security might be a better way to handle access
control for searching in general, but that's a different can of worms... :-)

*Proposed Solution:*
A proposed solution for overall Solr access for searching across
http-authenticated shards is this:

1. Define parameter(s) syntax for shard credentials.

2. Modify (or subclass) SearchHandler, in particular the
HttpCommComponent.submit() method, to optionally look for shard-specific
credentials in its ModifiableSolrParams params.
If it finds credentials, it creates/reuses an HttpClient object with these
and passes this to the SolrServer instance for the search request.
Because the credentials parameter would be totally optional, it should be
fine to patch SearchHandler 'in-line' without subclassing, so that
patches/updates will work without having to modify solrconfig.xml.
(feel free to disagree with me on this!)

3. This also requires a modification to SearchHandler.handleRequestBody() to
extract the credentials parameter(s) and pass these on to the submit()
request (similar to what it does now for SHARDS_QT).

4. Clients would populate their sharded query request with the defined
parameter(s) for each shard (I'm using SolrJ so there's app logic to do
this, but should be ok for other client types).

I admit I'm not an expert on SearchHandler inner workings, so if there are
other code paths that would be affected by this, or any other potential
issues, any advice/insight is greatly appreciated!
If anyone thinks this is a barmy idea, or has come up with a better
solution, please say!

Many thanks,
Peter


[jira] Commented: (SOLR-1819) Upgrade to Tika 0.7

2010-04-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852801#action_12852801
 ] 

Grant Ingersoll commented on SOLR-1819:
---

Looks like Tika has an RC out.  I'll give it a try.

 Upgrade to Tika 0.7
 ---

 Key: SOLR-1819
 URL: https://issues.apache.org/jira/browse/SOLR-1819
 Project: Solr
  Issue Type: Improvement
Reporter: Tricia Williams
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


 See title.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1568) Implement Spatial Filter

2010-04-02 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1568:
--

Attachment: SOLR-1568.patch

Got some tests and most of the rework from Yonik's comments.  Some of the tests 
explicitly fail due to bugs in the underlying tile stuff in Lucene.

Added support for handling the poles and the prime and 180th meridian to the 
LatLonType.  I think we're in pretty good shape now, assuming the underlying 
Lucene bits get fixed soon.

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1586) Create Spatial Point FieldTypes

2010-04-02 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1586.
---

Resolution: Fixed

 Create Spatial Point FieldTypes
 ---

 Key: SOLR-1586
 URL: https://issues.apache.org/jira/browse/SOLR-1586
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: examplegeopointdoc.patch.txt, SOLR-1586-geohash.patch, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112209.geopointonly.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt, 
 SOLR-1586.Mattmann.120709.geohashonly.patch.txt, 
 SOLR-1586.Mattmann.121209.geohash.outarr.patch.txt, 
 SOLR-1586.Mattmann.121209.geohash.outstr.patch.txt, 
 SOLR-1586.Mattmann.122609.patch.txt, SOLR-1586.patch, SOLR-1586.patch


 Per SOLR-773, create field types that hid the details of creating tiers, 
 geohash and lat/lon fields.
 Fields should take in lat/lon points in a single form, as in:
 field name=foolat lon/field

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2010-04-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852808#action_12852808
 ] 

Grant Ingersoll commented on SOLR-773:
--

Dan, sfilt can take a units measurement, but internally it uses miles.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2010-04-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852809#action_12852809
 ] 

Grant Ingersoll commented on SOLR-773:
--

Status update:

SOLR-1568, which is the last big piece, I think, is almost done.  I added a new 
LatLonType which should make it super easy to do pure LatLon stuff (Point is 
more for Rectangular Coordinate System.  I guess maybe we should rename it?) 
and it should be easy to extend to use different distance methods.  I will try 
to document some more on the wiki.

There are some minor bugs related to sorting by function right now, but it 
should be usable for people just doing spatial stuff (SOLR-1297).  Probably the 
next most important piece to get in place is SOLR-1298 and it's related item 
SOLR-705.  Help on those pieces would be most appreciated.

As always, people kicking the tires on the trunk is appreciated too.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1622) Add aggregate Math capabilities to Solr above and beyond the StatsComponent

2010-04-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852810#action_12852810
 ] 

Grant Ingersoll commented on SOLR-1622:
---

Some notes from IRC:
-cool - if you go into the stats stuff, dump the silly string based numerics 
(that we had to do in the past) and also make it per-segment 
- we need one capability in the lucene FieldCache, and we could dump the legacy 
SortedInt stuff for good
-that's simply the ability to tell of a document had a value or not

 Add aggregate Math capabilities to Solr above and beyond the StatsComponent
 ---

 Key: SOLR-1622
 URL: https://issues.apache.org/jira/browse/SOLR-1622
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Grant Ingersoll
Priority: Minor

 It would be really cool if we could have a QueryComponent that enabled doing 
 aggregating calculations on search results similar to what the StatsComponent 
 does, but in a more generic way.
 I also think it makes sense to reuse some of the function query capabilities 
 (like the parser, etc.).
 I imagine the interface might look like:
 {code}
 math=truefunc=recip(sum(A))
 {code}
 This would calculate the reciprocal of the sum of the values in the field A.  
 Then, you could do go across fields, too
 {code}
 math=truefunc=recip(sum(A, B, C))
 {code}
 Which would  sum the values across fields A, B and C.
 It is important to make the functions pluggable and reusable.  Might be also 
 nice to see if we can share the core calculations between function queries 
 and this capability such that if someone adds a new aggregating function, it 
 can also be used as a new Function query.
 Of course, we'd want plugin functions, too, so that people can plugin their 
 own functions.  After this is implemented, I think StatsComponent becomes a 
 derivative of the new MathComponent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-04-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852811#action_12852811
 ] 

Robert Muir commented on SOLR-1852:
---

Committed the test to trunk: revision 930262.

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2010-04-02 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852813#action_12852813
 ] 

Uri Boness commented on SOLR-773:
-

Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge 
all the related issues (there are currently two  open issues for the same 
purpose with two different patches). But this should be done with somewhat 
collaborated manner so everybody will be on the same page here also 
regarding the discussion about the different approaches (inline the pseudo 
fields or have them nested in a separate meta element). Is there some way to 
merge the issues? or perhaps mark one of them as duplicate, so the discussion 
will be centralized.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr

2010-04-02 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852813#action_12852813
 ] 

Uri Boness edited comment on SOLR-773 at 4/2/10 1:23 PM:
-

Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge 
all the related issues (there are currently two  open issues for the same 
purpose with two different patches). But this should be done with somewhat 
collaborated manner so everybody will be on the same page here also 
regarding the discussion about the different approaches (inline the pseudo 
fields or have them nested in a separate meta element). Is there some way to 
merge the issues? or perhaps mark one of them as duplicate, so the discussion 
will be centralized.

btw, the other duplicate issues is SOLR-1566

  was (Author: uboness):
Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow 
merge all the related issues (there are currently two  open issues for the same 
purpose with two different patches). But this should be done with somewhat 
collaborated manner so everybody will be on the same page here also 
regarding the discussion about the different approaches (inline the pseudo 
fields or have them nested in a separate meta element). Is there some way to 
merge the issues? or perhaps mark one of them as duplicate, so the discussion 
will be centralized.
  
 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, 
 lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, 
 solrGeoQuery.tar, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)
HTTP Authentication for sharded queries
---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor


This issue came out of a requirement to have HTTP authentication for queries. 
Currently, HTTP authentication works for querying single servers, but it's not 
possible for distributed searches across multiple shards to receive 
authenticated http requests.

This patch adds the option for Solr clients to pass shard-specific http 
credentials to SearchHandler, which can then use these credentials when making 
http requests to shards.

Here's how the patch works:

A final constant String called {{shardcredentials}} acts as the name of the 
SolrParams parameter key name.
The format for the value associated with this key is a comma-delimited list of 
colon-separated tokens:
{{   
shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
  }}
A client adds these parameters to their sharded request. 
In the absence of {{shardcredentials}} and/or matching credentials, the patch 
reverts to the existing behaviour of using a default http client (i.e. no 
credentials). This ensures b/w compatibility.

When SearchHandler receives the request, it passes the 'shardcredentials' 
parameter to the HttpCommComponent via the submit() method.
The HttpCommComponent parses the parameter string, and when it finds matching 
credentials for a given shard, it creates an HttpClient object with those 
credentials, and then sends the request using this.
Note: Because the match comparison is a string compare (a.o.t. dns compare), 
the host/ip names used in the shardcredentials parameters must match those used 
in the shards parameter.

Impl Notes:
This patch is used and tested on the 1.4 release codebase. There weren't any 
significant diffs between the 1.4 release and the latest trunk for 
SearchHandler, so should be fine on other trunks, but I've only tested with the 
1.4 release code base.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

Apologies that this is the source file and not a diff'ed patch file.

I've tried so many Win doze svn products, but I just can't get them to create a 
patch file (I'm sure this is more down to me not configuring them correctly, 
rather than rapidsvn, visualsvn, Tortoisesvn etc.).
If someone would like to create a patch file from this source, that would be 
extraordinarily kind of you!
In any case, the changes to this file are quite straightforward.


 HTTP Authentication for sharded queries
 ---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SearchHandler.java


 This issue came out of a requirement to have HTTP authentication for queries. 
 Currently, HTTP authentication works for querying single servers, but it's 
 not possible for distributed searches across multiple shards to receive 
 authenticated http requests.
 This patch adds the option for Solr clients to pass shard-specific http 
 credentials to SearchHandler, which can then use these credentials when 
 making http requests to shards.
 Here's how the patch works:
 A final constant String called {{shardcredentials}} acts as the name of the 
 SolrParams parameter key name.
 The format for the value associated with this key is a comma-delimited list 
 of colon-separated tokens:
 {{   
 shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
   }}
 A client adds these parameters to their sharded request. 
 In the absence of {{shardcredentials}} and/or matching credentials, the patch 
 reverts to the existing behaviour of using a default http client (i.e. no 
 credentials). This ensures b/w compatibility.
 When SearchHandler receives the request, it passes the 'shardcredentials' 
 parameter to the HttpCommComponent via the submit() method.
 The HttpCommComponent parses the parameter string, and when it finds matching 
 credentials for a given shard, it creates an HttpClient object with those 
 credentials, and then sends the request using this.
 Note: Because the match comparison is a string compare (a.o.t. dns compare), 
 the host/ip names used in the shardcredentials parameters must match those 
 used in the shards parameter.
 Impl Notes:
 This patch is used and tested on the 1.4 release codebase. There weren't any 
 significant diffs between the 1.4 release and the latest trunk for 
 SearchHandler, so should be fine on other trunks, but I've only tested with 
 the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1740) ShingleFilterFactory improvements

2010-04-02 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852891#action_12852891
 ] 

Steven Rowe commented on SOLR-1740:
---

Thank you, Robert.

 ShingleFilterFactory improvements
 -

 Key: SOLR-1740
 URL: https://issues.apache.org/jira/browse/SOLR-1740
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.1
Reporter: Steven Rowe
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1

 Attachments: SOLR-1740.patch, SOLR-1740.patch


 ShingleFilterFactory should allow specification of minimum shingle size (in 
 addition to maximum shingle size), as well as the separator to use between 
 tokens.  These are implemented at LUCENE-2218.  The attached patch allows 
 ShingleFilterFactory to accept configuration of these items, and includes 
 tests against the new functionality in TestShingleFilterFactory.  
 Solr will have to upgrade to lucene-analyzers-3.1-dev.jar before the attached 
 patch will apply.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1862) CLONE -java.io.IOException: read past EOF

2010-04-02 Thread Alexander S (JIRA)
CLONE -java.io.IOException: read past EOF
-

 Key: SOLR-1862
 URL: https://issues.apache.org/jira/browse/SOLR-1862
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Alexander S
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 1.5


A query with relevancy scores of all zeros produces an invalid doclist that 
includes sentinel values 2147483647 and causes Solr to request that invalid 
docid from Lucene which results in a java.io.IOException: read past EOF

http://search.lucidimagination.com/search/document/2d5359c0e0d103be/java_io_ioexception_read_past_eof_after_solr_1_4_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1860) improve stopwords list handling

2010-04-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852978#action_12852978
 ] 

Robert Muir commented on SOLR-1860:
---

A third idea from Hoss Man:

We should make it easy to edit these lists like english.
So an idea is to create an intl/ folder or similar under the example with 
stopwords_fr.txt, stopwords_de.txt
Additionally we could have a schema-intl.xml with example types 'text_fr', 
'text_de', etc setup for various languages.
I like this idea best.


 improve stopwords list handling
 ---

 Key: SOLR-1860
 URL: https://issues.apache.org/jira/browse/SOLR-1860
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor

 Currently Solr makes it easy to use english stopwords for StopFilter or 
 CommonGramsFilter.
 Recently in lucene, we added stopwords lists (mostly, but not all from 
 snowball) to all the language analyzers.
 So it would be nice if a user can easily specify that they want to use a 
 french stopword list, and use it for StopFilter or CommonGrams.
 The ones from snowball, are however formatted in a different manner than the 
 others (although in Lucene we have parsers to deal with this).
 Additionally, we abstract this from Lucene users by adding a static 
 getDefaultStopSet to all analyzers.
 There are two approaches, the first one I think I prefer the most, but I'm 
 not sure it matters as long as we have good examples (maybe a foreign 
 language example schema?)
 1. The user would specify something like:
  filter class=solr.StopFilterFactory 
 fromAnalyzer=org.apache.lucene.analysis.FrenchAnalyzer .../
  This would just grab the CharArraySet from the FrenchAnalyzer's 
 getDefaultStopSet method, who cares where it comes from or how its loaded.
 2. We add support for snowball-formatted stopwords lists, and the user could 
 something like:
 filter class=solr.StopFilterFactory 
 words=org/apache/lucene/analysis/snowball/french_stop.txt format=snowball 
 ... /
 The disadvantage to this is they have to know where the list is, what format 
 its in, etc. For example: snowball doesn't provide Romanian or Turkish
 stopword lists to go along with their stemmers, so we had to add our own.
 Let me know what you guys think, and I will create a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1861) HTTP Authentication for sharded queries

2010-04-02 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge updated SOLR-1861:
---

Attachment: SearchHandler.java

A small update to this patch to support distributed searches with multiple 
cores.


 HTTP Authentication for sharded queries
 ---

 Key: SOLR-1861
 URL: https://issues.apache.org/jira/browse/SOLR-1861
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Solr 1.4
Reporter: Peter Sturge
Priority: Minor
 Attachments: SearchHandler.java, SearchHandler.java


 This issue came out of a requirement to have HTTP authentication for queries. 
 Currently, HTTP authentication works for querying single servers, but it's 
 not possible for distributed searches across multiple shards to receive 
 authenticated http requests.
 This patch adds the option for Solr clients to pass shard-specific http 
 credentials to SearchHandler, which can then use these credentials when 
 making http requests to shards.
 Here's how the patch works:
 A final constant String called {{shardcredentials}} acts as the name of the 
 SolrParams parameter key name.
 The format for the value associated with this key is a comma-delimited list 
 of colon-separated tokens:
 {{   
 shard0:port0:username0:password0,shard1:port1:username1:password1,shardN:portN:usernameN:passwordN
   }}
 A client adds these parameters to their sharded request. 
 In the absence of {{shardcredentials}} and/or matching credentials, the patch 
 reverts to the existing behaviour of using a default http client (i.e. no 
 credentials). This ensures b/w compatibility.
 When SearchHandler receives the request, it passes the 'shardcredentials' 
 parameter to the HttpCommComponent via the submit() method.
 The HttpCommComponent parses the parameter string, and when it finds matching 
 credentials for a given shard, it creates an HttpClient object with those 
 credentials, and then sends the request using this.
 Note: Because the match comparison is a string compare (a.o.t. dns compare), 
 the host/ip names used in the shardcredentials parameters must match those 
 used in the shards parameter.
 Impl Notes:
 This patch is used and tested on the 1.4 release codebase. There weren't any 
 significant diffs between the 1.4 release and the latest trunk for 
 SearchHandler, so should be fine on other trunks, but I've only tested with 
 the 1.4 release code base.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Document level security in Apache Solr

2010-04-02 Thread Ryan McKinley
Hi Anders-

see comments below...


 Two weeks ago I created a JIRA issue (
 https://issues.apache.org/jira/browse/SOLR-1834) involving document level
 security in Apache Solr and submitted a patch containing a search component
 that can be seen as a starting point for making Solr handle document level
 security. I believe that document security is an essential part of an
 enterprise search engine and I hope that this contribution can start a
 discussion about how this should be handled in Solr (possibly in conjunction
 with the Lucene Connector Framework).


Thanks for posting the code -- a quick pass it looks good.  I agree
some cordination with Lucene Connectors will make sense.

On the patch, it looks good, but to get into the the dist, it will
probably need some sort of tests.  I'm not sure how that would work
with windows authentication (I don't' know much about it, but it has
been on my long term TODO list for a while!)  Perhaps we could have
tests that would run on systems that have somethign to test agains,
but not fail when running on linux (or something)


 As this contribution shows I would like to help to develop the security
 capabilities of Solr together with the community because I believe that it
 will improve Solr’s appeal to large enterprises. Moreover I think that most
 of us believe that a transparent security system will in the end give rise
 to the best security.


agree  -- the more people to poke holes, the better


 I hope some of you can take the time to look at the patch, try it out and
 think about:

 1)      1. Should this be a contrib module in Solr? (And if so, what needs
 to be done to contribute it?)


I think a contrib module makes sense.  For things to move forward, a
committer needs to step up to the plate.  I would love to, but don't
have much time soon.  To make it easier for people to feel comfortable
with it, tests and doc help lots.


 2)      2. Should document level security be a core feature in Solr? (And if
 so, what is the best way to integrate it into Solr?)

I'm not quite sure what you mean by 'core' -- I think it makes sense
to live as a contrib for a while and see how things develop.



 3)      3. How can this integrate with connectors like the Lucene Connector
 Framework? I.e. how do you create a uniform way to talk about Access Control
 Lists (http://en.wikipedia.org/wiki/Access_control_list).


good question!  That would be really powerful.




 P.s (for the nerdy)

 I have some ideas about putting the security deeper into Solr, perhaps by
 creating a secure SolrIndexReader and a secure SolrIndexSearcher that are
 fed user credentials from a search component. What do you think about this?


What are you thinking here?  To me, it seems like the index would need
to contain all data and a SearchComponet would take user credentials
and augment the query (group:[a b c] or whatever)

The advantage of keeping the same IndexSearch across all users is that
it can share a cache where appropriate.


 As I understand it, currently it’s possible to declare your own
 SolrIndexReader but not your own SolrIndexSearcher.


not sure on this...


ryan


[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search

2010-04-02 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853047#action_12853047
 ] 

Lance Norskog commented on SOLR-1858:
-

Also, Distributed Search by one core among other cores in the same instance 
must use the HTTP transport rather than direct internal access.

 Embedded Solr does not support Distributed Search
 -

 Key: SOLR-1858
 URL: https://issues.apache.org/jira/browse/SOLR-1858
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Lance Norskog
Priority: Minor

 It is impossible to do a Distributed Search across multiple cores in an 
 EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled 
 shards, and EmbeddedSolr does not export an HTTP interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search

2010-04-02 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853053#action_12853053
 ] 

Lance Norskog commented on SOLR-1858:
-

This could be handled with one simple change: add a new URL protocol 'solr'.

Details: the SolrJ library would include a new method that creates a server 
from an URL, and the server factory would support 'solr://' and 'solr://core' 
as URLs. The meaning of solr:// changes when used within an app, an 
EmbeddedSolr instance, or within the web app.

 In an Embedded Solr instance, it refers to the embedded instance itself. In a 
servlet container instance, it refers to that instance. 'solr://' would not be 
supported within a client app, because there is no Solr instance in the app.

In short, the 'solr://' URL refers to the Solr instance available within the 
current JVM, _via the current classloader_. 

 Embedded Solr does not support Distributed Search
 -

 Key: SOLR-1858
 URL: https://issues.apache.org/jira/browse/SOLR-1858
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Lance Norskog
Priority: Minor

 It is impossible to do a Distributed Search across multiple cores in an 
 EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled 
 shards, and EmbeddedSolr does not export an HTTP interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1858) Embedded Solr does not support Distributed Search

2010-04-02 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853054#action_12853054
 ] 

Lance Norskog commented on SOLR-1858:
-

Use cases possible:
* An embedded instance can do a distributed search across multiple cores which 
are inside the instance or are remote.
* The SolrEntityProcessor is an (uncommitted) plugin for the DataImportHandler 
([SOLR-1499]). It does a search against a Solr instance and supplies the 
resulting document, or document series, to the DIH processing chain. With the 
'solr://' option, this tool can do queries against its own Solr instance with 
no HTTP overhead.

 Embedded Solr does not support Distributed Search
 -

 Key: SOLR-1858
 URL: https://issues.apache.org/jira/browse/SOLR-1858
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Lance Norskog
Priority: Minor

 It is impossible to do a Distributed Search across multiple cores in an 
 EmbeddedSolr instance. Distributed Search only works for Solr HTTP-controlled 
 shards, and EmbeddedSolr does not export an HTTP interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.