RE: Solr under tomcat - UTF-8 issue

2009-10-25 Thread markwaddle

I was originally using POST for the same reason, however I discovered that
Tomcat could easily be configured to accept any length URI. All it requires
is specifying the maxHttpHeaderSize attribute in your default Connector in
server.xml. I set my value to 1MB, which is certainly excessive, but it
ensures I will never hit the limit. As the other chap mentioned, I now have
the benefits of caching and most importantly, proper web logs!

I also have a similar situation where I constrain the search results based
on the user's role. I have only two roles to support, so my case is very
simple, but I could imagine having a multivalued "role" field that you could
perform facet queries on.

Mark


Glock, Thomas wrote:
> 
> Thanks -
> 
> I agree.  However my application requires results be trimmed to users
> based on roles.  The roles are repeating values on the documents.  Users
> have many different role combinations as do documents.
> I recognize this is going to hamper caching - but using a GET will tend to
> limit the size of search phrases when combined with the boolean role
> clause.  And I am concerned with hitting url limits.
> 
> At any rate I solved it thanks to Yonik's recommendation.  
> 
> My flex client httpservice by default only sets the content-type request
> header to  "application/x-www-form-urlencoded"  what it needed to do for
> tomcat is set the content-type request header to content-type =
> "application/x-www-form-urlencoded; charset=UTF-8"; 
> 
> If you have any suggestions regarding limiting results based on user and
> document role permutations - I'm all ears.  I've been to the Search Summit
> in NYC and no vendor could even seem to grasp the concept.  
> 
> The problem case statement is this  - I have users globally who need to
> search for content tailored to them.  Users searching for 'Holiday' don't
> get any value from 1 documents having the word holiday. What they need
> are documents authored for that population.  The documents have the
> associated role information as metadata and therefore users will get only
> the documents they have access to and are relevant to them.  That's the
> plan anyway!  
> 
> By chance I stumbled in Solr a month or so ago and I think its awesome.  I
> got the book two days ago too - fantastic!
> 
> Thanks again,
> Tom
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-under-tomcat---UTF-8-issue-tp26040052p26054942.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Shards param accepts spaces between commas?

2009-10-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sun, Oct 25, 2009 at 9:34 PM, Chris Hostetter
 wrote:
>
> : It seems like no, and should be an easy change.  I'm putting newlines
> : after the commas so the large shards list doesn't scroll off the
> : screen.
>
> Yeah ... for some odd reason QueryComponnent is using
> StrUtils.splitSmart() ... SolrPluginUtils.split() seems like a saner
> choice.
>
> A better question is probably why the shards parm isn't just multivalued.
good question. I guess it should be

>
> (Yonik?)
>
>
>
>
>
>
> -Hoss
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


weird behaviour while inserting records into solr

2009-10-25 Thread Rakhi Khatwani
Hi,
 i was trying to insert one million records in solr (keeping the id from
0 to 100). things were fine till it inserted (id =  523932). after that
it started inserting it from 1 (i.e updating). i am not able to understand
this behaviour. any pointers??
Regards,
Raakhi


"begins with" searches

2009-10-25 Thread Bernadette Houghton
We need to offer "begins with" type searches, e.g. a search for "surname, f" 
will retrieve "surname, firstname", "surname, f", "surname fm" etc.

Ideally, the user would be able to enter something like "surname f*".

However, wildcards don't work on phrase searches, nor do range searches.

Any suggestions as to how best to search for "begins with" phrases; or, how to 
best configure solr to support such searches?

TIA
Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email: 
bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.
Deakin University does not warrant that this email and any attachments are 
error or virus free



Re: Dismax params, mm < explanation

2009-10-25 Thread ram_sj

After reading the explanation in the book, its very clear now. Thank you
citing it with page number, 

Ram


hossman wrote:
> 
> 
> What you are looking at is an XML escaped version of this string...
> 
>   2<-1 3<-2 6<100%
> 
> ...the syntax is documented here...
> 
> http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29
> http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html
> 
> ...note that the string you have listed there actually makes very little 
> sense because of the 100% condition.  it says that for queries of more 
> then 6 clauses all of them are required (usually the mm param get's less 
> strict as the number of clauses increase)
> 
> (FYI: As the creator of the 'mm' param syntax, One of my favorite parts of 
> the new Solr 1.4 book is the explanation of mm options with multiple 
> clauses.  It's descibes in in a completely differnet way from anything i'd 
> ever thought of before (i was convinced it was a huge mistake the first 
> two times i read that section before the light bulb went off and i 
> realized how brilliant it was) and is probably a lot easier for many 
> people to understand -- if you have the book it's on p139)
> 
> : 2<-1 3<-2 6<100%
>   ...
> : I requested solr to match atleast two fields, which i understood from
> the
> : documents. Can someone give me explanations for other params in it? 
> : 
> : "lt;-1 3"
> : 
> : "lt;-2 6"
> : 
> : "lt;100%"
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26052492.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-25 Thread Martijn v Groningen
I was able to reproduce the exact same stacktrace you have sent. The
exception occured when I removed a document from a newly created index
(with a commit) and then did a search with field collapsing enabled. I
have attached a new patch to SOLR-236 that includes a fix for this
bug.

Martijn

2009/10/25 Martijn v Groningen :
> Hi Joe,
>
> Can you give a bit more context info? Like the exact search and the
> field types you are using for example. Also are you doing a lot of
> frequent updates to the index?
>
> Cheers,
>
> Martijn
>
> 2009/10/23 Joe Calderon :
>> seems to happen when sort on anything besides strictly score, even
>> score desc, num desc triggers it, using latest nightly and 10/14 patch
>>
>> Problem accessing /solr/core1/select. Reason:
>>
>>    4731592
>>
>> java.lang.ArrayIndexOutOfBoundsException: 4731592
>>        at 
>> org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)
>>        at 
>> org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235)
>>        at 
>> org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173)
>>        at 
>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)
>>        at 
>> org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95)
>>        at 
>> org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
>>        at 
>> org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
>>        at 
>> org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at 
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>        at 
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
>>        at 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
>>        at 
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at 
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at 
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>>        at 
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>>        at 
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>        at 
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>        at 
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>        at org.mortbay.jetty.Server.handle(Server.java:326)
>>        at 
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>>        at 
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
>>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>        at 
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>>        at 
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)
>


Re: Dismax params, mm < explanation

2009-10-25 Thread Chris Hostetter

What you are looking at is an XML escaped version of this string...

2<-1 3<-2 6<100%

...the syntax is documented here...

http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29
http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

...note that the string you have listed there actually makes very little 
sense because of the 100% condition.  it says that for queries of more 
then 6 clauses all of them are required (usually the mm param get's less 
strict as the number of clauses increase)

(FYI: As the creator of the 'mm' param syntax, One of my favorite parts of 
the new Solr 1.4 book is the explanation of mm options with multiple 
clauses.  It's descibes in in a completely differnet way from anything i'd 
ever thought of before (i was convinced it was a huge mistake the first 
two times i read that section before the light bulb went off and i 
realized how brilliant it was) and is probably a lot easier for many 
people to understand -- if you have the book it's on p139)

: 2<-1 3<-2 6<100%
...
: I requested solr to match atleast two fields, which i understood from the
: documents. Can someone give me explanations for other params in it? 
: 
: "lt;-1 3"
: 
: "lt;-2 6"
: 
: "lt;100%"



-Hoss



Re: Constant Score Queries and Function Queries

2009-10-25 Thread Chris Hostetter

: Fair enough, I guess I was just kind of expecting a constant score query + a
: function query to result in a score of whatever the function query is.  This
: is a common trick to sort by a function, but it's easy enough to just ^0 the
: non function clause.

I think the root of hte issue is that a ConstantScoreQuery has a constant 
according to it's boost, which defaults to "1", so all the usual 
queryNorm effects apply when using it in a BooleanQuery.

Random thought: one way to implement "sort by function" more naturally 
would be to add a "sortfunc" parm that used the FunctionQParser, take the 
"q" query and move it to the Filter query list, then use the func query as 
your main query.  (all of this could be triggered my a new magic sort 
field "_func_ desc" which was equivilent to "score desc" after the 
query swapping ... things like "sort=inStock desc, _func_ desc" would 
still work as well)



-Hoss



Re: Searching over all Dynamic Fields: different things tested, multiple issues experienced

2009-10-25 Thread Chris Hostetter

: When I test it, if I test it with "stored=true", it works as expected, if I
: test with with "stored=false" the resultset is empty.

Adding stored="false" has no impact on anything related to searchings -- 
it only affects what values can be written out by the response writer. 
There's no way only changing that attribute on a field could produce the 
behavior you're describing.

If you post your schema.xml, some sample data, and examples of the queries 
you are attempting; people could probably help you spot what may be 
causing your problem.

-Hoss



Re: question about merging indexes

2009-10-25 Thread Chris Hostetter

: I need some help about the mergeindex command. I have 2 cores A and B
: that I want to merge into a new index RES. A has 100 docs and B 10
: docs. All of B's docs are from A, except that one attribute is
: changed. The goal is to bring the updated attributes from B into A.

that's not how mergeindex works ... merging two indexes is essentially 
just adding all the docs from one index into the other (but w/o the 
reindexing step - it works by copying the raw term info)

There is no way to modify a doc once it's been indexed.

: When I issue the mergeindexes command  my RES core only has 10 docs. I
: expect RES to have 100  or even 110 docs, but 10 is very puzzling. Am
: I misunderstanding something about merging indexes?

what exactly was the command you used to do the merge?  you should have 
gotten 110 docs.



-Hoss



Dismax params, mm < explanation

2009-10-25 Thread ram_sj

Hi,
consider this minimum match params in dismax query handler, 

  
2<-1 3<-2 6<100%
  

I requested solr to match atleast two fields, which i understood from the
documents. Can someone give me explanations for other params in it? 

"lt;-1 3"

"lt;-2 6"

"lt;100%"

how are they significant? 

thanks
ram
-- 
View this message in context: 
http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26049472.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solrj Javabin and JSON

2009-10-25 Thread Patrick Jungermann
Hi Stefan,

you don't need to convert the Java objects built from the result
returned as Javabin. Instead of this, you could easily use the JSON
return format by setting "wt=json". See also at [0] for more information
about this.


Patrick


[0] http://wiki.apache.org/solr/SolJSON


SGE0 schrieb:
> Hi Paul,
> 
> 
> fair enough. Is this included in the Solrj package ? Any examples how to do
> this ?
> 
> 
> Stefan
> 
> 
> 
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>> There is no point converting javabin to json. javabin is in
>> intermediate format it is converted to the java objects as soon as
>> comes. You just need means to convert the java object to json.
>>
>>
>>
>> On Sat, Oct 24, 2009 at 12:10 PM, SGE0  wrote:
>>> Hi,
>>>
>>> did anyone write a Javabin to JSON convertor and is willing to share this
>>> ?
>>>
>>> In our servlet we use a CommonsHttpSolrServer instance to execute a
>>> query.
>>>
>>> The problem is that is returns Javabin format and we need to send the
>>> result
>>> back to the browser using JSON format.
>>>
>>> And no, the browser is not allowed to directly query Lucene with the
>>> wt=json
>>> format.
>>>
>>> Regards,
>>>
>>> S.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solrj-Javabin-and-JSON-tp26036551p26036551.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> -- 
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
> 



Re: Specify increment gap with PatternTokenizerFactory

2009-10-25 Thread Chris Hostetter

: Is there a way to specify an increment gap between tokens with the
: PatternTokenizerFactory or do I need to customise it? For instance if I
: split on commas in "*Books, Online Shopping, Book Store*"  I want to be able
: to put a 100 position gap between say "books" and "online shopping".

Terminology clarification: an "increment gap" is what you configure the 
analyzer to increment it's internal position counter by when it's used to 
index multiple discrete values for a given field (using the 
positionIncrementGap in schema.xml).

What you are describing is just the position "increment" for a token after 
previous tokens produced by the same field value (ie: all one stream)

That said: No, PatternTokenizerFactory doesn't provide any means for 
changing the default increment (1)



-Hoss



Re: Solr and bitwise comparaison

2009-10-25 Thread Chris Hostetter

: I search to make a request in solr similaire are "SELECT COUNT(*) FROM
: InscriptionNew WHERE choices & 17 > 0;" into mysql.
: it is possible, you have an idea ?

bitmask operations in DB queries like that are usually a result of using a 
single physical column to store many logical boolean columns -- either out 
of space concerns or as a way to pre-allocate boolean fields for later use 
without needing to add columns.

In Solr: just use a BoolField for each bit.




-Hoss



Re: Shards param accepts spaces between commas?

2009-10-25 Thread Chris Hostetter

: It seems like no, and should be an easy change.  I'm putting newlines
: after the commas so the large shards list doesn't scroll off the
: screen.

Yeah ... for some odd reason QueryComponnent is using 
StrUtils.splitSmart() ... SolrPluginUtils.split() seems like a saner 
choice.

A better question is probably why the shards parm isn't just multivalued.

(Yonik?)






-Hoss



Re: MoreLikeThis support Dismax parameters

2009-10-25 Thread Chris Hostetter

In the current code base the MLT Handler has geneerally been superceeded 
by a MLT Component which may do what you want -- you can use an QParser 
you want to generate a DocList and the MLT Component then suggests similar 
docs for each doc in your list.

As i said: that may be what you're looking for (it's hard to tell based on 
your email) but the other possibility is that you want to be able to 
specify bq (and maybe bf) type parrams to influence the MLT portion of the 
request (ie: apply a bias so docs matching a particular query/func are 
mosre likely to be suggested) ... this is an area that hasn't really been 
very well explored as far as i can remember.

: >From what I've read/found, MoreLikeThis doesn't support the dismax
: parameters that are available in the StandardRequestHandler (such as bq). Is
: it possible that we might get support for those parameters some time? What
: are the issues with MLT Handler inheriting from the StandardRequestHandler
: instead of RequestHandlerBase?


-Hoss



Re: Store tika extracted result as xhtml

2009-10-25 Thread Chris Hostetter

: My objective is to be able to stored it as xhtml in the field and be 
: able to retrieve it as cached output. Since tika is already giving xhtml 
: output, I wonder why when Solr save it as a plain text. (Maybe I missed 
: out something in the configuration??)

I'm not very familiar with Tika or Solr CELL, but I think what you are 
seeing is that Solr only asks Tika for the *content* of the DOM Nodes 
matched by the xpath and/or capture params (ie: node.getTextContent()).

I suspect it wouldnt' be too hard to add an option to allow the capture of 
the serialized DOM Nodes.



-Hoss



Re: Which query parser handles nested queries?

2009-10-25 Thread Chris Hostetter
: http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/).
: When working with Solr 1.3 stable, I'm able to use this syntax
: effectively using the default requestHandler, but when I am
: hand-rolling my own requestHandler, it doesn't recognize the _query_

the magic field _query_ is special syntax of the SolrQueryParser (aka the 
"lucene" QParserPlugin)  (FYI: if you know jaav, which i assume you do it 
you're writting your own requestHandler) you can find this by grepping the 
code base for '"_query_"'

So if you use SolrQueryParser in your own request handler you should be 
fine ... if you're writting a custom request handler you'll have to add 
that same special handling.


-Hoss



Re: field collapsing bug (java.lang.ArrayIndexOutOfBoundsException)

2009-10-25 Thread Martijn v Groningen
Hi Joe,

Can you give a bit more context info? Like the exact search and the
field types you are using for example. Also are you doing a lot of
frequent updates to the index?

Cheers,

Martijn

2009/10/23 Joe Calderon :
> seems to happen when sort on anything besides strictly score, even
> score desc, num desc triggers it, using latest nightly and 10/14 patch
>
> Problem accessing /solr/core1/select. Reason:
>
>    4731592
>
> java.lang.ArrayIndexOutOfBoundsException: 4731592
>        at 
> org.apache.lucene.search.FieldComparator$StringOrdValComparator.copy(FieldComparator.java:660)
>        at 
> org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:235)
>        at 
> org.apache.solr.search.NonAdjacentDocumentCollapser$DocumentPriorityQueue.lessThan(NonAdjacentDocumentCollapser.java:173)
>        at 
> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)
>        at 
> org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:95)
>        at 
> org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
>        at 
> org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
>        at 
> org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
>        at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.handle(Server.java:326)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>        at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>        at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)