Re: MoreLikeThis: /solr/mlt NOT_FOUND

2010-06-01 Thread jlist9
That's it. Thank you!
I thought mlt was available by default. I was wrong.

On Tue, Jun 1, 2010 at 8:22 AM, Ahmet Arslan  wrote:
>> I have some experience using MLT with
>> the StandardRequestHandler with Python
>> but I can't figure out how to do it with solrj. It seems
>> that to do
>> MLT with solrj I have
>> to use MoreLikeThisRequestHandler and there seems no way to
>> use
>> StandardRequestHandler for MLT with solrj (please correct
>> me if I'm wrong.)
>>
>> So I try to test it by following this page:
>> http://wiki.apache.org/solr/MoreLikeThisHandler
>> but I get this error:
>>
>> HTTP ERROR: 404
>> NOT_FOUND
>> RequestURI=/solr/mlt
>>
>> Do I need to do something in the config file before I can
>> use MLT?
>
> Did you register /mlt in your solrconfig.xml?
>
>  class="org.apache.solr.handler.MoreLikeThisHandler">
> 
> list
> 
> 
>
> you can invoke it with SolrQuery.set("qt", "/mlt");


RE: Query related question

2010-06-01 Thread Jonathan Rochkind
One way to do it would be to use dismax request handler at query time, with a 
pf paramater on the same field(s) as your qf paramter, but with a big boost on 
the pf.  http://wiki.apache.org/solr/DisMaxRequestHandler

I'm not sure why you're getting matches for "tigers" and "woods" on "tiger 
woods" though; your example has the EnglishPorterFilterFactory commented out, 
if you had that actually in there that would explain it but as it is, I'm not 
sure what does. Your synonyms file? That seems odd. 

If you WERE using stemming, but wanted un-stemmed results to rank higher, one 
way to do it would be to actually use two different solr fields, one stemmed 
and one not stemmed. And then again use dismax, and boost the un-stemmed field 
higher, in either both qf and pf, or just pf. 

Jonathan

From: iboppana [indrani.bopp...@cmgdigital.com]
Sent: Tuesday, June 01, 2010 10:45 PM
To: solr-user@lucene.apache.org
Subject: Query related question

Hi All,

When I query for a word say Tiger woods, and sort results by score... i do
notice that the results are mixed up i.e first 5 results match Tiger woods
the next 2 match either tiger/tigers or wood/woods
the next 2 after that i notice again match tiger woods.

How do i make sure that when searching for words like above i get all the
results matching whole search term first, followed by individual tokens like
tiger, woods later.

My text fieldtype defined as follows


  





  
  






  




Thanks
 Indrani
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-related-question-tp863523p863523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query related question

2010-06-01 Thread iboppana

Hi All,

When I query for a word say Tiger woods, and sort results by score... i do
notice that the results are mixed up i.e first 5 results match Tiger woods
the next 2 match either tiger/tigers or wood/woods
the next 2 after that i notice again match tiger woods.

How do i make sure that when searching for words like above i get all the
results matching whole search term first, followed by individual tokens like
tiger, woods later.

My text fieldtype defined as follows


  





  
  






  




Thanks
 Indrani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-related-question-tp863523p863523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Generic question on Query Analyzers

2010-06-01 Thread iboppana

Thanks a lot for the quick responses. I will try it out.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Generic-question-on-Query-Analyzers-tp849075p863512.html
Sent from the Solr - User mailing list archive at Nabble.com.


Importing large datasets

2010-06-01 Thread Blargy

We have around 5 million items in our index and each item has a description
located on a separate physical database. These item descriptions vary in
size and for the most part are quite large. Currently we are only indexing
items and not their corresponding description and a full import takes around
4 hours. Ideally we want to index both our items and their descriptions but
after some quick profiling I determined that a full import would take in
excess of 24 hours. 

- How would I profile the indexing process to determine if the bottleneck is
Solr or our Database.
- In either case, how would one speed up this process? Is there a way to run
parallel import processes and then merge them together at the end? Possibly
use some sort of distributed computing?

Any ideas. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p863447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Array of arguments in URL?

2010-06-01 Thread Lance Norskog
In the "/spell" declaration in the example solrconfig.xml, we find
these lines among the default parameters:


  spellcheck


How does one supply such an array of strings in HTTP parameters? Does
Solr have a parsing option for this?

-- 
Lance Norskog
goks...@gmail.com


Re: CFP for Lucene Revolution Conference, Boston, MA October 7 & 8 2010

2010-06-01 Thread Grant Ingersoll
Sorry for the noise, but thought I would send out a reminder to get your talks 
in...

On May 17, 2010, at 8:43 AM, Grant Ingersoll wrote:

> Lucene Revolution Call For Participation - Boston, Massachusetts October 7 & 
> 8, 2010
>  
> The first US conference dedicated to Apache Lucene and Solr is coming to 
> Boston, October 7 & 8, 2010. The conference is sponsored by Lucid Imagination 
> with additional support from community and other commercial co‐sponsors. The 
> audience will include those experienced Solr and Lucene application 
> development, along with those experienced in other enterprise search 
> technologies interested becoming more familiar with Solr and Lucene 
> technologies and the opportunities they present. 
> 
> We are soliciting 45‐minute presentations for the conference.
> 
> Key Dates:
> May 12, 2010 Call For Participation Open
> June 23, 2010Call For Participation Closes
> June 28, 2010Speaker Acceptance/Rejection Notification
> October 5‐6, 2010  Lucene and Solr Pre‐conference Training Sessions
> October 7‐8, 2010  Conference Sessions
> 
> 
> Topics of interest include:
> Lucene and Solr in the Enterprise (case studies, implementation, return on 
> investment, etc.)
>  “How We Did It” Development Case Studies
> Spatial/Geo search
>  Lucene and Solr in the Cloud (Deployment cases as well as tutorials)
> Scalability and Performance Tuning
> Large Scale Search
> Real Time Search
> Data Integration/Data Management
> Lucene & Solr for Mobile Applications
> 
> All accepted speakers will qualify for discounted conference admission. 
> Financial assistance is available for speakers that qualify.
> 
> To submit a 45‐minute presentation proposal, please send an email to 
> c...@lucenerevolution.org with Subject containing: , Topic  session title> containing the following information in plain text.
> 
> If you have more than one topic proposed, send a separate email. Do not 
> attach Word or other text file documents.
> 
> Return all fields completed as follows:
> 1.Your full name, title, and organization 
> 2.Contact information, including your address, email, phone number 
> 3.The name of your proposed session (keep your title simple, interesting, 
> and relevant to the topic) 
> 4.A 75‐200 word overview of your presentation; in addition to the topic, 
> describe whether your
> presentation is intended as a tutorial, description of an implementation, an 
> theoretical/academic
> discussion, etc. 
> 5.A 100‐200‐word speaker bio that includes prior conference speaking or 
> related experience
> To be considered, proposals must be received by 12 Midnight PDT Wednesday, 
> June 23, 2010.
> 
> Please email any general questions regarding the conference to 
> i...@lucenerevolution.org. To be added to the conference mailing list, please 
> email sig...@lucenerevolution.org. If your organization is interested in 
> sponsorship opportunities, email spon...@lucenerevolution.org.
> 
> We look forward to seeing you in Boston!

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Spatial Solr: problem with multiValued PointType

2010-06-01 Thread Darren Govoni
This seems to be a problem (from my limited understanding). I
encountered the same thing.

And the problem is that you can have results that independently match
the constraints for latitude and longitude, but the corresponding points
would not match (i.e. with separate fields there are no longer points!).

Thusly, a correct implementation of spatial search cannot have 2
separate fields (one for lat and one for long) and expect to get
accurate results.

Not sure how Solr team is accounting for this yet.

The Spatial example on the wiki does not work in the current trunk. It
always produces results, no matter what the stored point field has in
relation to the query.

On Tue, 2010-05-04 at 16:20 +0200, pointbreak+s...@ml1.net wrote:

> I want to link documents to multiple spatial points, and filter
> documents based on a bounding box. I was expecting that the
> solr.PointType would help me with that, but run into a problem. When I
> create a filter, it seems that Solr matches the latitude and longitude
> of the PointType separately. Could somebody please advice me if this is
> expected behavior, and if so how to handle this usecase.
> 
> My setup is as follows:
> 
> in schema.xml:
>subFieldSuffix="_d"/>
>multiValued="true"/>
> 
> I create a document with the following locations:
>52.3672174, 4.9126891  and:
>52.3624717, 4.9106624
> 
> This document will match with the filter:
>location:[52.362,4.911 TO 52.363,4.913]
> 
> I would have expected it not to match, since both locations are outside
> this bounding box (the longitude of the second, and the latitude of the
> first point as point would be inside the bounding box).
> 
> Thank for any help.




Re: nested querries, and LocalParams syntax

2010-06-01 Thread Yonik Seeley
Hmmm, well, the lucene query parser does basic backslash escaping, and
so does local params within quoted strings.  You can also use
parameter derefererencing to avoid the need to escape values too.
Like you pointed out, using single quotes in some places can also
help.

But instead of me trying to give you tons of examples that you
probably already understand, start from the assumption that things
will work, and if you come across something that doesn't make sense
(or doesn't work), I can help with that.   Or if you give a single
real example as a general pattern, perhaps we could help figure out
the simplest way to avoid most of the escaping.

-Yonik
http://www.lucidimagination.com



On Tue, Jun 1, 2010 at 6:21 PM, Jonathan Rochkind  wrote:
> I am just trying to figure it out mostly, the particular thing I am trying
> to do is a very general purpose mapper to complex dismax nested querries.  I
> could try to explain it, and we could go back and forth for a while, and
> maybe I could convince you it makes sense to do what I'm trying to do.  But
> mostly I'm just exploring at this point, so I can get a sense of what is
> possible.
>
> So it would be super helpful if someone can help me figure out escaping
> stuff and skip the other part, heh.
>
> But basically, it's a mapper from a "CQL" query (a structured language for
> search-engine-style querries) to Solr, where some of the "fields" searched
> aren't really Solr fields/indexes, but aggregated definitions of dismax
> query params including multiple solr fields, where exactly what solr fields
> and other dismax querries will not be hard-coded, but will be configurable.
>  Thus the use of nested querries. So since it ends up so general purpose and
> abstract, and many of the individual parameters are configurable, thus my
> interest in figuring out proper escaping.
>
> Jonathan
>
> Yonik Seeley wrote:
>>
>> It's not clear if you're just trying to figure it all out, or get
>> something specific to work.
>> If you can give a specific example, we might be able to suggest easier
>> ways to achieve it rather than going escape crazy :-)
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Tue, Jun 1, 2010 at 5:06 PM, Jonathan Rochkind 
>> wrote:
>>
>>>
>>> Thanks, the pointer to that documentation page (which somehow I had
>>> missed),
>>> as well as Chris's response is very helpful.
>>>
>>> The one thing I'm still not sure about, which I might be able to figure
>>> it
>>> out through trial-and-error reverse engineering, is escaping issues when
>>> you
>>> combine nested querries WITH local params. We potentially have a lot of
>>> levels of quotes:
>>>
>>> q= URIescape(    _local_="{!dismax qf=" value that itself contains a \"
>>> quote mark"} "phrase query"    "   )
>>>
>>> Whole bunch of quotes going on there. How do I give this to Solr so all
>>> my
>>> quotes will end up parsed appropriately? Obviously that above example
>>> isn't
>>> right.   We've got the quotes around the _local_ nested query, then we've
>>> got quotes around a LocalParam value, then we've got quotes that might be
>>> IN
>>> the actual literal value of the LocalParam, or quotes that might be in
>>> the
>>> actual literal value of the nested query.  Maybe using single quotes in
>>> some
>>> places but double quotes in others will help, for certain places that can
>>> take singel or double quotes?
>>> Thanks very much for any advice, I get confused thinking about this.
>>>
>>> Jonathan
>>>
>>> Chris Hostetter wrote:
>>>

 In addition to yonik's point about the LocalParams wiki page (and please
 let us know if you aren't sure of the answers to any of your questions
 after
 reading it) I wanted to clear up one thing...

 : Let's start with that not-nested query example.   Can you in fact use
 it
 as
 : above, to force dismax handling of the 'q' even if the qt or request
 handler

 Quick side note: "qt" determines the ReequestHandler -- if it's "dismax"
 then you get the DisMaxRequestHandler which in recent versions of solr
 is
 just a thin subclass of the SearchHandler subclass where the default
 value
 of "defType" (which is used to pick a QParser) is "dismax" instead of
 "lucene" ... i tried to explain this in a recent blog...

 http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

 ... the key thing to note is that "defType" is a param that is specific
 to
 SearchHandler -- if you use "qt" to pick some other third party
 RequestHandler, it's not neccessarily going to do *anything* and the
 nested
 params syntax may not work at all.

 : default is something else?  The documentation is confusing: "In
 standard
 Solr
 : search handlers, the default type of the main query only may be
 specified via
 : the defType parameter. The default type of all other query parameters
 will
 : remain "lucene 

Re: Help me understand query syntax of subqueries

2010-06-01 Thread Chris Hostetter

: Any idea why this query returns 0 records:
: "sexual assault" AND (-obama)
: while this one returns 1400 ?
: "sexual assault" AND -(obama)

in the first one, the parans create a boolean query consisting of a single 
negated clause -- but pure negative boolean queries (ie: boolean queries 
were every clause is negated) by definition can't match anything -- and 
then you say in your outermost query that that boolean query (that matches 
nothing) must match (because it's part of an "AND")  this is expalined in 
your debugging info...

: "sexual assault" AND (-obama), translates to:  +text:"sexual assault"
: +(-text:obama), returns 0 records

In the second query, the negation applies to the entire boolean query 
-- since it contains one positive clause it's optimized away, also visible 
in your debugging info...

: "sexual assault" AND -(obama), translates to:  +text:"sexual assault"
: -text:obama, returns 1400 records


: (-obama), translates to: -text:obama, returns 716295 records
: -(obama), translates to: -text:obama, returns 716295 records

...this is because at the "top level" Solr does allow "pure negative" 
queries by inverting the search for you -- but it can't do that for sub 
queries.


-Hoss



Re: Subclassing DIH

2010-06-01 Thread Blargy

I'll give the deletedEntity "trick" a try... igneous 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p863108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inserting shards in overridden SearchComponent prepare method yields null pointer

2010-06-01 Thread Chris Hostetter

Wild shot in the dark: if the list of shards is changed between prepare 
method and hte process method of the QueryComponent that could tickle some 
code path that was never expected, and maybe trigger an NPE (ie: looking 
up some IDs in a map keyed off of shard and now the shard is something 
that never had a value put in that map) ... so it really dpeends where 
your component is registered in the component list.

like i said: wild shot i nthe dark.  you ahven't posted a lot of details 
(for instance: i assume that stack trace is coming from the shard, not the 
collator -- but again that's just a guess)

: Date: Tue, 1 Jun 2010 13:20:20 -0700
: From: Jason Rutherglen 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Inserting shards in overridden SearchComponent prepare method yields 
:  null pointer
: 
: The insert shards code is as follows:
: 
: ModifiableSolrParams modParams = new ModifiableSolrParams(params);
: modParams.set("shards", shards);
: rb.req.setParams(modParams);
: 
: Where shards is a valid single shard pseudo URL.
: 
: Stacktrace:
: 
: HTTP Status 500 - null java.lang.NullPointerException at
: 
org.apache.solr.handler.component.QueryComponent.createRetrieveDocs(QueryComponent.java:497)
: at 
org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:298)
: at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234)
: at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
: at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
: 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:342)
: at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:245)
: at
: 



-Hoss



Re: nested querries, and LocalParams syntax

2010-06-01 Thread Jonathan Rochkind
I am just trying to figure it out mostly, the particular thing I am 
trying to do is a very general purpose mapper to complex dismax nested 
querries.  I could try to explain it, and we could go back and forth for 
a while, and maybe I could convince you it makes sense to do what I'm 
trying to do.  But mostly I'm just exploring at this point, so I can get 
a sense of what is possible.


So it would be super helpful if someone can help me figure out escaping 
stuff and skip the other part, heh.


But basically, it's a mapper from a "CQL" query (a structured language 
for search-engine-style querries) to Solr, where some of the "fields" 
searched aren't really Solr fields/indexes, but aggregated definitions 
of dismax query params including multiple solr fields, where exactly 
what solr fields and other dismax querries will not be hard-coded, but 
will be configurable.  Thus the use of nested querries. So since it ends 
up so general purpose and abstract, and many of the individual 
parameters are configurable, thus my interest in figuring out proper 
escaping.


Jonathan

Yonik Seeley wrote:

It's not clear if you're just trying to figure it all out, or get
something specific to work.
If you can give a specific example, we might be able to suggest easier
ways to achieve it rather than going escape crazy :-)

-Yonik
http://www.lucidimagination.com



On Tue, Jun 1, 2010 at 5:06 PM, Jonathan Rochkind  wrote:
  

Thanks, the pointer to that documentation page (which somehow I had missed),
as well as Chris's response is very helpful.

The one thing I'm still not sure about, which I might be able to figure it
out through trial-and-error reverse engineering, is escaping issues when you
combine nested querries WITH local params. We potentially have a lot of
levels of quotes:

q= URIescape(_local_="{!dismax qf=" value that itself contains a \"
quote mark"} "phrase query""   )

Whole bunch of quotes going on there. How do I give this to Solr so all my
quotes will end up parsed appropriately? Obviously that above example isn't
right.   We've got the quotes around the _local_ nested query, then we've
got quotes around a LocalParam value, then we've got quotes that might be IN
the actual literal value of the LocalParam, or quotes that might be in the
actual literal value of the nested query.  Maybe using single quotes in some
places but double quotes in others will help, for certain places that can
take singel or double quotes?
Thanks very much for any advice, I get confused thinking about this.

Jonathan

Chris Hostetter wrote:


In addition to yonik's point about the LocalParams wiki page (and please
let us know if you aren't sure of the answers to any of your questions after
reading it) I wanted to clear up one thing...

: Let's start with that not-nested query example.   Can you in fact use it
as
: above, to force dismax handling of the 'q' even if the qt or request
handler

Quick side note: "qt" determines the ReequestHandler -- if it's "dismax"
then you get the DisMaxRequestHandler which in recent versions of solr is
just a thin subclass of the SearchHandler subclass where the default value
of "defType" (which is used to pick a QParser) is "dismax" instead of
"lucene" ... i tried to explain this in a recent blog...

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

... the key thing to note is that "defType" is a param that is specific to
SearchHandler -- if you use "qt" to pick some other third party
RequestHandler, it's not neccessarily going to do *anything* and the nested
params syntax may not work at all.

: default is something else?  The documentation is confusing: "In standard
Solr
: search handlers, the default type of the main query only may be
specified via
: the defType parameter. The default type of all other query parameters
will
: remain "lucene "."
: : I _think_ it's trying to say that I _can_, even in a standard search
handler,
: force dismax with {!dismax}, I just can't change the type of _other_
query
: parameters -- rather than saying that I _can't_ use {!dismax} to force
dismax
: type of 'q' in a "standard search handler".  Yes?

You're right, it is confusing -- the point is tha defType changes the
"default QParser type" for the "q" param -- but it doesn't change it for any
other param.  I've improved the wording, but the key to keep in mind is that
that is completley orthoginal to using the local params syntax that you
asked about.

What that documentation is trying to illustrate is that in this request...

  defType=XXX&q=AAA&fq=BBB

...the "XXX" QParser will be used to parse the value "AAA" -- but the
stock "lucene" QParser will be used to parse the "fq" param

Regardless of the value of defType, if you put the local params syntax
({!foo}) at the begining of a query param, you can force that param to be
parsed the way you wish...

  defType=XXX&q={!foo}AAA&fq={!bar}BBB

...in that example, neither the XXX o

Re: wrong lucene package in solr trunk?

2010-06-01 Thread Chris Hostetter

: In order to use the current trunk version of solr, I built it running
: "ant package" in trunk/solr and added the resulting maven artifacts to
: my project.

the trunk is definitley in flux now with the way Lucene & solr (and the 
new "modules" directory) are all designed to be built as one monolithic 
release.

ultimatley it should be possible to build the individual pieces seperately 
9and to a large extent you can already do that) but it doesn't suprise me 
at all that the POMs don't make sense at the moment.

Bottom line: your best bet for right now if you want to build from source, 
is to check out hte full Lucene trunk 
(https://svn.apache.org/repos/asf/lucene/dev/trunk/) instead of just Solr, 
and build at the top level -- using all hte jars produced instead of 
trusting that any of hte POMs will be correct.  

(The simple fact is even if the POMs were correct, because it's all one 
trunk now the Solr POMs would refer to Lucene jars that don't exist in any 
repository - because they are all compiled at once)





-Hoss



Re: newbie question on how to batch commit documents

2010-06-01 Thread Chris Hostetter

: CommonsHttpSolrServer.request() resulting in multiple searchers.  My first
: thought was to change the configs for autowarming.  But after looking at the
: autowarm params, I am not sure what can be changed or perhaps a different
: approach is recommened.

even with 0 autowarming (which is what you have) it can still take time to 
close/open a searcher on every commit -- which is why a commit per doc is 
not usually a good idea (and is *definitely* not a good idea when doing 
batch indexing.

most people can get away with just doing one commit after all their docs 
have been added (ie: at the end of the batch) but if you've got a ot of 
distinct clients, doing a lot of parllel indexing and you don't want to 
coordinate who is responsible for sending the commit, you can configure 
"autocommit" to happen on the solr server...

http://wiki.apache.org/solr/SolrConfigXml#Update_Handler_Section

...but in general you should make sure that your clients sending docs can 
deal with the occasional long delays (or possibly even needing to retry) 
when an occasional commit might block add/delete operations because of an 
expensive segment merge.

-Hoss



Re: Subclassing DIH

2010-06-01 Thread Lukas Kahwe Smith

On 01.06.2010, at 23:35, Chris Hostetter wrote:

> 
> : 
> http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780
> 
> yeah, i remember that thread -- it really seems like a driver issue, but 
> understandable that "fixing hte driver" is probably more out of scope then 
> "working arround in solr"
> 
> : I never did find a "good" solution to that bug however I did come up with a
> : workaround. I noticed if I removed my deletedPkQuery then the delta-import
> : would work as expected. Obviously I still have the need to delete items out
> : of the index during indexing so I wanted to subclass the DataImportHandler
> : to first update all documents then I would delete all the documents that my
> : deletedPkQuery would have deleted.
> 
> i'm not a DIH expert, but have you considered the possibility of having 
> two 
> distinct "entities" declared in your config, that both refer to the same 
> logical entity -- one that you use fo hte delta importing, and one that 
> you use for hte deletedPkQuery ?
> 
> I'm not sure if it would work, but based on another recent thread i saw, i 
> think it might...


to me the entire delta-query approach makes no sense, but i digress. here is a 
cut down version of the config i use todo full imports, deletes and updates









As you can see I have parameterized the DSN information. Plus I have one query 
defined for the deletes and another one for both the full import and updates. 
if clear is set to anything but false, the where condition evalutes to true and 
the updated_at would be ignored in pretty much any decent RDBMS. if its false, 
then the updated_at is checked as per usual.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: SolrException: No such core

2010-06-01 Thread Chris Hostetter

You have to give us more details then that if you expect anyone to have 
a clue what might be going wrong...

* what does your code for initializing solr look like?
* what does your soler home dir look like (ie: what files are in it)
* what do all of your config files look like?
* what is the full stack trace of these exceptions, and what does your 
code look like around the lines where these stack traces indicate your 
code is interacting with solr?
* etc...

http://wiki.apache.org/solr/UsingMailingLists


: Date: Fri, 28 May 2010 11:19:49 +0200 (CEST)
: From: jfmn...@free.fr
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: SolrException: No such core
: 
: With embedded solr (1.3.0) sometime a SolrException happens. 
: I don't understand why : I have not been able to find a scenario. 
: 
: 
: org.apache.solr.common.SolrException: No such core: core0
: at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112)
: at 
org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217)
: at org.apache.solr.client.solrj.SolrServer.deleteById(SolrServer.java:97)
: 
: Regards
: 
: JF
: 



-Hoss



Re: Logs for Java Replication in Solr

2010-06-01 Thread Chris Hostetter
: 
: where can I find more information about a failure of a Java replication
: in Solr 1.4?
: (Dashboard does not seem to be the best place!?)

All the log message are written using the JDK Logging framework, so it 
really depends on your servlet container, and where it's configured to 
write the logs...

http://wiki.apache.org/solr/SolrLogging



-Hoss



Re: Subclassing DIH

2010-06-01 Thread Chris Hostetter

: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780

yeah, i remember that thread -- it really seems like a driver issue, but 
understandable that "fixing hte driver" is probably more out of scope then 
"working arround in solr"

: I never did find a "good" solution to that bug however I did come up with a
: workaround. I noticed if I removed my deletedPkQuery then the delta-import
: would work as expected. Obviously I still have the need to delete items out
: of the index during indexing so I wanted to subclass the DataImportHandler
: to first update all documents then I would delete all the documents that my
: deletedPkQuery would have deleted.

i'm not a DIH expert, but have you considered the possibility of having 
two 
distinct "entities" declared in your config, that both refer to the same 
logical entity -- one that you use fo hte delta importing, and one that 
you use for hte deletedPkQuery ?

I'm not sure if it would work, but based on another recent thread i saw, i 
think it might...

http://lucene.472066.n3.nabble.com/deleteDocByID-td858903.html#a858951


...in any event, subclassing the entire DataImportHandler definitley seems 
like overkill for what you are trying to achieve -- we just need ot get 
some of the DIH experts to chime in here.

-Hoss



Re: nested querries, and LocalParams syntax

2010-06-01 Thread Yonik Seeley
It's not clear if you're just trying to figure it all out, or get
something specific to work.
If you can give a specific example, we might be able to suggest easier
ways to achieve it rather than going escape crazy :-)

-Yonik
http://www.lucidimagination.com



On Tue, Jun 1, 2010 at 5:06 PM, Jonathan Rochkind  wrote:
> Thanks, the pointer to that documentation page (which somehow I had missed),
> as well as Chris's response is very helpful.
>
> The one thing I'm still not sure about, which I might be able to figure it
> out through trial-and-error reverse engineering, is escaping issues when you
> combine nested querries WITH local params. We potentially have a lot of
> levels of quotes:
>
> q= URIescape(    _local_="{!dismax qf=" value that itself contains a \"
> quote mark"} "phrase query"    "   )
>
> Whole bunch of quotes going on there. How do I give this to Solr so all my
> quotes will end up parsed appropriately? Obviously that above example isn't
> right.   We've got the quotes around the _local_ nested query, then we've
> got quotes around a LocalParam value, then we've got quotes that might be IN
> the actual literal value of the LocalParam, or quotes that might be in the
> actual literal value of the nested query.  Maybe using single quotes in some
> places but double quotes in others will help, for certain places that can
> take singel or double quotes?
> Thanks very much for any advice, I get confused thinking about this.
>
> Jonathan
>
> Chris Hostetter wrote:
>>
>> In addition to yonik's point about the LocalParams wiki page (and please
>> let us know if you aren't sure of the answers to any of your questions after
>> reading it) I wanted to clear up one thing...
>>
>> : Let's start with that not-nested query example.   Can you in fact use it
>> as
>> : above, to force dismax handling of the 'q' even if the qt or request
>> handler
>>
>> Quick side note: "qt" determines the ReequestHandler -- if it's "dismax"
>> then you get the DisMaxRequestHandler which in recent versions of solr is
>> just a thin subclass of the SearchHandler subclass where the default value
>> of "defType" (which is used to pick a QParser) is "dismax" instead of
>> "lucene" ... i tried to explain this in a recent blog...
>>
>> http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
>>
>> ... the key thing to note is that "defType" is a param that is specific to
>> SearchHandler -- if you use "qt" to pick some other third party
>> RequestHandler, it's not neccessarily going to do *anything* and the nested
>> params syntax may not work at all.
>>
>> : default is something else?  The documentation is confusing: "In standard
>> Solr
>> : search handlers, the default type of the main query only may be
>> specified via
>> : the defType parameter. The default type of all other query parameters
>> will
>> : remain "lucene "."
>> : : I _think_ it's trying to say that I _can_, even in a standard search
>> handler,
>> : force dismax with {!dismax}, I just can't change the type of _other_
>> query
>> : parameters -- rather than saying that I _can't_ use {!dismax} to force
>> dismax
>> : type of 'q' in a "standard search handler".  Yes?
>>
>> You're right, it is confusing -- the point is tha defType changes the
>> "default QParser type" for the "q" param -- but it doesn't change it for any
>> other param.  I've improved the wording, but the key to keep in mind is that
>> that is completley orthoginal to using the local params syntax that you
>> asked about.
>>
>> What that documentation is trying to illustrate is that in this request...
>>
>>   defType=XXX&q=AAA&fq=BBB
>>
>> ...the "XXX" QParser will be used to parse the value "AAA" -- but the
>> stock "lucene" QParser will be used to parse the "fq" param
>>
>> Regardless of the value of defType, if you put the local params syntax
>> ({!foo}) at the begining of a query param, you can force that param to be
>> parsed the way you wish...
>>
>>   defType=XXX&q={!foo}AAA&fq={!bar}BBB
>>
>> ...in that example, neither the XXX or "lucene" QParsers are ever used.
>>
>>
>>
>> -Hoss
>>
>>
>>
>


Re: nested querries, and LocalParams syntax

2010-06-01 Thread Jonathan Rochkind
Thanks, the pointer to that documentation page (which somehow I had 
missed), as well as Chris's response is very helpful.


The one thing I'm still not sure about, which I might be able to figure 
it out through trial-and-error reverse engineering, is escaping issues 
when you combine nested querries WITH local params. We potentially have 
a lot of levels of quotes:


q= URIescape(_local_="{!dismax qf=" value that itself contains a \" 
quote mark"} "phrase query""   )


Whole bunch of quotes going on there. How do I give this to Solr so all 
my quotes will end up parsed appropriately? Obviously that above example 
isn't right.   We've got the quotes around the _local_ nested query, 
then we've got quotes around a LocalParam value, then we've got quotes 
that might be IN the actual literal value of the LocalParam, or quotes 
that might be in the actual literal value of the nested query.  Maybe 
using single quotes in some places but double quotes in others will 
help, for certain places that can take singel or double quotes? 


Thanks very much for any advice, I get confused thinking about this.

Jonathan

Chris Hostetter wrote:
In addition to yonik's point about the LocalParams wiki page (and please 
let us know if you aren't sure of the answers to any of your questions 
after reading it) I wanted to clear up one thing...


: Let's start with that not-nested query example.   Can you in fact use it as
: above, to force dismax handling of the 'q' even if the qt or request handler

Quick side note: "qt" determines the ReequestHandler -- if it's "dismax" 
then you get the DisMaxRequestHandler which in recent versions of solr is 
just a thin subclass of the SearchHandler subclass where the 
default value of "defType" (which is used to pick a QParser) is "dismax" 
instead of "lucene" ... i tried to explain this in a recent blog...


http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

... the key thing to note is that "defType" is a param that is specific to 
SearchHandler -- if you use "qt" to pick some other third party 
RequestHandler, it's not neccessarily going to do *anything* and the 
nested params syntax may not work at all.


: default is something else?  The documentation is confusing: "In standard Solr
: search handlers, the default type of the main query only may be specified via
: the defType parameter. The default type of all other query parameters will
: remain "lucene "."
: 
: I _think_ it's trying to say that I _can_, even in a standard search handler,

: force dismax with {!dismax}, I just can't change the type of _other_ query
: parameters -- rather than saying that I _can't_ use {!dismax} to force dismax
: type of 'q' in a "standard search handler".  Yes?

You're right, it is confusing -- the point is tha defType changes the 
"default QParser type" for the "q" param -- but it doesn't change it for 
any other param.  I've improved the wording, but the key to keep in mind 
is that that is completley orthoginal to using the local params syntax 
that you asked about.


What that documentation is trying to illustrate is that in this request...

   defType=XXX&q=AAA&fq=BBB

...the "XXX" QParser will be used to parse the value "AAA" -- but the 
stock "lucene" QParser will be used to parse the "fq" param


Regardless of the value of defType, if you put the local params 
syntax ({!foo}) at the begining of a query param, you can force that param 
to be parsed the way you wish...


   defType=XXX&q={!foo}AAA&fq={!bar}BBB

...in that example, neither the XXX or "lucene" QParsers are ever used.



-Hoss


  


Inserting shards in overridden SearchComponent prepare method yields null pointer

2010-06-01 Thread Jason Rutherglen
The insert shards code is as follows:

ModifiableSolrParams modParams = new ModifiableSolrParams(params);
modParams.set("shards", shards);
rb.req.setParams(modParams);

Where shards is a valid single shard pseudo URL.

Stacktrace:

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.createRetrieveDocs(QueryComponent.java:497)
at 
org.apache.solr.handler.component.QueryComponent.distributedProcess(QueryComponent.java:298)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:234)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:342)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:245)
at


Re: Enhancing Solr relevance functions through predefined constants

2010-06-01 Thread Prasanna R
On Tue, Jun 1, 2010 at 11:57 AM, Chris Hostetter
wrote:

> :
> : I have a suggestion for improving relevance functions in Solr by way of
> : providing access to a set of pre-defined constants in Solr queries.
> : Specifically, the number of documents indexed, the number of unique terms
> in
> : a field, the total number of terms in a field, etc. are some of the
> : query-time constants that I believe can be made use of in function
> queries
> : as well as boosted queries to aid in the relevance calculations.
>
> I'm not sure if he was inspired by your email or not, but i did notice
> yonik just opened an issue that sounds very similar to this...
>
> https://issues.apache.org/jira/browse/SOLR-1932
>
>
This bug definitely addresses what I had in mind. Glad to see a patch out
for it. I feel this has the potential to become pretty big once we have some
real use cases for it.



> FWIW: number of unique terms in a field is reall, really, expensive to
> compute (although perhaps we could cache it somewhere)
>

The number of unique terms (and other similar metrics) is pretty much a
query-time constant and we can have it optionally computed and then cached
at the end of every major index build which will make it readily available
for consumption. This will be particularly suited for cases where we have
indexes being built on a node(s) that does not serve traffic and then is
replicated to the servers that handle the traffic.

Prasanna


Re: Enhancing Solr relevance functions through predefined constants

2010-06-01 Thread Chris Hostetter
: 
: I have a suggestion for improving relevance functions in Solr by way of
: providing access to a set of pre-defined constants in Solr queries.
: Specifically, the number of documents indexed, the number of unique terms in
: a field, the total number of terms in a field, etc. are some of the
: query-time constants that I believe can be made use of in function queries
: as well as boosted queries to aid in the relevance calculations.

I'm not sure if he was inspired by your email or not, but i did notice 
yonik just opened an issue that sounds very similar to this...

https://issues.apache.org/jira/browse/SOLR-1932

FWIW: number of unique terms in a field is reall, really, expensive to 
compute (although perhaps we could cache it somewhere)



-Hoss



Re: Luke browser does not show non-String Solr fields?

2010-06-01 Thread Chris Hostetter

: So it seems like Luke does not understand Solr's long type. This
: is not a native Lucene type?

No,  Lucene has concept of "types" ... there are utilities to help encode 
some data in special ways (particularly numbers) but the underlying lucene 
index doesn't keep track of when/how you do ths -- so Luke has no way of 
knowing what "type" the field is.

Schema information is specific to Solr.


-Hoss



RE: DIH, Full-Import, DB and Performance.

2010-06-01 Thread cbennett
Performance is dependent on your server/data and the batchsize. To reduce
the server load experiment with different batchsize settings. The higher the
batch size the faster the import and the higher your SQL Server load will
be. Try starting with a small batch and then gradually increasing it.

Colin.

> -Original Message-
> From: stockii [mailto:st...@shopgate.com]
> Sent: Tuesday, June 01, 2010 12:31 PM
> To: solr-user@lucene.apache.org
> Subject: RE: DIH, Full-Import, DB and Performance.
> 
> 
> thx for the reply =)
> 
> 
> i try out selectMethod="cursor"  but the load of the server is going
> bigger
> and bigger during a import =(
> 
> selectMethod="cursor" only solve the problem with the locking ? right ?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p862043.html
> Sent from the Solr - User mailing list archive at Nabble.com.





RE: DIH, Full-Import, DB and Performance.

2010-06-01 Thread stockii

thx for the reply =)


i try out selectMethod="cursor"  but the load of the server is going bigger
and bigger during a import =(

selectMethod="cursor" only solve the problem with the locking ? right ? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p862043.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DIH, Full-Import, DB and Performance.

2010-06-01 Thread cbennett
The settings and defaults will depend on which version of SQL Server you are
using and which version of the JDBC driver.

The default for resonseBuffering was changed to adaptive after version 1.2
so unless you are using 1.2 or earlier you don't need to set it to adaptive.

Also if I remember correctly the batchsize will only take affect if you are
using cursors, the default is for all data to be sent to the client
(selectMethod is direct).

Using the default settings for the MS sqljdbc driver caused locking issues
in our database. As soon as the full import started shared locks would be
set on all rows and wouldn't be removed until all the data had been sent,
which for us would be around 30 minutes. During that time no updates could
get an exclusive lock which of course led to huge problems.

Setting selectMethod="cursor" solved the problem for us although it does
slow down the full import.

Another option that worked for us was to not set the selectMethod and set
readOnly="true", but be sure you understand the implications. This causes
all data to be sent to the client (which is the default), giving maximum
performance, and causes no locks to be set which resolves the other issues.
However, this sets transaction isolation to TRANSACTION_READ_UNCOMMITTED
which will cause the select statement to ignore any locks when getting data
so the consistency of the data cannot be guaranteed, which may or may not be
an issue depending on your particular situation.


Colin.

> -Original Message-
> From: stockii [mailto:st...@shopgate.com]
> Sent: Tuesday, June 01, 2010 7:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH, Full-Import, DB and Performance.
> 
> 
> do you think that the option
> 
> responseBuffer="adaptive"
> 
> should solve my problem ?
> 
> 
> From DIH FAQ ...:
> 
> I'm using DataImportHandler with MS SQL Server database with sqljdbc
> driver.
> DataImportHandler is going out of memory. I tried adjustng the
> batchSize
> values but they don't seem to make any difference. How do I fix this?
> 
> There's a connection property called responseBuffering in the sqljdbc
> driver
> whose default value is "full" which causes the entire result set to be
> fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for
> more
> details. You can set this property to "adaptive" to keep the driver
> from
> getting everything into memory. Connection properties like this can be
> set
> as an attribute (responseBuffering="adaptive") in the dataSource
> configuration OR directly in the jdbc url specified in
> DataImportHandler's
> dataSource configuration.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/DIH-
> Full-Import-DB-and-Performance-tp861068p861134.html
> Sent from the Solr - User mailing list archive at Nabble.com.





how to use "q=string" in solrconfig.xml `?

2010-06-01 Thread stockii

hello.

this ist my request to solr. and i cannot change this.:
http://host/solr/select/?q=string

i cannot change this =( so i have a new termsComponent. i want to use
q=string as default for terms.prefix=string.

can i do somethin like this: ?


 true 
 suggest
 index
 ${???}


its all working, but i dont know how its possible to use terms.prefix =
q=string ?!?!?!!?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-use-q-string-in-solrconfig-xml-tp861870p861870.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis: /solr/mlt NOT_FOUND

2010-06-01 Thread Ahmet Arslan
> I have some experience using MLT with
> the StandardRequestHandler with Python
> but I can't figure out how to do it with solrj. It seems
> that to do
> MLT with solrj I have
> to use MoreLikeThisRequestHandler and there seems no way to
> use
> StandardRequestHandler for MLT with solrj (please correct
> me if I'm wrong.)
> 
> So I try to test it by following this page:
> http://wiki.apache.org/solr/MoreLikeThisHandler
> but I get this error:
> 
> HTTP ERROR: 404
> NOT_FOUND
> RequestURI=/solr/mlt
> 
> Do I need to do something in the config file before I can
> use MLT?

Did you register /mlt in your solrconfig.xml?



list



you can invoke it with SolrQuery.set("qt", "/mlt");


  


Default filter in solr config (+filter document by now for near time index feeling)

2010-06-01 Thread Charton, Andre
Hi,

I have this use case: I update index every 10 min on a master-solr (via batch) 
and replicate them to slaves. The clients use the slaves. From client view now 
it's ugly: it looks like we change our index only every 10 minutes. Sure, but 
idea now is to index all documents with a index date, set this index date 10 
min to the future and create a filter "INDEX_DATE:[* TO NOW]".

Question 1: is it possible to set this as part of solr-config, so every 
implementation against the server will regard this.

Question 2: From caching point of view this sounds a little ugly, is it - 
anybody tried this?

Thanks,

André


Highlighting arbitrary text without really indexing it

2010-06-01 Thread Binesh Gummadi
Hi,

I have a use case where I have to highlight indexed field values in
arbitrary text without indexing arbitrary text.

Here is an example

*Indexed field values are*
Lucid
Platform
Solr

*Arbitrary text (not indexed)*
Lucid Imagination and Guardian News and Media today announced that the
Guardian‘s Open Platform, commercially launched today,  is powered by Solr,
the Lucene Enterprise Search Server. During development of the Content API,
the Guardian tapped Lucid Imagination's deep expertise with Solr and related
technologies, unlocking the Guardian’s content for new online commercial
service models.

*Expected result
Lucid* Imagination and Guardian News and Media today announced that
the Guardian‘s Open *Platform*, commercially launched today,  is
powered by *Solr*, the Lucene Enterprise Search Server. During
development of the Content API, the Guardian tapped Lucid Imagination's deep
expertise with *Solr* and related technologies, unlocking the
Guardian’s content for new online commercial service models.


One approach is to use FieldAnalysisRequestHandler which takes three inputs
1) field name 2) field value 3) query. I can provide 1 and 3 but I have to
query index to get field value(s). Querying index for values and feeding
them to FieldAnalaysisRequestHandler doesn't sound like a good option in my
opinion.

How can I achieve this? Any pointers would be helpful.

Thank You
Binesh Gummadi


wrong lucene package in solr trunk?

2010-06-01 Thread Hannes Korte
Hi,

In order to use the current trunk version of solr, I built it running
"ant package" in trunk/solr and added the resulting maven artifacts to
my project.

Unfortunately the generated pom.xml-files contain the dependency to
lucene-*-2.9.1, but are compiled with the contents of
trunk/solr/lucene-libs.

Running solr this leads to a NoSuchMethodError in
org.apache.lucene.util.Version, because the 2.9.1 version does not
contain the valueOf-method, which is called from the current solr trunk
code.

Of course I could manually build maven artifacts from the jars contained
in lucene-libs, but this is a rather ugly solution. Does anybody know an
elegant way to handle this?

Best regards,
Hannes


Re: Logs for Java Replication in Solr

2010-06-01 Thread Peter Karich
Hi,

Now we are getting the following exception [1] under
admin/replication/index.jsp and I have no clue what the cause could be
and couldn't find further info about it...

And how can I configure that the indices log into different log files
under the multi-index setup for tomcat [2]?

Regards,
Peter.

> Hi,
>
> where can I find more information about a failure of a Java replication
> in Solr 1.4?
> (Dashboard does not seem to be the best place!?)
>
> Regards,
> Peter.
>
>   

[1]
HTTP Status 500 - org/apache/commons/httpclient/methods/PostMethod
org.apache.jasper.JasperException:
org/apache/commons/httpclient/methods/PostMethod at
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:460)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:355)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:679)
at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:461)
at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:399)
at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:301)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
at


[2]
http://wiki.apache.org/solr/SolrTomcat


Re: HTML encode extracted docs - Problems with solr.HTMLStripCharFilter

2010-06-01 Thread Damian Bursztyn
Did anybody find a way to fix this more than removing the
HTMLStripCharFilter analyzer during the indexing?

Thanks

On Sat, Mar 13, 2010 at 7:55 PM, Lance Norskog  wrote:

> HTMLStripCharFilter is only in the analyzer: it creates searchable
> terms from the HTML input. The raw HTML is stored and fetched.
>
> There are some bugs in term positions and highlighting, An
> EntityProcessor wrapping the HTMLStripCharFIlter would be really
> useful.
>
> On Tue, Mar 9, 2010 at 5:31 AM, Mark Roberts 
> wrote:
> > Sounds like "solr.HTMLStripCharFilter" may work... except, I'm getting a
> couple of problems:
> >
> > 1) HTML still seems to be getting into my content field
> >
> > All I did was add 
> to the index analyzer for the my "text" fieldType.
> >
> >
> > 2) Some it seems to have broken my highlighting, I get this error:
> >
> > 'org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token
> wrong exceeds length of provided text sized 3862'
> >
> >
> >
> > Any ideas how I can fix this?
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Lance Norskog [mailto:goks...@gmail.com]
> > Sent: 09 March 2010 04:36
> > To: solr-user@lucene.apache.org
> > Subject: Re: HTML encode extracted docs
> >
> > A Tika integration with the DataImportHandler is in the Solr trunk.
> > With this, you can copy the raw HTML into different fields and process
> > one copy with Tika.
> >
> > If it's just straight HTML, would the HTMLStripCharFilter be good enough?
> >
> > http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2
> >
> > On Mon, Mar 8, 2010 at 5:50 AM, Mark Roberts 
> wrote:
> >> I'm uploading .htm files to be extracted - some of these files are
> "include" files that have snippets of HTML rather than fully formed html
> documents.
> >>
> >> solr-cell stores the raw HTML for these items, rather than extracting
> the text. Is there any way I can get solr to encode this content prior to
> storing it?
> >>
> >> At the moment, I have the problem that when the highlighted snippets are
>  retrieved via search, I need to parse the snippet and HTML encode the bits
> of HTML that where indexed, whilst *not* encoding the bits that where added
> by the highlighter, which is messy and time consuming.
> >>
> >> Thanks! Mark,
> >>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
"A person who never made a mistake never tried anything new."
Albert Einstein


Re: DIH, Full-Import, DB and Performance.

2010-06-01 Thread stockii

another question 

we have sometimes a load from over 3,.. on our server and only from
different tomcat instances. no import is running and not much requests send
to solr. 

we have 4 cores running for our search. 
2 cores have each 4 Million doc's and the other two cores have each around
200.000 doc's.

why is the load so much 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH, Full-Import, DB and Performance.

2010-06-01 Thread stockii

do you think that the option 

responseBuffer="adaptive" 

should solve my problem ? 


>From DIH FAQ ...: 

I'm using DataImportHandler with MS SQL Server database with sqljdbc driver.
DataImportHandler is going out of memory. I tried adjustng the batchSize
values but they don't seem to make any difference. How do I fix this?

There's a connection property called responseBuffering in the sqljdbc driver
whose default value is "full" which causes the entire result set to be
fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for more
details. You can set this property to "adaptive" to keep the driver from
getting everything into memory. Connection properties like this can be set
as an attribute (responseBuffering="adaptive") in the dataSource
configuration OR directly in the jdbc url specified in DataImportHandler's
dataSource configuration.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861134.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Interleaving the results

2010-06-01 Thread Geert-Jan Brits
Indeed, it's just a matter of ordening the results on the client-side IFF I
infer correctly from your description that you are guarenteed to get results
from enough different customers from SOlr in the first place to do the
interleaving that you describe. (In general this is a pretty big IF).

So assuming that's the case, you just make sure to return the customerid as
part of the solr-result (make sure the customerid is stored) (or get the
customerid through other means e.g: look it up in a db based on the id of
the doc returned).
Finally, simply code the interleaving (for example: throw the results in
something like Map> and iterate the map, so you get
the first element of each list then the 2nd, etc...



2010/6/1 NarasimhaRaju 

> Can some body throw some ideas, on how to achieve (interleaving) from with
> in the application especially in a distributed setup?
>
>
>  “ There are only 10 types of people in this world:-
> Those who understand binary and those who don’t “
>
>
> Regards,
> P.N.Raju,
>
>
>
>
> 
> From: Lance Norskog 
> To: solr-user@lucene.apache.org
> Sent: Sat, May 29, 2010 3:04:46 AM
> Subject: Re: Interleaving the results
>
> There is no interleaving tool. There is a random number tool. You will
> have to achive this in your application.
>
> On Fri, May 28, 2010 at 8:23 AM, NarasimhaRaju  wrote:
> > Hi,
> > how to achieve custom ordering of the documents when there is a general
> query?
> >
> > Usecase:
> > Interleave documents from different customers one after the other.
> >
> > Example:
> > Say i have 10 documents in the index belonging to 3 customers
> (customer_id field in the index ) and using query *:*
> > so all the documents in the results score the same.
> > but i want the results to be interleaved
> > one document from the each customer should appear before a document from
> the same customer repeats ?
> >
> > is there a way to achieve this ?
> >
> >
> > Thanks in advance
> >
> > R.
> >
> >
> >
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>
>
>
>
>


Spatial Query with LatLonType

2010-06-01 Thread Darren Govoni
Hi,
  I read over the SpatialWiki. It wasn't clear how to query for
documents with LatLon fields
that reside inside a specific bounding box (not distance from). Simply
put, I have a google map
and want to construct a query for single LatLon fields that are inside
the map view (between the lat/lon corners).

Ranged filter won't work because lat lon are not separate fields in this
case (and that doesn't produce correct results for me anyway).

thanks for any tips.

Darren


DIH, Full-Import, DB and Performance.

2010-06-01 Thread stockii

Hello..

We have about 4 Million Products in our our Database and the Import takes
about 1,5 hours. In this Time is the Performance of the Database very bad
and our Server crashed sometimes. It's seems that DIH send only ONE select
to the db ?!?! is that right ? 

all other processes cannot connect to the db =(...

thats very bad  what is the best solution to make a full-import better,
so that we dont have such problems !?!?!?!? an import with PHP takes to
long for us !

thats the query: 
query="select *
FROM items_de.shop_items as i, shops as s 
WHERE s.id=i.shop_id AND s.is_active=1 AND s.is_testmode=0 AND 
parent_id IS
NULL"  >

AND the Mappings for the categories:











what do you thing make it better ? can the dih use other options ? make it
sense to use anoter batchSize = "-1" 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Full-Import-DB-and-Performance-tp861068p861068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Architecture discussion

2010-06-01 Thread rabahb

Thinking twice about this architecture ...

I'm concerned about the way I'm going to automate the following steps:

A- The slaves would regularly poll Master-core1 for changes
B- A backup of the current index would be created
C- Re-Indexing will happen on Master-core2 
D- When Indexing is done, we'll trigger a swap between Master-core1 and
core2
E- Slaves will then poll and pickup the freshly updated index segments
F- and so on!

This seems to be simple when it's done manually. But I can not just sit
there and trigger a button to send the events. To reach that goal, I
realized that on solution would be to have 2 cores on the master side, while
the slaves would only have one core (as previously discussed). We'll just
need to configure the slave polling period (A,E), and send the right http
request (B,C,D). 

Well ok, step A is automated "natively". Easy enough, using the internal
solr capabilities.
But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will
pay for that.

All right so I imagine that I should implement a process that will automate
the phases that I would otherwise do manually. This would be an external
process not based on solr mechanism.

My questions are:

1/Can I leverage on some solr mechanisms (that is, by configuration only) in
order to reach that goal?
I haven't found how to do it!

2/ Is there any issue while replicating master "swapped" index files? I've
seen in the literature that there might be some issues.

3/If a solr configuration based solution does not exist, my first attempt
would be to write a shell based process that will regularly trigger the
events, wait for the end of each phase by polling the current phase status,
in order to trigger the next one. Does that sound good to you? Or is there a
better and more elegant way to do the trick when indexing and replication
should be beating at a high pace? 

Thank you.
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Architecture discussion

2010-06-01 Thread rabahb

Hi Chris,

Thanks for your insights. I totally understand your point about steps 4 and
5. I wanted to control the moment when the swap would happen on the slave
side but as you say there is no use for that. It only adds up complexity
that internal solr mechanisms are already providing.  

For the replication aspect, I re-read the whole documentation and with the
light you shed on that topic, I realize that the only problem here is the
huge amount of data that can be passed over the wire depending on the
segments that the indexing will update. As you say, optimizing can have a
devastating effect on the replication phase as, if I have a good
understanding of what you said, this could potentially update all the index
segments. 

OK! so if I rephrase it, the best strategy in my case is to limit the
optimization phases in order to prioritize the replication performance, and
make the optimization only when the replication activity is not so crucial
in order to avoid degrading the search performances. 

Thank you very much. That helps a lot.






-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Architecture-discussion-tp825708p860767.html
Sent from the Solr - User mailing list archive at Nabble.com.


MoreLikeThis: /solr/mlt NOT_FOUND

2010-06-01 Thread jlist9
I have some experience using MLT with the StandardRequestHandler with Python
but I can't figure out how to do it with solrj. It seems that to do
MLT with solrj I have
to use MoreLikeThisRequestHandler and there seems no way to use
StandardRequestHandler for MLT with solrj (please correct me if I'm wrong.)

So I try to test it by following this page:
http://wiki.apache.org/solr/MoreLikeThisHandler
but I get this error:

HTTP ERROR: 404
NOT_FOUND
RequestURI=/solr/mlt

Do I need to do something in the config file before I can use MLT?

Thanks


Re: Interleaving the results

2010-06-01 Thread NarasimhaRaju
Can some body throw some ideas, on how to achieve (interleaving) from with in 
the application especially in a distributed setup?


 “ There are only 10 types of people in this world:-
Those who understand binary and those who don’t “ 


Regards, 
P.N.Raju,





From: Lance Norskog 
To: solr-user@lucene.apache.org
Sent: Sat, May 29, 2010 3:04:46 AM
Subject: Re: Interleaving the results

There is no interleaving tool. There is a random number tool. You will
have to achive this in your application.

On Fri, May 28, 2010 at 8:23 AM, NarasimhaRaju  wrote:
> Hi,
> how to achieve custom ordering of the documents when there is a general query?
>
> Usecase:
> Interleave documents from different customers one after the other.
>
> Example:
> Say i have 10 documents in the index belonging to 3 customers (customer_id 
> field in the index ) and using query *:*
> so all the documents in the results score the same.
> but i want the results to be interleaved
> one document from the each customer should appear before a document from the 
> same customer repeats ?
>
> is there a way to achieve this ?
>
>
> Thanks in advance
>
> R.
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com



  

Re: newbie question on how to batch commit documents

2010-06-01 Thread olivier sallou
I would additionally suggest to use embeddedSolrServer for large uploads if
possible, performance are better.

2010/5/31 Steve Kuo 

> I have a newbie question on what is the best way to batch add/commit a
> large
> collection of document data via solrj.  My first attempt  was to write a
> multi-threaded application that did following.
>
> Collection docs = new ArrayList();
> for (Widget w : widges) {
>doc.addField("id", w.getId());
>doc.addField("name", w.getName());
>   doc.addField("price", w.getPrice());
>doc.addField("category", w.getCat());
>doc.addField("srcType", w.getSrcType());
>docs.add(doc);
>
>// commit docs to solr server
>server.add(docs);
>server.commit();
> }
>
> And I got this exception.
>
> rg.apache.solr.common.SolrException:
>
> Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later
>
>
> Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later
>
>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
>at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
>
> The solrj wiki/documents seemed to indicate that because multiple threads
> were calling SolrServer.commit() which in term called
> CommonsHttpSolrServer.request() resulting in multiple searchers.  My first
> thought was to change the configs for autowarming.  But after looking at
> the
> autowarm params, I am not sure what can be changed or perhaps a different
> approach is recommened.
>
>  class="solr.FastLRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> Your help is much appreciated.
>