Solr request filter and indexing process

2011-07-30 Thread 于浩
Hello,Dear friends,
 I have got an problem in developing with solr.
 In My Application ,It must sends multiple query to solr server after the
page is loaded. Then I found a problem: some request will return
statusCode:0 and QTime:0, The solr has accepted the request, but It does not
return a result document.  If I send each request  one by one manually ,It
will return the result. But If I send the request frequently in a very
 short times, It will return nothing only statusCode:0 and QTime:0.
I think this may be a stratege for solr. but i can't find any documents or
discussions on the internet.
so i want you can help me.   edited on 2011-07-28

and now I have a new problem, I am developing on php, so I connect solr
through solrPhpClient( an opensource project on google code). I find the
speed of add many documents is very slow. when I add ten documents to an
solr index, It must takes more than 5 minutes(Because of the commit process
)
anybody can help me?


Re: fragsize for highlighting

2011-07-30 Thread Frank Chiu
I ended up removing the EdgeNGramFilterFactory and the highlighting seems to
work okay.  Thanks for your help, echoParams is useful.

On Sat, Jul 30, 2011 at 2:07 PM, Ahmet Arslan  wrote:

>
> I suspected that you set fragsize twice, but from what you paste thats not
> the case.  e.g. f.description_texts.hl.fragsize=100&hl.fragsize=10
>
> However the response you pasted is not coming from that URL. It will be
> better to see compatible URL and response.
>
> echoParams=all displays all parameters used. Both defaults defined in
> solrconfig.xml and the ones in URL.
>
> http://wiki.apache.org/solr/CoreQueryParameters#echoParams
>
>
> --- On Sat, 7/30/11, Frank Chiu  wrote:
>
> > From: Frank Chiu 
> > Subject: Re: fragsize for highlighting
> > To: "Ahmet Arslan" 
> > Cc: solr-user@lucene.apache.org
> > Date: Saturday, July 30, 2011, 9:35 PM
> > I'm a bit of a newbie- adding
> > echoParams=all to my querystring isn't
> > yielding additional info (does solr 1.4 support it?).
> > Here's a query (also
> > tried adding hl.fragsize=10):
> >
> >
> http://localhost:8982/solr/select/?fl=*+score&start=0&q=gofish&qf=description_texts&hl.simple.pre=@@@hl@@@&hl.simple.post=@@@endhl@@@&fq=type:(Task)&hl=on&defType=dismax&rows=30&echoParams=all
> >
> > 
> > 
> > 0
> > 3
> > 
> > 10
> > * score
> > 0
> > immanu
> > description_texts
> > @@@hl@@@
> > @@@endhl@@@
> > type:(Task)
> > on
> > dismax
> > 30
> > 
> > 
> >
> > 
> > ...
> > 
> > @@@hl@@@some s@@@endhl@@@uper long piece of text. long
> > interesting stuff and
> > text gofish found
> > 
> > 
> > ...
> > 
> >
> >
> >
> >
> >
> > On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan 
> > wrote:
> >
> > > > Hi, I'm setting hl.fragsize = 10 in
> > > > all my highlighting fragmenters but I'm
> > > > still getting snippets being returned with >
> > 10
> > > > characters (I think I'm
> > > > getting the full text back).  I also tried
> > specifying
> > > > hl.fragsize in the
> > > > querystring, but the same thing happens.
> > Any idea why
> > > > fragsize is not
> > > > getting picked up?
> > >
> > > May be you are setting it twice? What is the output of
> > &echoParams=all?
> > >
> >
>


Re: fragsize for highlighting

2011-07-30 Thread Ahmet Arslan

I suspected that you set fragsize twice, but from what you paste thats not the 
case.  e.g. f.description_texts.hl.fragsize=100&hl.fragsize=10

However the response you pasted is not coming from that URL. It will be better 
to see compatible URL and response.

echoParams=all displays all parameters used. Both defaults defined in 
solrconfig.xml and the ones in URL.

http://wiki.apache.org/solr/CoreQueryParameters#echoParams


--- On Sat, 7/30/11, Frank Chiu  wrote:

> From: Frank Chiu 
> Subject: Re: fragsize for highlighting
> To: "Ahmet Arslan" 
> Cc: solr-user@lucene.apache.org
> Date: Saturday, July 30, 2011, 9:35 PM
> I'm a bit of a newbie- adding
> echoParams=all to my querystring isn't
> yielding additional info (does solr 1.4 support it?). 
> Here's a query (also
> tried adding hl.fragsize=10):
> 
> http://localhost:8982/solr/select/?fl=*+score&start=0&q=gofish&qf=description_texts&hl.simple.pre=@@@hl@@@&hl.simple.post=@@@endhl@@@&fq=type:(Task)&hl=on&defType=dismax&rows=30&echoParams=all
> 
> 
> 
> 0
> 3
> 
> 10
> * score
> 0
> immanu
> description_texts
> @@@hl@@@
> @@@endhl@@@
> type:(Task)
> on
> dismax
> 30
> 
> 
> 
> 
> ...
> 
> @@@hl@@@some s@@@endhl@@@uper long piece of text. long
> interesting stuff and
> text gofish found
> 
> 
> ...
> 
> 
> 
> 
> 
> 
> On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan 
> wrote:
> 
> > > Hi, I'm setting hl.fragsize = 10 in
> > > all my highlighting fragmenters but I'm
> > > still getting snippets being returned with >
> 10
> > > characters (I think I'm
> > > getting the full text back).  I also tried
> specifying
> > > hl.fragsize in the
> > > querystring, but the same thing happens. 
> Any idea why
> > > fragsize is not
> > > getting picked up?
> >
> > May be you are setting it twice? What is the output of
> &echoParams=all?
> >
>


Re: Solr Incremental Indexing

2011-07-30 Thread Alexei Martchenko
I always have a field in my databases called datelastmodified, so whenever I
update that record, i set it to getdate() - mssql func - and then get all
latest records order by that field.

2011/7/29 Mohammed Lateef Hussain 

> Hi
>
> Need some help in Solr incremental indexing approch.
>
> I have built my Solr index using SolrJ API and now want to update the index
> whenever any changes has been made in
> database. My requirement is not to use DB triggers to call any update
> events.
>
> I want to update my index on the fly whenever my application updates any
> record in database.
>
> Note: My indexing logic to get the required data from DB is some what
> complex and involves many tables.
>
> Please suggest me how can I proceed here.
>
> Thanks
> Lateef
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: fragsize for highlighting

2011-07-30 Thread Frank Chiu
I'm a bit of a newbie- adding echoParams=all to my querystring isn't
yielding additional info (does solr 1.4 support it?).  Here's a query (also
tried adding hl.fragsize=10):

http://localhost:8982/solr/select/?fl=*+score&start=0&q=gofish&qf=description_texts&hl.simple.pre=@@@hl@@@&hl.simple.post=@@@endhl@@@&fq=type:(Task)&hl=on&defType=dismax&rows=30&echoParams=all



0
3

10
* score
0
immanu
description_texts
@@@hl@@@
@@@endhl@@@
type:(Task)
on
dismax
30




...

@@@hl@@@some s@@@endhl@@@uper long piece of text. long interesting stuff and
text gofish found


...






On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan  wrote:

> > Hi, I'm setting hl.fragsize = 10 in
> > all my highlighting fragmenters but I'm
> > still getting snippets being returned with > 10
> > characters (I think I'm
> > getting the full text back).  I also tried specifying
> > hl.fragsize in the
> > querystring, but the same thing happens.  Any idea why
> > fragsize is not
> > getting picked up?
>
> May be you are setting it twice? What is the output of &echoParams=all?
>


Re: slow highlighting because of stemming

2011-07-30 Thread Michael Sokolov

On 7/30/2011 3:46 AM, Orosz György wrote:

Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Consider that the highlighter needs to match terms from the query with 
terms from the document, just like search. If the indexed document has 
been stemmed, then the query also needs to be stemmed, or you won't see 
matches.


-Mike


Re: Autocomplete with Solr 3.1

2011-07-30 Thread O. Klein
According to
http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/
it should be possible to set spellcheck.maxCollations to 5.

This doesn't work for me in 4.0, nor does it work with the regular
spellchecker, unless I set spellcheck.maxCollationTries to a value like 10.

Then I get a list of collations.

However adding these parameters to the suggester doesn't do anything.

Is this common behavior? Or is my Solr borked?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3211775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fragsize for highlighting

2011-07-30 Thread Ahmet Arslan
> Hi, I'm setting hl.fragsize = 10 in
> all my highlighting fragmenters but I'm
> still getting snippets being returned with > 10
> characters (I think I'm
> getting the full text back).  I also tried specifying
> hl.fragsize in the
> querystring, but the same thing happens.  Any idea why
> fragsize is not
> getting picked up?

May be you are setting it twice? What is the output of &echoParams=all?


Re: slow highlighting because of stemming

2011-07-30 Thread Ahmet Arslan
> I am doing some logging about stemming, and what I can see
> is that a lot of
> tokens are stemmed for the highlighting. It is the strange
> part, since I
> don't understand why does any highlighter need stemming
> again.

Highlighting do re-analyze the text being highlighted.

> Anyway my docments are not really large, just a few
> kilobytes, but thanks
> for this suggestion.
> 
> If you could help me in "how could I just ignore the
> stemming for
> highlighting" thing it would be very great!

If you store term vectors, the this re-analyze is skipped.
http://wiki.apache.org/solr/FieldOptionsByUseCase


fragsize for highlighting

2011-07-30 Thread Frank Chiu
Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm
still getting snippets being returned with > 10 characters (I think I'm
getting the full text back).  I also tried specifying hl.fragsize in the
querystring, but the same thing happens.  Any idea why fragsize is not
getting picked up?
Thanks!


Re: slow highlighting because of stemming

2011-07-30 Thread Orosz György
Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Anyway my docments are not really large, just a few kilobytes, but thanks
for this suggestion.

If you could help me in "how could I just ignore the stemming for
highlighting" thing it would be very great!

Thanks,
Gyuri

2011/7/29 Mike Sokolov 

> I'm not sure I would identify stemming as the culprit here.
>
> Do you have very large documents?  If so, there is a patch for FVH
> committed to limit the number of phrases it looks at; see hl.phraseLimit,
> but this won't be available until 3.4 is released.


> You can also limit the amount of each document that is analyzed by the
> regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to
> FVH? not sure)
>
> Using RegexFragmenter is also probably slower than something like
> SimpleFragmenter.
>
> There is work to implement faster highlighting for Solr/Lucene, but it
> depends on some basic changes to the search architecture so it might be a
> while before that becomes available.  See https://issues.apache.org/**
> jira/browse/LUCENE-3318if 
> you're interested in following that development.
>
> -Mike
>
>
> On 07/29/2011 04:55 AM, Orosz György wrote:
>
>> Dear all,
>>
>> I am quite new about using Solr, but would like to ask your help.
>> I am developing an application which should be able to highlight the
>> results
>> of a query. For this I am using regex fragmenter:
>> 
>>> class="org.apache.solr.**highlight.RegexFragmenter">
>> 
>>   500
>>   0.5
>>   <**/str>
>>  

Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-07-30 Thread Prasanna R
We use a dismax handler with mm 1 in our Solr installation. I have a
fieldType defined that creates shingles to handle space variations in the
input strings and user queries. This fieldType can successfully handle cases
where the query is 'thunderbolt' and the document contains the string
'thunder bolt' (the shingle results in the token 'thunderbolt' created
during indexing).  However, due to the pre-analysis whitespace tokenization
done by lucene query parser, the reverse is not handled well - document with
string 'thunderbolt' being matched to query 'thunder bolt'.

I find that in our dismax handler the shingle field records a match and
scores on the 'pf' but the document is not returned as none of the fields in
'qf' record a match (mm is 1). I am looking for suggestions on how to handle
this scenario. Using a synonym will obviously work but it seems a rather
hackish solution. Is there a more elegant way of achieving a similar effect?


Alternatively, is there a way to get the 'mm' parameter to factor in matches
on 'pf' also?

Kindly help.

Regards,

Prasanna