RE: Document row in solr Result

2011-09-12 Thread Pierre GOSSE
Hi Eric,

If you want a query informing one customer of its product row at any given 
time, the easiest way is to filter on submission date greater than this 
customer's and return the result count. If you have 500 products with an 
earlier submission date, your row number is 501.

Hope this helps,

Pierre


-Message d'origine-
De : Eric Grobler [mailto:impalah...@googlemail.com] 
Envoyé : lundi 12 septembre 2011 11:00
À : solr-user@lucene.apache.org
Objet : Re: Document row in solr Result

Hi Manish,

Thank you for your time.

For upselling reasons I want to inform the customer that:
"your product is on the last page of the search result. However, click here
to put your product back on the first page..."


Here is an example:
I have a phone with productid 635001 in the iphone category.
When I sort this category by submissiondate this product will be near the
end of the result (on row 9863 in this example).
At the moment I have to scan nearly 1 rows in the client to determine
the position of this product.
Is there a more efficient way to find the position of a specific document in
a resultset without returning the full result?

q=category:iphone
fl=productid
sort=submissiondate desc
rows=1

 row productid submissiondate
   1 6565692011-09-12 08:12
   2 6564682011-09-12 08:03
   3 6562012011-09-11 23:41
...
9863 6350012011-08-11 17:22
...
9922 6344232011-08-10 21:51

Regards
Ericz

On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna wrote:

> You might not be able to find the row index.
> Can you post your query in detail. The kind of inputs and outputs you are
> expecting.
>
> On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler  >wrote:
>
> > Hi Manish,
> >
> > Thanks for your reply - but how will that return me the row index of the
> > original query.
> >
> > Regards
> > Ericz
> >
> > On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna  > >wrote:
> >
> > > fq -> filter query parameter searches within the results.
> > >
> > > On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler <
> impalah...@googlemail.com
> > > >wrote:
> > >
> > > > Hi Solr experts,
> > > >
> > > > If you have a site with products sorted by submission date, the
> product
> > > of
> > > > a
> > > > customer might be on page 1 on the first day, and then move down to
> > page
> > > x
> > > > as other customers submit newer entries.
> > > >
> > > > To find the row of a product you can of course run the query and loop
> > > > through the result until you find the specific productid like:
> > > > q=category:myproducttype
> > > > fl=productid
> > > > sort=submissiondate desc
> > > > rows=1
> > > >
> > > > But is there perhaps a more efficient way to do this? Maybe a special
> > > > syntax
> > > > to search within the result.
> > > >
> > > > Thanks
> > > > Ericz
> > > >
> > >
> >
>


RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Merging does not happen often enough to keep deleted documents to a low enough 
count ?

Maybe there's a need to have "partial" optimization available in solr, meaning 
that segment with too much deleted document could be copied to a new file 
without unnecessary datas. That way cleaning deleted datas could be compatible 
with having light replications.

I'm worried by this idea of deleted documents influencing relevance scores, any 
pointer to how important this influence may be ?

Pierre

-Message d'origine-
De : Shawn Heisey [mailto:s...@elyograg.org] 
Envoyé : vendredi 22 juillet 2011 16:42
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

On 7/22/2011 8:23 AM, Pierre GOSSE wrote:
> I've read that in a thread title " Weird optimize performance degradation", 
> where Erick Erickson states that "Older versions of Lucene would search 
> faster on an optimized index, but this is no longer necessary.", and more 
> recently in a thread you initiated a month ago "Question about optimization".
>
> I'll also be very interested if anyone had a more precise idea/datas of 
> benefits and tradeoff of optimize vs merge ...

My most recent testing has been with Solr 3.2.0.  I have noticed some 
speedup after optimizing an index, but the gain is not 
earth-shattering.  My index consists of 7 shards.  One of them is small, 
and receives all new documents every two minutes.  The others are large, 
and aside from deletes, are mostly static.  Once a day, the oldest data 
is distributed from the small shard to its proper place in the other six 
shards.

The small shard is optimized once an hour, and usually takes less than a 
minute.  I optimize one large shard every day, so each one gets 
optimized once every six days.  That optimize takes 10-15 minutes.  The 
only reason that I optimize is to remove deleted documents, whatever 
speedup I get is just icing on the cake.  Deleted documents take up 
space and continue to influence the relevance scoring of queries, so I 
want to remove them.

Thanks,
Shawn



RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Hi Mark

I've read that in a thread title " Weird optimize performance degradation", 
where Erick Erickson states that "Older versions of Lucene would search faster 
on an optimized index, but this is no longer necessary.", and more recently in 
a thread you initiated a month ago "Question about optimization".

I'll also be very interested if anyone had a more precise idea/datas of 
benefits and tradeoff of optimize vs merge ...

Pierre


-Message d'origine-
De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 15:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Hello,

Pierre, can you tell us where you read that?
"I've read here that optimization is not always a requirement to have an
efficient index, due to some low level changes in lucene 3.xx"

Marc.

On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE wrote:

> Solr will response for search during optimization, but commits will have to
> wait the end of the optimization process.
>
> During optimization a new index is generated on disk by merging every
> single file of the current index into one big file, so you're server will be
> busy, especially regarding disk access. This may alter your response time
> and has very negative effect on the replication of index if you have a
> master/slave architecture.
>
> I've read here that optimization is not always a requirement to have an
> efficient index, due to some low level changes in lucene 3.xx, so maybe you
> don't really need optimization. What version of solr are you using ? Maybe
> someone can point toward a relevant link about optimization other than solr
> wiki
> http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
>
> Pierre
>
>
> -Message d'origine-
> De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
> Envoyé : vendredi 22 juillet 2011 12:45
> À : solr-user@lucene.apache.org
> Objet : Re: commit time and lock
>
> Thanks for clarity.
>
> One more thing I want to know about optimization.
>
> Right now I am planning to optimize the server in 24 hour. Optimization is
> also time taking ( last time took around 13 minutes), so I want to know
> that
> :
>
> 1. when optimization is under process that time will solr server response
> or
> not?
> 2. if server will not response then how to do optimization of server fast
> or
> other way to do optimization so our user will not have to wait to finished
> optimization process.
>
> regards
> Jonty
>
>
>
> On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE  >wrote:
>
> > Solr still respond to search queries during commit, only new indexations
> > requests will have to wait (until end of commit?). So I don't think your
> > users will experience increased response time during commits (unless your
> > server is much undersized).
> >
> > Pierre
> >
> > -Message d'origine-
> > De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
> > Envoyé : jeudi 21 juillet 2011 20:27
> > À : solr-user@lucene.apache.org
> > Objet : Re: commit time and lock
> >
> > Actually i m worried about the response time. i k commiting around 500
> > docs in every 5 minutes. as i know,correct me if i m wrong; at the
> > time of commiting solr server stop responding. my concern is how to
> > minimize the response time so user not need to wait. or any other
> > logic will require for my case. please suggest.
> >
> > regards
> > jonty
> >
> > On Tuesday, June 21, 2011, Erick Erickson 
> wrote:
> > > What is it you want help with? You haven't told us what the
> > > problem you're trying to solve is. Are you asking how to
> > > speed up indexing? What have you tried? Have you
> > > looked at: http://wiki.apache.org/solr/FAQ#Performance?
> > >
> > > Best
> > > Erick
> > >
> > > On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods 
> > wrote:
> > >> I am using solrj to index the data. I have around 5 docs indexed.
> As
> > at
> > >> the time of commit due to lock server stop giving response so I was
> > >> calculating commit time:
> > >>
> > >> double starttemp = System.currentTimeMillis();
> > >> server.add(docs);
> > >> server.commit();
> > >> System.out.println("total time in commit = " +
> > (System.currentTimeMillis() -
> > >> starttemp)/1000);
> > >>
> > >> It taking around 9 second to commit the 5000 docs with 15 fields.
> > However I
> > >> am not confirm the lock time of index whether it is start
> > >> since server.add(docs); time or server.commit(); time only.
> > >>
> > >> If I am changing from above to following
> > >>
> > >> server.add(docs);
> > >> double starttemp = System.currentTimeMillis();
> > >> server.commit();
> > >> System.out.println("total time in commit = " +
> > (System.currentTimeMillis() -
> > >> starttemp)/1000);
> > >>
> > >> then commit time becomes less then 1 second. I am not sure which one
> is
> > >> right.
> > >>
> > >> please help.
> > >>
> > >> regards
> > >> Jonty
> > >>
> > >
> >
>


RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr will response for search during optimization, but commits will have to 
wait the end of the optimization process.

During optimization a new index is generated on disk by merging every single 
file of the current index into one big file, so you're server will be busy, 
especially regarding disk access. This may alter your response time and has 
very negative effect on the replication of index if you have a master/slave 
architecture.

I've read here that optimization is not always a requirement to have an 
efficient index, due to some low level changes in lucene 3.xx, so maybe you 
don't really need optimization. What version of solr are you using ? Maybe 
someone can point toward a relevant link about optimization other than solr 
wiki 
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

Pierre


-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 12:45
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Thanks for clarity.

One more thing I want to know about optimization.

Right now I am planning to optimize the server in 24 hour. Optimization is
also time taking ( last time took around 13 minutes), so I want to know that
:

1. when optimization is under process that time will solr server response or
not?
2. if server will not response then how to do optimization of server fast or
other way to do optimization so our user will not have to wait to finished
optimization process.

regards
Jonty



On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE wrote:

> Solr still respond to search queries during commit, only new indexations
> requests will have to wait (until end of commit?). So I don't think your
> users will experience increased response time during commits (unless your
> server is much undersized).
>
> Pierre
>
> -Message d'origine-
> De : Jonty Rhods [mailto:jonty.rh...@gmail.com]
> Envoyé : jeudi 21 juillet 2011 20:27
> À : solr-user@lucene.apache.org
> Objet : Re: commit time and lock
>
> Actually i m worried about the response time. i k commiting around 500
> docs in every 5 minutes. as i know,correct me if i m wrong; at the
> time of commiting solr server stop responding. my concern is how to
> minimize the response time so user not need to wait. or any other
> logic will require for my case. please suggest.
>
> regards
> jonty
>
> On Tuesday, June 21, 2011, Erick Erickson  wrote:
> > What is it you want help with? You haven't told us what the
> > problem you're trying to solve is. Are you asking how to
> > speed up indexing? What have you tried? Have you
> > looked at: http://wiki.apache.org/solr/FAQ#Performance?
> >
> > Best
> > Erick
> >
> > On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods 
> wrote:
> >> I am using solrj to index the data. I have around 5 docs indexed. As
> at
> >> the time of commit due to lock server stop giving response so I was
> >> calculating commit time:
> >>
> >> double starttemp = System.currentTimeMillis();
> >> server.add(docs);
> >> server.commit();
> >> System.out.println("total time in commit = " +
> (System.currentTimeMillis() -
> >> starttemp)/1000);
> >>
> >> It taking around 9 second to commit the 5000 docs with 15 fields.
> However I
> >> am not confirm the lock time of index whether it is start
> >> since server.add(docs); time or server.commit(); time only.
> >>
> >> If I am changing from above to following
> >>
> >> server.add(docs);
> >> double starttemp = System.currentTimeMillis();
> >> server.commit();
> >> System.out.println("total time in commit = " +
> (System.currentTimeMillis() -
> >> starttemp)/1000);
> >>
> >> then commit time becomes less then 1 second. I am not sure which one is
> >> right.
> >>
> >> please help.
> >>
> >> regards
> >> Jonty
> >>
> >
>


RE: commit time and lock

2011-07-22 Thread Pierre GOSSE
Solr still respond to search queries during commit, only new indexations 
requests will have to wait (until end of commit?). So I don't think your users 
will experience increased response time during commits (unless your server is 
much undersized).

Pierre

-Message d'origine-
De : Jonty Rhods [mailto:jonty.rh...@gmail.com] 
Envoyé : jeudi 21 juillet 2011 20:27
À : solr-user@lucene.apache.org
Objet : Re: commit time and lock

Actually i m worried about the response time. i k commiting around 500
docs in every 5 minutes. as i know,correct me if i m wrong; at the
time of commiting solr server stop responding. my concern is how to
minimize the response time so user not need to wait. or any other
logic will require for my case. please suggest.

regards
jonty

On Tuesday, June 21, 2011, Erick Erickson  wrote:
> What is it you want help with? You haven't told us what the
> problem you're trying to solve is. Are you asking how to
> speed up indexing? What have you tried? Have you
> looked at: http://wiki.apache.org/solr/FAQ#Performance?
>
> Best
> Erick
>
> On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods  wrote:
>> I am using solrj to index the data. I have around 5 docs indexed. As at
>> the time of commit due to lock server stop giving response so I was
>> calculating commit time:
>>
>> double starttemp = System.currentTimeMillis();
>> server.add(docs);
>> server.commit();
>> System.out.println("total time in commit = " + (System.currentTimeMillis() -
>> starttemp)/1000);
>>
>> It taking around 9 second to commit the 5000 docs with 15 fields. However I
>> am not confirm the lock time of index whether it is start
>> since server.add(docs); time or server.commit(); time only.
>>
>> If I am changing from above to following
>>
>> server.add(docs);
>> double starttemp = System.currentTimeMillis();
>> server.commit();
>> System.out.println("total time in commit = " + (System.currentTimeMillis() -
>> starttemp)/1000);
>>
>> then commit time becomes less then 1 second. I am not sure which one is
>> right.
>>
>> please help.
>>
>> regards
>> Jonty
>>
>


RE: What is the different?

2011-07-22 Thread Pierre GOSSE
Hi,

Have you check the queries by using the debugQuery=true parameter ? This could 
give some hints of what is searched in both cases.

Pierre

-Message d'origine-
De : cnyee [mailto:yeec...@gmail.com] 
Envoyé : vendredi 22 juillet 2011 05:14
À : solr-user@lucene.apache.org
Objet : What is the different?

Hi,

I have two queries:

(1) q = (change management)
(2) q = (change management) AND domain_ids:(0^1.3 OR 1)

The purpose of the (2) is to boost the records with domain_ids=0.
In my database all records has domain_ids = 0 or 1, so domains_ids:(0 or 1)
will always returns the full database.

Now my questions is - query (2) returns 5000+ results, but query (1) returns
700+ results.

Can somebody enlighten me on what is the reasons behind such a vast
different in number of results?

Many thanks in advance.

Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-different-tp3190278p3190278.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
It is redundancy. You have to balance the cost of redundancy with the cost in 
performance with your web index requested by your windows service. If your 
windows service is not too aggressive in its requests, go for shards.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 15:05
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-Original Message-----
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

>From what you tell us, I guess a separate index for website docs would be the 
>best. If you fear that request from the window service would cripple your web 
>site performance, why not have a totally separated index on another server, 
>and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>


RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
>From what you tell us, I guess a separate index for website docs would be the 
>best. If you fear that request from the window service would cripple your web 
>site performance, why not have a totally separated index on another server, 
>and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-----
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>


RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE
The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>


RE: Multiple indexes

2011-06-17 Thread Pierre GOSSE
> I think there are reasons to use seperate indexes for each document type
> but do combined searches on these indexes
> (for example if you need separate TFs for each document type).

I wonder if in this precise case it wouldn't be pertinent to have a single 
index with the various document types each having each their own fields set. 
Isn't TF calculated field by field ?


RE: Document match with no highlight

2011-05-13 Thread Pierre GOSSE
In WordDelimiterFilter the parameters catenateNumbers, catenateWords, 
catenateAlls are set to 1. This parameters adds overlapping tokens which could 
explain that you meet the bug described in the jira issue I mentioned. 

As I understand WordDelimiterFilter :
"0176 R3 1.5 TO" should we tokenized with tokens "R3" overlapping with "R" and 
"3", and "15" overlapping with "1" and "5" 

This parmeters are set to 0 for query, but having them set to 1 should not 
correct your problem unless you search for "R3 1.5".

I think you have to either
 - set this parameters to 0 in index, but your query won't match anymore
 - wait for correction to be released in a new solr version, 
 - use solr trunk, 
 - or backport the modifications in the lucene-highlighter version you use. 

I did a backport for solr 1.4.1 since I won't move to 3.0 until some time, so 
please ask if you have question about how to do this.

Pierre


-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 20:06
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my "text" fieldtype.


  





  
  






  

Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
"0176 R3 1.5 TO "

Searching for "3 1 15" returns a match with "empty" highlight.
Searching for "3 1 15"~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE wrote:

> > Since you're using the standard "text" field, this should NOT be you're
> case.
>
> Sorry, for the missing NOT in previous phrase. You should have the same
> issue given what you said, but still, it sound very similar.
>
> Are you sure your fieldtype "text" has nothing special ? a tokenizer or
> filter that could add some token in your indexed text but not in your query,
> like for example a WordDelimiter present in  and not  ?
>
> Pierre
>
> -Message d'origine-
> De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
> Envoyé : jeudi 12 mai 2011 18:21
> À : solr-user@lucene.apache.org
> Objet : RE: Document match with no highlight
>
> > In fact if I did "3 1 15"~1 I do get snipet also.
>
> Strange, I had a very similar problem, but with overlapping tokens. Since
> you're using the standard "text" field, this should be you're case.
>
> Maybe you could have a look at this issue, since it sound very familiar to
> me :
> https://issues.apache.org/jira/browse/LUCENE-3087
>
> Pierre
>
> -Message d'origine-
> De : Phong Dais [mailto:phong.gd...@gmail.com]
> Envoyé : jeudi 12 mai 2011 17:26
> À : solr-user@lucene.apache.org
> Objet : Re: Document match with no highlight
>
> Hi,
>
> 
>
> The type "text" is the default one that came with the default solr 1.4
> install w.o any modifications.
>
> If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
> get snipet also.
>
> Hope that helps.
>
> P.
>
> On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:
>
> >  > URL:
> > >
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> > >
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> > >
> > > XML:
> > > 
> > > 
> > >   
> > > 0
> > > 19
> > > 
> > >   
> > >> > name="indent">on
> > >> > name="hl.fl">DOC_TEXT
> > >> > name="wt">standard
> > >> > name="hl.maxAnalyzedChars">-1
> > >   on
> > >   10
> > >> > name="version">2.2
> > >> > name="debugQuery">on
> > >> > name="fl">DOC_TEXT,score
> > >   0
> > >   DOC_TEXT:"3 1
> > > 15"
> > >> > name="qt">standard
> > >   
> > > 
> > >   
> > >   

RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
> Since you're using the standard "text" field, this should NOT be you're case.

Sorry, for the missing NOT in previous phrase. You should have the same issue 
given what you said, but still, it sound very similar. 

Are you sure your fieldtype "text" has nothing special ? a tokenizer or filter 
that could add some token in your indexed text but not in your query, like for 
example a WordDelimiter present in  and not  ?

Pierre

-Message d'origine-
De : Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Envoyé : jeudi 12 mai 2011 18:21
À : solr-user@lucene.apache.org
Objet : RE: Document match with no highlight

> In fact if I did "3 1 15"~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard "text" field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,



The type "text" is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:

>  > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> > 0
> > 19
> > 
> >   
> >> name="indent">on
> >> name="hl.fl">DOC_TEXT
> >> name="wt">standard
> >> name="hl.maxAnalyzedChars">-1
> >   on
> >   10
> >> name="version">2.2
> >> name="debugQuery">on
> >> name="fl">DOC_TEXT,score
> >   0
> >   DOC_TEXT:"3 1
> > 15"
> >> name="qt">standard
> >   
> > 
> >   
> >   

RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
> In fact if I did "3 1 15"~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard "text" field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,



The type "text" is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan  wrote:

>  > URL:
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> >
> > XML:
> > 
> > 
> >   
> > 0
> > 19
> > 
> >   
> >> name="indent">on
> >> name="hl.fl">DOC_TEXT
> >> name="wt">standard
> >> name="hl.maxAnalyzedChars">-1
> >   on
> >   10
> >> name="version">2.2
> >> name="debugQuery">on
> >> name="fl">DOC_TEXT,score
> >   0
> >   DOC_TEXT:"3 1
> > 15"
> >> name="qt">standard
> >   
> > 
> >   
> >   

RE: Allowing looser matches

2011-04-13 Thread Pierre GOSSE
For (a) I don't think anything exists today providing this mechanism. 
But (b) is a good description of the dismax handler with a MM parameter of 66%. 

Pierre

-Message d'origine-
De : Mark Mandel [mailto:mark.man...@gmail.com] 
Envoyé : mercredi 13 avril 2011 10:04
À : solr-user@lucene.apache.org
Objet : Allowing looser matches

Not sure if the title explains it all, or if what I want is even possible,
but figured I would ask.

Say, I have a series of products I'm selling, and a search of:

"Blue Wool Rugs"

Comes in.  This returns 0 results, as "Blue" and "Rugs" match terms that are
indexes, "Wool" does not.

Is there a way to configure my index/searchHandler, to either:

(a) if no documents are returned, look to partial matches of the search
(e.g. return results with 'Blue rugs', in this case)
(b) add results to the overall search, but at a lower score, that have only
*some* of the terms being searched in them (in this case, maybe 2/3)

Is that even possible?

Thanks,

Mark

-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au

Hands-on ColdFusion ORM Training
www.ColdFusionOrmTraining.com


RE: Highlighting Problem

2011-03-29 Thread Pierre GOSSE
Look like special chars are filtered at index time and not replaced by space 
that would keep correct offset of terms. Can you paste here the definition of 
the fieldtype in your shema.xml ?


Pierre

-Message d'origine-
De : pottw...@freenet.de [mailto:pottw...@freenet.de] 
Envoyé : lundi 28 mars 2011 11:16
À : solr-user@lucene.apache.org
Objet : Highlighting Problem

dear solr specialists,

my data looks like this:

j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg 
asdlfj.

if I want to query for the first "word", the following queries must match:

j]s(dh)fjk
j]s(dhfjk
j]sdhfjk
jsdhfjk
dhf

So the matching should ignore some characters like ( ) [ ] and should match 
substrings.

So far I have the following field definition in the schema.xml:

    
  
    
    
    
    
     
  
  
    
    
      
    
     
  
    


With this definition the matching works as planned. But not for highlighting, 
there the special characters seem to move the  tags to wrong positions, for 
example searching for "jsdhfjk" misses the last 3 letters of the words ( = 3 
special characters from PatternReplaceFilterFactory)

j]s(dh)fjk

Solr has so many bells and whistles - what must I do to get a correctly working 
highlighting?

kind regards,
F.


---
Zeigen Sie uns Ihre beste Seite und gewinnen Sie ein iPad!
Machen Sie mit beim freenet Homepage Award 2011


RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread Pierre GOSSE
I do have the xml preamble  in my config 
file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't 
try with 7.0.11 yet.

I wonder why your exception point to line 4 column 6, however. Shouldn't it 
point to line 1 column 1 ? Do you have some blank lines at the start of your 
XML file or some non blank lines ?

Pierre

-Message d'origine-
De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
Envoyé : jeudi 17 mars 2011 14:48
À : solr-user@lucene.apache.org
Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'', 


Here is my context file:


   

---

Hope this helps.

Cheers

François


On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:

> Hello list,
> 
> Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
> past I have been using guidance in accordance with
> http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
> but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
> E.g.
> 
> INFO: Deploying configuration descriptor wombra.xml < This is my context
> fragment
> from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
> 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
> SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
> target matching "[xX][mM][lL]" is not allowed.
> org.xml.sax.SAXParseException: The processing instruction target
> matching "[xX][mM][lL]" is not allowed.
> ...
> 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
> deployDescriptor
> SEVERE: Error deploying configuration descriptor wombra.xml
> org.xml.sax.SAXParseException: The processing instruction target
> matching "[xX][mM][lL]" is not allowed.
> ...
> some more
> ...
> 
> My configuration descriptor is as follows
> 
>  crossContext="true">
>   value="/home/lewis/Downloads/wombra" override="true"/>
> 
> 
> Preferably I would upload a WAR file, but I have been working well with
> the configuration I have been using up until now therefore I didn't
> question change.
> I am unfamiliar with the above errors. Can anyone please point me in the
> right direction?
> 
> Thank you
> Lewis
> 
> Glasgow Caledonian University is a registered Scottish charity, number 
> SC021474
> 
> Winner: Times Higher Education's Widening Participation Initiative of the 
> Year 2009 and Herald Society's Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> 
> Winner: Times Higher Education's Outstanding Support for Early Career 
> Researchers of the Year 2010, GCU as a lead with Universities Scotland 
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
Well, Jonathan explanations are much more accurate than mine. :)

I took the word serialization as meaning kind of isolation between commits, 
which is not very smart. Sorry to have introduce more confusion in this.

Pierre

-Message d'origine-
De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] 
Envoyé : mercredi 9 février 2011 17:04
À : solr-user@lucene.apache.org
Objet : Re: Concurrent updates/commits

Hello,

Thanks very much for your quick replies.

So, according to Pierre, all updates will be immediately posted to Solr, but
all commits will be serialised. But doesn't that contradict Jonathan's
example where you can end up with "FIVE 'new indexes' being warmed"? If
commits are serialised, then there can only ever be one Index Searcher being
auto-warmed at a time or have I got this wrong?

The reason we are investigating commit serialisation, is because we want to
know whether the commit requests will be blocked until the previous ones
finish.

Cheers,
- Savvas

On 9 February 2011 15:44, Pierre GOSSE  wrote:

> > However, the Solr book, in the "Commit, Optimise, Rollback" section
> reads:
> > "if more than one Solr client were to submit modifications and commit
> them
> > at similar times, it is possible for part of one client's set of changes
> to
> > be committed before that client told Solr to commit"
> > which suggests that requests are *not* serialised.
>
> I read this as "If two client submit modifications and commits every couple
> of minutes, it could happen that modifications of client1 got committed by
> client2's commit before client1 asks for a commit."
>
> As far as I understand Solr commit, they are serialized by design. And
> committing too often could lead you to trouble if you have many warm-up
> queries (?).
>
> Hope this helps,
>
> Pierre
> -Message d'origine-
> De : Savvas-Andreas Moysidis [mailto:
> savvas.andreas.moysi...@googlemail.com]
> Envoyé : mercredi 9 février 2011 16:34
> À : solr-user@lucene.apache.org
> Objet : Concurrent updates/commits
>
> Hello,
>
> This topic has probably been covered before here, but we're still not very
> clear about how multiple commits work in Solr.
> We currently have a requirement to make our domain objects searchable
> immediately after the get updated in the database by some user action. This
> could potentially cause multiple updates/commits to be fired to Solr and we
> are trying to investigate how Solr handles those multiple requests.
>
> This thread:
>
> http://search-lucene.com/m/0cab31f10Mh/concurrent+commits&subj=commit+concurrency+full+text+search
>
> suggests that Solr will handle all of the lower level details and that
> "Before
> a *COMMIT* is done , lock is obtained and its released  after the
> operation"
> which in my understanding means that Solr will serialise all update/commit
> requests?
>
> However, the Solr book, in the "Commit, Optimise, Rollback" section reads:
> "if more than one Solr client were to submit modifications and commit them
> at similar times, it is possible for part of one client's set of changes to
> be committed before that client told Solr to commit"
> which suggests that requests are *not* serialised.
>
> Our questions are:
> - Does Solr handle concurrent requests or do we need to add synchronisation
> logic around our code?
> - If Solr *does* handle concurrent requests, does it serialise each request
> or has some other strategy for processing those?
>
>
> Thanks,
> - Savvas
>


RE: Concurrent updates/commits

2011-02-09 Thread Pierre GOSSE
> However, the Solr book, in the "Commit, Optimise, Rollback" section reads:
> "if more than one Solr client were to submit modifications and commit them 
> at similar times, it is possible for part of one client's set of changes to 
> be committed before that client told Solr to commit"
> which suggests that requests are *not* serialised.

I read this as "If two client submit modifications and commits every couple of 
minutes, it could happen that modifications of client1 got committed by 
client2's commit before client1 asks for a commit."

As far as I understand Solr commit, they are serialized by design. And 
committing too often could lead you to trouble if you have many warm-up queries 
(?).

Hope this helps,

Pierre
-Message d'origine-
De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] 
Envoyé : mercredi 9 février 2011 16:34
À : solr-user@lucene.apache.org
Objet : Concurrent updates/commits

Hello,

This topic has probably been covered before here, but we're still not very
clear about how multiple commits work in Solr.
We currently have a requirement to make our domain objects searchable
immediately after the get updated in the database by some user action. This
could potentially cause multiple updates/commits to be fired to Solr and we
are trying to investigate how Solr handles those multiple requests.

This thread:
http://search-lucene.com/m/0cab31f10Mh/concurrent+commits&subj=commit+concurrency+full+text+search

suggests that Solr will handle all of the lower level details and that "Before
a *COMMIT* is done , lock is obtained and its released  after the
operation"
which in my understanding means that Solr will serialise all update/commit
requests?

However, the Solr book, in the "Commit, Optimise, Rollback" section reads:
"if more than one Solr client were to submit modifications and commit them
at similar times, it is possible for part of one client's set of changes to
be committed before that client told Solr to commit"
which suggests that requests are *not* serialised.

Our questions are:
- Does Solr handle concurrent requests or do we need to add synchronisation
logic around our code?
- If Solr *does* handle concurrent requests, does it serialise each request
or has some other strategy for processing those?


Thanks,
- Savvas


RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE
Yes, I see I didn't understand that facet.query parameter.

Have you consider submitting two queries ? One for results with q.op=OR, one 
for faceting with q.op=AND ?


-Message d'origine-
De : Grijesh [mailto:pintu.grij...@gmail.com] 
Envoyé : vendredi 4 février 2011 10:42
À : solr-user@lucene.apache.org
Objet : RE: Problem in faceting


facet.query=+water +treatement +plant will not return the city facet that is
needed by poster.
That will give the counts matching the query facet.query=+water +treatement
+plant only

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422881.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Problem in faceting

2011-02-04 Thread Pierre GOSSE
Using a facet query like

facet.query=+water +treatement +plant 

... should give a count of 0 to documents not having all tree terms. This could 
do the trick, if I understand how this parameter works.