date:20090410

Re: Performance when indexing or cold cache

2009-04-10 Thread sunnyfr


Hi Walter,

Did you find a way to sort out your issue, I would be very interested.
Thanks a lot,


Walter Underwood wrote:
> 
> We've had some performance problems while Solr is indexing and also when
> it
> starts with a cold cache. I'm still digging through our own logs, but I'd
> like to get more info about this, so any ideas or info are welcome.
> 
> We have four Solr servers on dual CPU PowerPC machines, 2G of heap, about
> 100-300 queries/second, 250K docs, Tomcat 6.0.10, not fronted by Apache.
> We don't use facets, we sort by score. In general use, there are six
> different request handlers called to build a page. Here is one, they
> are all very similar.
> 
>   
> 
>  0.01
>  
> exact^8.0 exact_alt^6.0 exact_base^8.0 title^4.0 title_alt^3.0
> title_base^4.0 phonetic_hi^1.0
>  
>  
> exact^12.0 exact_alt^9.0 exact_base^12.0 title^6.0 title_alt^4.0
> title_base^6.0 phonetic_hi^1.5
>  
>  
> popularity^2.0
>  
>  
> id,type,movieid,personid,genreid,score
>  
>  1
>  100
> 
> 
>   (pushstatus:A AND (type:movie OR type:person))
> 
>   
> 
> wunder
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-when-indexing-or-cold-cache-tp13348420p22984912.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Any tips for indexing large amounts of data?

2009-04-10 Thread sunnyfr


ok but how people do for a frequent update for a large dabase and lot of
query on it ?
do they turn off the slave during the warmup ?? 


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr  wrote:
>>
>> Hi Otis,
>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB
>> index
>> for 14M docs and 5 update every 30mn but my replication kill
>> everything.
>> My segments are merged too often sor full index replicate and cache lost
>> and
>>  I've no idea what can I do now?
>> Some help would be brilliant,
>> btw im using Solr 1.4.
>>
> 
> sunnnyfr , whether the replication is full or delta , the caches are
> lost completely.
> 
> you can think of partitioning the index into separate Solrs and
> updating one partition at a time and perform distributed search.
> 
>> Thanks,
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> Mike is right about the occasional slow-down, which appears as a pause
>>> and
>>> is due to large Lucene index segment merging.  This should go away with
>>> newer versions of Lucene where this is happening in the background.
>>>
>>> That said, we just indexed about 20MM documents on a single 8-core
>>> machine
>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
>>> took
>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>> approach before some of our changes apparently required several days to
>>> index the same amount of data.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>> - Original Message 
>>> From: Mike Klaas 
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>> Subject: Re: Any tips for indexing large amounts of data?
>>>
>>> There should be some slowdown in larger indices as occasionally large
>>> segment merge operations must occur.  However, this shouldn't really
>>> affect overall speed too much.
>>>
>>> You haven't really given us enough data to tell you anything useful.
>>> I would recommend trying to do the indexing via a webapp to eliminate
>>> all your code as a possible factor.  Then, look for signs to what is
>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>> the computer thrashing, etc?
>>>
>>> -Mike
>>>
>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>
 Hi,

 Thanks for answering this question a while back. I have made some
 of the suggestions you mentioned. ie not committing until I've
 finished indexing. What I am seeing though, is as the index get
 larger (around 1Gb), indexing is taking a lot longer. In fact it
 slows down to a crawl. Have you got any pointers as to what I might
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:

>
> : I would think you would see better performance by allowing auto
> commit
> : to handle the commit size instead of reopening the connection
> all the
> : time.
>
> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>  just
> index everything, and don't commit until you are completely done.
>
> autoCommitting will slow your indexing down (the benefit being
> that more
> results will be visible to searchers as you proceed)
>
>
>
>
> -Hoss
>

>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22986152.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: multiple tokenizers needed

2009-04-10 Thread Grant Ingersoll

The only thing that comes to mind in a short term way is writing two  
TokenFilter implementations that wrap the second and third tokenizers


On Apr 9, 2009, at 11:00 PM, Ashish P wrote:



I want to analyze a text based on pattern ";" and separate on  
whitespace and

it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
 




Can anyone please tell me how to achieve this?? Because the above  
syntax is

not at all possible.
--
View this message in context: 
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Question on Solr Distributed Search

2009-04-10 Thread Shalin Shekhar Mangar

On Fri, Apr 10, 2009 at 7:50 AM, vivek sar  wrote:

> Just an update. I changed the schema to store the unique id field, but
> I still get the connection reset exception. I did notice that if there
> is no data in the core then it returns the 0 result (no exception),
> but if there is data and you search using "shards" parameter I get the
> connection reset exception. Can anyone provide some tip on where can I
> look for this problem?
>
>
Did you re-index after changing the field to stored?
-- 
Regards,
Shalin Shekhar Mangar.

QueryElevationComponent : hot update of elevate.xml

2009-04-10 Thread Nicolas Pastorino


Hello !


Browsing the mailing-list's archives did not help me find the answer,  
hence the question asked directly here.


Some context first :
Integrating Solr with a CMS ( eZ Publish ), we chose to support  
Elevation. The idea is to be able to 'elevate' any object from the  
CMS. This can be achieved through eZ Publish's back office, with a  
dedicated Elevate administration GUI, the configuration is stored in  
the CMS temporarily, and then synchronized frequently and/or on  
demand onto Solr. This synchronisation is currently done as follows :

1. Generate the elevate.xml based on the stored configuration
2. Replace elevate.xml in Solr's dataDir
3. Commit. It appears that when having elevate.xml in Solr's dataDir,  
and solely in this case, commiting triggers a reload of elevate.xml.  
This does not happen when elevate.xml is stored in Solr's conf dir.



This method has one main issue though : eZ Publish needs to have  
access to the same filesystem as the one on which Solr's dataDir is  
stored. This is not always the case when the CMS is clustered for  
instance --> show stopper :(


Hence the following idea / RFC :
How about extending the Query Elevation system with the possibility  
to push an updated elevate.xml file/XML through HTTP ?
This would update the file where it is actually located, and trigger  
a reload of the configuration.
Not being very knowledgeable about Solr's API ( yet ! ), i cannot  
figure out whether this would be possible, how this would be  
achievable ( which type of plugin for instance ) or even be valid ?


Thanks a lot in advance for your thoughts,
--
Nicolas

Re: Any tips for indexing large amounts of data?

2009-04-10 Thread Noble Paul നോബിള്‍ नोब्ळ्

they don't usually turn off the slave , but it is not a bad idea if
you can take it offline. It is a logistical headache.

BTW do you have very good cache hit ratio? then it makes sense to autowarm .
--Noble

On Fri, Apr 10, 2009 at 4:07 PM, sunnyfr  wrote:
>
> ok but how people do for a frequent update for a large dabase and lot of
> query on it ?
> do they turn off the slave during the warmup ??
>
>
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
>>
>> On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr  wrote:
>>>
>>> Hi Otis,
>>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB
>>> index
>>> for 14M docs and 5 update every 30mn but my replication kill
>>> everything.
>>> My segments are merged too often sor full index replicate and cache lost
>>> and
>>>  I've no idea what can I do now?
>>> Some help would be brilliant,
>>> btw im using Solr 1.4.
>>>
>>
>> sunnnyfr , whether the replication is full or delta , the caches are
>> lost completely.
>>
>> you can think of partitioning the index into separate Solrs and
>> updating one partition at a time and perform distributed search.
>>
>>> Thanks,
>>>
>>>
>>> Otis Gospodnetic wrote:

 Mike is right about the occasional slow-down, which appears as a pause
 and
 is due to large Lucene index segment merging.  This should go away with
 newer versions of Lucene where this is happening in the background.

 That said, we just indexed about 20MM documents on a single 8-core
 machine
 with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
 took
 a little less than 10 hours - that's over 550 docs/second.  The vanilla
 approach before some of our changes apparently required several days to
 index the same amount of data.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: Mike Klaas 
 To: solr-user@lucene.apache.org
 Sent: Monday, November 19, 2007 5:50:19 PM
 Subject: Re: Any tips for indexing large amounts of data?

 There should be some slowdown in larger indices as occasionally large
 segment merge operations must occur.  However, this shouldn't really
 affect overall speed too much.

 You haven't really given us enough data to tell you anything useful.
 I would recommend trying to do the indexing via a webapp to eliminate
 all your code as a possible factor.  Then, look for signs to what is
 happening when indexing slows.  For instance, is Solr high in cpu, is
 the computer thrashing, etc?

 -Mike

 On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:

> Hi,
>
> Thanks for answering this question a while back. I have made some
> of the suggestions you mentioned. ie not committing until I've
> finished indexing. What I am seeing though, is as the index get
> larger (around 1Gb), indexing is taking a lot longer. In fact it
> slows down to a crawl. Have you got any pointers as to what I might
> be doing wrong?
>
> Also, I was looking at using MultiCore solr. Could this help in
> some way?
>
> Thank you
> Brendan
>
> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>
>>
>> : I would think you would see better performance by allowing auto
>> commit
>> : to handle the commit size instead of reopening the connection
>> all the
>> : time.
>>
>> if your goal is "fast" indexing, don't use autoCommit at all ...
  just
>> index everything, and don't commit until you are completely done.
>>
>> autoCommitting will slow your indexing down (the benefit being
>> that more
>> results will be visible to searchers as you proceed)
>>
>>
>>
>>
>> -Hoss
>>
>






>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22986152.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: multiple tokenizers needed

2009-04-10 Thread Erik Hatcher

Or have the indexing client split the data at these delimiters and  
just use the CJKAnalyzer.


Erik

On Apr 10, 2009, at 7:30 AM, Grant Ingersoll wrote:

The only thing that comes to mind in a short term way is writing two  
TokenFilter implementations that wrap the second and third tokenizers


On Apr 9, 2009, at 11:00 PM, Ashish P wrote:



I want to analyze a text based on pattern ";" and separate on  
whitespace and

it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
 




Can anyone please tell me how to achieve this?? Because the above  
syntax is

not at all possible.
--
View this message in context: 
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: QueryElevationComponent : hot update of elevate.xml

2009-04-10 Thread Ryan McKinley



On Apr 10, 2009, at 7:48 AM, Nicolas Pastorino wrote:


Hello !


Browsing the mailing-list's archives did not help me find the  
answer, hence the question asked directly here.


Some context first :
Integrating Solr with a CMS ( eZ Publish ), we chose to support  
Elevation. The idea is to be able to 'elevate' any object from the  
CMS. This can be achieved through eZ Publish's back office, with a  
dedicated Elevate administration GUI, the configuration is stored in  
the CMS temporarily, and then synchronized frequently and/or on  
demand onto Solr. This synchronisation is currently done as follows :

1. Generate the elevate.xml based on the stored configuration
2. Replace elevate.xml in Solr's dataDir
3. Commit. It appears that when having elevate.xml in Solr's  
dataDir, and solely in this case, commiting triggers a reload of  
elevate.xml. This does not happen when elevate.xml is stored in  
Solr's conf dir.



This method has one main issue though : eZ Publish needs to have  
access to the same filesystem as the one on which Solr's dataDir is  
stored. This is not always the case when the CMS is clustered for  
instance --> show stopper :(


Hence the following idea / RFC :
How about extending the Query Elevation system with the possibility  
to push an updated elevate.xml file/XML through HTTP ?
This would update the file where it is actually located, and trigger  
a reload of the configuration.
Not being very knowledgeable about Solr's API ( yet ! ), i cannot  
figure out whether this would be possible, how this would be  
achievable ( which type of plugin for instance ) or even be valid ?



Perhaps look at implementing custom RequestHandler:
http://wiki.apache.org/solr/SolrRequestHandler

maybe it could POST the new elevate.xm and then save it to the right  
place and call commit...


ryan

Re: Additive filter queries

2009-04-10 Thread Matthew Runo

That would work, but the other part of our problem comes in when we  
then try to facet on the resulting set.. If we filter by size 1, for  
example, and then facet Width again - we get facet results that have  
no size 1's, because we have no taught solr what 1_W means, etc etc..


I think field collapsing might solve this for us, maybe..

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Apr 9, 2009, at 5:23 PM, Chris Hostetter wrote:


: Right now a document looks like this:
:
: 
: 
: 1598548
: 12545
: Adidas
: 1, 2, 3, 4, 5, 6, 7
: AA, A, B, W, W, 
: Brown
: 
:
: If we went down a level, it could look like..
: 
: 
: 1598548
: 12545
: 654641654684
: Adidas
: 1
: AA
: Brown
: 

If you want result at the "product" level then you don't have to  
have one

*doc* per legal size+width pair ... you just need one *term* per
valid size+width pair

 1, 2, 3, 4, 5, 6, 7
 AA, A, B, W, W, 
 1_W 2W 3_B 3_W 4_AA 4_A 4_B 4_W 4_WW 5_W 5_ 6_  
7_


a search for size 4 clogs would look like...

 q=clogs&fq=size:5&facet.field=opts&f.opts.facet.prefix=4_

...and the facet counts for "opts" would tell me what widths were
available (and how many).

for completeness you typically want to index the pairs in both  
directions
(1_W and W_1 ... typically in seperate fields) so the user can  
filter by
either option first ... for something like size+color this makes  
sense,

but i'm guessing with shoes no one expects to narrow by "width" untill
they've narrowed by size first.


-Hoss

Index Version Number

2009-04-10 Thread Richard Wiseman

Is it possible for a Solr client to determine if the index has changed 
since the last time it performed a query?  For example, is it possible 
to query the current Lucene indexVersion?


Thanks in advance for your help,
Richard

Re: Question on Solr Distributed Search

2009-04-10 Thread vivek sar

yes - it's all new indexes. I can search them individually, but adding
"shards" throws "Connection Reset" error. Is there any way I can debug
this or any other pointers?

-vivek

On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar
 wrote:
> On Fri, Apr 10, 2009 at 7:50 AM, vivek sar  wrote:
>
>> Just an update. I changed the schema to store the unique id field, but
>> I still get the connection reset exception. I did notice that if there
>> is no data in the core then it returns the 0 result (no exception),
>> but if there is data and you search using "shards" parameter I get the
>> connection reset exception. Can anyone provide some tip on where can I
>> look for this problem?
>>
>>
> Did you re-index after changing the field to stored?
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Help with relevance failure in Solr 1.3

2009-04-10 Thread Walter Underwood

If you don't see the attachments, you can get them here:

http://wunderwood.org/solr/

wunder

On 4/10/09 10:56 AM, "Walter Underwood"  wrote:

> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and
> I would appreciate any ideas.
> 
> Ocassionally, a server will start returning results with really poor
> relevance. Single term queries work fine, but multi-term queries are
> scored based on the most common term (lowest IDF).
> 
> I don't see anything in the logs when this happens. We have a monitor
> doing a search for the 100 most popular movies once per minute to
> catch this, so we know when it was first detected.
> 
> I'm attaching two explain outputs, one for the query "changeling" and
> one for "the changeling".
> 
> We are running Solr 1.3 with Lucene 2.4.0, and have added a fuzzy query
> using JaroWinkler matching.
> 
> I'd appreciate ideas about where to look, what debug output to try, etc.
> 
> wunder

Help with relevance failure in Solr 1.3

2009-04-10 Thread Walter Underwood

We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and
I would appreciate any ideas.

Ocassionally, a server will start returning results with really poor
relevance. Single term queries work fine, but multi-term queries are
scored based on the most common term (lowest IDF).

I don't see anything in the logs when this happens. We have a monitor
doing a search for the 100 most popular movies once per minute to
catch this, so we know when it was first detected.

I'm attaching two explain outputs, one for the query "changeling" and
one for "the changeling".

We are running Solr 1.3 with Lucene 2.4.0, and have added a fuzzy query
using JaroWinkler matching.

I'd appreciate ideas about where to look, what debug output to try, etc.

wunder

Re: Help with relevance failure in Solr 1.3

2009-04-10 Thread Grant Ingersoll



On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:

We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,  
and

I would appreciate any ideas.

Ocassionally, a server will start returning results with really poor
relevance. Single term queries work fine, but multi-term queries are
scored based on the most common term (lowest IDF).

I don't see anything in the logs when this happens. We have a monitor
doing a search for the 100 most popular movies once per minute to
catch this, so we know when it was first detected.

I'm attaching two explain outputs, one for the query "changeling" and
one for "the changeling".



I'm not sure what exactly  you are asking, so bear with me...

Are you saying that "the changeling" normally returns results just  
fine and then periodically it will "go bad" or are you saying you  
don't understand why "the changeling" scores differently from  
"changeling"?  In looking at the explains, it is weird that in the  
"the changeling" case, the term changeling doesn't even show up as a  
term.


Can you share your dismax configuration?  That will be easier to parse  
than trying to make sense of the debug query parsing.


-Grant

Re: Index Version Number

2009-04-10 Thread Grant Ingersoll

This info is available via the Luke Handler, I believe: http://localhost:8983/solr/admin/luke/ 
:  In there, I see: version, current, optimized and true information.


See also http://wiki.apache.org/solr/LukeRequestHandler

HTH,
Grant

On Apr 10, 2009, at 11:58 AM, Richard Wiseman wrote:

Is it possible for a Solr client to determine if the index has  
changed since the last time it performed a query?  For example, is  
it possible to query the current Lucene indexVersion?


Thanks in advance for your help,
Richard


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Question on StreamingUpdateSolrServer

2009-04-10 Thread vivek sar

Hi,

 I was using CommonsHttpSolrServer for indexing, but having two
threads writing (10K batches) at the same time was throwing,

  "ProtocolException: Unbuffered entity enclosing request can not be repeated. "

I switched to StreamingUpdateSolrServer (using addBeans) and I don't
see the problem anymore. The speed is very fast - getting around
25k/sec (single thread), but I'm facing another problem. When the
indexer using StreamingUpdateSolrServer is running I'm not able to
send any url request from browser to Solr web app. I just get blank
page. I can't even get to the admin interface. I'm also not able to
shutdown the Tomcat running the Solr webapp when the Indexer is
running. I've to first stop the Indexer app and then stop the Tomcat.
I don't have this problem when using CommonsHttpSolrServer.

Here is how I'm creating it,

server = new StreamingUpdateSolrServer(url, 1000,3);

I simply call server.addBeans(...) on it. Is there anything else I
need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
become unresponsive  when Indexer using StreamingUpdateSolrServer is
running (though, indexing happens fine)?

Thanks,
-vivek

Re: Help with relevance failure in Solr 1.3

2009-04-10 Thread Walter Underwood

Normally, both "changeling" and "the changeling" work fine. This one
server is misbehaving like this for all multi-term queries.

Yes, it is VERY weird that the term "changeling" does not show up in
the explain.

A server will occasionally "go bad" and stay in that state. In one case,
two servers went bad and both gave the same wrong results.

Here is the dismax config. "groups" means "movies". The title* fields
are stemmed and stopped, the "exact*" fields are not.

  

  

 dismax
 none
 0.01
 
exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
title^3.0 title_alt^3.0 title_base^4.0
 

 
exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0
title_alt^4.0 title_base^6.0
 
 
search_popularity^100.0
 
 1
 100
 id,type,movieid,personid,genreid



  type:group OR type:person

  


wunder

On 4/10/09 12:51 PM, "Grant Ingersoll"  wrote:

> 
> On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
> 
>> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
>> and
>> I would appreciate any ideas.
>> 
>> Ocassionally, a server will start returning results with really poor
>> relevance. Single term queries work fine, but multi-term queries are
>> scored based on the most common term (lowest IDF).
>> 
>> I don't see anything in the logs when this happens. We have a monitor
>> doing a search for the 100 most popular movies once per minute to
>> catch this, so we know when it was first detected.
>> 
>> I'm attaching two explain outputs, one for the query "changeling" and
>> one for "the changeling".
> 
> 
> I'm not sure what exactly  you are asking, so bear with me...
> 
> Are you saying that "the changeling" normally returns results just
> fine and then periodically it will "go bad" or are you saying you
> don't understand why "the changeling" scores differently from
> "changeling"?  In looking at the explains, it is weird that in the
> "the changeling" case, the term changeling doesn't even show up as a
> term.
> 
> Can you share your dismax configuration?  That will be easier to parse
> than trying to make sense of the debug query parsing.
> 
> -Grant

Re: Question on StreamingUpdateSolrServer

2009-04-10 Thread vivek sar

I also noticed that the Solr app has over 6000 file handles open -

"lsof | grep solr | wc -l"   - shows 6455

I've 10 cores (using multi-core) managed by the same Solr instance. As
soon as start up the Tomcat the open file count goes up to 6400.  Few
questions,

1) Why is Solr holding on to all the segments from all the cores - is
it because of auto-warmer?
2) How can I reduce the open file count?
3) Is there a way to stop the auto-warmer?
4) Could this be related to "Tomcat returning blank page for every request"?

Any ideas?

Thanks,
-vivek

On Fri, Apr 10, 2009 at 1:48 PM, vivek sar  wrote:
> Hi,
>
>  I was using CommonsHttpSolrServer for indexing, but having two
> threads writing (10K batches) at the same time was throwing,
>
>  "ProtocolException: Unbuffered entity enclosing request can not be repeated. 
> "
>
> I switched to StreamingUpdateSolrServer (using addBeans) and I don't
> see the problem anymore. The speed is very fast - getting around
> 25k/sec (single thread), but I'm facing another problem. When the
> indexer using StreamingUpdateSolrServer is running I'm not able to
> send any url request from browser to Solr web app. I just get blank
> page. I can't even get to the admin interface. I'm also not able to
> shutdown the Tomcat running the Solr webapp when the Indexer is
> running. I've to first stop the Indexer app and then stop the Tomcat.
> I don't have this problem when using CommonsHttpSolrServer.
>
> Here is how I'm creating it,
>
> server = new StreamingUpdateSolrServer(url, 1000,3);
>
> I simply call server.addBeans(...) on it. Is there anything else I
> need to do to make use of StreamingUpdateSolrServer? Why does Tomcat
> become unresponsive  when Indexer using StreamingUpdateSolrServer is
> running (though, indexing happens fine)?
>
> Thanks,
> -vivek
>

Re: logging

2009-04-10 Thread Ryan McKinley

If you use the off the shelf .war, it *should* be the same.  (if not,  
we need to fix it)


If you are building your own .war, how SLF4 behaves depends on what  
implementation is in the runtime path.  If you want to use log4j  
logging, put in the slf4j-log4j.jar in your classpath and you should  
be all set.



On Apr 9, 2009, at 4:56 PM, Kevin Osborn wrote:

We built our own webapp that used the Solr JARs. We used Apache  
Commons/log4j logging and just put log4j.properties in the Resin  
conf directory. The commons-logging and log4j jars were put in the  
Resin lib driectory. Everything worked great and we got log files  
for our code only.


So, I upgraded to Solr 1.4 and I no longer get my log file. I assume  
it has something to do with Solr 1.4 using SL4J instead of JDK  
logging, but it seems like my code would be independent of that. Any  
ideas?

Re: logging

2009-04-10 Thread Kevin Osborn

Or for my quick and dirty methods (this was just a test), I just removed the 
jcl-over-slrj JAR, and it worked like normal.

From: Ryan McKinley 
To: solr-user@lucene.apache.org
Sent: Friday, April 10, 2009 3:16:30 PM
Subject: Re: logging

If you use the off the shelf .war, it *should* be the same.  (if not, we need 
to fix it)

If you are building your own .war, how SLF4 behaves depends on what 
implementation is in the runtime path.  If you want to use log4j logging, put 
in the slf4j-log4j.jar in your classpath and you should be all set.

On Apr 9, 2009, at 4:56 PM, Kevin Osborn wrote:

> We built our own webapp that used the Solr JARs. We used Apache Commons/log4j 
> logging and just put log4j.properties in the Resin conf directory. The 
> commons-logging and log4j jars were put in the Resin lib driectory. 
> Everything worked great and we got log files for our code only.
> 
> So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has 
> something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems 
> like my code would be independent of that. Any ideas?
> 
> 
>

maxCodeLength in PhoneticFilterFactory

2009-04-10 Thread Brian Whitman

i have this version of solr running:

Solr Implementation Version: 1.4-dev 747554M - bwhitman - 2009-02-24
16:37:49

and am trying to update a schema to support 8 code length metaphone instead
of 4 via this (committed) issue:

https://issues.apache.org/jira/browse/SOLR-813

So I change the schema to this (knowing that I have to reindex)



But when I do queries fail with

Error_initializing_DoubleMetaphoneclass_orgapachecommonscodeclanguageDoubleMetaphone__at_orgapachesolranalysisPhoneticFilterFactoryinitPhoneticFilterFactoryjava90__at_orgapachesolrschemaIndexSchema$6initIndexSchemajava821__at_orgapachesolrschemaIndexSchema$6initIndexSchemajava817__at_orgapachesolrutilpluginAbstractPluginLoaderloadAbstractPluginLoaderjava149__at_orgapachesolrschemaIndexSchemareadAnalyzerIndexSchemajava831__at_orgapachesolrschemaIndexSchemaaccess$100IndexSchemajava58__at_orgapachesolrschemaIndexSchema$1createIndexSchemajava425__at_orgapachesolrschemaIndexSchema$1createIndexSchemajava410__at_orgapachesolrutilpluginAbstractPluginLoaderloadAbstractPluginLoaderjava141__at_orgapachesolrschemaIndexSchemareadSchemaIndexSchemajava452__at_orgapachesolrschemaIndexSchemainitIndexSchemajava95__at_orgapachesolrcoreSolrCoreinitSolrCorejava501__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava121

PHP Remove From Index/Search By Fields

2009-04-10 Thread Johnny X


Hey,


How could I write some code in PHP to place in a button to remove a returned
item from the index?

In turn, is it possible to copy all of the XML elements from said item and
place them in a document somewhere locally once it's been removed?

Finally, there is one default search field. How do you search on multiple
different fields in PHP?

If I wanted to search by all of the fields indexed, is that easy to code?
What changes do I need to make in the XML schema?


Thanks for much for any help!
-- 
View this message in context: 
http://www.nabble.com/PHP-Remove-From-Index-Search-By-Fields-tp22996701p22996701.html
Sent from the Solr - User mailing list archive at Nabble.com.

special characters in Solr search query.

2009-04-10 Thread Sagar Khetkade


Hi,
 
There is a strange issue while querying on the Solr indexes. If my query 
contains the special characters like [ ] !<> etc. It is throwing the query 
parse exception. From my application interface I am able to handle the special 
characters but the issue is while the document which I am going to index 
contains any of these special characters it is throwing query parse exception. 
Can anyone give pointer over this? 
Thanks in advance.
 
Regards,
Sagar Khetkade

_
The new Windows Live Messenger. You don’t want to miss this.
http://www.microsoft.com/india/windows/windowslive/messenger.aspx

Re: special characters in Solr search query.

2009-04-10 Thread Shalin Shekhar Mangar

On Sat, Apr 11, 2009 at 10:13 AM, Sagar Khetkade  wrote:

>
> There is a strange issue while querying on the Solr indexes. If my query
> contains the special characters like [ ] !<> etc. It is throwing the query
> parse exception. From my application interface I am able to handle the
> special characters but the issue is while the document which I am going to
> index contains any of these special characters it is throwing query parse
> exception. Can anyone give pointer over this?
> Thanks in advance.
>

You need to escape those characters. Look at
http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Escaping%20Special%20Characters

If you are using Solrj, this should be done automatically. Solrj calls
ClientUtils.escapeQueryChars under the hood.
-- 
Regards,
Shalin Shekhar Mangar.

Re: Question on StreamingUpdateSolrServer

2009-04-10 Thread Shalin Shekhar Mangar

On Sat, Apr 11, 2009 at 3:29 AM, vivek sar  wrote:

> I also noticed that the Solr app has over 6000 file handles open -
>
>"lsof | grep solr | wc -l"   - shows 6455
>
> I've 10 cores (using multi-core) managed by the same Solr instance. As
> soon as start up the Tomcat the open file count goes up to 6400.  Few
> questions,
>
> 1) Why is Solr holding on to all the segments from all the cores - is
> it because of auto-warmer?

You have 10 cores, so Solr opens 10 indexes, each of which contains multiple
files. That is one reason. Apart from that, Tomcat will keep some file
handles for incoming connections.

>
> 2) How can I reduce the open file count?

Are they causing a problem? Tomcat will log messages when it cannot accept
incoming connections if it runs out of available file handles. But if you
experiencing issues, you can increase the file handle limit or you can set
useCompoundFile=true in solrconfig.xml.

>
> 3) Is there a way to stop the auto-warmer?
> 4) Could this be related to "Tomcat returning blank page for every
> request"?
>

It could be. Check the Tomcat and Solr logs.

-- 
Regards,
Shalin Shekhar Mangar.

Re: sorlj search

2009-04-10 Thread Shalin Shekhar Mangar

On Wed, Feb 6, 2008 at 10:51 AM, Tevfik Kiziloren  wrote:

>
> Caused by: org.apache.solr.common.SolrException: parsing error
>at
>
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:138)
>at
>
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:99)
>at
>
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:317)
>at
>
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:84)
>... 29 more
> Caused by: java.lang.RuntimeException: this must be known type! not: int
>at
>
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:217)
>at
>
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:235)
>at
>
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:123)
>


Which version of Solr and Solrj client are you using?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Performance when indexing or cold cache

Re: Any tips for indexing large amounts of data?

Re: multiple tokenizers needed

Re: Question on Solr Distributed Search

QueryElevationComponent : hot update of elevate.xml

Re: Any tips for indexing large amounts of data?

Re: multiple tokenizers needed

Re: QueryElevationComponent : hot update of elevate.xml

Re: Additive filter queries

Index Version Number

Re: Question on Solr Distributed Search

Re: Help with relevance failure in Solr 1.3

Help with relevance failure in Solr 1.3

Re: Help with relevance failure in Solr 1.3

Re: Index Version Number

Question on StreamingUpdateSolrServer

Re: Help with relevance failure in Solr 1.3

Re: Question on StreamingUpdateSolrServer

Re: logging

Re: logging

maxCodeLength in PhoneticFilterFactory

PHP Remove From Index/Search By Fields

special characters in Solr search query.

Re: special characters in Solr search query.

Re: Question on StreamingUpdateSolrServer

Re: sorlj search

26 matches

Site Navigation

Mail list logo

Footer information