Re: Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-17 Thread Shalin Shekhar Mangar
On Fri, Apr 17, 2009 at 4:46 AM, ristretto.rb wrote:

> 1.  I would need to get the source for 1.4 and build it, right?  No
> release yet, eh?


Nope.


>
> 2.  Any one using 1.4 in production without issue; is this wise?  Or
> should I wait?


Running a nightly is always a risky business. Test comprehensively first.


>
> 3.  Will I need to make changes to my schema.xml to support my current
> field set under 1.4?


No, it should be back-compatible.

Also look at the Upgrading from 1.3 section in CHANGES.txt


>
> 4.  Do I need to reindex all my data?
>

It should be able to read your existing index. Since it uses a newer version
of Lucene, once you write anything, the index format will change and will no
longer be readable by Solr 1.3

So make sure you upgrade the slaves before the master.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Faceted Search

2009-04-17 Thread Alejandro Gonzalez
if you are querying using a http request you can add these two parameters:

facet=true
facet.field=field_for_faceting

and optionally this one to set the max number of facets:

facet.limit=facet_limit

I don't know if it's what you need...


On Fri, Apr 17, 2009 at 6:17 AM, Sajith Weerakoon wrote:

> Hi all,
>
> Can someone of you tell me how to implement a faceted search?
>
>
>
> Thanks,
>
> Regards,
>
> Sajith Vimukthi Weerakoon.
>
>
>
>


Re: Advice on moving from 1.3 to 1.4-dev or trunk?

2009-04-17 Thread Gene Campbell
Thanks for the feedback.  Will read up on upgrading.  I actually went
with the trunk, not a nightly.

When you say Test ... Are you suggesting there is a test suite I
should run, or do just do my own testing?

thanks
gene




On Fri, Apr 17, 2009 at 7:26 PM, Shalin Shekhar Mangar
 wrote:
> On Fri, Apr 17, 2009 at 4:46 AM, ristretto.rb wrote:
>
>> 1.  I would need to get the source for 1.4 and build it, right?  No
>> release yet, eh?
>
>
> Nope.
>
>
>>
>> 2.  Any one using 1.4 in production without issue; is this wise?  Or
>> should I wait?
>
>
> Running a nightly is always a risky business. Test comprehensively first.
>
>
>>
>> 3.  Will I need to make changes to my schema.xml to support my current
>> field set under 1.4?
>
>
> No, it should be back-compatible.
>
> Also look at the Upgrading from 1.3 section in CHANGES.txt
>
>
>>
>> 4.  Do I need to reindex all my data?
>>
>
> It should be able to read your existing index. Since it uses a newer version
> of Lucene, once you write anything, the index format will change and will no
> longer be readable by Solr 1.3
>
> So make sure you upgrade the slaves before the master.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Authentication Error

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is fixed in the trunk

On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah
 wrote:
> Thanks Noble.Regards,
> Allahbaksh
>
> 2009/4/16 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah
>>  wrote:
>> > Hi,I have followed the procedure given on this blog to setup the solr
>> >
>> > Below is my code. I am trying to index the data but I am not able to
>> connect
>> > to server and getting authentication error.
>> >
>> >
>> > HttpClient client=new HttpClient();
>> > client.getState().setCredentials(new AuthScope("localhost", 80,
>> > AuthScope.ANY_SCHEME),
>> >                new UsernamePasswordCredentials("admin", "admin"));
>> >
>> > Can you please let me know what may be the problem.
>> >
>> > The other problem which I am facing is using Load Banlancing
>> > SolrServer lbHttpSolrServer = new LBHttpSolrServer("
>> > http://localhost:8080/solr","http://localhost:8983/solr";);
>> >
>> > Now the problem is the first server is down then I will get an error. If
>> I
>> > swap the server in constructor by giving port 8983 server as first and
>> 8080
>> > as second it works fine. The thing
>> >
>> > Problem is If only the last server which is set is active and the rest of
>> > other are down then Solr throws and exception and search is not
>> performed.
>> >
>> I shall write a testcase and let you know
>> > Regards,
>> > Allahbaksh
>> >
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> Allahbaksh Mohammedali Asadullah,
> Software Engineering & Technology Labs,
> Infosys Technolgies Limited, Electronic City,
> Hosur Road, Bangalore 560 100, India.
> (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
> Fax: 91-80-28520362 | Mobile: 91-9845505322.
>



-- 
--Noble Paul


Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Hello,

I am searching for a way to use the Lucene MultiFieldQueryParser in my 
SOLR Installation.

Is there a chance to change the "solrQueryParser" ?

In my old Lucene Setting I used to combine many different types of 
QueryParser in my Querry...


Or is there a chance to get MultiFieldQueryParser  functions in SOLR ?

Greets -Ralf-


Re: Sorting performance + replication of index between cores

2009-04-17 Thread sunnyfr

Hi Christophe, 

Did you find a way to fix up your problem, cuz even with replication will
have this problem, lot of update means clear cache and manage that.
I've the same issue, I just wondering if I won't turn off servers during
update ??? 
How did you fix that ? 

Thanks,
sunny


christophe-2 wrote:
> 
> Hi,
> 
> After fully reloading my index, using another field than a Data does not 
> help that much.
> Using a warmup query avoids having the first request slow, but:
>  - Frequents commits means that the Searcher is reloaded frequently 
> and, as the warmup takes time, the clients must wait.
>  - Having warmup slows down the index process (I guess this is 
> because after a commit, the Searchers are recreated)
> 
> So I'm considering, as suggested,  to have two instances: one for 
> indexing and one for searching.
> I was wondering if there are simple ways to replicate the index in a 
> single Solr server running two cores ? Any such config already tested ? 
> I guess that the standard replication based on rsync can be simplified a 
> lot in this case as the two indexes are on the same server.
> 
> Thanks
> Christophe
> 
> Beniamin Janicki wrote:
>> :so you can send your updates anytime you want, and as long as you only 
>> :commit every 5 minutes (or commit on a master as often as you want, but 
>> :only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>> :results will be at most 5minutes + warming time stale.
>>
>> This is what I do as well ( commits are done once per 5 minutes ). I've
>> got
>> master - slave configuration. Master has turned off all caches (commented
>> in
>> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
>> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
>> with warming it took from 30 mins up to 2 hours). 
>>
>> Slave caches are configured to have autowarmCount="0" and
>> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
>> done. I haven't noticed any huge delays while serving search request.
>> Try to use those values - may be they'll help in your case too.
>>
>> Ben Janicki
>>
>>
>> -Original Message-
>> From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
>> Sent: 22 October 2008 04:56
>> To: solr-user@lucene.apache.org
>> Subject: Re: Sorting performance
>>
>>
>> : The problem is that I will have hundreds of users doing queries, and a
>> : continuous flow of document coming in.
>> : So a delay in warming up a cache "could" be acceptable if I do it a few
>> times
>> : per day. But not on a too regular basis (right now, the first query
>> that
>> loads
>> : the cache takes 150s).
>> : 
>> : However: I'm not sure why it looks not to be a good idea to update the
>> caches
>>
>> you can refresh the caches automaticly after updating, the "newSearcher" 
>> event is fired whenever a searcher is opened (but before it's used by 
>> clients) so you can configure warming queries for it -- it doesn't have
>> to 
>> be done manually (or by the first user to use that reader)
>>
>> so you can send your updates anytime you want, and as long as you only 
>> commit every 5 minutes (or commit on a master as often as you want, but 
>> only run snappuller/snapinstaller on your slaves every 5 minutes) your 
>> results will be at most 5minutes + warming time stale.
>>
>>
>> -Hoss
>>
>>   
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Sorting-performance-tp20037712p23094174.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler



Kraus, Ralf | pixelhouse GmbH wrote:
> 
> Hello,
> 
> I am searching for a way to use the Lucene MultiFieldQueryParser in my 
> SOLR Installation.
> Is there a chance to change the "solrQueryParser" ?
> 
> In my old Lucene Setting I used to combine many different types of 
> QueryParser in my Querry...
> 
> Or is there a chance to get MultiFieldQueryParser  functions in SOLR ?
> 
> Greets -Ralf-
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23094692.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler
  

Is there a chance to combine RequestHandler ?
I need to use some additional "normal" boolean and integer querries !

Greets -Ralf-


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Think there's no search handler that uses MultiFieldQueryParser in Solr. But
check DismaxRequestHandler, probably will do the job. Yo can specify all the
fields where you want to search in and it will build the query using boolean
queries. It includes also many more features:
http://wiki.apache.org/solr/DisMaxRequestHandler

THX A LOT !

You really made my day !

Greets -Ralf-


Re: Customizing solr with my lucene

2009-04-17 Thread mirage1987

Hi,
 Well let me exaplin the scenario in detail.
I have my lucene.jar that i need to work with Solr. So i started by adding
the lucene.jar to the WEB_INF directory of solr.war,added my schema.xml in
conf dir and restarted the solr server.
Now i run my program , add a doc into it. The doc is added successfully as
shown by the stats. But when i query it through the browser no results are
returned. I tried out with various terms in *:* shows the doc. 
Am i missing sumthin here. Is there a way i can view the posting list of the
solr (lucene) and see weather my terms has been indexed or not.

So let's say my doc has an entry like
University of Southern California
and i search for "california" no results from solr
But wen i do it on my lucene without solr results are shown.

I also tried the debug query into my queries as u suggested.
Here is what i getThis is very much the way i have modified my lucene so
query parsing seems correct

−

0
0
−

true
on
0
name:california 
10
2.2



−

name:california 
name:california 
BoostingTermQuery(value:california)
value:california

OldLuceneQParser
−

0.0
−

0.0
−

0.0

−

0.0

−

0.0

−

0.0

−

0.0


−

0.0
−

0.0

−

0.0

−

0.0

−

0.0

−

0.0













Erik Hatcher wrote:
> 
> What is the query parsed to?   Add &debugQuery=true to your Solr  
> request and let us know what the query parses to.
> 
> As for whether upgrading a Lucene library is sufficient... depends on  
> what Solr version you're starting with (payload support is already in  
> all recent versions of Solr's Lucene JARs) and what has changed in  
> Lucene since, and whether you're expecting an existing index to work  
> or rebuilding it from scratch.
> 
>   Erik
> 
> On Apr 14, 2009, at 7:51 AM, mirage1987 wrote:
> 
>>
>> hey,
>>  I am trying to modify the lucene code by adding payload  
>> functionality
>> into it.
>> Now if i want to use this lucene with solr what should i do.
>> I have added this to the lib folder of solr.war replacing the old  
>> lucene..Is
>> this enough??
>> Plus i am also using a different schema than the default shema.xml  
>> used by
>> solr.(Added some fields and removed some of the previous ones).
>> The problem i am facing is that now the solr is not returning  
>> results but
>> the lucene individually is for the same query.
>> Could you help me on this...ny ideas n suggestions??
>> -- 
>> View this message in context:
>> http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23038007.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23096895.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Authentication Error

2009-04-17 Thread Allahbaksh Asadullah
Hi Noble.
Thank you very much. I will download the latest solr nightly build.
Please note this is the another problem which I think is bug.


I am trying out load balancing feature in Solr 1.4 using LBHTTPSolrServer.

Below is setup
I have three solr server. A, B and C.

Now the problem is if I make first two solr server (Note I have specified A,
B, C in order) i.e A and B down then it throws and exception. It does not
check it with server C. Though the server C is still active.

In short the if only last server specified in the constructor is active then
I get a Exception and query doesnot get fired.

Is it a bug or what may be the exact problem.

Regards,
Allahbaksh



2009/4/17 Noble Paul നോബിള്‍ नोब्ळ् 

> It is fixed in the trunk
>
> On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah
>  wrote:
> > Thanks Noble.Regards,
> > Allahbaksh
> >
> > 2009/4/16 Noble Paul നോബിള്‍ नोब्ळ् 
> >
> >> On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah
> >>  wrote:
> >> > Hi,I have followed the procedure given on this blog to setup the solr
> >> >
> >> > Below is my code. I am trying to index the data but I am not able to
> >> connect
> >> > to server and getting authentication error.
> >> >
> >> >
> >> > HttpClient client=new HttpClient();
> >> > client.getState().setCredentials(new AuthScope("localhost", 80,
> >> > AuthScope.ANY_SCHEME),
> >> >new UsernamePasswordCredentials("admin", "admin"));
> >> >
> >> > Can you please let me know what may be the problem.
> >> >
> >> > The other problem which I am facing is using Load Banlancing
> >> > SolrServer lbHttpSolrServer = new LBHttpSolrServer("
> >> > http://localhost:8080/solr","http://localhost:8983/solr";);
> >> >
> >> > Now the problem is the first server is down then I will get an error.
> If
> >> I
> >> > swap the server in constructor by giving port 8983 server as first and
> >> 8080
> >> > as second it works fine. The thing
> >> >
> >> > Problem is If only the last server which is set is active and the rest
> of
> >> > other are down then Solr throws and exception and search is not
> >> performed.
> >> >
> >> I shall write a testcase and let you know
> >> > Regards,
> >> > Allahbaksh
> >> >
> >>
> >>
> >>
> >> --
> >> --Noble Paul
> >>
> >
> >
> >
> > --
> > Allahbaksh Mohammedali Asadullah,
> > Software Engineering & Technology Labs,
> > Infosys Technolgies Limited, Electronic City,
> > Hosur Road, Bangalore 560 100, India.
> > (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
> > Fax: 91-80-28520362 | Mobile: 91-9845505322.
> >
>
>
>
> --
> --Noble Paul
>



-- 
Allahbaksh Mohammedali Asadullah,
Software Engineering & Technology Labs,
Infosys Technolgies Limited, Electronic City,
Hosur Road, Bangalore 560 100, India.
(Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927.
Fax: 91-80-28520362 | Mobile: 91-9845505322.


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

Well dismax has a q.alt parameter where you can specify a query in "lucene"
sintax. The query must be empty to use q.alt:
http://.../select?q=&q.alt=phone_number:1234567
This would search in the field phone_number independly of what fields you
have configured in teh dismax.

Another way would be to confiure various requesthandlers (one with dismax
and one standard for the filed that you want for example). You can tell Solr
wich to use in the url request

Don't know if this is what you need...


Kraus, Ralf | pixelhouse GmbH wrote:
> 
> Marc Sturlese schrieb:
>> Think there's no search handler that uses MultiFieldQueryParser in Solr.
>> But
>> check DismaxRequestHandler, probably will do the job. Yo can specify all
>> the
>> fields where you want to search in and it will build the query using
>> boolean
>> queries. It includes also many more features:
>> http://wiki.apache.org/solr/DisMaxRequestHandler
>>   
> Is there a chance to combine RequestHandler ?
> I need to use some additional "normal" boolean and integer querries !
> 
> Greets -Ralf-
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097365.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

Well dismax has a q.alt parameter where you can specify a query in "lucene"
sintax. The query must be empty to use q.alt:
http://.../select?q=&q.alt=phone_number:1234567
This would search in the field phone_number independly of what fields you
have configured in teh dismax.
  
Now I use the "fq" parameter in combination with "q.alt" ... Runs fine 
yet :-)

The "fq" parameter sets my additional query parameter :-)

Greets -Ralf-




Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Kraus, Ralf | pixelhouse GmbH

Marc Sturlese schrieb:

The only problem I found with q.alt is that it doesn't allow highlighting (or
at least it doesn't showed it for me). If you find out how to do it let me
know.

I use highlighting only with the normal querry !
My q.alt is "*.*"

But its really sad that the dismax dont support wildcarts :-(

Greets -Ralf-


Re: Using Lucene MultiFieldQueryParser with SOLR

2009-04-17 Thread Marc Sturlese

The only problem I found with q.alt is that it doesn't allow highlighting (or
at least it doesn't showed it for me). If you find out how to do it let me
know.
Thanks!

Kraus, Ralf | pixelhouse GmbH wrote:
> 
> Marc Sturlese schrieb:
>> Well dismax has a q.alt parameter where you can specify a query in
>> "lucene"
>> sintax. The query must be empty to use q.alt:
>> http://.../select?q=&q.alt=phone_number:1234567
>> This would search in the field phone_number independly of what fields you
>> have configured in teh dismax.
>>   
> Now I use the "fq" parameter in combination with "q.alt" ... Runs fine 
> yet :-)
> The "fq" parameter sets my additional query parameter :-)
> 
> Greets -Ralf-
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097737.html
Sent from the Solr - User mailing list archive at Nabble.com.



EventListeners of DIM

2009-04-17 Thread Marc Sturlese

Hey there,
I have seen the new feature of EventListeners of DIH in trunk.







These events are called at the begining and end of the whole indexing
process or at the begining and end of indexing just a document.
My idea is to update a field of a row of a mysl table every time a doc is
indexed. Is this possible or I should I save all doc ids and do the update
of the row of the table using onImportEnd?

Thanks in advance!


-- 
View this message in context: 
http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: EventListeners of DIM

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
these are for the beginning and end of the whoke indexing process

On Fri, Apr 17, 2009 at 7:38 PM, Marc Sturlese  wrote:
>
> Hey there,
> I have seen the new feature of EventListeners of DIH in trunk.
>
> 
> 
> 
> 
> 
>
> These events are called at the begining and end of the whole indexing
> process or at the begining and end of indexing just a document.
> My idea is to update a field of a row of a mysl table every time a doc is
> indexed. Is this possible or I should I save all doc ids and do the update
> of the row of the table using onImportEnd?
>
> Thanks in advance!
>
>
> --
> View this message in context: 
> http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: Customizing solr with my lucene

2009-04-17 Thread mirage1987

Hey Erik,
 I also checked the index using luke and the index shows that
the terms are indexed as they should have been. So that implies that
something is wrong with the querying only and the results are not getting
retrieved.(As i said earlier even the parsed query is the way it should be
according to the changes i have made to lucene.)
Any ideas you have on this. Why this could be happening.

One more thing... tried to query the solr index using luke ...but still no
resultsmay be the index is not stored correctlycould it be changes
in the lucene api???should i revert to an older version of solr???



-- 
View this message in context: 
http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23098700.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Garbage Collectors

2009-04-17 Thread Bill Au
I would also include the -XX:+HeapDumpOnOutOfMemoryError option to get
a heap dump when the JVM runs out of heap space.



On Thu, Apr 16, 2009 at 9:43 PM, Bryan Talbot wrote:

> If you're using java 5 or 6 jmap is a useful tool in tracking down memory
> leaks.
>
> http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html
>
> jmap -histo:live 
>
> will print a histogram of all live objects in the heap.  Start at the top
> and work your way down until you find something suspicious -- the trick is
> in knowing what is suspicious of course.
>
>
> -Bryan
>
>
>
>
>
> On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote:
>
>  Otis Gospodnetic wrote:
>>
>>> Personally, I'd start from scratch:
>>> -Xmx -Xms...
>>>
>>> -server is not even needed any more.
>>>
>>> If you are not using Java 1.6, I suggest you do.
>>>
>>> Next, I'd try to investigate why objects are not being cleaned up - this
>>> should not be happening in the first place.  Is Solr the only webapp
>>> running?
>>>
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
>>>
>>>  From: David Baker 
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 16, 2009 3:33:18 PM
 Subject: Garbage Collectors

 I have an issue with garbage collection on our solr servers.  We have an
 issue where the  old generation  never  gets cleaned up on one of our
 servers.  This server has a little over 2 million records which are updated
 every hour or so.  I have tried the parallel GC and the concurrent GC.  The
 parallel seems more stable for us, but both end up running out of memory.  
 I
 have increased the memory allocated to the servers, but this just seems to
 delay the problem.  My question is, what are the suggested options for 
 using
 the parallel GC.  Currently we are using something of this nature:

 -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy
 -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m
 -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr

 I am new to solr and GC tuning, so any advice is appreciated.

  Thanks for the reply, yes, solr is the only app running under this
>> tomcat server. I will remove -server, and other options except the heap
>> allocation options and see how it performs. Any suggestions on how to go
>> about finding out why objects are not being cleaned up if these changes dont
>> work?
>>
>>
>


Re: CollapseFilter with the latest Solr in trunk

2009-04-17 Thread Jeff Newburn
We are currently trying to do the same thing.  With the patch unaltered we
can use fq as long as collapsing is turned on.  If we just send a normal
document level query with an fq parameter it blows up.

Additionally, it does not appear that the collapse.facet option works at
all.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: climbingrose 
> Reply-To: 
> Date: Fri, 17 Apr 2009 16:53:00 +1000
> To: solr-user 
> Subject: CollapseFilter with the latest Solr in trunk
> 
> Hi all,
> 
> Have any one try to use CollapseFilter with the latest version of Solr in
> trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList()
> and setFilter() on one instance of the QueryCommand. I modified the code in
> QueryCommand to allow this:
> 
> public QueryCommand setFilterList(Query f) {
> //  if( filter != null ) {
> //throw new IllegalArgumentException( "Either filter or filterList
> may be set in the QueryCommand, but not both." );
> //  }
>   filterList = null;
>   if (f != null) {
> filterList = new ArrayList(2);
> filterList.add(f);
>   }
>   return this;
> }
> 
> However, I still have a problem which prevent query filters from working
> when used in conjunction with CollapseFilter. In other words, query filters
> doesn't seem to have any effects on the result set when CollapseFilter is
> used.
> 
> The other problem is related to OpenBitSet:
> 
> java.lang.ArrayIndexOutOfBoundsException: 2183
> at org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242)
> at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:202)
> 
> at 
> 
org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:161>
)
> at org.apache.solr.search.CollapseFilter.(CollapseFilter.java:141)
> 
> at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:2
> 17)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle
> r.java:195)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja
> va:131)
> 
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
> at 
> 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303>
)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:23
> 2)
> 
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi
> lterChain.java:202)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChai
> n.java:173)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java
> :213)
> 
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java
> :178)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
> 
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:1
> 07)
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
> 
> at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processCon
> nection(Http11BaseProtocol.java:664)
> at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:
> 527)
> at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWork
> erThread.java:80)
> 
> at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:
> 684)
> 
> 
> at java.lang.Thread.run(Thread.java:619)
> 
> I think CollapseFilter is rather an important function in Solr that gets
> used quite frequently. Does anyone have a solution for this?
> 
> -- 
> Regards,
> 
> Cuong Hoang



dual of method - CommonsHttpSolrServer(url) to close and destroy underlying httpclient connection

2009-04-17 Thread Rakesh Sinha
When we instantiate a commonshttpsolrserver - we use the following method.

CommonsHttpSolrServerserver = new CommonsHttpSolrServer(this.endPoint);

how do we do we a 'kill all' of all the underlying httpclient connections  ?

server.getHttpClient() returns a HttpClient reference, but I am trying
to figure out the right method to close all currently active
httpclient connections .


WordDelimiterFilterFactory removes words when options set to 0

2009-04-17 Thread Burton-West, Tom
In trying to understand the various options for WordDelimiterFilterFactory, I 
tried setting all options to 0.
This seems to prevent a number of words from being output at all. In particular 
"can't" and "99dxl" don't get output, nor do any wods containing hypens. Is 
this correct behavior?


Here is what the Solr Analyzer output

org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1   2   3   4   5   6   7   8   
9
term text   ca-55   99_3_a9 55-67   powerShot   ca999x15foo-bar 
can't   joe's   99dxl

 org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=0, 
generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, 
catenateNumbers=0}

term position   1   5
term text   powerShot   joe
term type   wordword
source start,end20,29   53,56

Here is the schema

  
  
  
  
  


Tom

python response handler treats "unschema'd" fields differently

2009-04-17 Thread Brian Whitman
I have a solr index where we removed a field from the schema but it still
had some documents with that field in it.
Queries using the standard response handler had no problem but the
&wt=python handler would break on any query (with fl="*" or asking for that
field directly) with:

SolrHTTPException: HTTP code=400, reason=undefined_field_oldfield

I "fixed" it by putting that field back in the schema.

One related weirdness is that fl=oldfield would cause the exception but not
fl=othernonschemafield -- that is, it would only break on field names that
were not in schema but were in the documents.

I know this is undefined behavior territory but it was still weird that the
standard response writer does not do this-- if you give a nonexistent field
name to fl on wt=standard, either one that is in documents or is not -- it
happily performs the query just skipping the ones that are not in the
schema.


RE: Dictionary lookup possibilities

2009-04-17 Thread Steven A Rowe
Hi Jaco,

On 4/9/2009 at 2:58 PM, Jaco wrote:
> I'm struggling with some ideas, maybe somebody can help me with past
> experiences or tips. I have loaded a dictionary into a Solr index,
> using stemming and some stopwords in analysis part of the schema.
> Each record holds a term from the dictionary, which can consist of
> multiple words. For some data analysis work, I want to send pieces
> of text (sentences actually) to Solr to retrieve all possible
> dictionary terms that could occur. Ideally, I want to construct a
> query that only returns those Solr records for which all individual
> words in that record are matched.
> 
> For instance, my dictionary holds the following terms:
> 1 - a b c d
> 2 - c d e
> 3 - a b
> 4 - a e f g h
> 
> If I put the sentence [a b c d f g h] in as a query, I want to recieve
> dictionary items 1 (matching all words a b c d) and 3 (matching words a
> b) as matches
> 
> I have been puzzling about how to do this. The only way I found so far
> was to construct an OR query with all words of the sentence in it. In
> this case, that would result in all dictionary items being returned.
> This would then require some code to go over the search results and
> analyse each of them (i.e. by using the highlight function) to kick
> out 'false' matches, but I am looking for a more efficient way.
> 
> Is there a way to do this with Solr functionality, or do I need to
> start looking into the Lucene API ..?

Your problem could be modeled as a set of standing queries, where your 
dictionary entries are the *queries* (with all words required, maybe using a 
PhraseQuery or a SpanNearQuery), and the sentence is the document.

Solr may not be usable in this context (extremely high volume queries), 
depending on your throughput requirements, but Lucene's MemoryIndex was 
designed for this kind of thing:



Steve



Re: python response handler treats "unschema'd" fields differently

2009-04-17 Thread Yonik Seeley
Seems like we could handle this 2 ways... leave out the field if it's
not defined in the schema, or include it and write it out as a string.
 I think either would probably be more useful than throwing an error
(which isn't really a request error but rather a schema/indexing
error).

Thoughts?

-Yonik
http://www.lucidimagination.com


On Fri, Apr 17, 2009 at 4:36 PM, Brian Whitman  wrote:
> I have a solr index where we removed a field from the schema but it still
> had some documents with that field in it.
> Queries using the standard response handler had no problem but the
> &wt=python handler would break on any query (with fl="*" or asking for that
> field directly) with:
>
> SolrHTTPException: HTTP code=400, reason=undefined_field_oldfield
>
> I "fixed" it by putting that field back in the schema.
>
> One related weirdness is that fl=oldfield would cause the exception but not
> fl=othernonschemafield -- that is, it would only break on field names that
> were not in schema but were in the documents.
>
> I know this is undefined behavior territory but it was still weird that the
> standard response writer does not do this-- if you give a nonexistent field
> name to fl on wt=standard, either one that is in documents or is not -- it
> happily performs the query just skipping the ones that are not in the
> schema.
>


Re: Hierarchal Faceting Field Type

2009-04-17 Thread Chris Hostetter

: level one#
: level one#level two#
: level one#level two#level three#
: 
: Trying to find the right combination of field type and query to get the
: desired results. Saw some previous posts about hierarchal facets which helped
: in the generating the right query but having an issue using the built in text
: field which ignores our delimiter and the string field which prevents us from
: doing a start with search. Does anyone have any insight into the field
: declaration?

Use TextField, with a PatternTokenizer

BTW: if this isn't thread you've already seen, it's handy to know about...

http://www.nabble.com/Hierarchical-Faceting-to20090898.html#a20176326


-Hoss



Re: SNMP monitoring

2009-04-17 Thread Chris Hostetter

:  How would I set up SNMP monitoring of my Solr server? I've done some
: searching of the wiki and Google and have come up with a blank. Any
: pointers?

it depends on what you want to monitor.  if you just want to know what the 
JVM is running, this should be fairly easy...

if you wnat to be able to get Solr specific stats/data your best bet is 
probably to look into ways to access JMX MBeans via SNMP (there seem to be 
some tools out there to do things like this)

http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp
http://www.google.co.uk/search?hl=en&q=jmx+snmp



-Hoss



Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :)

Surely there must be dozens more of you guys out there... c'mon,
accelerate your knowledge! Join us in Seattle!



On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
 wrote:
> Greetings,
>
> Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
> with me in the Seattle area? I can donate some facilities, etc. -- I
> also always have topics to speak about :)
>
> Cheers,
> Bradford
>


Re: Garbage Collectors

2009-04-17 Thread Otis Gospodnetic

The only thing that comes to mind is running Solr under a profiler (e.g. 
YourKit) and figuring out which objects are not getting cleaned up and who's 
holding references to them.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: David Baker 
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 16, 2009 6:40:31 PM
> Subject: Re: Garbage Collectors
> 
> Otis Gospodnetic wrote:
> > Personally, I'd start from scratch:
> > -Xmx -Xms...
> > 
> > -server is not even needed any more.
> > 
> > If you are not using Java 1.6, I suggest you do.
> > 
> > Next, I'd try to investigate why objects are not being cleaned up - this 
> should not be happening in the first place.  Is Solr the only webapp running?
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >  
> >> From: David Baker 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, April 16, 2009 3:33:18 PM
> >> Subject: Garbage Collectors
> >> 
> >> I have an issue with garbage collection on our solr servers.  We have an 
> issue where the  old generation  never  gets cleaned up on one of our 
> servers.  
> This server has a little over 2 million records which are updated every hour 
> or 
> so.  I have tried the parallel GC and the concurrent GC.  The parallel seems 
> more stable for us, but both end up running out of memory.  I have increased 
> the 
> memory allocated to the servers, but this just seems to delay the problem.  
> My 
> question is, what are the suggested options for using the parallel GC.  
> Currently we are using something of this nature:
> >> 
> >> -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy 
> >> -XX:+UseParallelOldGC 
> -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 
> -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr
> >> 
> >> I am new to solr and GC tuning, so any advice is appreciated.
> >>
> Thanks for the reply, yes, solr is the only app running under this tomcat 
> server. I will remove -server, and other options except the heap allocation 
> options and see how it performs. Any suggestions on how to go about finding 
> out 
> why objects are not being cleaned up if these changes dont work?



Re: dual of method - CommonsHttpSolrServer(url) to close and destroy underlying httpclient connection

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
httpClient.getHttpConnectionManager().closeIdleConnections();

--Noble

On Sat, Apr 18, 2009 at 1:31 AM, Rakesh Sinha  wrote:
> When we instantiate a commonshttpsolrserver - we use the following method.
>
> CommonsHttpSolrServer    server = new CommonsHttpSolrServer(this.endPoint);
>
> how do we do we a 'kill all' of all the underlying httpclient connections  ?
>
> server.getHttpClient() returns a HttpClient reference, but I am trying
> to figure out the right method to close all currently active
> httpclient connections .
>



-- 
--Noble Paul


Create incremental snapshot

2009-04-17 Thread Koushik Mitra
Hi,

We want to create snapshot incrementally.

What we want is every time the snap shooter script runs, it should not create a 
snapshot with pre-existing (last snapshot indexes) + delta (newly created 
indexes), rather just create a snapshot with the delta (newly created indexes).

Any references here would be highly appreciated.

Regards,
Koushik

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Create incremental snapshot

2009-04-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
the snapshooter does not really copy any files. They ar just hardlinks
(does not consume disk space) so even a full copy is not very
expensive

On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
 wrote:
> Hi,
>
> We want to create snapshot incrementally.
>
> What we want is every time the snap shooter script runs, it should not create 
> a snapshot with pre-existing (last snapshot indexes) + delta (newly created 
> indexes), rather just create a snapshot with the delta (newly created 
> indexes).
>
> Any references here would be highly appreciated.
>
> Regards,
> Koushik
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are 
> not
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this 
> e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>



-- 
--Noble Paul


Re: Create incremental snapshot

2009-04-17 Thread Koushik Mitra
When we run the snapshooter script, it creates a snapshot folder e.g. 
snapshot.20090418064010 and this snapshot folder contains physical index files 
which take space on the file system (as shown below). Are we missing anything 
here?

-rw-r-  46 test  test 59 Apr 17 23:26 _i.tii
-rw-r-  46 test  test507 Apr 17 23:26 _i.prx
-rw-r-  46 test  test 14 Apr 17 23:26 _i.nrm
-rw-r-  46 test  test333 Apr 17 23:26 _i.frq
-rw-r-  46 test  test135 Apr 17 23:26 _i.fnm
-rw-r-  46 test  test 12 Apr 17 23:26 _i.fdx
-rw-r-  46 test  test   1433 Apr 17 23:26 _i.fdt

Regards,
Koushik



On 18/04/09 12:17 PM, "Noble Paul നോബിള്‍  नोब्ळ्"  wrote:

the snapshooter does not really copy any files. They ar just hardlinks
(does not consume disk space) so even a full copy is not very
expensive

On Sat, Apr 18, 2009 at 12:06 PM, Koushik Mitra
 wrote:
> Hi,
>
> We want to create snapshot incrementally.
>
> What we want is every time the snap shooter script runs, it should not create 
> a snapshot with pre-existing (last snapshot indexes) + delta (newly created 
> indexes), rather just create a snapshot with the delta (newly created 
> indexes).
>
> Any references here would be highly appreciated.
>
> Regards,
> Koushik
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are 
> not
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this 
> e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>



--
--Noble Paul