SolrCloud (ZooKeeper)

2011-02-10 Thread Stijn Vanhoorelbeke
Hi,

I've completed the quick&dirty tutorials of SolrCloud ( see
http://wiki.apache.org/solr/SolrCloud ).
The whole concept of SolrCloud and ZooKeeper look indeed very promising.

I found also some info about a 'ZooKeeperComponent' - From this conponent it
should be possible to configure ZooKeeper directly from the sorlconfig.xml (
see http://wiki.apache.org/solr/ZooKeeperIntegration ).

Is this functionality already implemented (or is something for the future)?
If so, can you please point me to a good guide/tutorial, cause I do not find
much on the regular web. Or else copy me your real working
ZooKeeperComponent layout & how you configured ZooKeeper.

Thanks for helping me,


QTime Solr Query

2011-02-10 Thread Stijn Vanhoorelbeke
Hi,

I've done some stress testing onto my solr system ( running in the ec2 cloud
).
>From what I've noticed during the tests, the QTime drops to just 1 or 2 ms (
on a index of ~2 million documents ).

My first thought pointed me to the different Solr caches; so I've disabled
all of them. Yet QTime stays low.
Then the Lucence internal Field Cache came into sight. This cache is hidden
deep into Lucence and is not configurable trough Solr.

To cope with this I thought I would lower the memory allocated to Solr -
that way a smaller cache is forced.
But yet QTime stays low.

Can Solr be so fast to retrieve queries in just 1/2 ms - even if I only
allocate 100 Mb to Solr?


Monitor the QTime.

2011-02-10 Thread Stijn Vanhoorelbeke
Hi,

Is it possible to monitor the QTime of the queries.
I know I could enable logging - but then all of my requests are logged,
making big&nasty logs.

I just want to log the QTime periodically, lets say once every minute.
Is this possible using Solr or can this be set up in tomcat anyway?


Re: Unable to build the SolrCloud branch - SOLR-1873

2011-02-10 Thread Stijn Vanhoorelbeke
Hi,

I've followed the guide & worked perfect for me.
( I had to execute ant compile - not ant example, but not likely that was
your problem ).

2011/1/2 siddharth 

>
> I seemed to have figured out the problem. I think it was an issue with the
> JAVA_HOME being set. The build was failing while compiling the module solrj
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unable-to-build-the-SolrCloud-branch-SOLR-1873-tp2180635p2180800.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud (ZooKeeper)

2011-02-10 Thread Stijn Vanhoorelbeke
So,

The only way - we now have - to integrate ZooKeeper is by using
'-DzkHost=url:port_of_ZooKeeper' when we start up a Solr instance?

+ I've noticed, when a solr instance goes down, the node comes inactive in
Zookeeper - but the node is maintained in the list of nodes. How can you
remove a solr instance from the list of hosts?


2011/2/10 Yonik Seeley 

> On Thu, Feb 10, 2011 at 5:00 PM, Stijn Vanhoorelbeke
>  wrote:
> > I've completed the quick&dirty tutorials of SolrCloud ( see
> > http://wiki.apache.org/solr/SolrCloud ).
> > The whole concept of SolrCloud and ZooKeeper look indeed very promising.
> >
> > I found also some info about a 'ZooKeeperComponent' - From this conponent
> it
> > should be possible to configure ZooKeeper directly from the
> sorlconfig.xml (
> > see http://wiki.apache.org/solr/ZooKeeperIntegration ).
>
> That's not part of what has been committed to trunk.
> Also, if one want's to get solrconfig from zookeeper, having the
> zookeeper config in solrconfig is a bit chicken and eggish ;-)
>
> -Yonik
> http://lucidimagination.com
>


Re: Turn off caching

2011-02-10 Thread Stijn Vanhoorelbeke
Hi,

You can comment out all sections in solrconfig.xml pointing to a cache.
However, there is a cache deep in Lucence - the fieldcache - that can't be
commented out. This cache will always jump into the picture

If I need to do such things, I restart the whole tomcat6 server to flush ALL
caches.

2011/2/11 Li Li 

> do you mean queryResultCache? you can comment related paragraph in
> solrconfig.xml
> see http://wiki.apache.org/solr/SolrCaching
>
> 2011/2/8 Isan Fulia :
> > Hi,
> > My solrConfig file looks like
> >
> > 
> >  
> >
> >  
> > > multipartUploadLimitInKB="2048" />
> >  
> >
> >   > default="true" />
> >  
> >   > class="org.apache.solr.handler.admin.AdminHandlers" />
> >
> >
> >   > class="org.apache.solr.request.XSLTResponseWriter">
> >  
> >  
> >  
> >*:*
> >  
> > 
> >
> >
> > EveryTime I fire the same query so as to compare the results for
> different
> > configurations , the query result time is getting reduced because of
> > caching.
> > So I want to turn off the cahing or clear the ache before  i fire the
> same
> > query .
> > Does anyone know how to do it.
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>


Re: Monitor the QTime.

2011-02-11 Thread Stijn Vanhoorelbeke
> QTime is, of course, specific to the query, but it is returned in the
> response XML, so one could run occasional queries to figure it out.
> Please see http://wiki.apache.org/solr/SearchHandler
>
> Regards,
> Gora
>

Yes, this could be a possibility. But then the Solr cache jumps back into
the picture.
I cannot simply query the system each minute with the same query - that way
the result would be completely satisfied by the internal caches. I could
build a list of heavy queries to do so - but I'd loved to use a more
straight forward method.


Re: Monitor the QTime.

2011-02-11 Thread Stijn Vanhoorelbeke
2011/2/11 Ryan McKinley 

> You may want to check the stats via JMX.  For example,
>
>
> http://localhost:8983/solr/core/admin/mbeans?stats=true&key=org.apache.solr.handler.StandardRequestHandler
>
> shows some basic stats info for the handler.
> ryan


Can you access this URL from a web browser (tried but doesn't work ) ? Or
must this used in jConsole / custom made java program.

Could you please point me to a good guide to implement this JMX stuff, cause
I'm a newbie for JMX.


Re: SolrCloud (ZooKeeper)

2011-02-12 Thread Stijn Vanhoorelbeke
Hi,

Has anyone used Zookeeper on a Tomcat 5 system?
Could someone point me to a good guide how to implement Zookeeper/solr
combination? The SolCloud wiki page doesn't give much info at all.

Thank you,


Re: SolrCloud - Example C not working

2011-02-16 Thread Stijn Vanhoorelbeke
2011/2/16 Yonik Seeley 

> On Wed, Feb 16, 2011 at 3:57 AM, Thorsten Scherler 
> wrote:
> > On Tue, 2011-02-15 at 09:59 -0500, Yonik Seeley wrote:
> >> On Mon, Feb 14, 2011 at 8:08 AM, Thorsten Scherler 
> wrote:
> >> > Hi all,
> >> >
> >> > I followed http://wiki.apache.org/solr/SolrCloud and everything
> worked
> >> > fine till I tried "Example C:".
> >>
> >> Verified.  I just tried and it failed for me too.
> >
> > Hi Yonik, thanks for verifying. :)
> >
> > Should I open an issue and move the thread to the dev list?
>
> Yeah, thanks!
>
> -Yonik
> http://lucidimagination.com
>

Hi,
For me, example C doesn't work eater. I just tried it - example A & B worked
like a charm,

Stijn Vanhoorelbeke


Re: Migration from Solr 1.2 to Solr 1.4

2011-02-17 Thread Stijn Vanhoorelbeke
Hi,

I recently ran across the same issues;
I'm updating my solr 1.4 up to the last Nightly Build ( to have ZooKeeper
functionality ).

I've copied the solr_home dir - but with no success.
( The config files were not accepted on the new build - due to version
mismatch ).
Then I copied only the index data & used a fresh copy of conf/solrconfig.xml
( which I adapted to represent my old solrconfig.xml settings ) - I still
could use the old schema.

That way, I came out with a working - new version - of solr.
But the copied index gave some trouble: 'Format version is not supported'.

I guess I have to rebuild the index.

For your case:
Maybe you could replicate the index by using the Replication handler - build
into solconfig.xml
You could set your old system as master and your new one as slave & hope the
slave gets updated.
( Note: I guess this will not work - as the replication handler, in
solrconfig.xml was only introduced in solr 1.4 and is not present in solr
1.3 )

2011/2/17 Chris Hostetter 
>
> : > if you don't have any custom components, you can probably just use
> : > your entire solr home dir as is -- just change the solr.war.  (you
can't
> : > just copy the data dir though, you need to use the same configs)
> : >
> : > test it out, and note the "Upgrading" notes in the CHANGES.txt for the
> : > 1.3, 1.4, and 1.4.1 releases for "gotchas" that you might wnat to
watch
> : > out for.
>
> : Thank you for your reply, I've tried to copy the data and configuration
> : directory without success :
> : SEVERE: Could not start SOLR. Check solr/home property
> : java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException:
> : Unknown format version: -10
>
> Hmmm... ok, i'm not sure why that would happen.  According to the
> CAHNGES.txt,  Solr 1.2 used Lucene 2.1 and Solr 1.4.1 used 2.9.3 -- so
> Solr 1.4 should have been able to read an index created by Solr 1.2.
>
> You *could* try upgrading first from 1.2 to 1.3, run an optimize command,
> and then try upgradin from 1.3 to 1.4 -- but i can't make any assertions
> that that will work better, since going straight from 1.2 to 1.4 should
> have worked the same way.
>
> When in doubt: reindex.
>
>
> -Hoss


Re: My Plan to Scale Solr

2011-02-17 Thread Stijn Vanhoorelbeke
Hi,

I'm currently looking at SolrCloud. I've managed to set up a scalable
cluster with ZooKeeper.
( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick
understanding )
This way, all different shards / replicas are stored in a centralised
configuration.

Moreover the ZooKeeper contains out-of-the-box loadbalancing.
So, lets say - you have 2 different shards and each is replicated 2 times.
Your zookeeper config will look like this:

\config
 ...
   /live_nodes (v=6 children=4)
  lP_Port:7500_solr (ephemeral v=0)
  lP_Port:7574_solr (ephemeral v=0)
  lP_Port:8900_solr (ephemeral v=0)
  lP_Port:8983_solr (ephemeral v=0)
 /collections (v=20 children=1)
  collection1 (v=0 children=1) "configName=myconf"
   shards (v=0 children=2)
shard1 (v=0 children=3)
 lP_Port:8983_solr_ (v=4)
"node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/";
 lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";
 lP_Port:8900_solr_ (v=1)
"node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/";
shard2 (v=0 children=2)
 lP_Port:7500_solr_ (v=0)
"node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/";
 lP_Port:7574_solr_ (v=1)
"node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/";

--> This setup can be realised, by 1 ZooKeeper module - the other solr
machines need just to know the IP_Port were the zookeeper is active & that's
it.
--> So no configuration / installing is needed to realise quick a scalable /
load balanced cluster.

Disclaimer:
ZooKeeper is a relative new feature - I'm not sure if it will work out in a
real production environment, which has a tight SLA pending.
But - definitely keep your eyes on this stuff - this will mature quickly!

Stijn Vanhoorelbeke


Re: SolrCloud new....

2011-02-18 Thread Stijn Vanhoorelbeke

Hi,

I'm busy doing the exact same thing.
I figured things out - all  by myself - the wiki page is a nice 'fist view',
but doesn't goes in dept...

Lets go ahead:
1)Should i copy the libraries from cloud to trunk???
2)should i keep the cloud module in every system???

A: Yes, you should.
You should get yourself the latest dev trunk and compile it.

The steps I followed:
+ grap latest trunk & build solr
+ backup all solr config files
+ in dir tomcat6/webapps/ remove the dir 'solr'
+ copy the new solr.war ( which you build in first step ) to tomcat6/webapps
+ On your Solr_home/conf dir solrconfig.xml need to be replaced by a new one
( you take from example dir of your build) -- some other config files ( like
schema.xml ) you may keep using the old ones.
+ Adapt the new files to represent the old configuration
+ restart tomcat and it will install new version of solr

It seems the index isn't compatible - so you need to flush your whole index
and re-index all data.
And finally you have your solr system back with zookeeper integrated in
/admin zone :)


3) I am not using any cores in the solr. It is a single solr in every
system.can solrcloud support it??

A: Actually you are using one cor - so gives no problem.
But be sure to check you have solr.xml file in your solr_home dir.
This file just mentions all cores - in your case just one core;
( you can find examples of layout of this file easily on
http://wiki.apache.org/solr/CoreAdmin )

4) the example is given in jetty.Is it the same way to make it in tomcat???

A: Right now - it is the same way.
You have to edit your /etc/init.d/tomcat6 startup script. In the start)
section you can specify all the JAVA_OPTS ( the ones the solrcloud wiki
mentions).

Be sure to set following one:
export JAVA_OPTS="$JAVA_OPTS -DhostPort=8080" ( if tomcat runs on port 8080
)

At first I didn't -->  my zookeeper pointed to standard 8983 port, which
gave errors.


In the above I gave you a quick peak how to get the SolrCloud feature.
In above the Zookeeper is embedded in one of your solr machines. If you
don't want this you may place zookeeper on a different machine ( like I'm
doing right now).

If you need more help - you can contact me.
Stijn Vanhoorelbeke,


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question: havnig multiple solrCloud configuration on the same machine

2011-02-19 Thread Stijn Vanhoorelbeke

Hi,

I'm following your suggestions.

Extract of your last step:
>This would give you three different configurations - you would then edit
>the zookeeper info to point each collection (essentially a SolrCore at
>this point) to the right configuration files:
>
>collections/collection1
>  config=conf1
>
>collections/collection2
>  config=conf2
>
>collections/collection3
>  config=conf3

How do you manage to set this?
Do you need to modify solr.xml to set this stuff?

Stijn Vanhoorelbeke,
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/question-havnig-multiple-solrCloud-configuration-on-the-same-machine-tp1165790p2536210.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud new....

2011-02-20 Thread Stijn Vanhoorelbeke

Hi,

Can I edit the SolrCloud page?
I never thought about it - but since it's a wiki -- everyone can edit,
right?

For the moment I'll not write stuff onto it - but if you need some help, I
can share you some of my ( little, but some ) experience.

2011/2/20 Otis Gospodnetic-2 [via Lucene] <
ml-node+2538747-751169843-301...@n3.nabble.com>

> Hi Stijn,
>
> Thank you for sharing this.
> Would it at all be possible for you to update the parts of SolrCloud page
> that
> are incorrect and that you figured out or add anything new that's not on
> that
> page yet?
>
> Thanks!
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>
> > From: Stijn Vanhoorelbeke <[hidden 
> > email]<http://user/SendEmail.jtp?type=node&node=2538747&i=0>>
>
> > To: [hidden email]<http://user/SendEmail.jtp?type=node&node=2538747&i=1>
> > Sent: Fri, February 18, 2011 6:42:10 AM
> > Subject: Re: SolrCloud new
> >
> >
> > Hi,
> >
> > I'm busy doing the exact same thing.
> > I figured things out -  all  by myself - the wiki page is a nice 'fist
> view',
> > but doesn't goes  in dept...
> >
> > Lets go ahead:
> > 1)Should i copy the libraries from cloud to  trunk???
> > 2)should i keep the cloud module in every system???
> >
> > A: Yes,  you should.
> > You should get yourself the latest dev trunk and compile  it.
> >
> > The steps I followed:
> > + grap latest trunk & build solr
> > +  backup all solr config files
> > + in dir tomcat6/webapps/ remove the dir  'solr'
> > + copy the new solr.war ( which you build in first step ) to
>  tomcat6/webapps
> > + On your Solr_home/conf dir solrconfig.xml need to be  replaced by a new
> one
> > ( you take from example dir of your build) -- some  other config files (
> like
> > schema.xml ) you may keep using the old ones.
> > +  Adapt the new files to represent the old configuration
> > + restart tomcat and  it will install new version of solr
> >
> > It seems the index isn't compatible -  so you need to flush your whole
> index
> > and re-index all data.
> > And finally  you have your solr system back with zookeeper integrated in
> > /admin zone  :)
> >
> >
> > 3) I am not using any cores in the solr. It is a single solr in  every
> > system.can solrcloud support it??
> >
> > A: Actually you are using one  cor - so gives no problem.
> > But be sure to check you have solr.xml file in  your solr_home dir.
> > This file just mentions all cores - in your case just one  core;
> > ( you can find examples of layout of this file easily on
> > http://wiki.apache.org/solr/CoreAdmin )
> >
> > 4) the example is given in  jetty.Is it the same way to make it in
> tomcat???
> >
> > A: Right now - it is the  same way.
> > You have to edit your /etc/init.d/tomcat6 startup script. In the  start)
> > section you can specify all the JAVA_OPTS ( the ones the solrcloud  wiki
> > mentions).
> >
> > Be sure to set following one:
> > export  JAVA_OPTS="$JAVA_OPTS -DhostPort=8080" ( if tomcat runs on port
>  8080
> > )
> >
> > At first I didn't -->  my zookeeper pointed to standard  8983 port, which
>
> > gave errors.
> >
> >
> > In the above I gave you a quick peak  how to get the SolrCloud feature.
> > In above the Zookeeper is embedded in one  of your solr machines. If you
> > don't want this you may place zookeeper on a  different machine ( like
> I'm
> > doing right now).
> >
> > If you need more help -  you can contact me.
> > Stijn Vanhoorelbeke,
> >
> >
> > --
> > View this message  in context:
> >http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html<http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html?by-user=t>
> > Sent  from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2538747.html
>  To unsubscribe from SolrCloud new, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1528872&code=c3Rpam4udmFuaG9vcmVsYmVrZUBnbWFpbC5jb218MTUyODg3MnwxNjg2NDg1MjQ0>.
>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2540432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Why is SolrDispatchFilter using 90% of the Time?

2011-03-03 Thread Stijn Vanhoorelbeke
Hi,

I'm working with a recent NightlyBuild of Solr and I'm doing some serious
ZooKeeper testing.
I've NewRelic monitoring enabled on my solr machines.

When I look at the distribution of the Response-time I notice
'SolrDispatchFilter.doFilter()' is taking up 90% of the time.
The other 10% is used by SolrSeacher and the QueryComponent.

+ Can anyone explain me why SolrDispatchFilter is consuming so much time?
++ Can I do something to lower this number?
 ( After all SolrDispatchFilter must Dispatch each time to the standard
searcher. )

Stijn Vanhoorelbeke


Explanation of the different caches.

2010-12-21 Thread Stijn Vanhoorelbeke
Hi,

I want to do a quick&dirt load testing - but all my results are cached.
I commented out all the Solr caches - but still everything is cached.

* Can the caching come from the 'Field Collapsing Cache'.
 -- although I don't see this element in my config file.
( As the system now jumps from 1GB to 7 GB of RAM when I do a load
test with lots of queries ).

* Can it be a Lucence cache?

+-+
I want to lower the caches so they cache only some 100 or 1000 documents.
( Right now - when I do 50 000 unique queries Solr will use 7 GB of
RAM and everything fits in some cache! )

Any suggestions how I could proper stress test my Solr - with a small
number of queries (some 100  - not in the millions as some testers
have)?


Re: Explanation of the different caches.

2010-12-21 Thread Stijn Vanhoorelbeke
I am aware of the power of the caches.
I do not want to completely remove the caches - I want them to be small.
- So I can launch a stress test with small amount of data.
( Some items may come from cache - some need to be searched up <->
right now everything comes from the cache... )

2010/12/21 Toke Eskildsen :
> Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote:
>> I want to do a quick&dirt load testing - but all my results are cached.
>> I commented out all the Solr caches - but still everything is cached.
>>
>> * Can the caching come from the 'Field Collapsing Cache'.
>  > -- although I don't see this element in my config file.
>> ( As the system now jumps from 1GB to 7 GB of RAM when I do a load
>> test with lots of queries ).
>
> If you allow the JVM to use a maximum of 7GB heap, it is not that surprising 
> that it allocates it when you hammer the searcher. Whether the heap is used 
> for caching or just filled with dead object waiting for garbage collection is 
> hard to say at this point. Try lowering the maximum heap to 1 GB and do your 
> testing again.
>
> Also note that Lucene/Solr performance on conventional harddisks benefits a 
> lot from disk caching: If you perform the same search more than one time, the 
> speed will increase significantly as relevant parts of the index will 
> (probably) be in RAM. Remember to flush your disk cache between tests.