from:"Sujatha Arun"

DocValue field & commit

2020-03-30 Thread sujatha arun

 A facet heavy query which uses docValue fields for faceting  returns about
5k results executes between  10ms to 5 secs and the 5 secs time seems to
coincide with after a hard commit.

Does that have any relation? Why the fluctuation in execution time?

Thanks,
Revas

Solr 8.2 indexing issues

2019-11-12 Thread Sujatha Arun

We recently migrated from 6.6.2 to 8.2. We are seeing issues with indexing
where the leader and the replica document counts do not match. We get
different results every time we do a *:* search.

The only issue we see in the logs is Jira issue : Solr-13293

Has anybody seen similar issues?

Thanks

Re: Solr 4.7 .0 Issues with Update and Delete

2014-10-01 Thread Sujatha Arun

Thanks Erick, No problem. Will check .

On Thu, Oct 2, 2014 at 12:54 AM, Erick Erickson 
wrote:

> I'm clueless about the MongoDB connector. I suspect
> that's where the issue is unless you can reproduce
> this with a solr-only case.
>
> As you can tell, I don't recall seeing this problem come by
> the boards.
>
> Best,
> Erick
>
> On Wed, Oct 1, 2014 at 11:53 AM, Sujatha Arun  wrote:
> > Erick,
> >
> >  Actually I am synching data between solr and  Mongodb using
> > mongo-connector.The details below. I have submitterd an issue in
> > Mongo-conenctor forum,just trying at the solr forum too  just in  case
> > anybody has encountered the same :) or why does the log state No
> > uncommitted changes. Skipping IW.commit
> >
> >
> > *Solr 4.7.0 Mongo connector 1.3.dev0 *
> >
> >
> > When I update a document for the first time with query
> >
> >
> > db.products.update({BookId:1179416.0},{$set:{Status:"no sell"}}
> >
> >
> > Entry in solr Mongo connector
> > INFO - Finished 'http://search.bf.com:8080/solr/bf/update/?commit=true'
> > (post) with body 'u' >
> >
> > Entry in solr logs
> > org.apache.solr.update.UpdateHandler – end_commit_flush 47824126
> > [http-bio-8080-exec-1] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor – [bf] webapp=/solr
> > path=/update/ params={commit=true} {add=[1179416.0
> > (1480746950999408640)],commit=} 0 36
> >
> > When I update same document again
> > db.products.update({BookId:1179416.0},{$set:{Status:"sell"}}
> >
> > Mongo Db Logs
> > NFO - Finished 'http://search.bf.com:8080/solr/bf/update/?commit=true'
> > (post) with body 'u' > Solr Logs
> > org.apache.solr.update.UpdateHandler – No uncommitted changes. Skipping
> > IW.commit. 47904520 [http-bio-8080-exec-8] INFO
> > org.apache.solr.search.SolrIndexSearcher – Opening Searcher@7f771c0[bf]
> > main 47904520 [http-bio-8080-exec-8] INFO
> > org.apache.solr.update.UpdateHandler – end_commit_flush 47904521
> > [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore – [bf]
> > Registered new searcher Searcher@7f771c0[bf]
> > main{StandardDirectoryReader(segments_2zz:8617:nrt
> > _s1(4.7):C450500/3:delGen=3 _wj(4.7):C63500 _1kf(4.7):C396334/1:delGen=1
> > _26o(4.7):C360500 _2pl(4.7):C248000 _2in(4.7):C59000 _2ub(4.7):C77000
> > _2zv(4.7):C63500 _36a(4.7):C77000 _32x(4.7):C27500 _34v(4.7):C27500
> > _38s(4.7):C27500 _3ag(4.7):C41000 _3ba(4.7):C6860 _3b0(4.7):C5000
> > _3bb(4.7):C1 _3bc(4.7):C1 _3bg(4.7):C1 _3bh(4.7):C1 _3bi(4.7):C1)}
> 47904521
> > [http-bio-8080-exec-8] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor – [bf] webapp=/solr
> > path=/update/ params={commit=true} {commit=,commit=} 0 3
> > Each time Mongoconnector sends a commit, but the second time ,nothing
> > happens.this seems to work only once.
> >
> > On Wed, Oct 1, 2014 at 9:47 PM, Erick Erickson 
> > wrote:
> >
> >> At this point details matter a lot.
> >>
> >> What exactly are you doing when you update?
> >>
> >> What happens if you issue an explicit update command? i.e.
> >> http://blahlbah/solr/collection/update?commit=true?
> >>
> >> Are you sure you aren't seeing, say, browser caching?
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Oct 1, 2014 at 9:04 AM, Sujatha Arun 
> wrote:
> >> > Thanks, BookId is the unique key, the issue is resolved with respect
> to
> >> > delete .Its the update that causing the issue
> >> >
> >> > On Wed, Oct 1, 2014 at 8:51 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> > wrote:
> >> >
> >> >> I'd add only one thing to Angel's comments:
> >> >> you're deleting by "id", but querying by "BookId". This
> >> >> _should_ work (on a quick glance at the code) iff
> >> >> your  is "BookId"...
> >> >>
> >> >> I took a quick glance at the code and "id" should delete by
> >> >> , so is your "BookId" the  in
> >> >> your schema?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Oct 1, 2014 at 12:37 AM, Angel Tchorbadjiiski
> >> >>  wrote:
> >> >> > Hello Sujatha,
> >> >> >
> >> >> > have you tried to leave the quotes out? :-)
> >> >> &g

Re: Solr 4.7 .0 Issues with Update and Delete

2014-10-01 Thread Sujatha Arun

Erick,

 Actually I am synching data between solr and  Mongodb using
mongo-connector.The details below. I have submitterd an issue in
Mongo-conenctor forum,just trying at the solr forum too  just in  case
anybody has encountered the same :) or why does the log state No
uncommitted changes. Skipping IW.commit


*Solr 4.7.0 Mongo connector 1.3.dev0 *


When I update a document for the first time with query


db.products.update({BookId:1179416.0},{$set:{Status:"no sell"}}


Entry in solr Mongo connector
INFO - Finished 'http://search.bf.com:8080/solr/bf/update/?commit=true'
(post) with body 'u'http://search.bf.com:8080/solr/bf/update/?commit=true'
(post) with body 'u'
wrote:

> At this point details matter a lot.
>
> What exactly are you doing when you update?
>
> What happens if you issue an explicit update command? i.e.
> http://blahlbah/solr/collection/update?commit=true?
>
> Are you sure you aren't seeing, say, browser caching?
>
> Best,
> Erick
>
> On Wed, Oct 1, 2014 at 9:04 AM, Sujatha Arun  wrote:
> > Thanks, BookId is the unique key, the issue is resolved with respect to
> > delete .Its the update that causing the issue
> >
> > On Wed, Oct 1, 2014 at 8:51 PM, Erick Erickson 
> > wrote:
> >
> >> I'd add only one thing to Angel's comments:
> >> you're deleting by "id", but querying by "BookId". This
> >> _should_ work (on a quick glance at the code) iff
> >> your  is "BookId"...
> >>
> >> I took a quick glance at the code and "id" should delete by
> >> , so is your "BookId" the  in
> >> your schema?
> >>
> >>
> >>
> >> On Wed, Oct 1, 2014 at 12:37 AM, Angel Tchorbadjiiski
> >>  wrote:
> >> > Hello Sujatha,
> >> >
> >> > have you tried to leave the quotes out? :-)
> >> >
> >> > Alternatively try using 'id:1.0' to
> see
> >> if
> >> > the same error arises.
> >> >
> >> > A bit more information on the Update issue (exact query sent and all
> the
> >> log
> >> > corresponding entries) would be needed to help you with your problem.
> >> >
> >> > Cheers
> >> > Angel
> >> >
> >> >
> >> > On 01.10.2014 09:19, Sujatha Arun wrote:
> >> >>
> >> >> I am having the  following issue on delete and update in solr 4.7.0
> >> >>
> >> >>
> >> >> *Delete Issue*
> >> >>
> >> >> I am using the following Curl command to delete a document from index
> >> >>
> >> >> curl http://localhost:8080/solr/bf/update?commit=true -H
> "Content-Type:
> >> >> text/xml" --data-binary '"1.0"'
> >> >>
> >> >> 
> >> >> 
> >> >> 0 >> >> name="QTime">7
> >> >> 
> >> >>
> >> >> This is what I see in logs
> >> >>
> >> >>   INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [bf]
> >> >> webapp=/solr path=/update params={commit=true} {delete=["1.0"
> >> >> (-1480743948591824896)],commit=} 0 8
> >> >>
> >> >> But When  I query I am still seeing the document
> >> >>
> >> >> INFO  org.apache.solr.core.SolrCore  – [bf] webapp=/solr path=/select
> >> >> params={indent=true&q=BookId:1.0&_=1412147431359&wt=json} hits=1
> >> status=0
> >> >> QTime=111
> >> >>
> >> >>
> >> >> =
> >> >>
> >> >> *Update Issue*
> >> >>
> >> >> When I update a document for the first time it works fine ,but when I
> >> >> update the same document again ,i  see this message in the logs and
> >> >> document is not updated
> >> >>
> >> >>   INFO  org.apache.solr.update.UpdateHandler  – No uncommitted
> changes.
> >> >> Skipping IW.commit.
> >> >>
> >> >> Regards
> >> >>
> >> >
> >>
>

Re: Solr 4.7 .0 Issues with Update and Delete

2014-10-01 Thread Sujatha Arun

Thanks, BookId is the unique key, the issue is resolved with respect to
delete .Its the update that causing the issue

On Wed, Oct 1, 2014 at 8:51 PM, Erick Erickson 
wrote:

> I'd add only one thing to Angel's comments:
> you're deleting by "id", but querying by "BookId". This
> _should_ work (on a quick glance at the code) iff
> your  is "BookId"...
>
> I took a quick glance at the code and "id" should delete by
> , so is your "BookId" the  in
> your schema?
>
>
>
> On Wed, Oct 1, 2014 at 12:37 AM, Angel Tchorbadjiiski
>  wrote:
> > Hello Sujatha,
> >
> > have you tried to leave the quotes out? :-)
> >
> > Alternatively try using 'id:1.0' to see
> if
> > the same error arises.
> >
> > A bit more information on the Update issue (exact query sent and all the
> log
> > corresponding entries) would be needed to help you with your problem.
> >
> > Cheers
> > Angel
> >
> >
> > On 01.10.2014 09:19, Sujatha Arun wrote:
> >>
> >> I am having the  following issue on delete and update in solr 4.7.0
> >>
> >>
> >> *Delete Issue*
> >>
> >> I am using the following Curl command to delete a document from index
> >>
> >> curl http://localhost:8080/solr/bf/update?commit=true -H "Content-Type:
> >> text/xml" --data-binary '"1.0"'
> >>
> >> 
> >> 
> >> 0 >> name="QTime">7
> >> 
> >>
> >> This is what I see in logs
> >>
> >>   INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [bf]
> >> webapp=/solr path=/update params={commit=true} {delete=["1.0"
> >> (-1480743948591824896)],commit=} 0 8
> >>
> >> But When  I query I am still seeing the document
> >>
> >> INFO  org.apache.solr.core.SolrCore  – [bf] webapp=/solr path=/select
> >> params={indent=true&q=BookId:1.0&_=1412147431359&wt=json} hits=1
> status=0
> >> QTime=111
> >>
> >>
> >> =
> >>
> >> *Update Issue*
> >>
> >> When I update a document for the first time it works fine ,but when I
> >> update the same document again ,i  see this message in the logs and
> >> document is not updated
> >>
> >>   INFO  org.apache.solr.update.UpdateHandler  – No uncommitted changes.
> >> Skipping IW.commit.
> >>
> >> Regards
> >>
> >
>

Solr 4.7 .0 Issues with Update and Delete

2014-10-01 Thread Sujatha Arun

I am having the  following issue on delete and update in solr 4.7.0


*Delete Issue*

I am using the following Curl command to delete a document from index

curl http://localhost:8080/solr/bf/update?commit=true -H "Content-Type:
text/xml" --data-binary '"1.0"'



07


This is what I see in logs

 INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [bf]
webapp=/solr path=/update params={commit=true} {delete=["1.0"
(-1480743948591824896)],commit=} 0 8

But When  I query I am still seeing the document

INFO  org.apache.solr.core.SolrCore  – [bf] webapp=/solr path=/select
params={indent=true&q=BookId:1.0&_=1412147431359&wt=json} hits=1 status=0
QTime=111


=

*Update Issue*

When I update a document for the first time it works fine ,but when I
update the same document again ,i  see this message in the logs and
document is not updated

 INFO  org.apache.solr.update.UpdateHandler  – No uncommitted changes.
Skipping IW.commit.

Regards

PDF Indexing

2014-04-02 Thread Sujatha Arun

Hi,

I  am able to use TIKA and DIH to  Index a pdf as a single document.However
I need each page to be single document. Is there any inbuilt mechanism to
achieve the same or do I have to use pdfbox or any other tool achieve this?

Regards

Re: solr 4.3.1 Installation

2013-07-16 Thread Sujatha Arun

Thanks Sandeep,that fixed it.

Regards,
Sujatha


On Tue, Jul 16, 2013 at 10:41 PM, Sandeep Gupta  wrote:

> This problem looks to me because of solr logging ...
> see below detail description (taken one of the mail thread)
>
>
> -
> Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file,
> so you need to put them in your classpath.  Jarfiles are included in the
> example, in example/lib/ext, and those jarfiles set up logging to use
> log4j, a much more flexible logging framework than JDK logging.
>
> JDK logging is typically set up with a file called logging.properties,
> which I think you must use a system property to configure.  You aren't
> using JDK logging, you are using log4j, which uses a file called
> log4j.properties.
>
>
> http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty
>
>
>
>
> On Tue, Jul 16, 2013 at 6:28 PM, Sujatha Arun  wrote:
>
> > Hi ,
> >
> > We have been using solr 3.6.1 .Recently  downloaded the solr 4.3.1
>  version
> >  and installed the same  as multicore setup as follows
> >
> > Folder Structure
> > solr.war
> > solr
> >  conf
> >core0
> > core1
> > solr.xml
> >
> > Created the context fragment xml file in tomcat/conf/catalina/localhost
> > which refers to the solr.war file and the solr home folder
> >
> > copied the muticore conf folder without the zoo.cfg file
> >
> > I get the following error and admin page does not load
> > 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
> > SEVERE: Error filterStart
> > 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
> > SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
> > 16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig
> > checkResources
> > INFO: Undeploying context [/solr_4.3.1]
> > 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
> > SEVERE: Error filterStart
> > 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
> > SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
> >
> >
> > Please let me know what I am missing If i need to install this with the
> > default multicore setup without the cloud .Thanks
> >
> > Regards
> > Sujatha
> >
>

solr 4.3.1 Installation

2013-07-16 Thread Sujatha Arun

Hi ,

We have been using solr 3.6.1 .Recently  downloaded the solr 4.3.1  version
 and installed the same  as multicore setup as follows

Folder Structure
solr.war
solr
 conf
   core0
core1
solr.xml

Created the context fragment xml file in tomcat/conf/catalina/localhost
which refers to the solr.war file and the solr home folder

copied the muticore conf folder without the zoo.cfg file

I get the following error and admin page does not load
16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig
checkResources
INFO: Undeploying context [/solr_4.3.1]
16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr_4.3.1] startup failed due to previous errors


Please let me know what I am missing If i need to install this with the
default multicore setup without the cloud .Thanks

Regards
Sujatha

Re: XInclude in data-config.xml

2013-04-12 Thread Sujatha Arun

Hi Andre,

In 3.6.1 version when we used the entities in schema.xml for language
analyzers ,it  gave errors on server restart and core would not load .

Regards,
Sujatha


2013/4/12 Andre Bois-Crettez 

> On 04/12/2013 09:31 AM, stockii wrote:
>
>> hello.
>>
>> is it possible to include some entities with XInclude in my
>> data-config.xml?
>>
>
> We first struggled with XInclude, and then switched to use custom
> entities, which worked much better for our needs (reusing  common parts
> in several SearchHandlers).
> ex. in solrconfig.xml :
>
> 
>  
> ]>
> 
> ...
> 
> 
> &solrconfigcommon;
> 
> ...
>
> 
>
> in solrconfig_common.xml :
>
> 
> explicit
> edismax
> title^4 description^1
> *:*
> *:*
> 20
> AND
> title~2^2.0
>
> HTH
> André
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Latency Comparison between cloud hosting Vs Dedicated hosting

2013-04-09 Thread Sujatha Arun

Hi,

We are comparing search request latency between Amazon Vs  Dedicated
hosting [Rackspace] .For comparison we used solr version 3.6.1 and Amazon
small instance.The index size was less than 1GB.

We see that the latency is about 75 -100 %  from Amazon. Any body who has
migrated form Dedicated hosting to Cloud has any pointers  for improving
latecny?

Would a bigger instance improve latency?

Regards
Sujatha

Re: Master /Slave Set up on AWS - 3.6.1

2013-03-05 Thread Sujatha Arun

Thanks Otis . Yes  true but considering that the Indexing is via a queue,
there would actually be minimal load on the machine. And we are planning to
replicate this set up via adding more machines when the server reaches
about 80% capacity for adding more cores.

Regards,
Sujatha

On Wed, Mar 6, 2013 at 9:26 AM, Otis Gospodnetic  wrote:

> Hello,
>
> This is not recommended because people typically don't want the load from
> indexing affect queries/user experience.  If your numbers are low, then
> this may not be a big deal. If you already need to create a core on 2
> machines, creating it on 3 doesn't seem a big deal.  There is a slight
> conflict in what you wrote, which is that cores will be created frequently,
> which implies you will quickly have lots of cores vs. desire to have
> minimal hardware.
>
> If it were up to me, I'd consider a weaker/cheaper master, and more slaves.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Mar 5, 2013 at 12:46 PM, Sujatha Arun  wrote:
>
> > Hi Otis,Michael,
> >
> > Thanks for your input and suggestions .
> >
> > Yes, we were considering the sticky session for pagination  and we are
> not
> > planning for having index on EBS
> >
> > I would like to understand  why its not the recommended approach,  can
> you
> > please explain?
> >
> > Till now we were having a single server for both Indexing and search
> > ,though this was on dedicated server and not on cloud. Indexing would
> > happen sequentially via  a queue due to which commits would happen for
> only
> > one or at most 2 cores simultaneously.
> >
> > With Master /Slave approach  I see that when a slave replicates form
> master
> > based on the poll time which I have defined in solr.xml file & depending
> on
> > the number of cores/webapp  ,the replication and hence commit is going to
> > happen simultaneously for many cores increasing  load on slave server.
> >
> > By having  2 slaves and one master -  whenever we create a core,which
> > happens quite frequently  we need to create this on 3 servers instead of
> > two , which has to done manually by running a script on each server.We
> have
> > requirement for adding cores per each customer.
> >
> > I do understand that hardware requirements for master can be quite
> > different [Lower memory /higher CPU/Cache setting in config /autowarming
> > etc  ] from slave.But given that we will be Indexing sequentially and
> > having the same configuration in terms of memory and CPU/cache  for both
> > master and slave,would this be a reasonable approach?
> >
> > Thanks&Regards,
> > Sujatha
> >
> >
> >
> > On Tue, Mar 5, 2013 at 9:10 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> > > If your index is on EBS, you'll see big iowait percentages when merges
> > > happen.
> > >
> > > I'm not sure what that's going to do to your master's ability to
> > > service requests. You should test.
> > >
> > > Alternatively, you might figure out the size of machine you need to
> > > index vs. the size of machine you need to service queries. They're
> > > very likely not the same, in which case, that may afford you the
> > > ability to have 2 slaves and 1 master in a similar budget.
> > >
> > > Also, once you've settled on an infrastructure, you should investigate
> > > buying reserved instances for a year. It will greatly reduce your
> > > costs.
> > >
> > > Michael Della Bitta
> > >
> > > 
> > > Appinions
> > > 18 East 41st Street, 2nd Floor
> > > New York, NY 10017-6271
> > >
> > > www.appinions.com
> > >
> > > Where Influence Isn’t a Game
> > >
> > >
> > > On Tue, Mar 5, 2013 at 8:59 AM, Sujatha Arun 
> > wrote:
> > > > Is there anything wrong with set up?
> > > >
> > > > On Tue, Mar 5, 2013 at 5:43 PM, Sujatha Arun 
> > > wrote:
> > > >
> > > >> Hi Otis,
> > > >>
> > > >> Since currently we are planning for only one slave  due to cost
> > > >> considerations, can we have an ELB fronting the master and slave for
> > HA.
> > > >>
> > > >>1. All index requests will go to the master .
> > > >>2. Slave replicates  from master .
> > > >>

Re: Master /Slave Set up on AWS - 3.6.1

2013-03-05 Thread Sujatha Arun

Hi Otis,Michael,

Thanks for your input and suggestions .

Yes, we were considering the sticky session for pagination  and we are not
planning for having index on EBS

I would like to understand  why its not the recommended approach,  can you
please explain?

Till now we were having a single server for both Indexing and search
,though this was on dedicated server and not on cloud. Indexing would
happen sequentially via  a queue due to which commits would happen for only
one or at most 2 cores simultaneously.

With Master /Slave approach  I see that when a slave replicates form master
based on the poll time which I have defined in solr.xml file & depending on
the number of cores/webapp  ,the replication and hence commit is going to
happen simultaneously for many cores increasing  load on slave server.

By having  2 slaves and one master -  whenever we create a core,which
happens quite frequently  we need to create this on 3 servers instead of
two , which has to done manually by running a script on each server.We have
requirement for adding cores per each customer.

I do understand that hardware requirements for master can be quite
different [Lower memory /higher CPU/Cache setting in config /autowarming
etc  ] from slave.But given that we will be Indexing sequentially and
having the same configuration in terms of memory and CPU/cache  for both
master and slave,would this be a reasonable approach?

Thanks&Regards,
Sujatha

On Tue, Mar 5, 2013 at 9:10 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> If your index is on EBS, you'll see big iowait percentages when merges
> happen.
>
> I'm not sure what that's going to do to your master's ability to
> service requests. You should test.
>
> Alternatively, you might figure out the size of machine you need to
> index vs. the size of machine you need to service queries. They're
> very likely not the same, in which case, that may afford you the
> ability to have 2 slaves and 1 master in a similar budget.
>
> Also, once you've settled on an infrastructure, you should investigate
> buying reserved instances for a year. It will greatly reduce your
> costs.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Tue, Mar 5, 2013 at 8:59 AM, Sujatha Arun  wrote:
> > Is there anything wrong with set up?
> >
> > On Tue, Mar 5, 2013 at 5:43 PM, Sujatha Arun 
> wrote:
> >
> >> Hi Otis,
> >>
> >> Since currently we are planning for only one slave  due to cost
> >> considerations, can we have an ELB fronting the master and slave for HA.
> >>
> >>1. All index requests will go to the master .
> >>2. Slave replicates  from master .
> >>3. Search request can go either to master /slave via ELB.
> >>
> >> is that resonable   HA for search ?
> >>
> >> Regards
> >> Sujatha
> >>
> >>
> >>
> >> On Tue, Mar 5, 2013 at 5:12 PM, Otis Gospodnetic <
> >> otis.gospodne...@gmail.com> wrote:
> >>
> >>> Hi Sujatha,
> >>>
> >>> If I understand correctly, you will have only 1 slave (and 1 master),
> so
> >>> that's not really a HA architecture.  You could manually turn master
> into
> >>> slave, but that's going to mean some down time...
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support
> >>> http://sematext.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Mar 5, 2013 at 3:05 AM, Sujatha Arun 
> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > We are planning to set up *2* *High-Memory Quadruple Extra Large
> >>> Instance
> >>> >  *as
> >>> > master and slave for our multicore solr setup  which has more than
> 200
> >>> > cores spread between a couple of webapps on a single JVM on *AWS*
> >>> >
> >>> > All indexing [via a queue will go to master ]  . One Slave  Server
> will
> >>> > replicate all the core level indexes from the master , slave
> >>> Configurations
> >>> > are defined in the solr.xml  at the webapp level  with a different
> poll
> >>> > interval for each webapp.
> >>> >
> >>> > We are planning to LB the search requests by fronting the master and
> >>> slave
> >>> > with an *AWS ELB *. The master configuration will not enable the
> slave
> >>> > properties as master is not replicating from any other machine. The
> >>> master
> >>> > and slave have similar hardware configurations [*High-Memory
> Quadruple
> >>> > Extra Large Instance] .*This is mainly for HA if the slave goes down.
> >>> > *
> >>> > *
> >>> > Any issue with the above set up ,please advice.
> >>> >
> >>> > Regards,
> >>> > Sujatha
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > *
> >>> > *
> >>> >
> >>>
> >>
> >>
>

Re: Master /Slave Set up on AWS - 3.6.1

2013-03-05 Thread Sujatha Arun

Is there anything wrong with set up?

On Tue, Mar 5, 2013 at 5:43 PM, Sujatha Arun  wrote:

> Hi Otis,
>
> Since currently we are planning for only one slave  due to cost
> considerations, can we have an ELB fronting the master and slave for HA.
>
>1. All index requests will go to the master .
>2. Slave replicates  from master .
>3. Search request can go either to master /slave via ELB.
>
> is that resonable   HA for search ?
>
> Regards
> Sujatha
>
>
>
> On Tue, Mar 5, 2013 at 5:12 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Hi Sujatha,
>>
>> If I understand correctly, you will have only 1 slave (and 1 master), so
>> that's not really a HA architecture.  You could manually turn master into
>> slave, but that's going to mean some down time...
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Tue, Mar 5, 2013 at 3:05 AM, Sujatha Arun  wrote:
>>
>> > Hi,
>> >
>> > We are planning to set up *2* *High-Memory Quadruple Extra Large
>> Instance
>> >  *as
>> > master and slave for our multicore solr setup  which has more than 200
>> > cores spread between a couple of webapps on a single JVM on *AWS*
>> >
>> > All indexing [via a queue will go to master ]  . One Slave  Server will
>> > replicate all the core level indexes from the master , slave
>> Configurations
>> > are defined in the solr.xml  at the webapp level  with a different poll
>> > interval for each webapp.
>> >
>> > We are planning to LB the search requests by fronting the master and
>> slave
>> > with an *AWS ELB *. The master configuration will not enable the slave
>> > properties as master is not replicating from any other machine. The
>> master
>> > and slave have similar hardware configurations [*High-Memory Quadruple
>> > Extra Large Instance] .*This is mainly for HA if the slave goes down.
>> > *
>> > *
>> > Any issue with the above set up ,please advice.
>> >
>> > Regards,
>> > Sujatha
>> >
>> >
>> >
>> >
>> > *
>> > *
>> >
>>
>
>

Re: Master /Slave Set up on AWS - 3.6.1

2013-03-05 Thread Sujatha Arun

Hi Otis,

Since currently we are planning for only one slave  due to cost
considerations, can we have an ELB fronting the master and slave for HA.

   1. All index requests will go to the master .
   2. Slave replicates  from master .
   3. Search request can go either to master /slave via ELB.

is that resonable   HA for search ?

Regards
Sujatha



On Tue, Mar 5, 2013 at 5:12 PM, Otis Gospodnetic  wrote:

> Hi Sujatha,
>
> If I understand correctly, you will have only 1 slave (and 1 master), so
> that's not really a HA architecture.  You could manually turn master into
> slave, but that's going to mean some down time...
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Mar 5, 2013 at 3:05 AM, Sujatha Arun  wrote:
>
> > Hi,
> >
> > We are planning to set up *2* *High-Memory Quadruple Extra Large Instance
> >  *as
> > master and slave for our multicore solr setup  which has more than 200
> > cores spread between a couple of webapps on a single JVM on *AWS*
> >
> > All indexing [via a queue will go to master ]  . One Slave  Server will
> > replicate all the core level indexes from the master , slave
> Configurations
> > are defined in the solr.xml  at the webapp level  with a different poll
> > interval for each webapp.
> >
> > We are planning to LB the search requests by fronting the master and
> slave
> > with an *AWS ELB *. The master configuration will not enable the slave
> > properties as master is not replicating from any other machine. The
> master
> > and slave have similar hardware configurations [*High-Memory Quadruple
> > Extra Large Instance] .*This is mainly for HA if the slave goes down.
> > *
> > *
> > Any issue with the above set up ,please advice.
> >
> > Regards,
> > Sujatha
> >
> >
> >
> >
> > *
> > *
> >
>

Master /Slave Set up on AWS - 3.6.1

2013-03-05 Thread Sujatha Arun

Hi,

We are planning to set up *2* *High-Memory Quadruple Extra Large Instance  *as
master and slave for our multicore solr setup  which has more than 200
cores spread between a couple of webapps on a single JVM on *AWS*

All indexing [via a queue will go to master ]  . One Slave  Server will
replicate all the core level indexes from the master , slave Configurations
are defined in the solr.xml  at the webapp level  with a different poll
interval for each webapp.

We are planning to LB the search requests by fronting the master and slave
with an *AWS ELB *. The master configuration will not enable the slave
properties as master is not replicating from any other machine. The master
and slave have similar hardware configurations [*High-Memory Quadruple
Extra Large Instance] .*This is mainly for HA if the slave goes down.
*
*
Any issue with the above set up ,please advice.

Regards,
Sujatha




*
*

Multicore Master - Slave - solr 3.6.1

2013-02-27 Thread Sujatha Arun

We have a multicore setup with more that 200 cores . Some of the cores have
different schema based on the search type /language.

While trying to migrate to Master /Slave set up.

I see that we can specify the  Master /Slave properties in
solrcore.properties file . However does this have to done at a core level
.What are the options for defining this globally across the cores so that
when promoting a Slave to master  /vice versa ,i do not have to this for
each core.

I tried the following:

1) Added the properties as name value pairs in the solr.xml  - *But these
values are lost on Server Restart*
*
*
2) Tried defining the properties in  a single  file and tried to reference
to the same file for ever core at core creation with below command .*But
this is not picked up and not reflected in solr.xml*

Command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=coreX&instanceDir=path_to_instance_directory&properties=path
to common properties file

3)Tried adding the solrcore,properties at the level of solr.xml  file ,but
this also doe not work

The last 2 methods if works ,i  guess would involve server restart for any
changes as opposed to core relaod


4) I do not want to share a same common InstanceDir as we
have different type of schema and this will add confusion to OPS team on
creating the cores.


So any pointers on how we can define the global solrcore.properties file
 ?Thanks

Regards
Sujatha

Re: solr 3.6.1 Indexing and utf8 issue

2013-01-22 Thread Sujatha Arun

Thanks for the pointer , but given the same index code ,why does this not
work in solr 3.6.1 but wors fine in solr 1.3

Any idea?

Regards
Sujatha

On Tue, Jan 22, 2013 at 9:33 PM, Markus Jelsma
wrote:

> Hi,
>
> You've likely got some non-character code points in your data and they
> need to be stripped.
>
> http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True
> :]
>
> See the patch for NUTCH-1016 for an example on how to strip them. It's
> easily ported to other languages.
> https://issues.apache.org/jira/browse/NUTCH-1016
>
> Cheers,
>
>
>
> -Original message-
> > From:Sujatha Arun 
> > Sent: Tue 22-Jan-2013 12:35
> > To: solr-user@lucene.apache.org
> > Subject: solr 3.6.1 Indexing and utf8 issue
> >
> > Hi,
> >
> > We are on solr 3.6.1 on  Tomcat 5.5.25 . The Indexing of polish content
> throws the following error  .
> >
> > Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte
> 0x77 (at char #166, byte #127)
> > at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
> > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
> > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
> > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
> > ... 20 more
> > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte
> 0x77
> >
> >
> >
> > I have added a patch to enable utf-8 encoding in solrDispatchFilter.java
> file
> >
> > The same content file in 1.3 with utf8 patch works fine .Please find
> attached content file
> >
> > Please let me know what could be missing?
> >
> > Regards
> > Sujatga
> >
>

solr 3.6.1 Indexing and utf8 issue

2013-01-22 Thread Sujatha Arun

Hi,

We are on solr 3.6.1 on  Tomcat 5.5.25 . The Indexing of polish content
throws the following error  .

*Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte
0x77 (at char #166, byte #127)*
at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:309)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
... 20 more
*Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x77*
*
*
*
*

I have added a patch to enable utf-8 encoding in solrDispatchFilter.java
file

The same content file in 1.3 with utf8 patch works fine .Please find
attached content file

Please let me know what could be missing?

Regards
Sujatga
● Rozmieszczenie kamieni może być połączone z masażem całego ciała. 
AROMATERAPIA9 Olejki aromatyczne mogą zostać użyte w połączeniu z hydroterapią 
oraz z zabiegami masażu. Bezpieczeństwo przeprowadzanego zabiegu 
wykorzystującego olejki aromatyczne jest bardzo ważne. Do zabiegu masażu 
nieodzowne jest rozcieńczenie wykorzystywanego olejku. Zawsze należy używać 
olejków aromatycznych, które uznawane są za bezpieczne. Inhalacje ● Elektryczne 
rozpylacze są łatwe w użyciu i efektywnie rozpylają aromat w gabinecie, 
wykorzystuje się je do celów zarówno terapeutycznych, jak i estetycznych. 
Należy zawsze używać olejków w czystej postaci, nie można ich rozcieńczać w 
innych substancjach, gdyż może to doprowadzić do zatkania rozpylacza. ● Kolejną 
prostą i efektywną drogą umożliwiającą inhalację olejkami aromatycznymi jest 
umieszczenie kilku kropel na chusteczce higienicznej lub wełnianym ubraniu i 
wdychanie oparów przez cały dzień. Jest to świetny prezent dla pacjenta 
poddanego uprzednio zabiegowi masażu. Masaż ● Przygotowując oliwkę do masażu, 
należy użyć od 1 do 2,5% olejku aromatycznego w stosunku do substancji, w 
której rozpuszczamy olejek. Wynosi to w przybliżeniu od 5 do 15 kropli na każde 
30 ml rozcieńczacza. Można również pomieszać kilka różnych olejków 
aromatycznych i rozpuścić nie więcej niż 15 kropli w substancji 
rozcieńczającej. Przygotowując aromatyzowaną oliwkę do masażu, należy pamiętać, 
że dużo nie zawsze znaczy lepiej, gdyż można osiągnąć dobre rezultaty, 
rozcieńczając mniejsze ilości olejków aromatycznych.

90

Re: AutoComplete with FiterQuery for Full content

2013-01-21 Thread Sujatha Arun

Hi Jack,

I need to filter the suggestions based on some other fields and the below
mentioned method  [Suggester] does not allow to do same.

Hence at present we have only two options for suggest  Implementation with
filters

1.Facets
2.N-Grams

as mentioned in this site .
http://www.searchworkings.org/blog/-/blogs/420845/maximized

What I have mentioned below is the ngrams approach .

Regards
Sujatha

On Tue, Jan 22, 2013 at 11:52 AM, Jack Krupansky wrote:

> It's not clear what your question or problem is. Try explaining it in
> simple English first. Autocomplete is fairly simple - no need for the
> complexity of an ngram filter.
>
> Here's an example of a suggester component and request handler based on a
> simple text field:
>
> 
>  
>suggest
>org.apache.**solr.spelling.suggest.**
> Suggester
>org.apache.**solr.spelling.suggest.tst.**
> TSTLookup
>name
>true
>  
> 
>
>  name="/suggest">
>  
>true
>**suggest
>true
>5
>**true
>  
>  
>suggest
>  
> 
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Tuesday, January 22, 2013 12:59 AM
> To: solr-user@lucene.apache.org
> Subject: AutoComplete with FiterQuery for Full content
>
>
> Hi,
>
> I need suggestion  on solr Autocomplete for Full content with Filter query.
>
> I have currently implemented this as below
>
>
>   1. Solr version 3.6.1
>   2. solr.StandardTokenizerFactory
>   3. EdgeNGramFilterFactory with maxGramSize="25" minGramSize="1"
>   4. Stored the content field
>   5. Use the Fastvectorhighter and breakiterator on WORD to return results
>
>   based on  standard analyzer with a fragsize of 20 &using the fq param as
>   required
>
> This seems to provide snippets ,but they seem like junk at times and not
> really relevant as they are pieces of sentence with search term in them .It
> could be like
> the  and ...eg: on searching river  suggestion is  - the river and
> ...which does not really make sense as a suggestion...
>
> So other options of
>
>
>   - facets support fq but cannot be used for fullcontent tokenized text
>   due to performance issue
>
>
>   1. Can we use a tool that can just extract keywords/phrases from the
>
>Full content and that can either be indexed or updated to Db and same
> can
>   be used to serve the autocomplete?
>   2. Any other methods?
>   3. Are there any opensource tools for keyword extraction? Sematext has a
>
>   commercial tool for the same.
>   4. Which would be better for Autocomplete  - DB / Index in terms of
>
>   speed /performance?
>
> Any pointers?
>
> Regards,
> Sujatha
>

AutoComplete with FiterQuery for Full content

2013-01-21 Thread Sujatha Arun

Hi,

I need suggestion  on solr Autocomplete for Full content with Filter query.

I have currently implemented this as below


   1. Solr version 3.6.1
   2. solr.StandardTokenizerFactory
   3. EdgeNGramFilterFactory with maxGramSize="25" minGramSize="1"
   4. Stored the content field
   5. Use the Fastvectorhighter and breakiterator on WORD to return results
   based on  standard analyzer with a fragsize of 20 &using the fq param as
   required

This seems to provide snippets ,but they seem like junk at times and not
really relevant as they are pieces of sentence with search term in them .It
could be like
the  and ...eg: on searching river  suggestion is  - the river and
...which does not really make sense as a suggestion...

So other options of


   - facets support fq but cannot be used for fullcontent tokenized text
   due to performance issue


   1. Can we use a tool that can just extract keywords/phrases from the
Full content and that can either be indexed or updated to Db and same can
   be used to serve the autocomplete?
   2. Any other methods?
   3. Are there any opensource tools for keyword extraction? Sematext has a
   commercial tool for the same.
   4. Which would be better for Autocomplete  - DB / Index in terms of
   speed /performance?

Any pointers?

Regards,
Sujatha

AutoComplete with FiterQuery for Full content

2013-01-21 Thread Sujatha Arun

Hi,

I need suggestion  on solr Autocomplete for Full content with Filter query.

I have currently implemented this as below


   1. Solr version 3.6.1
   2. solr.StandardTokenizerFactory
   3. EdgeNGramFilterFactory with maxGramSize="25" minGramSize="1"
   4. Stored the content field
   5. Use the Fastvectorhighter and breakiterator on WORD to return results
   based on  standard analyzer with a fragsize of 20 &using the fq param as
   required

This seems to provide snippets ,but they seem like junk at times and not
really relevant as they are pieces of sentence with search term in them .It
could be like
the  and ...eg: on searching river  suggestion is  - the river and
...which does not really make sense as a suggestion...

So other options of


   - facets support fq but cannot be used for fullcontent tokenized text
   due to performance issue


   1. Can we use a tool that can just extract keywords/phrases from the
Full content and that can either be indexed or updated to Db and same can
   be used to serve the autocomplete?
   2. Any other methods?
   3. Are there any opensource tools for keyword extraction? Sematext has a
   commercial tool for the same.
   4. Which would be better for Autocomplete  - DB / Index in terms of
   speed /performance?

Any pointers?

Regards,
Sujatha

Re: Master /Slave Architecture3.6.1

2013-01-10 Thread Sujatha Arun

Thanks,Otis..

But then what exactly is the advantage  for a master slave architecture
 for  multicore  ,when  replication has the same effect as that of a commit
and if I am going to have worse performance by moving to master/ slave over
a single server with sequential indexing?Am I missing anything?

Would it make sense to have each server act as both master and slave and
 LB the indexing and  searching requests to both servers?

Regards,
Sujatha
On Thu, Jan 10, 2013 at 8:41 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> You are going in the right direction and your assumptions are correct. In
> short, if the performance hit is too big then you simply need more ec2
> instances (some have high cpu, some memory, some disk IO ... pick wisely).
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jan 10, 2013 4:44 AM, "Sujatha Arun"  wrote:
>
> > Hi,
> >
> > Our current architecture is as follows ,
> >
> >- Single server  [ On which we do both Indexing and Searching]
> >- Solr version 3.6.1  Multicores
> >- We have several small & big indexes as cores within a webapp
> >- Our Indexing to the individual cores happen via an index queue ,due
> to
> >which at any given time ,we are indexing only to one or at most 2
> cores
> >- Also we processing our pdf's and html files externally to text files
> >before feeding it to solr
> >
> >
> > We are planning to move to the AWS using 3.6.1 and  would want to
> >
> >-  Separate the  Indexing and  Searching to separate servers as master
> >/slave .This is mainly   so that the both the activities are not
> > competing
> >for resources
> >- Also to use  Tika to process pdf and also to process html files
> >directly via solr ,which might increase the CPU load.
> >-  But ,if I set up so that all Indexing request are going to one
> server
> >sequentially and each core in slave polls the master core for index
> > changes
> >,and then issues a commit to load a new index reader,then all this
> > activity
> >might happen in parallel which will actually spike the CPU activity on
> >slave and hence will degrade the search performance?
> >
> > Is this assumption correct?Then is there any advantage other
> > than availability to this architecture ,any advice on this?
> >
> > Regards
> > Sujatha
> >
>

Master /Slave Architecture3.6.1

2013-01-10 Thread Sujatha Arun

Hi,

Our current architecture is as follows ,

   - Single server  [ On which we do both Indexing and Searching]
   - Solr version 3.6.1  Multicores
   - We have several small & big indexes as cores within a webapp
   - Our Indexing to the individual cores happen via an index queue ,due to
   which at any given time ,we are indexing only to one or at most 2 cores
   - Also we processing our pdf's and html files externally to text files
   before feeding it to solr


We are planning to move to the AWS using 3.6.1 and  would want to

   -  Separate the  Indexing and  Searching to separate servers as master
   /slave .This is mainly   so that the both the activities are not competing
   for resources
   - Also to use  Tika to process pdf and also to process html files
   directly via solr ,which might increase the CPU load.
   -  But ,if I set up so that all Indexing request are going to one server
   sequentially and each core in slave polls the master core for index changes
   ,and then issues a commit to load a new index reader,then all this activity
   might happen in parallel which will actually spike the CPU activity on
   slave and hence will degrade the search performance?

Is this assumption correct?Then is there any advantage other
than availability to this architecture ,any advice on this?

Regards
Sujatha

Language schema Template which includes appropriate language analysis field type as a separate xml file

2013-01-08 Thread Sujatha Arun

Hi ,

Our requirement is have a separate schema for every language which differs
in the field type definition for language based analysis.If I have a
standard schema which differs only in the language analysis part ,which can
be inserted by any of the 3 methods in the schema.xml as mentioned in the
following  this wiki page,
http://wiki.apache.org/solr/SolrConfigXml
 ..

Which would be best one to go with? I am using solr 3.6.1 version

1)Xinclude does not work with 3.x schema file and patch exists only for
4.x+ version

2)Includes via Document Entities - any performance degradation while
indexing / searching due to addtional parsing expected?

3) System property substitution - Can this be used  to substitute field
types in the schema?

What are the other methods if any of achieving the same?

Thanks
Sujatha

Re: solr autocomplete requirement

2012-11-19 Thread Sujatha Arun

Anyone with suggestions on this?


On Mon, Nov 19, 2012 at 10:13 PM, Sujatha Arun  wrote:

> Hi,
>
> Our requirement for auto complete is slightly complicated , We need two
> types of auto complete
>
> 1. Meta data Auto complete
> 2. Full text Content Auto complete
>
> In addition the metadata fields are multi-valued & we need to filter the
> results for certain auto-complete both types
>
> After trying different approaches like
>
> 1)Suggester  -We cannot filter results
> 2)Terms Comp - We cannot filter
> 3)Facets on Full text Content with Tokenized fields - Expensive
> 4)Same core with n-gram Indexing and storing the results and using the
> highlight component to fetch the snippet for autosuggest.
>
> The last approach  which we are leaning towards has 2 draw backs -
>
> One- it returns duplicates data as ,some meta data is the same across
> documents
> Two- words are getting truncated at character when results are returned
> with highlight
>
>
> Mitigation for the above 2 issue could be :  Remove duplicates after
>  obtaining results at Application (issue could be additional time for this)
>Use fast
> vector highlight that can help with full word snippets (could be heavy on
> the Index Size)
>
> Anybody body has any suggestion / had similar requirements with successful
> implementation?
>
> Other question ,what would be impact of serving the suggestions out of the
> same core as the one we are searching while using highlight component for
> fetching snippets.
>
> For our full text search requirements ,we are doing the highlight outside
> solr, in our application and we would be storing and using the highlight ,
> only for suggestion.
>
> Thanks
> Sujatha
>
>
>
>
>
>
>

Re: Search Suggest for full content with Filter option

2012-10-28 Thread Sujatha Arun

Any Suggestions on this?



On Sun, Oct 28, 2012 at 1:30 PM, Sujatha Arun  wrote:

> Hello,
>
> I want  suggestions for full content of several books with a filter
> that restricts suggestions to a single book .However the best options of
>  suggester and terms component  do not support filter.
>
> That leaves the facets and  ngram Indexing.I indexed entire content by
> splitting on white space as the suggestions should work for any words in
> the index, But I find this query
>
> /select?q=tes&facets=true&facet.field=ph_su&facet.prefix=tes&facet.limit=5
> extremely time consuming .This could be because of the number of unique
> terms in the full Index.
>
> For an ngram Indexing ,If were to Index the entire content as tokenized
> into a field and store the same  ,for any token for the document ,I would
> get the entire stored content as suggestion .How can I get the only the
> correct keyword as suggestion using an non-suggester based n gram Indexing
> such that it can be filtered?
>
>
> Regards
> Sujatha
>

solr -autosuggest

2012-10-25 Thread Sujatha Arun

Hi,

A  few question on Solr Auto suggest below

Q1)I tried using  the  Index based Suggest functionality with  solr 3.6.1 ,
can I combine this with  file based boosting .Currently when I specify the
index field and the sourcelocation,the file in the source location is not
considered.
Is there any way both can be used?

Q2)I saw this line where it says "Currently implemented Lookups keep their
data in memory, so unlike spellchecker data, this data is discarded on core
reload and not available until you invoke the build command, either
explicitly or implicitly during a commit."I have used the wfst lookup  and
using the index based suggestion ,I suppose that this applies to only File
based suggestion? Is this correct?


Q3) if spellcheck.onlyMorePopular=true is selected: weights are treated as
"popularity" score ,Does this mean that this is based on frequency of words
 or is this based on ranking [tf * idf...ect] ?



Regards,
Sujatha

Re: solr1.4 code Example

2012-10-09 Thread Sujatha Arun

Thanks ,that worked.

Regards
Sujatha

On Tue, Oct 9, 2012 at 5:57 PM, Iwan Hanjoyo  wrote:

> you can download the code directly from here
>
> http://www.solrenterprisesearchserver.com/
>
> http://solrenterprisesearchserver.s3-website-us-east-1.amazonaws.com/downloads/5883-solr-enterprise-search2.zip
>
> regards,
>
>
> Hanjoyo
>

Re: solr1.4 code Example

2012-10-08 Thread Sujatha Arun

did get some files by jar unpacking ,but could not get the  ones I wanted
...thanks anyway !!

On Mon, Oct 8, 2012 at 5:56 PM, Toke Eskildsen wrote:

> On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote:
> > I am unable to unzip the  5883_Code.zip file for solr 1.4 from paktpub
> site
> > .I get the error message
> >
> >   End-of-central-directory signature not found. [...]
>
> It is a corrupt ZIP-file. I'm guessing you got it from
> http://www.packtpub.com/files/code/5883_Code.zip
> I tried downloading the archive and it was indeed corrupt. You can read
> some of the files by using jar for unpacking: 'jar xvf 5883_Code.zip'.
>
> You'll need to contact packtpub to get them to fix it peroperly. A quick
> search indicates that they've had problems before:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/%
> 3c4bf66e8f.4070...@shoptimax.de%3E
>
>
>

solr1.4 code Example

2012-10-08 Thread Sujatha Arun

hi,

I am unable to unzip the  5883_Code.zip file for solr 1.4 from paktpub site
.I get the error message

  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.


any pointers?

Regards
Sujatha

Re: Merge Policy Recommendation for 3.6.1

2012-09-29 Thread Sujatha Arun

Thanks Shawn,that helps a lot .our current OS limit is set to 300,000+, I
guess, which is I heard is maximum for the OS .. not sure of the soft and
hard limits .Will check this .

Regards,
Sujatha



On Fri, Sep 28, 2012 at 8:14 PM, Shawn Heisey  wrote:

> On 9/28/2012 12:43 AM, Sujatha Arun wrote:
>
>> Hello,
>>
>> In the case where there are over 200+ cores on a single node , is it
>> recommended to go with Tiered MP with segment size of 4 ? Our Index size
>> vary from a few MB to 4 GB .
>>
>> Will there be any issue with "Too many open files " and the number of
>> indexes with respect to MP ?  At the moment we are thinking of going with
>> Tiered MP ..
>>
>> Os file limit has been set to maximum.
>>
>
> Whether or not to deviate from the standard TieredMergePolicy depends
> heavily on many factors which we do not know, but I can tell you that it's
> probably not a good idea.  That policy typically produces the best results
> in all scenarios.
>
> http://blog.mikemccandless.**com/2011/02/visualizing-**
> lucenes-segment-merges.html<http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html>
>
> On the subject of open files:  With its default configuration, a Solr 3.x
> index will have either 8 or 11 files per segment, depending on whether you
> are using termvectors.  I am completely unsure about 4.0, because I've
> never used it, but it is probably similar.  The following calculations are
> based on my experience with 3.x.
>
> With a segment limit of 4, you might expect to have only six segments
> around at any one time - the four that are being merged, the new merged
> segment, and a segment where new data is being written.  If your system
> indexes data slow enough for merges to complete before another new segment
> is created, this is indeed the most you will ever see.  If your system
> indexes data fast enough, you might actually have short-lived moments with
> 10 or 14 segments, and possibly more.
>
> Assuming some things, which lead to using the 13 segment figure:
> simultaneous indexing to multiple cores at once, with termvectors turned
> on.  With these assumptions, a 200 core Solr installation using 4 segments
> might potentially have nearly 37000 files open, but is more likely to have
> significantly less.  If you increase your merge policy segment limit, the
> numbers will go up from there.
>
> I have configured my Linux servers with a soft file limit of 49152 and a
> hard limit of 65536.  My segment limit is set to 35, and each server has a
> maximum of four active cores, which means that during heavy indexing, I can
> see over 8000 open files.
>
> What does "maximum" on the OS file limit actually mean?  Does your OS have
> a way to specify unlimited? My personal feeling is that it's a bad idea to
> run with no limits at all.  I would imagine that you need to go with a
> minimum soft limit of 65536.  Your segment limit of 4 is probably
> reasonable, unless you will be doing a lot of indexing in a very short
> amount of time.  If you are, you may want a larger limit, and a larger
> number of maximum open files.
>
> Thanks,
> Shawn
>
>

Merge Policy Recommendation for 3.6.1

2012-09-27 Thread Sujatha Arun

Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with "Too many open files " and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.

Regards
Sujatha

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun

Hi ,

Please comment on whether I should consider to move to the old Logbytesize
MP on moving to 3.6.1 from 1.3 ,as I see improvements in query performance
on optimization.

Just to mention we have a lot of indexes in multi cores as well as multiple
webapps and that's the reason we went for CFS in 1.3  to avoid the  "too
many open file issue which we had encountered.

Regards
Sujatha

On Tue, Sep 25, 2012 at 9:55 AM, Sujatha Arun  wrote:

> Any comments on this?
>
>
>
> On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun wrote:
>
>> Thanks Jack.
>>
>> so Qtime = Sum of all prepare components + sum of all process components
>> - Debug comp process/prepare time
>>
>> In 3.6.1 the process part of Query component for the following query
>> seems to take  8 times more time?  anything missing? For most queries the
>> process part of the Querycomponent seem to take more time in 3.6.1
>>
>>
>> This is *3.6.1 *
>>
>> 
>> 0
>> *33*
>> 
>> on
>> on0
>> *differential AND equations AND has AND one AND
>> solution*
>> 102.2
>>
>> *Debug Output*
>> *
>> *
>> LuceneQParser
>> 
>> 33.0
>> 
>>
>> 3.0
>> *> name="time">3.0*> name="org.apache.solr.handler.component.FacetComponent">> name="time">0.0> name="org.apache.solr.handler.component.MoreLikeThisComponent">> name="time">0.0> name="org.apache.solr.handler.component.HighlightComponent">> name="time">0.0> name="org.apache.solr.handler.component.StatsComponent">> name="time">0.0> name="org.apache.solr.handler.component.DebugComponent">> name="time">0.0
>>
>> 30.0
>> *> name="time">26.0*
>> > name="time">0.0> name="org.apache.solr.handler.component.MoreLikeThisComponent">> name="time">0.0> name="org.apache.solr.handler.component.HighlightComponent">> name="time">0.0> name="org.apache.solr.handler.component.StatsComponent">> name="time">0.0> name="org.apache.solr.handler.component.DebugComponent">> name="time">4.0
>>
>> *Same query in solr 1.3*
>> *
>> *
>> 
>> 0
>> *6*
>> 
>> onon
>> 0differential AND equations AND has
>> AND one AND solution
>> 102.2
>>
>> Debug Info
>>
>> 
>> 6.0
>> 
>> 1.0
>> *> name="time">1.0*
>> > name="time">0.0> name="org.apache.solr.handler.component.MoreLikeThisComponent">> name="time">0.0> name="org.apache.solr.handler.component.HighlightComponent">> name="time">0.0> name="org.apache.solr.handler.component.DebugComponent">> name="time">0.0
>>
>> 
>> 5.0
>> *> name="time">3.0*
>> > name="time">0.0> name="org.apache.solr.handler.component.MoreLikeThisComponent">> name="time">0.0> name="org.apache.solr.handler.component.HighlightComponent">> name="time">0.0> name="org.apache.solr.handler.component.DebugComponent">> name="time">2.0
>>
>>
>>
>> *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky > > wrote:*
>>
>>> Run a query on both old and new with &debugQuery=true on your query
>>> request and look at the component timings for possible insight.
>>>
>>> -- Jack Krupansky
>>>
>>> From: Sujatha Arun
>>> Sent: Monday, September 24, 2012 7:26 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1
>>>
>>> Hi,
>>>
>>> On migrating from 1.3 to 3.6.1  , I see the query performance degrading
>>> by nearly 2 times for all types of query.  Indexing performance slight
>>> degradation over 1.3 For Indexing we use our custom scripts that post xml
>>> over HTTP.
>>>
>>> Any thing that I might have missed . I am thinking that this might be
>>> due to new Tiered MP over LogByteSize creating more segment files and hence
>>> more the Query latency .We are using Compound Files in 1.3  and I have set
>>> this to true even in 3.6.1 ,but results in more segement files
>>>
>>> On optimizing the query response time improved beyond 1.3  .So could it
>>> be the MP or am i missing something here . Do let me know
>>>
>>> Please find attached the solrconfig.xml
>>>
>>> Regards
>>> Sujatha
>>>
>>
>>
>

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun

Any comments on this?



On Mon, Sep 24, 2012 at 10:28 PM, Sujatha Arun  wrote:

> Thanks Jack.
>
> so Qtime = Sum of all prepare components + sum of all process components -
> Debug comp process/prepare time
>
> In 3.6.1 the process part of Query component for the following query seems
> to take  8 times more time?  anything missing? For most queries the process
> part of the Querycomponent seem to take more time in 3.6.1
>
>
> This is *3.6.1 *
>
> 
> 0
> *33*
> 
> on
> on0
> *differential AND equations AND has AND one AND
> solution*
> 102.2
>
> *Debug Output*
> *
> *
> LuceneQParser
> 
> 33.0
> 
>
> 3.0
> * name="time">3.0* name="org.apache.solr.handler.component.FacetComponent"> name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.StatsComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">0.0
>
> 30.0
> * name="time">26.0*
>  name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.StatsComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">4.0
>
> *Same query in solr 1.3*
> *
> *
> 
> 0
> *6*
> 
> onon
> 0differential AND equations AND has
> AND one AND solution
> 102.2
>
> Debug Info
>
> 
> 6.0
> 
> 1.0
> * name="time">1.0*
>  name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">0.0
>
> 
> 5.0
> * name="time">3.0*
>  name="time">0.0 name="org.apache.solr.handler.component.MoreLikeThisComponent"> name="time">0.0 name="org.apache.solr.handler.component.HighlightComponent"> name="time">0.0 name="org.apache.solr.handler.component.DebugComponent"> name="time">2.0
>
>
>
> *On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky 
> wrote:
> *
>
>> Run a query on both old and new with &debugQuery=true on your query
>> request and look at the component timings for possible insight.
>>
>> -- Jack Krupansky
>>
>> From: Sujatha Arun
>> Sent: Monday, September 24, 2012 7:26 AM
>> To: solr-user@lucene.apache.org
>> Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1
>>
>> Hi,
>>
>> On migrating from 1.3 to 3.6.1  , I see the query performance degrading
>> by nearly 2 times for all types of query.  Indexing performance slight
>> degradation over 1.3 For Indexing we use our custom scripts that post xml
>> over HTTP.
>>
>> Any thing that I might have missed . I am thinking that this might be due
>> to new Tiered MP over LogByteSize creating more segment files and hence
>> more the Query latency .We are using Compound Files in 1.3  and I have set
>> this to true even in 3.6.1 ,but results in more segement files
>>
>> On optimizing the query response time improved beyond 1.3  .So could it
>> be the MP or am i missing something here . Do let me know
>>
>> Please find attached the solrconfig.xml
>>
>> Regards
>> Sujatha
>>
>
>

Re: Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun

Thanks Jack.

so Qtime = Sum of all prepare components + sum of all process components -
Debug comp process/prepare time

In 3.6.1 the process part of Query component for the following query seems
to take  8 times more time?  anything missing? For most queries the process
part of the Querycomponent seem to take more time in 3.6.1


This is *3.6.1 *


0
*33*

on
on0
*differential AND equations AND has AND one AND solution
*
102.2

*Debug Output*
*
*
LuceneQParser

33.0


3.0
*3.0*0.00.00.00.00.0

30.0
*26.0*
0.00.00.00.04.0

*Same query in solr 1.3*
*
*

0
*6*

onon
0differential AND equations AND has
AND one AND solution
102.2

Debug Info


6.0

1.0
*1.0*
0.00.00.00.0


5.0
*3.0*
0.00.00.02.0



*On Mon, Sep 24, 2012 at 7:35 PM, Jack Krupansky wrote:
*

> Run a query on both old and new with &debugQuery=true on your query
> request and look at the component timings for possible insight.
>
> -- Jack Krupansky
>
> From: Sujatha Arun
> Sent: Monday, September 24, 2012 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: Performance Degradation on Migrating from 1.3 to solr 3.6.1
>
> Hi,
>
> On migrating from 1.3 to 3.6.1  , I see the query performance degrading by
> nearly 2 times for all types of query.  Indexing performance slight
> degradation over 1.3 For Indexing we use our custom scripts that post xml
> over HTTP.
>
> Any thing that I might have missed . I am thinking that this might be due
> to new Tiered MP over LogByteSize creating more segment files and hence
> more the Query latency .We are using Compound Files in 1.3  and I have set
> this to true even in 3.6.1 ,but results in more segement files
>
> On optimizing the query response time improved beyond 1.3  .So could it be
> the MP or am i missing something here . Do let me know
>
> Please find attached the solrconfig.xml
>
> Regards
> Sujatha
>

Performance Degradation on Migrating from 1.3 to solr 3.6.1

2012-09-24 Thread Sujatha Arun

Hi,

On migrating from 1.3 to 3.6.1  , I see the query performance degrading by
nearly 2 times for all types of query.  Indexing performance slight
degradation over 1.3 For Indexing we use our custom scripts that post xml
over HTTP.

Any thing that I might have missed . I am thinking that this might be due
to new Tiered MP over LogByteSize creating more segment files and hence
more the Query latency .We are using Compound Files in 1.3  and I have set
this to true even in 3.6.1 ,but results in more segement files

On optimizing the query response time improved beyond 1.3  .So could it be
the MP or am i missing something here . Do let me know

Please find attached the solrconfig.xml

Regards
Sujatha






${solr.abortOnConfigurationError:true}

  LUCENE_36
  
  
  


 

   true  
   4
  
 1 
 1000
  
 
 
   
  



 
  single


 
  
  
   
  
10


  
  
  
   
  
   true
   true
   50
   200
 
   
 

false
2
 

  


  
  
  
  
  
   

 
   explicit
   
 
 
  
  
 
  
   
   
  
   
  
   
   
  
  

  solrpingquery


  all

  
  
  
  

 explicit 
 true

  
   
 
   


5
   
  
   
  
solr

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-19 Thread Sujatha Arun

Thanks Jack . Yes, this seems so !!

However I would like to fix this at code level by setting the noCFSRatio to
1.0 . But in solr 3.6.1 i am not able to find the build.xml file .
I suppose the build process has been changed since 1.3 ,can you throw some
light on how I can build source code post this change .

In 1.3  , I used to change the code from src file and compile and build
from the same directory as the build.xml file ,however  all files seem to
be jarred now .Any pointers?

Regards
Sujatha

On Thu, Sep 20, 2012 at 5:36 AM, Jack Krupansky wrote:

> You may simply be encountering the situation where the merge size is
> greater than 10% of the index size, as per this comment in the code:
>
> /** If a merged segment will be more than this percentage
> *  of the total size of the index, leave the segment as
> *  non-compound file even if compound file is enabled.
> *  Set to 1.0 to always use CFS regardless of merge
> *  size.  Default is 0.1. */
> public void setNoCFSRatio(double noCFSRatio) {
>
> Unfortunately there currently is no way for you to set the ratio higher in
> Solr.
>
> LogMergePolicy has the same issue.
>
> There should be some wiki doc for this, but I couldn't find any.
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Tuesday, September 18, 2012 10:00 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Compond File Format Advice needed - On migration to 3.6.1
>
>
> anybody?
>
> On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun 
> wrote:
>
>  Hi ,
>>
>> The default Index file creation format in 3.6.1 [migrating from 1.3]
>>  in-spite of setting the usecompoundfile to true seems to be to create non
>> compound files due to Lucene 2790<https://issues.apache.**
>> org/jira/browse/LUCENE-2790<https://issues.apache.org/jira/browse/LUCENE-2790>
>> >
>>
>> .
>>
>> I have tried the following ,but everything seems to create non compound
>> files ..
>>
>>
>>- set  compound file format to true
>>- used the TieredMergePolicy, did not change maxMergeAtOnce and
>>segmentsPerTier.
>>- switched back to LogByteSizeMergePolicy but this also creates non
>>
>>compound files
>>
>> We are in a situation where we have several cores and hence several
>> indexes ,and do not want to run into too many open files error. What can
>> be
>> done to switch to compound file format from the beginning or will this
>> TiredMergepolicy lead us to too many open files eventually?
>>
>> Regards
>> Sujatha
>>
>>
>

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun

Thanks Lance.

Did try going back to LogByteSizeMergePolicy,which we using with 1.3,and
with  usecompoundfile =true ,but even then this ,leads to non compound
index file format.

These seems no Config way to go back truly compound Index file format.

Regards
Sujatha

On Wed, Sep 19, 2012 at 9:33 AM, Lance Norskog  wrote:

> 1) Use fewer segments.
> 2) Start the service with a higher limit on the number of open files. It
> used to be that the kernel allocated fixed resources for maximum number,
> but that is no longer true. This is not really an important limit.
> 3) That Lucene issue was closed in 3.1. This must be some other problem.
> 4) You can pick another merge policy if TieredMergePolicy does not work
> for you.
>
> - Original Message -----
> | From: "Sujatha Arun" 
> | To: solr-user@lucene.apache.org
> | Sent: Tuesday, September 18, 2012 10:12:13 AM
> | Subject: Compond File Format Advice needed - On migration to 3.6.1
> |
> | Hi ,
> |
> | The default Index file creation format in 3.6.1 [migrating from 1.3]
> |  in-spite of setting the usecompoundfile to true seems to be to
> |  create non
> | compound files due to Lucene
> | 2790<https://issues.apache.org/jira/browse/LUCENE-2790>
> | .
> |
> | I have tried the following ,but everything seems to create non
> | compound
> | files ..
> |
> |
> |- set  compound file format to true
> |- used the TieredMergePolicy, did not change maxMergeAtOnce and
> |segmentsPerTier.
> |- switched back to LogByteSizeMergePolicy but this also creates
> |non
> |compound files
> |
> | We are in a situation where we have several cores and hence several
> | indexes
> | ,and do not want to run into too many open files error. What can be
> | done to
> | switch to compound file format from the beginning or will this
> | TiredMergepolicy lead us to too many open files eventually?
> |
> | Regards
> | Sujatha
> |
>

Re: Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun

anybody?

On Tue, Sep 18, 2012 at 10:42 PM, Sujatha Arun  wrote:

> Hi ,
>
> The default Index file creation format in 3.6.1 [migrating from 1.3]
>  in-spite of setting the usecompoundfile to true seems to be to create non
> compound files due to Lucene 
> 2790<https://issues.apache.org/jira/browse/LUCENE-2790>
> .
>
> I have tried the following ,but everything seems to create non compound
> files ..
>
>
>- set  compound file format to true
>- used the TieredMergePolicy, did not change maxMergeAtOnce and
>segmentsPerTier.
>- switched back to LogByteSizeMergePolicy but this also creates non
>compound files
>
> We are in a situation where we have several cores and hence several
> indexes ,and do not want to run into too many open files error. What can be
> done to switch to compound file format from the beginning or will this
> TiredMergepolicy lead us to too many open files eventually?
>
> Regards
> Sujatha
>

Compond File Format Advice needed - On migration to 3.6.1

2012-09-18 Thread Sujatha Arun

Hi ,

The default Index file creation format in 3.6.1 [migrating from 1.3]
 in-spite of setting the usecompoundfile to true seems to be to create non
compound files due to Lucene
2790
.

I have tried the following ,but everything seems to create non compound
files ..


   - set  compound file format to true
   - used the TieredMergePolicy, did not change maxMergeAtOnce and
   segmentsPerTier.
   - switched back to LogByteSizeMergePolicy but this also creates non
   compound files

We are in a situation where we have several cores and hence several indexes
,and do not want to run into too many open files error. What can be done to
switch to compound file format from the beginning or will this
TiredMergepolicy lead us to too many open files eventually?

Regards
Sujatha

Re: 3.6.1 unable to create compound Index files?

2012-09-18 Thread Sujatha Arun

Hi,

Just discoved that this seems to dependent on *noCFSRatio   [Lucene 2790] .*

If I need to make it by default to usecompound  File format ,then where
should I change , can this be changed only at code level or is there any
config setting which allows me to specify that it should always be compound
format ,irrespective of *noCFSRatio .*
*
*
Regards
Sujatha

On Tue, Sep 18, 2012 at 11:30 AM, Sujatha Arun  wrote:

> Hi,
>
> I am unable to create compound Index format in 3.6.1 inspite of setting
>   as true. I do not see any .cfs file ,instead all the
> .fdx,frq etc are seen and I see the segments_8 even though the mergefactor
> is at 4 . Should I not see only 4 segment files at any time?
>
> Please find attached schema and let me know if I have missed anything.
>
> Regards,
> Sujatha
>

3.6.1 unable to create compound Index files?

2012-09-17 Thread Sujatha Arun

Hi,

I am unable to create compound Index format in 3.6.1 inspite of setting
  as true. I do not see any .cfs file ,instead all the
.fdx,frq etc are seen and I see the segments_8 even though the mergefactor
is at 4 . Should I not see only 4 segment files at any time?

Please find attached schema and let me know if I have missed anything.

Regards,
Sujatha






${solr.abortOnConfigurationError:true}

  LUCENE_36
  
  
  


   
 1 
 1000
 true 
 4 
 

 
 
  single


 
  
  
   
  
10


  
  
  
   
  
   true
   true
   50
   200
   
   
  

  
   
  
 

false
2
 

  

 

  
  
  
  
   

 
   explicit
   
 
 
  
   

  
   

  standard
  solrpingquery
  all

  
  
  
  

 explicit 
 true

  
  
   



5
   
  
   
  
solr

Re: 1.3 to 3.6 migration

2012-09-17 Thread Sujatha Arun

Hi Jack,

Thanks.

Even though I have mentioned compound  Index to true in the Indexconfig
section of schema for 3.6 version ,it still seems to create normal Index
files.

Attached is the solrconfig.xml

Please let me know if anything wrong

Regards
Sujatha

On Sat, Sep 15, 2012 at 9:43 PM, Jack Krupansky wrote:

> Correcting myself, for #4, Solr doesn't "analyze" string fields such as
> the unique key field, but... a transformer or other logic, say in DIH, that
> constructs the document key values might behave differently between Solr
> 1.3 and 3.6. Maybe there was a bug in 1.3 that caused distinct keys to map
> to the same value (causing documents to be discarded), but now in 3.6 the
> mapping is correct and distinct (and more documents are correctly indexed.
>
> -- Jack Krupansky
>
> -Original Message- From: Jack Krupansky
> Sent: Saturday, September 15, 2012 10:34 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: 1.3 to 3.6 migration
>
> Try some queries in both the old and the new and identify some documents
> that appear in one and not the other. Then examine a couple of those docs
> in
> detail one field at a time and see if anything is suspicious. Take each
> field value and enter it into the Solr Admin Analysis page to see how Solr
> 3.6 analyzes the field value compared to 1.3.
>
> Four likely scenarios:
> 1. The additional docs were not present when you indexed with 1.3.
> 2. Your indexing tool (DIH, or whatever) may have discarded  the docs in
> 1.3
> due to some issue that has now been resolved.
> 3. Solr 1.3 got an error those documents but your indexing process
> continued
> despite the error, while Solr 3.6 may not have hit those errors, possibly
> because it is more flexible and has more features now.
> 4. Your key values analyze differently in Solr 3.6 so that the keys of the
> extra documents mapped to other existing keys in Solr 1.3, causing the
> "extra" documents to overwrite existing documents in Solr 1.3.
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Saturday, September 15, 2012 2:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: 1.3 to 3.6 migration
>
> Can you please elaborate?
>
> Regards
> Sujatha
>
> On Sat, Sep 15, 2012 at 1:34 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>  Hi,
>>
>> Maybe your indexer is different/modified/buggy?
>>
>> Otis
>> --
>> Search Analytics - 
>> http://sematext.com/search-**analytics/index.html<http://sematext.com/search-analytics/index.html>
>> Performance Monitoring - 
>> http://sematext.com/spm/index.**html<http://sematext.com/spm/index.html>
>>
>>
>> On Fri, Sep 14, 2012 at 3:23 PM, Sujatha Arun 
>> wrote:
>> > Hi,
>> >
>> > Just migrated to 3.6.1 from 1.3 version with the following observation
>> >
>> > Indexed content using the same source
>> >
>> >1.3
>> > 3.6.1
>> >  Number of documents indexed 11505  13937
>> > Index Time  - Full Index 170ms
>> 171ms
>> > Index size 23 MB
>> > 31MB
>> > Query Time [first time] for *:*  44 ms
>>  187
>> >
>> > and *:* query is not cached in 3.6.1 in query result cache ,is this
>> > expected?
>> >
>> > some points:
>> >
>> > Even though I used the same data source ,the number of documents indexed
>> > seem to be more in 3.6.1 [ not sure why?]
>> > All the other params including index size and query time seem to be more
>> > instnead of less in 3.6.1 and  queries are not getting cached in 3.6.1
>> >
>> > Attached the schema's - any pointers?
>> >
>> > Regards
>> > Sujatha
>> >
>>
>>






${solr.abortOnConfigurationError:true}

  LUCENE_36
  
  
  


   
true

  4
 32
 2147483647
 1
 1000
 1
  single


 
  
  
   
  
10


  
  
  
   
  
   true
   true
   50
   200
   
   
  

  
   
  
 

false
2
 

  

 

  
  
  
  
   

 
   explicit
   
 
 
  
   

  
   

  standard
  solrpingquery
  all

  
  
  
  

 explicit 
 true

  
  
   



5
   
  
   
  
solr

Re: 1.3 to 3.6 migration

2012-09-14 Thread Sujatha Arun

Can you please elaborate?

Regards
Sujatha

On Sat, Sep 15, 2012 at 1:34 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Maybe your indexer is different/modified/buggy?
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Fri, Sep 14, 2012 at 3:23 PM, Sujatha Arun  wrote:
> > Hi,
> >
> > Just migrated to 3.6.1 from 1.3 version with the following observation
> >
> > Indexed content using the same source
> >
> >1.3
> > 3.6.1
> >  Number of documents indexed 11505  13937
> > Index Time  - Full Index 170ms
> 171ms
> > Index size 23 MB
> > 31MB
> > Query Time [first time] for *:*  44 ms
>  187
> >
> > and *:* query is not cached in 3.6.1 in query result cache ,is this
> > expected?
> >
> > some points:
> >
> > Even though I used the same data source ,the number of documents indexed
> > seem to be more in 3.6.1 [ not sure why?]
> > All the other params including index size and query time seem to be more
> > instnead of less in 3.6.1 and  queries are not getting cached in 3.6.1
> >
> > Attached the schema's - any pointers?
> >
> > Regards
> > Sujatha
> >
>

Re: 3.6.1 - Suggester and spellcheker Implementation

2012-09-14 Thread Sujatha Arun

Thanks . :(

Regards
Sujatha

On Thu, Sep 13, 2012 at 2:28 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi Sujatha,
>
> No, suggester and spellchecker are separate beasts.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Wed, Sep 12, 2012 at 3:18 PM, Sujatha Arun  wrote:
> > Hi ,
> >
> > If I am looking to implement Suggester Implementation with 3.6.1 ,I
> beleive
> > this creates it own index , now If I want to also use the spellcheck
>  also
> > ,would it be using the same index as suggester?
> >
> > Regards
> > Sujatha
>

1.3 to 3.6 migration

2012-09-14 Thread Sujatha Arun

Hi,

Just migrated to 3.6.1 from 1.3 version with the following observation

Indexed content using the same source

   *1.3
 3.6.1*
 Number of documents indexed 11505  13937
Index Time  - Full Index 170ms 171ms
Index size 23 MB
 31MB
Query Time [first time] for *:*  44 ms  187

and *:* query is not cached in 3.6.1 in query result cache ,is this
expected?

some points:

Even though I used the same data source ,the number of documents indexed
seem to be more in 3.6.1 [ not sure why?]
All the other params including index size and query time seem to be more
instnead of less in 3.6.1 and  queries are not getting cached in 3.6.1

Attached the schema's - any pointers?

Regards
Sujatha






  

  









 
   
 
 
 
   
  
 
 
 
   

 































  

  




  








  
  







  





  







  




  



  




  








  





 



   

	
	
	
	
	
	

	
	


	
	
	
	


	
	
	
	
	
	
	
	
	
	

	
	
	

	
	
	
	


	
	
	
	
	
	
	
	
	

	
   
  

   
   


   
   
   
   
   
 

 
  id

 
 content

 
 

  













 
 
 
 







  
  
  





 
   
 
 
 
   
  
 
 
 
   

 
 
  
  
  
 
 
 






 

  

  

 
  








  
  







  


  


 
	
	
	
	
	

	
	


	
	
	
	


	
	
	
	
	
	
	
	
	
	

	
	
	
	

	
	
	
	


	
	
	
	
	
	
	
	
	

	
   
  

   
   


   
   
   
   
   
 

 
 id

 
 content

3.6.1 - Suggester and spellcheker Implementation

2012-09-12 Thread Sujatha Arun

Hi ,

If I am looking to implement Suggester Implementation with 3.6.1 ,I beleive
this creates it own index , now If I want to also use the spellcheck  also
,would it be using the same index as suggester?

Regards
Sujatha

Re: Version Migration from solr 1.3

2012-09-07 Thread Sujatha Arun

I see that  4.0 alpha has been release after 3.6.1 , so should I look at
3.5 as the most stable release currently?

Version Source :
https://issues.apache.org/jira/browse/SOLR?selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel

Regards
Sujatha

On Fri, Sep 7, 2012 at 11:17 PM, Sujatha Arun  wrote:

> Hi ,
>
> If we are migrating from 1.3 ,which is the current stable version that we
> should be looking at  3.6.1  or  3.6.2 ?
>
> Regards
> Sujatha
>

Solr Cache

2012-09-01 Thread Sujatha Arun

if we pre assigning the cache values in the solrconfig.xml as follows for
each core [solr 1.3] .IS the RAM memory pre assigned as follows per core
and how much would be used for each  .Would I be able to calculate the RAM
reserved for each cache.since we have many such cores with the same Cache
values ,would  like to forecast the usage.
we have assinged total 19gb RAm for 220 cores with the same config and the
following is the exampe of one core usage.

** documentCache  *class: * org.apache.solr.search.LRUCache  *
version: * 1.0  *description: * LRU Cache(maxSize=16384, initialSize=16384)
**lookups : 6668
hits : 2451
hitratio : 0.36
inserts : 4245
evictions : 0
size : 4245
warmupTime : 0
cumulative_lookups : 552526
cumulative_hits : 234707
cumulative_hitratio : 0.42
cumulative_inserts : 317819
cumulative_evictions : 29553

 filterCache  *class: * org.apache.solr.search.LRUCache
*version: *1.0
*description: *   LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=4096,
**lookups : 75
hits : 68
hitratio : 0.90
inserts : 672
evictions : 0
size : 672
warmupTime : 24030
cumulative_lookups : 3459
cumulative_hits : 2730
cumulative_hitratio : 0.78
cumulative_inserts : 729
cumulative_evictions : 0
 queryResultCache  *class: * org.apache.solr.search.LRUCache  *
version: * 1.0  *description: * LRU Cache(maxSize=16384, initialSize=4096,
autowarmCount=1024,
 lookups : 96
hits : 23
hitratio : 0.23
inserts : 1099
evictions : 0
size : 1097
warmupTime : 1042045
cumulative_lookups : 5214
cumulative_hits : 2081
cumulative_hitratio : 0.39
cumulative_inserts : 3147
cumulative_evictions : 0

Regards
Sujatha

Re: Patch 2429 for solr1.3?

2012-08-30 Thread Sujatha Arun

Thanks Erick ,Local params are  there from solr 1.4

Regards
Sujatha

On Thu, Aug 30, 2012 at 5:26 PM, Erick Erickson wrote:

> I very much doubt it, all I can say is "give it a try and see". But those
> code lines are so divergent that I would be extremely surprised if
> you could even apply the patch, much less get it to work.
>
> If you _do_ try it, there'll be no support, you're on your own.
>
> But I suspect that if you simply try to apply the patch it'll fail
> completely and you won't even be able to compile. 2429
> depends, for instance, on local params and I'm not
> even sure they exist in 1.3.
>
> Good luck
> Erick
>
> On Thu, Aug 30, 2012 at 12:30 AM, Sujatha Arun 
> wrote:
> > Can we use the patch 2429 in solr 1.3?
> >
> > Regards
> > Sujatha
>

Patch 2429 for solr1.3?

2012-08-29 Thread Sujatha Arun

Can we use the patch 2429 in solr 1.3?

Regards
Sujatha

Re: Muticore Sharding

2012-08-17 Thread Sujatha Arun

Hi,

This is the parsed query string which *returns 3331234 documents*

content:elena content:read content:a
content:rate content:r content:page content:per content:minut content:a
content:total content:m content:minut re

and  a Boolean "AND " query of the above query  takes 410 ms and *returns
just 1 result*

+content:elena +content:read +content:a
+content:rate +content:r +content:page +content:per +content:minut
+content:a +content:total +content:m +content:minut

so the main difference being the number of results returned  ,so is this
i/0 issue or is this time expected given the number of documents returned.

Also what is the difference between Qtime and  timing shown as a part of
Debugquery?

Regards
Sujatha

Regards
Sujatha



On Fri, Aug 17, 2012 at 8:35 PM, Jack Krupansky wrote:

> First, provide us with the additional info Erik requested in his last
> reply.
>
> Then, can you provide a snippet from your Solr log file that shows a
> couple of queries. Maybe there is something else going on or some
> exceptions, or something. Or at least to show us the quey times in context.
>
> Finally, has this been happening from the very beginning, or did it begin
> to grow gradually and steadily, or did it just suddenly start to happen
> even though it had been fine just a few minutes earlier? If the latter, had
> there been and system or JVM or Solr configuration changes between the time
> queries were fast and when they became slow?
>
> Thanks.
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Friday, August 17, 2012 10:50 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Muticore Sharding
>
>
> Sorry typo ,I meant upwards of 20s at any time. What should I be looking
> at?
>
> Regards
> Sujatha
>
> On Fri, Aug 17, 2012 at 8:09 PM, Sujatha Arun  wrote:
>
>  Erik,
>>
>> What could be the issue  Load / I/O ? It seems to shows upwards of 20 ms
>> at any time
>>
>> Regards
>> Sujatha
>>
>>
>> On Fri, Aug 17, 2012 at 6:18 PM, Erik Hatcher > >wrote:
>>
>>  Sujatha - that query debug output shows only 218ms, so it isn't
>>> representative of the issue you're reporting.
>>>
>>> Also, what's the query parse output?   I imagine you're doing a boolean
>>> OR query across all those terms (include "a"), yes?  Maybe you'd rather
>>> the
>>> operator be AND?
>>>
>>> Erik
>>>
>>>
>>> On Aug 17, 2012, at 08:01 , Sujatha Arun wrote:
>>>
>>> > No customization,its the default standard request handler.Solr Version
>>> is
>>> > 1.3
>>> >
>>> > "a" is not there in stop words
>>> >
>>> > Server Load ,i presume is not there , but not too sure ,not checked.
>>> >
>>> > RAM :
>>> >
>>> > TOTAL RAM :48GB
>>> > RAM to  JVM :18 GB ,Permgen =2GB
>>> > TOTAL INDEX SIZE of all the  multicore Instances =23GB
>>> >
>>> > Timing [Cut & paste ] , I have never looked at this before
>>> >
>>> > **OldLuceneQParser>> > name="time">218.0>> > name="time">6.0>> > name="org.apache.solr.handler.**component.QueryComponent"><**double
>>> > name="time">5.0**>> > name="org.apache.solr.handler.**component.FacetComponent"><**double
>>> > name="time">0.0**>> > name="org.apache.solr.handler.**component.**
>>> MoreLikeThisComponent">>> > name="time">0.0**>> > name="org.apache.solr.handler.**component.HighlightComponent">**
>>> >> > name="time">0.0**>> > name="org.apache.solr.handler.**component.DebugComponent"><**double
>>> > name="time">0.0**>> > name="time">211.0>> > name="org.apache.solr.handler.**component.QueryComponent"><**double
>>> > name="time">0.0**>> > name="org.apache.solr.handler.**component.FacetComponent"><**double
>>> > name="time">0.0**>> > name="org.apache.solr.handler.**component.**
>>> MoreLikeThisComponent">>> > name="time">0.0**>> > name="org.apache.solr.handler.**component.HighlightComponent">**
>>> >> > name="time">0.0**>> > name="org.apache.solr.handler.**component.DebugComponent"><**double
>>> > name="time&qu

Re: Muticore Sharding

2012-08-17 Thread Sujatha Arun

Sorry typo ,I meant upwards of 20s at any time. What should I be looking at?

Regards
Sujatha

On Fri, Aug 17, 2012 at 8:09 PM, Sujatha Arun  wrote:

> Erik,
>
> What could be the issue  Load / I/O ? It seems to shows upwards of 20 ms
> at any time
>
> Regards
> Sujatha
>
>
> On Fri, Aug 17, 2012 at 6:18 PM, Erik Hatcher wrote:
>
>> Sujatha - that query debug output shows only 218ms, so it isn't
>> representative of the issue you're reporting.
>>
>> Also, what's the query parse output?   I imagine you're doing a boolean
>> OR query across all those terms (include "a"), yes?  Maybe you'd rather the
>> operator be AND?
>>
>> Erik
>>
>>
>> On Aug 17, 2012, at 08:01 , Sujatha Arun wrote:
>>
>> > No customization,its the default standard request handler.Solr Version
>> is
>> > 1.3
>> >
>> > "a" is not there in stop words
>> >
>> > Server Load ,i presume is not there , but not too sure ,not checked.
>> >
>> > RAM :
>> >
>> > TOTAL RAM :48GB
>> > RAM to  JVM :18 GB ,Permgen =2GB
>> > TOTAL INDEX SIZE of all the  multicore Instances =23GB
>> >
>> > Timing [Cut & paste ] , I have never looked at this before
>> >
>> > OldLuceneQParser> > name="time">218.0> > name="time">6.0> > name="org.apache.solr.handler.component.QueryComponent">> > name="time">5.0> > name="org.apache.solr.handler.component.FacetComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.MoreLikeThisComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.HighlightComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.DebugComponent">> > name="time">0.0> > name="time">211.0> > name="org.apache.solr.handler.component.QueryComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.FacetComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.MoreLikeThisComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.HighlightComponent">> > name="time">0.0> > name="org.apache.solr.handler.component.DebugComponent">> > name="time">211.0
>> >
>> > Regards
>> > Sujatha
>> >
>> > On Fri, Aug 17, 2012 at 4:11 PM, Erik Hatcher > >wrote:
>> >
>> >> Just over 4M docs... no need to shard.
>> >>
>> >> Both your queries contain what is likely very common terms (content:a
>> in
>> >> the first one and content:1 in the second). Generally these are "stop
>> >> words" and removed either during indexing or querying, but I guess not
>> in
>> >> your case.
>> >>
>> >> What's your "standard" request handler look like?  Doing anything
>> custom
>> >> in there?
>> >>
>> >> What are the timings of the components in the debugQuery=true response?
>> >>
>> >> Is the server under load when you're issuing these queries?   What
>> about
>> >> RAM?
>> >>
>> >> Again, what version of Solr?
>> >>
>> >>Erik
>> >>
>> >>
>> >>
>> >>
>> >> On Aug 17, 2012, at 06:35 , Sujatha Arun wrote:
>> >>
>> >>> Hi Erick,
>> >>>
>> >>> The number of documents is : 4389048
>> >>>
>> >>> I have given 2 queries below with timing and the number of hits
>> >>>
>> >>> INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
>> >>>
>> >>
>> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:if+content:elena+content:reads+content:at+content:a+content:rate+content:of+content:r+content:pages+content:per+content:minute+content:for+content:a+content:total+content:of+content:m+content:minutes}
>> >>> hits=3331109 status=0 QTime=22677
>> >>>
>> >>> INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
>> >>>
>> >>
>> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:chapter+content:1}
>> >>> hits=1919445 status=0 QTime=6677
>> >>>
>> >>>
>> >>> Any time I execute the time taken to execute the first query is  >
>> than
>> >>> 22secs ,now it took 33 secs to execute.
>> >>>
>> >>> Regards
>> >>> Sujatha
>> >>>
>> >>>
>> >>> On Fri, Aug 17, 2012 at 3:30 PM, Erik Hatcher > >>> wrote:
>> >>>
>> >>>> How many documents do  you have?   What are the queries?  I'd guess
>> your
>> >>>> query complexity (or load) is to blame here, not index size.  What
>> >> version
>> >>>> of Solr?
>> >>>>
>> >>>> Until you know what is causing the slow queries, sharding is not
>> >> something
>> >>>> to consider I'd say.  But yes, you would want to reindex to
>> distribute
>> >> the
>> >>>> documents.
>> >>>>
>> >>>>   Erik
>> >>>>
>> >>>>
>> >>>> On Aug 17, 2012, at 02:21 , Sujatha Arun wrote:
>> >>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> One of the Index in a multicore set up has a 3GB+ index ,and it
>> seems
>> >> to
>> >>>>> take around 5000ms+ to return simple boolean queries . The Index is
>> >>>>> not optimized
>> >>>>> Would it make sense to shard the index as cores in the same server
>> to
>> >>>>> expect better response time?
>> >>>>>
>> >>>>> do I have to re index all over again to distribute the documents
>> >> between
>> >>>> 2
>> >>>>> shards ?
>> >>>>>
>> >>>>> Regards
>> >>>>> Sujatha
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Muticore Sharding

2012-08-17 Thread Sujatha Arun

Erik,

What could be the issue  Load / I/O ? It seems to shows upwards of 20 ms at
any time

Regards
Sujatha

On Fri, Aug 17, 2012 at 6:18 PM, Erik Hatcher wrote:

> Sujatha - that query debug output shows only 218ms, so it isn't
> representative of the issue you're reporting.
>
> Also, what's the query parse output?   I imagine you're doing a boolean OR
> query across all those terms (include "a"), yes?  Maybe you'd rather the
> operator be AND?
>
> Erik
>
>
> On Aug 17, 2012, at 08:01 , Sujatha Arun wrote:
>
> > No customization,its the default standard request handler.Solr Version is
> > 1.3
> >
> > "a" is not there in stop words
> >
> > Server Load ,i presume is not there , but not too sure ,not checked.
> >
> > RAM :
> >
> > TOTAL RAM :48GB
> > RAM to  JVM :18 GB ,Permgen =2GB
> > TOTAL INDEX SIZE of all the  multicore Instances =23GB
> >
> > Timing [Cut & paste ] , I have never looked at this before
> >
> > OldLuceneQParser > name="time">218.0 > name="time">6.0 > name="org.apache.solr.handler.component.QueryComponent"> > name="time">5.0 > name="org.apache.solr.handler.component.FacetComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.MoreLikeThisComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.HighlightComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.DebugComponent"> > name="time">0.0 > name="time">211.0 > name="org.apache.solr.handler.component.QueryComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.FacetComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.MoreLikeThisComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.HighlightComponent"> > name="time">0.0 > name="org.apache.solr.handler.component.DebugComponent"> > name="time">211.0
> >
> > Regards
> > Sujatha
> >
> > On Fri, Aug 17, 2012 at 4:11 PM, Erik Hatcher  >wrote:
> >
> >> Just over 4M docs... no need to shard.
> >>
> >> Both your queries contain what is likely very common terms (content:a in
> >> the first one and content:1 in the second). Generally these are "stop
> >> words" and removed either during indexing or querying, but I guess not
> in
> >> your case.
> >>
> >> What's your "standard" request handler look like?  Doing anything custom
> >> in there?
> >>
> >> What are the timings of the components in the debugQuery=true response?
> >>
> >> Is the server under load when you're issuing these queries?   What about
> >> RAM?
> >>
> >> Again, what version of Solr?
> >>
> >>Erik
> >>
> >>
> >>
> >>
> >> On Aug 17, 2012, at 06:35 , Sujatha Arun wrote:
> >>
> >>> Hi Erick,
> >>>
> >>> The number of documents is : 4389048
> >>>
> >>> I have given 2 queries below with timing and the number of hits
> >>>
> >>> INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
> >>>
> >>
> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:if+content:elena+content:reads+content:at+content:a+content:rate+content:of+content:r+content:pages+content:per+content:minute+content:for+content:a+content:total+content:of+content:m+content:minutes}
> >>> hits=3331109 status=0 QTime=22677
> >>>
> >>> INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
> >>>
> >>
> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:chapter+content:1}
> >>> hits=1919445 status=0 QTime=6677
> >>>
> >>>
> >>> Any time I execute the time taken to execute the first query is  > than
> >>> 22secs ,now it took 33 secs to execute.
> >>>
> >>> Regards
> >>> Sujatha
> >>>
> >>>
> >>> On Fri, Aug 17, 2012 at 3:30 PM, Erik Hatcher  >>> wrote:
> >>>
> >>>> How many documents do  you have?   What are the queries?  I'd guess
> your
> >>>> query complexity (or load) is to blame here, not index size.  What
> >> version
> >>>> of Solr?
> >>>>
> >>>> Until you know what is causing the slow queries, sharding is not
> >> something
> >>>> to consider I'd say.  But yes, you would want to reindex to distribute
> >> the
> >>>> documents.
> >>>>
> >>>>   Erik
> >>>>
> >>>>
> >>>> On Aug 17, 2012, at 02:21 , Sujatha Arun wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> One of the Index in a multicore set up has a 3GB+ index ,and it seems
> >> to
> >>>>> take around 5000ms+ to return simple boolean queries . The Index is
> >>>>> not optimized
> >>>>> Would it make sense to shard the index as cores in the same server to
> >>>>> expect better response time?
> >>>>>
> >>>>> do I have to re index all over again to distribute the documents
> >> between
> >>>> 2
> >>>>> shards ?
> >>>>>
> >>>>> Regards
> >>>>> Sujatha
> >>>>
> >>>>
> >>
> >>
>
>

Re: Muticore Sharding

2012-08-17 Thread Sujatha Arun

No customization,its the default standard request handler.Solr Version is
1.3

"a" is not there in stop words

Server Load ,i presume is not there , but not too sure ,not checked.

RAM :

TOTAL RAM :48GB
RAM to  JVM :18 GB ,Permgen =2GB
TOTAL INDEX SIZE of all the  multicore Instances =23GB

Timing [Cut & paste ] , I have never looked at this before

OldLuceneQParser218.06.05.00.00.00.00.0211.00.00.00.00.0211.0

Regards
Sujatha

On Fri, Aug 17, 2012 at 4:11 PM, Erik Hatcher wrote:

> Just over 4M docs... no need to shard.
>
> Both your queries contain what is likely very common terms (content:a in
> the first one and content:1 in the second). Generally these are "stop
> words" and removed either during indexing or querying, but I guess not in
> your case.
>
> What's your "standard" request handler look like?  Doing anything custom
> in there?
>
> What are the timings of the components in the debugQuery=true response?
>
> Is the server under load when you're issuing these queries?   What about
> RAM?
>
> Again, what version of Solr?
>
> Erik
>
>
>
>
> On Aug 17, 2012, at 06:35 , Sujatha Arun wrote:
>
> > Hi Erick,
> >
> > The number of documents is : 4389048
> >
> > I have given 2 queries below with timing and the number of hits
> >
> > INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
> >
> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:if+content:elena+content:reads+content:at+content:a+content:rate+content:of+content:r+content:pages+content:per+content:minute+content:for+content:a+content:total+content:of+content:m+content:minutes}
> > hits=3331109 status=0 QTime=22677
> >
> > INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
> >
> params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:chapter+content:1}
> > hits=1919445 status=0 QTime=6677
> >
> >
> > Any time I execute the time taken to execute the first query is  > than
> > 22secs ,now it took 33 secs to execute.
> >
> > Regards
> > Sujatha
> >
> >
> > On Fri, Aug 17, 2012 at 3:30 PM, Erik Hatcher  >wrote:
> >
> >> How many documents do  you have?   What are the queries?  I'd guess your
> >> query complexity (or load) is to blame here, not index size.  What
> version
> >> of Solr?
> >>
> >> Until you know what is causing the slow queries, sharding is not
> something
> >> to consider I'd say.  But yes, you would want to reindex to distribute
> the
> >> documents.
> >>
> >>Erik
> >>
> >>
> >> On Aug 17, 2012, at 02:21 , Sujatha Arun wrote:
> >>
> >>> Hello,
> >>>
> >>> One of the Index in a multicore set up has a 3GB+ index ,and it seems
> to
> >>> take around 5000ms+ to return simple boolean queries . The Index is
> >>> not optimized
> >>> Would it make sense to shard the index as cores in the same server to
> >>> expect better response time?
> >>>
> >>> do I have to re index all over again to distribute the documents
> between
> >> 2
> >>> shards ?
> >>>
> >>> Regards
> >>> Sujatha
> >>
> >>
>
>

Re: Muticore Sharding

2012-08-17 Thread Sujatha Arun

Hi Erick,

The number of documents is : 4389048

I have given 2 queries below with timing and the number of hits

INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:if+content:elena+content:reads+content:at+content:a+content:rate+content:of+content:r+content:pages+content:per+content:minute+content:for+content:a+content:total+content:of+content:m+content:minutes}
hits=3331109 status=0 QTime=22677

INFO: [ipc_widget_search] webapp=/multicore_5 path=/select/
params={version=2.1&fl=*+score&stylesheet=&qt=standard&fq=&rows=100&start=0&q=content:chapter+content:1}
hits=1919445 status=0 QTime=6677


Any time I execute the time taken to execute the first query is  > than
22secs ,now it took 33 secs to execute.

Regards
Sujatha


On Fri, Aug 17, 2012 at 3:30 PM, Erik Hatcher wrote:

> How many documents do  you have?   What are the queries?  I'd guess your
> query complexity (or load) is to blame here, not index size.  What version
> of Solr?
>
> Until you know what is causing the slow queries, sharding is not something
> to consider I'd say.  But yes, you would want to reindex to distribute the
> documents.
>
> Erik
>
>
> On Aug 17, 2012, at 02:21 , Sujatha Arun wrote:
>
> > Hello,
> >
> > One of the Index in a multicore set up has a 3GB+ index ,and it seems to
> > take around 5000ms+ to return simple boolean queries . The Index is
> > not optimized
> > Would it make sense to shard the index as cores in the same server to
> > expect better response time?
> >
> > do I have to re index all over again to distribute the documents between
> 2
> > shards ?
> >
> > Regards
> > Sujatha
>
>

Re: Custom Plugins for solr

2012-08-13 Thread Sujatha Arun

What I would be doing is this ..

Create a custom class that refer to all  org,apache.* classes (import stt)
,the custom file's  location is  independent of the solr core class files.
compile this separately
package this as a jar
move this to lib dir of each solr core
refer to this in lib directory of solrconfig.xml
realod the core.

I am assuming that I am not directly handling the solr download src files
or the war files ,Is this correct?do I have to be concerned with build
files etc? How then does the approach differ in the later versions?

Regards
Sujatha









On Mon, Aug 13, 2012 at 10:30 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> No, the jar would be exactly the same, with the caveat that you'd have
> to build against the newer Solr version of course.
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Mon, Aug 13, 2012 at 12:55 PM, Sujatha Arun 
> wrote:
> > Thanks ,I am going to try this on solr 1.3 version .Would the approach be
> > any different for the recent sorl versions?
> >
> > Regards
> > Sujatha
> >
> > On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> >> Then you're on the right track.
> >>
> >> 1. You'd either have to restart Tomcat or in the case of Multicore
> >> setups, reload the core.
> >> 2. If the jar has dependencies outside of the Solr provided classes,
> >> you'll have to include those as well. If it only depends on Solr stuff
> >> or things that are in the servlet container's classpath, you should be
> >> fine with just the one class.
> >>
> >> Michael Della Bitta
> >>
> >> 
> >> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> >> www.appinions.com
> >> Where Influence Isn’t a Game
> >>
> >>
> >> On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun 
> >> wrote:
> >> > Adding a new class
> >> >
> >> > Regards
> >> > Sujatha
> >> >
> >> > On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta <
> >> > michael.della.bi...@appinions.com> wrote:
> >> >
> >> >> Michael Della Bitta
> >> >> Hi Sujatha,
> >> >>
> >> >> Are you adding a new class, or modifying one of the provided Solr
> >> classes?
> >> >>
> >> >> Michael
> >> >>
> >> >>
> >> >> 
> >> >> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> >> >> www.appinions.com
> >> >> Where Influence Isn’t a Game
> >> >>
> >> >>
> >> >> On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun 
> >> wrote:
> >> >> > Hi ,
> >> >> >
> >> >> > I would like to write a custom component for solr  to address a
> >> >> particular
> >> >> > issue.
> >> >> >
> >> >> > This is what I have been doing ,write the custom code directly in
> the
> >> >> > downloaded code base and rebuild the war file and deploy the same.
> We
> >> >> > currently have multiple cores ,hence  I want to approach this in a
> >> core
> >> >> > specific way as opposed to affecting all the cores in the webapp .
> >> >> >
> >> >> > If I have to write a plugin and move it to the lib directory of
> each
> >> core
> >> >> > ,would I just need to add one single class file packed as a jar
>  and
> >> make
> >> >> > appropriate changes to the solrconfig file .When I reload the core
> ,
> >> I am
> >> >> > assuming that apart from the  classes in the war file ,this jar
> file
> >> in
> >> >> the
> >> >> > lib will be automatically referenced.
> >> >> >
> >> >> > Would I need to restart sevlet container?
> >> >> > Would I need to have other files to which this custom class is
> >> >> referencing
> >> >> > to in the custom jar file or will that be automatically taken care
> of?
> >> >> >
> >> >> > Regards
> >> >> > Sujatha
> >> >>
> >>
>

Re: Custom Plugins for solr

2012-08-13 Thread Sujatha Arun

Thanks ,I am going to try this on solr 1.3 version .Would the approach be
any different for the recent sorl versions?

Regards
Sujatha

On Mon, Aug 13, 2012 at 8:53 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Then you're on the right track.
>
> 1. You'd either have to restart Tomcat or in the case of Multicore
> setups, reload the core.
> 2. If the jar has dependencies outside of the Solr provided classes,
> you'll have to include those as well. If it only depends on Solr stuff
> or things that are in the servlet container's classpath, you should be
> fine with just the one class.
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Mon, Aug 13, 2012 at 10:36 AM, Sujatha Arun 
> wrote:
> > Adding a new class
> >
> > Regards
> > Sujatha
> >
> > On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> >> Michael Della Bitta
> >> Hi Sujatha,
> >>
> >> Are you adding a new class, or modifying one of the provided Solr
> classes?
> >>
> >> Michael
> >>
> >>
> >> ----
> >> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> >> www.appinions.com
> >> Where Influence Isn’t a Game
> >>
> >>
> >> On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun 
> wrote:
> >> > Hi ,
> >> >
> >> > I would like to write a custom component for solr  to address a
> >> particular
> >> > issue.
> >> >
> >> > This is what I have been doing ,write the custom code directly in the
> >> > downloaded code base and rebuild the war file and deploy the same. We
> >> > currently have multiple cores ,hence  I want to approach this in a
> core
> >> > specific way as opposed to affecting all the cores in the webapp .
> >> >
> >> > If I have to write a plugin and move it to the lib directory of each
> core
> >> > ,would I just need to add one single class file packed as a jar  and
> make
> >> > appropriate changes to the solrconfig file .When I reload the core ,
> I am
> >> > assuming that apart from the  classes in the war file ,this jar file
> in
> >> the
> >> > lib will be automatically referenced.
> >> >
> >> > Would I need to restart sevlet container?
> >> > Would I need to have other files to which this custom class is
> >> referencing
> >> > to in the custom jar file or will that be automatically taken care of?
> >> >
> >> > Regards
> >> > Sujatha
> >>
>

Re: Custom Plugins for solr

2012-08-13 Thread Sujatha Arun

Adding a new class

Regards
Sujatha

On Mon, Aug 13, 2012 at 5:54 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Michael Della Bitta
> Hi Sujatha,
>
> Are you adding a new class, or modifying one of the provided Solr classes?
>
> Michael
>
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Mon, Aug 13, 2012 at 7:18 AM, Sujatha Arun  wrote:
> > Hi ,
> >
> > I would like to write a custom component for solr  to address a
> particular
> > issue.
> >
> > This is what I have been doing ,write the custom code directly in the
> > downloaded code base and rebuild the war file and deploy the same. We
> > currently have multiple cores ,hence  I want to approach this in a core
> > specific way as opposed to affecting all the cores in the webapp .
> >
> > If I have to write a plugin and move it to the lib directory of each core
> > ,would I just need to add one single class file packed as a jar  and make
> > appropriate changes to the solrconfig file .When I reload the core , I am
> > assuming that apart from the  classes in the war file ,this jar file in
> the
> > lib will be automatically referenced.
> >
> > Would I need to restart sevlet container?
> > Would I need to have other files to which this custom class is
> referencing
> > to in the custom jar file or will that be automatically taken care of?
> >
> > Regards
> > Sujatha
>

Custom Plugins for solr

2012-08-13 Thread Sujatha Arun

Hi ,

I would like to write a custom component for solr  to address a particular
issue.

This is what I have been doing ,write the custom code directly in the
downloaded code base and rebuild the war file and deploy the same. We
currently have multiple cores ,hence  I want to approach this in a core
specific way as opposed to affecting all the cores in the webapp .

If I have to write a plugin and move it to the lib directory of each core
,would I just need to add one single class file packed as a jar  and make
appropriate changes to the solrconfig file .When I reload the core , I am
assuming that apart from the  classes in the war file ,this jar file in the
lib will be automatically referenced.

Would I need to restart sevlet container?
Would I need to have other files to which this custom class is referencing
to in the custom jar file or will that be automatically taken care of?

Regards
Sujatha

Re: solr 1872

2012-07-31 Thread Sujatha Arun

Thanks,Peter .Will try this.

Regards
Sujatha

On Tue, Jul 31, 2012 at 3:55 PM, Peter Sturge wrote:

> Hi,
>
> The acl file usually goes in the conf folder, so if you specify different
> conf folders for each core, you could have a different one for each.
> The acl file can also be specified in solrconfig.xml, under the
> SolrACLSecurity section:
>   acl.xml
> If you use a different solrconfig.xml for each core, you could specify
> different files that way.
>
> Keep in mind that if you just need to control core access, you can use
> jetty realms or similar acl mechanism for your container.
> SolrACLSecurity is for controlling fine-grained access to data within a
> core.
>
> Thanks,
> Peter
>
>
>
> On Tue, Jul 31, 2012 at 5:50 AM, Sujatha Arun  wrote:
>
> > Peter,
> >
> > In a multicore environment , where should the acl file reside , under the
> > conf directory ,Can I use a acl file per core ?
> >
> > Regards
> > Sujatha
> >
> > On Tue, Jul 31, 2012 at 9:15 AM, Sujatha Arun 
> wrote:
> >
> > > Renamed to zip and worked fine,thanks
> > >
> > > Regards
> > > Sujatha
> > >
> > >
> > > On Tue, Jul 31, 2012 at 9:15 AM, Sujatha Arun 
> > wrote:
> > >
> > >> thanks ,was looking to the rar file for instructions on set up .
> > >>
> > >> Regards
> > >> Sujatha
> > >>
> > >>
> > >> On Tue, Jul 31, 2012 at 1:07 AM, Peter Sturge  > >wrote:
> > >>
> > >>> I can access the rar fine with WinRAR, so should be ok, but yes, it
> > might
> > >>> be in zip format.
> > >>> In any case, better to use the slightly later version -->
> > >>> SolrACLSecurity.java
> > >>> 26kb 12 Apr 2010 10:35
> > >>>
> > >>> Thanks,
> > >>> Peter
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jul 30, 2012 at 7:50 PM, Sujatha Arun 
> > >>> wrote:
> > >>>
> > >>> > I am uable to use the rar file from the site
> > >>> > https://issues.apache.org/jira/browse/SOLR-1872.
> > >>> >
> > >>> > When I try to open it,I get the message 'SolrACLSecurity.rar is not
> > RAR
> > >>> > archive.
> > >>> >
> > >>> > Is the file there at this link?
> > >>> >
> > >>> > Regards
> > >>> > Sujatha
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: solr 1872

2012-07-30 Thread Sujatha Arun

Peter,

In a multicore environment , where should the acl file reside , under the
conf directory ,Can I use a acl file per core ?

Regards
Sujatha

On Tue, Jul 31, 2012 at 9:15 AM, Sujatha Arun  wrote:

> Renamed to zip and worked fine,thanks
>
> Regards
> Sujatha
>
>
> On Tue, Jul 31, 2012 at 9:15 AM, Sujatha Arun  wrote:
>
>> thanks ,was looking to the rar file for instructions on set up .
>>
>> Regards
>> Sujatha
>>
>>
>> On Tue, Jul 31, 2012 at 1:07 AM, Peter Sturge wrote:
>>
>>> I can access the rar fine with WinRAR, so should be ok, but yes, it might
>>> be in zip format.
>>> In any case, better to use the slightly later version -->
>>> SolrACLSecurity.java
>>> 26kb 12 Apr 2010 10:35
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>>>
>>> On Mon, Jul 30, 2012 at 7:50 PM, Sujatha Arun 
>>> wrote:
>>>
>>> > I am uable to use the rar file from the site
>>> > https://issues.apache.org/jira/browse/SOLR-1872.
>>> >
>>> > When I try to open it,I get the message 'SolrACLSecurity.rar is not RAR
>>> > archive.
>>> >
>>> > Is the file there at this link?
>>> >
>>> > Regards
>>> > Sujatha
>>> >
>>>
>>
>>
>

Re: solr 1872

2012-07-30 Thread Sujatha Arun

Renamed to zip and worked fine,thanks

Regards
Sujatha

On Tue, Jul 31, 2012 at 9:15 AM, Sujatha Arun  wrote:

> thanks ,was looking to the rar file for instructions on set up .
>
> Regards
> Sujatha
>
>
> On Tue, Jul 31, 2012 at 1:07 AM, Peter Sturge wrote:
>
>> I can access the rar fine with WinRAR, so should be ok, but yes, it might
>> be in zip format.
>> In any case, better to use the slightly later version -->
>> SolrACLSecurity.java
>> 26kb 12 Apr 2010 10:35
>>
>> Thanks,
>> Peter
>>
>>
>>
>> On Mon, Jul 30, 2012 at 7:50 PM, Sujatha Arun 
>> wrote:
>>
>> > I am uable to use the rar file from the site
>> > https://issues.apache.org/jira/browse/SOLR-1872.
>> >
>> > When I try to open it,I get the message 'SolrACLSecurity.rar is not RAR
>> > archive.
>> >
>> > Is the file there at this link?
>> >
>> > Regards
>> > Sujatha
>> >
>>
>
>

Re: solr 1872

2012-07-30 Thread Sujatha Arun

thanks ,was looking to the rar file for instructions on set up .

Regards
Sujatha

On Tue, Jul 31, 2012 at 1:07 AM, Peter Sturge wrote:

> I can access the rar fine with WinRAR, so should be ok, but yes, it might
> be in zip format.
> In any case, better to use the slightly later version -->
> SolrACLSecurity.java
> 26kb 12 Apr 2010 10:35
>
> Thanks,
> Peter
>
>
>
> On Mon, Jul 30, 2012 at 7:50 PM, Sujatha Arun  wrote:
>
> > I am uable to use the rar file from the site
> > https://issues.apache.org/jira/browse/SOLR-1872.
> >
> > When I try to open it,I get the message 'SolrACLSecurity.rar is not RAR
> > archive.
> >
> > Is the file there at this link?
> >
> > Regards
> > Sujatha
> >
>

solr 1872

2012-07-30 Thread Sujatha Arun

I am uable to use the rar file from the site
https://issues.apache.org/jira/browse/SOLR-1872.

When I try to open it,I get the message 'SolrACLSecurity.rar is not RAR
archive.

Is the file there at this link?

Regards
Sujatha

OS memory Pollution on rsync?

2012-06-03 Thread Sujatha Arun

Hello,

When we migrated the solr webapps to a new server in production,We did an
rsync of all the indexes in our old server . On checking with Jconsole
,the  OS free Ram was already filled to twice the Index size .There is no
other service in this server. The rsync was done twice.

Even though  we have more RAM in the new server ,some of the simple queries
are taking more than 2s to execute .If index was cached ,then why its
taking so long to execute


Any ideas?

Regards
Sujatha
*
*

Re: solr 1.3 Multicores and maxboolean clause

2012-05-30 Thread Sujatha Arun

Thanks Jack .

In which case the template cores would be ones that would be initialized
first and we need to take care of this  on template configs .

Also I notices that when  we remove the core1 and core0 and try to create a
new webapp without any core and empty solr.xml and try to create a new core
,we get an error and core is not created.

Regards
Sujatha

On Thu, May 31, 2012 at 12:40 AM, Jack Krupansky wrote:

> As per the source code, Solr only sets the BooleanQuery clause limit on
> the very first core load. It ignores any the setting on subsequent core
> loads, including a reload of the initial core.
>
> SolrCore.java: "// only change the BooleanQuery maxClauseCount once for
> ALL cores..."
>
> The cores should get loaded in the order they appear in solr.xml, although
> I don't know if that is a written, contractual guarantee.
>
> As the CoreAdmin wiki page says, "Workaround, set maxBooleanClauses to the
> greatest value desired in *all* cores".
>
> See:
> http://wiki.apache.org/solr/**CoreAdmin#Known_Issues<http://wiki.apache.org/solr/CoreAdmin#Known_Issues>
>
> The wiki is wrong when it says "Whichever Solr core initializes last will
> win the setting of the solrconfig.xml's maxBooleanClauses value." The first
> core to be loaded wins. Or, maybe the source code is wrong. Either way, a
> correction is needed.
>
> -- Jack Krupansky
>
> -Original Message- From: Sujatha Arun
> Sent: Wednesday, May 30, 2012 1:30 PM
> To: solr-user@lucene.apache.org
> Subject: solr 1.3 Multicores and maxboolean clause
>
>
> Hello,
>
> The solrcore Wiki says that "Lucene's
> BooleanQuery<http://wiki.**apache.org/solr/BooleanQuery<http://wiki.apache.org/solr/BooleanQuery>
> >**maxClauseCount
>
> is a static variable, making it a single value across the
> entire JVM. Whichever Solr core initializes last will win the setting of
> the solrconfig.xml's maxBooleanClauses value. Workaround, set
> maxBooleanClauses to the greatest value desired in *all* cores."
>
> Now what I see is that even if any one core* has a smaller value for
> maxboolean clause* ,the smaller one is taken into effect and not the last
> core which is created.
>
> *Some questions*
>
>
>  1. What is the order for initialization of the cores on a server
>
>  restart,I don't see this info in the logs?
>  2. When i change the maxboolean clause on one cores and reload the core
>
>  ,it is not effected ?Does this require Tomcat restart?why?
>  3. The default cores core0 and core1 that comes in the example multicore
>
>  setup does not have this value set in them as it has minimum configuration
>  ,does this affect the value in other cores if I use that as default?
>
> Regards,
> Sujatha
>

solr 1.3 Multicores and maxboolean clause

2012-05-30 Thread Sujatha Arun

Hello,

The solrcore Wiki says that "Lucene's
BooleanQuerymaxClauseCount
is a static variable, making it a single value across the
entire JVM. Whichever Solr core initializes last will win the setting of
the solrconfig.xml's maxBooleanClauses value. Workaround, set
maxBooleanClauses to the greatest value desired in *all* cores."

Now what I see is that even if any one core* has a smaller value for
maxboolean clause* ,the smaller one is taken into effect and not the last
core which is created.

*Some questions*


   1. What is the order for initialization of the cores on a server
   restart,I don't see this info in the logs?
   2. When i change the maxboolean clause on one cores and reload the core
   ,it is not effected ?Does this require Tomcat restart?why?
   3. The default cores core0 and core1 that comes in the example multicore
   setup does not have this value set in them as it has minimum configuration
   ,does this affect the value in other cores if I use that as default?

Regards,
Sujatha

Re: Multicore Issue - Server Restart

2012-05-30 Thread Sujatha Arun

solr 1.3

Regards
Sujatha

On Wed, May 30, 2012 at 8:26 PM, Siva Kommuri  wrote:

> Hi Sujatha,
>
> Which version of Solr are you using?
>
> Best Wishes,
> Siva
>
> On Wed, May 30, 2012 at 12:22 AM, Sujatha Arun 
> wrote:
>
> > Yes ,that is correct.
> >
> > Regards
> > Sujatha
> >
> > On Tue, May 29, 2012 at 7:23 PM, lboutros  wrote:
> >
> > > Hi Suajtha,
> > >
> > > each webapps has its own solr home ?
> > >
> > > Ludovic.
> > >
> > > -
> > > Jouve
> > > France.
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Multicore-Issue-Server-Restart-tp3986516p3986602.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Multicore Issue - Server Restart

2012-05-30 Thread Sujatha Arun

Yes ,that is correct.

Regards
Sujatha

On Tue, May 29, 2012 at 7:23 PM, lboutros  wrote:

> Hi Suajtha,
>
> each webapps has its own solr home ?
>
> Ludovic.
>
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multicore-Issue-Server-Restart-tp3986516p3986602.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Multicore Issue - Server Restart

2012-05-28 Thread Sujatha Arun

Hello ,

We have a multicore webapp for every 50 cores.Currently 3 Multicore webapps
and 150 cores distributed across the 3 webapps.

When we re started the server [Tomcat] ,we noticed that the solr.xml was
wiped out and we could not see any cores  in webapp1 and webapp3 ,but only
a few cores in webapp 2.

The solr.xml has persistent =true. Given this what could have possibly
happened ?

Solution was to add all the cores manually to solr.xml and restart
server,But I am unsure as to what would have caused this and there is no
indication in the logs also  for any issue,

Regards
Suajtha

Re: Dynamic creation of cores for this use case.

2012-05-09 Thread Sujatha Arun

you will have to create the directory and configs yourself .you will need
to call the command once you create the directory and give permissions ,the
following url only creates the data folder and makes an entry in solr.xml

Refer :http://blog.dustinrue.com/archives/690

Regards
Sujahta

On Wed, May 9, 2012 at 12:02 PM, pprabhcisco123  wrote:

> Hi,
>
> I tried to create core by simply hitting the below url
>
>
> http://localhost:8983/solr/admin/cores?action=CREATE&name=core3&instanceDir=C://solr&config=solrconfig.xml&schema=schema.xml&dataDir=C://solr/data
>
> It made a entry in the solr.xml file . but the core directory is not
> created.
>
> Please let me know what might be the issue ?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Dynamic-creation-of-cores-for-this-use-case-tp3937696p3973300.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Webapps and JVM code cache

2012-05-08 Thread Sujatha Arun

yes  47 MB  ,Does CMS permgen Sweeping take care  of code Cache clean up?

Thanks Michael and Otis

Regards
Sujatha

On Wed, May 9, 2012 at 2:27 AM, Otis Gospodnetic  wrote:

> Hi,
>
> Did you really mean 47 *MB*?
> Yes, if any limits are reached and GC cannot reclaim enough space, you
> will get OOM.
> You can use a couple of JVM params to catch this and dump heap to a file
> if you want to analyze it and see what was using memory and how much.
>
> Otis
> 
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
> >
> > From: Sujatha Arun 
> >To: solr-user@lucene.apache.org
> >Sent: Tuesday, May 8, 2012 7:24 AM
> >Subject: Solr Webapps and JVM code cache
> >
> >Hello ,
> >
> >I see that the code cache in the JVM is nearing its memory limits  47mb
> >/assigned in 50 MB .On deploying more  solr webapps to the server,Will i
> >get any out of memory exceptions ? and will JVM freeze?
> >
> >How should this be handled?
> >
> >Regards
> >Sujatha
> >
> >
> >
>

Re: Solr Webapps and JVM code cache

2012-05-08 Thread Sujatha Arun

The Permgen space ,we have given 2Gb  ,currently used permgen space is
650MB ,however code cache has by default  49MB and 47+ Mb  has been used ,I
would like to know what happens when we deploy more webapps to the
container .


   - Does the unused memory removed from cache to give more memory?
   - Does the JVM throw an OOM when we limit on code cache memory is
   reached?
   - Does the code cache get GC'ed ,by CMSpermgensweeping ?When this
   happens what to expect?


Regards
Sujatha



On Tue, May 8, 2012 at 7:44 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Hi Sujatha,
>
> You will likely have to increase the JVM permgen space for your
> container when you launch it. This is normal. How you do this depends on
> the container you're using and how you launch it.
>
> Michael
>
> On Tue, 2012-05-08 at 16:54 +0530, Sujatha Arun wrote:
> > Hello ,
> >
> > I see that the code cache in the JVM is nearing its memory limits  47mb
> > /assigned in 50 MB .On deploying more  solr webapps to the server,Will i
> > get any out of memory exceptions ? and will JVM freeze?
> >
> > How should this be handled?
> >
> > Regards
> > Sujatha
>
>
>

Solr Webapps and JVM code cache

2012-05-08 Thread Sujatha Arun

Hello ,

I see that the code cache in the JVM is nearing its memory limits  47mb
/assigned in 50 MB .On deploying more  solr webapps to the server,Will i
get any out of memory exceptions ? and will JVM freeze?

How should this be handled?

Regards
Sujatha

Re: Scaling Solr - Suggestions !!

2012-04-30 Thread Sujatha Arun

I was copying the indexes from webapp to cores ,when this happened .It
could have been an error from my end ,but just worried that an issue with
one core  would reflect on webapp .

Regards
Sujatha

On Mon, Apr 30, 2012 at 7:20 PM, Erick Erickson wrote:

> I'd get to the root of why indexes are corrupt! This should
> be very unusual. If you're seeing this at all frequently,
> it indicates something is very wrong and starting bunches
> of JVMs up is a band-aid over a much more serious
> problem.
>
> Are you, by chance, doing a kill -9? or other hard-abort?
>
> Best
> Erick
>
> On Mon, Apr 30, 2012 at 12:22 AM, Sujatha Arun 
> wrote:
> > Now the reason ,I have used different webapps instead of a single one for
> > the cores is ,while prototyping ,I discovered that ,when one of the cores
> > index is corrupt ,the entire webapp does not start up and the same must
> be
> > true of  "too many open files" etc ,that is to say if there is an issue
> > withe any one core [Schema /index] ,the entire webapp does not start up.
> >
> > Thanks for your suggestion.
> >
> >
> > Regards
> > Sujatha
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Apr 28, 2012 at 6:49 PM, Michael Della Bitta <
> > michael.della.bi...@appinions.com> wrote:
> >
> >> Just my opinion, but I'm not sure I see the value in deploying the cores
> >> to different webapps in a single container on a single machine to avoid
> >> a single point of failure... You still have a single point of failure at
> >> the process level down to the hardware, which when you think about it,
> >> is mostly everything. But perhaps you're at least using more than one
> >> container.
> >>
> >> It sounds to me that the easiest route to scalability for you would be
> >> to add more machines. Unless your cores are particularly complex or your
> >> traffic is heavy, a 3GB core should be no match for a single machine.
> >> And the traffic problem can be solved by replication and load balancing.
> >>
> >> Michael
> >>
> >> On Sat, 2012-04-28 at 13:24 +0530, Sujatha Arun wrote:
> >> > Hello,
> >> >
> >> > *Background* :For each of our  customers, we create 3 solr webapps
> with
> >> > different search  schema's,serving different search requirements and
> we
> >> > have about 70 customers.So we have about 210 webapps curently .
> >> >
> >> > *Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram
> :32GB ,
> >> > Permgen initally 1GB ,now increased to 2GB.
> >> >
> >> > *Solr Indexes* : Most are the order of a few MB ,about 2  big index of
> >> > about 3GB  each
> >> >
> >> > *Scaling Step 1 *:  We saw the permgen value go upto to nearly 850 mb
> >> ,when
> >> > we created so  many webapps ,hence now we are moving to solr cores
> and we
> >> > are going to have about 50 cores per webapp ,bringing the number of
> >> webapps
> >> > to about 5 . We want to distribute the cores with multiple webapps to
> >> avoid
> >> > a single point of failure.
> >> >
> >> >
> >> > *Requirement* :
> >> >
> >> >
> >> >-   We need to only scale the cores horizontally ,whose index sizes
> >> are
> >> >big.
> >> >-   We also require permission based search for each webapp ,would
> >> solr
> >> >NRT fit our needs ,where we can index the permission into the
> document
> >> >,which would mean   there would be frequent addition and deletion
> of
> >> >permissions to the documents across cores.
> >> >-   We also require  automatic fail over
> >> >
> >> > What technology would be ideal fit given Solr Cloud ,Katta , Solandra
> >> > ,Lily,Elastic Search etc [Preferably Open source] [ We would be
> required
> >> to
> >> > maintain many webapps with multicores ] and what about the commercial
> >> > offering given out use case
> >> >
> >> > Thanks.
> >> >
> >> > Regards,
> >> > Sujatha
> >>
> >>
> >>
>

Re: Scaling Solr - Suggestions !!

2012-04-29 Thread Sujatha Arun

Now the reason ,I have used different webapps instead of a single one for
the cores is ,while prototyping ,I discovered that ,when one of the cores
index is corrupt ,the entire webapp does not start up and the same must be
true of  "too many open files" etc ,that is to say if there is an issue
withe any one core [Schema /index] ,the entire webapp does not start up.

Thanks for your suggestion.


Regards
Sujatha







On Sat, Apr 28, 2012 at 6:49 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Just my opinion, but I'm not sure I see the value in deploying the cores
> to different webapps in a single container on a single machine to avoid
> a single point of failure... You still have a single point of failure at
> the process level down to the hardware, which when you think about it,
> is mostly everything. But perhaps you're at least using more than one
> container.
>
> It sounds to me that the easiest route to scalability for you would be
> to add more machines. Unless your cores are particularly complex or your
> traffic is heavy, a 3GB core should be no match for a single machine.
> And the traffic problem can be solved by replication and load balancing.
>
> Michael
>
> On Sat, 2012-04-28 at 13:24 +0530, Sujatha Arun wrote:
> > Hello,
> >
> > *Background* :For each of our  customers, we create 3 solr webapps with
> > different search  schema's,serving different search requirements and we
> > have about 70 customers.So we have about 210 webapps curently .
> >
> > *Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram :32GB ,
> > Permgen initally 1GB ,now increased to 2GB.
> >
> > *Solr Indexes* : Most are the order of a few MB ,about 2  big index of
> > about 3GB  each
> >
> > *Scaling Step 1 *:  We saw the permgen value go upto to nearly 850 mb
> ,when
> > we created so  many webapps ,hence now we are moving to solr cores and we
> > are going to have about 50 cores per webapp ,bringing the number of
> webapps
> > to about 5 . We want to distribute the cores with multiple webapps to
> avoid
> > a single point of failure.
> >
> >
> > *Requirement* :
> >
> >
> >-   We need to only scale the cores horizontally ,whose index sizes
> are
> >big.
> >-   We also require permission based search for each webapp ,would
> solr
> >NRT fit our needs ,where we can index the permission into the document
> >,which would mean   there would be frequent addition and deletion of
> >permissions to the documents across cores.
> >-   We also require  automatic fail over
> >
> > What technology would be ideal fit given Solr Cloud ,Katta , Solandra
> > ,Lily,Elastic Search etc [Preferably Open source] [ We would be required
> to
> > maintain many webapps with multicores ] and what about the commercial
> > offering given out use case
> >
> > Thanks.
> >
> > Regards,
> > Sujatha
>
>
>

Scaling Solr - Suggestions !!

2012-04-28 Thread Sujatha Arun

Hello,

*Background* :For each of our  customers, we create 3 solr webapps with
different search  schema's,serving different search requirements and we
have about 70 customers.So we have about 210 webapps curently .

*Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram :32GB ,
Permgen initally 1GB ,now increased to 2GB.

*Solr Indexes* : Most are the order of a few MB ,about 2  big index of
about 3GB  each

*Scaling Step 1 *:  We saw the permgen value go upto to nearly 850 mb ,when
we created so  many webapps ,hence now we are moving to solr cores and we
are going to have about 50 cores per webapp ,bringing the number of webapps
to about 5 . We want to distribute the cores with multiple webapps to avoid
a single point of failure.


*Requirement* :


   -   We need to only scale the cores horizontally ,whose index sizes are
   big.
   -   We also require permission based search for each webapp ,would solr
   NRT fit our needs ,where we can index the permission into the document
   ,which would mean   there would be frequent addition and deletion of
   permissions to the documents across cores.
   -   We also require  automatic fail over

What technology would be ideal fit given Solr Cloud ,Katta , Solandra
,Lily,Elastic Search etc [Preferably Open source] [ We would be required to
maintain many webapps with multicores ] and what about the commercial
offering given out use case

Thanks.

Regards,
Sujatha

Range Queries -sfloat

2012-03-28 Thread Sujatha Arun

Hello,

I am having an issue with range query in solr 1.3 .

query is price: [ 1 TO 20 ] is returning values out of this range ,like
23.00 AND 55.00 .The field type of the price field is sfloat .

When I check this form admin Debug query ,I am seeing junk instead of price.

example:

price:[ 1 TO 20 ]
price:[ 1 TO 20 ]
price:[1.0 TO 20.0]
price:[¿ࠀ#0; TO Á਀#0;]

Why is this happenning ?

Regards
Sujatha

Re: Solr cores issue

2012-03-28 Thread Sujatha Arun

Isn't it  administratively easier with multiple cores instead of multiple
webapps??

Regards
Sujatha

On Tue, Mar 27, 2012 at 6:24 PM, Erick Erickson wrote:

> It might be administratively easier to have multiple webapps, but
> it shouldn't really matter as far as I know...
>
> Best
> Erick
>
> On Tue, Mar 27, 2012 at 12:22 AM, Sujatha Arun 
> wrote:
> > yes ,I must have mis-copied and yes, i do have the conf folder per core
> > with schema etc ...
> >
> > Because of this issue ,we have decided to have multiple webapps with
> about
> > 50 cores per webapp  ,instead of one singe webapp with all 200 cores
> ,would
> > this make better sense ?
> >
> > what would be your suggestion?
> >
> > Regards
> > Sujatha
> >
> > On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >
> >> Shouldn't be. What do your log files say? You have to treat each
> >> core as a separate index. In other words, you need to have a core#/conf
> >> with the schema matching your core#/data/index directory etc.
> >>
> >> I suspect you've simply mis-copied something.
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun 
> wrote:
> >> > I was migrating to cores from webapp ,and I was copying a bunch of
> >> indexes
> >> > from webapps to respective cores ,when I restarted ,I had this issue
> >> where
> >> > the whole webapp with the cores would not startup and was getting
> index
> >> > corrupted message..
> >> >
> >> > In this scenario or in a scenario where there is an issue with schema
> >> > /config file for one core ,will the whole webapp with the cores not
> >> restart?
> >> >
> >> > Regards
> >> > Sujatha
> >> >
> >> > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >wrote:
> >> >
> >> >> Index corruption is very rare, can you provide more details how you
> >> >> got into that state?
> >> >>
> >> >> Best
> >> >> Erick
> >> >>
> >> >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun 
> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > Suppose  I have several cores in a single webapp ,I have issue with
> >> Index
> >> >> > beong corrupted in one core  ,or schema /solrconfig of one core is
> not
> >> >> well
> >> >> > formed ,then entire webapp refused to load on server restart?
> >> >> >
> >> >> > Why does this happen?
> >> >> >
> >> >> > Regards
> >> >> > Sujatha
> >> >>
> >>
>

Re: Solr cores issue

2012-03-26 Thread Sujatha Arun

yes ,I must have mis-copied and yes, i do have the conf folder per core
with schema etc ...

Because of this issue ,we have decided to have multiple webapps with about
50 cores per webapp  ,instead of one singe webapp with all 200 cores ,would
this make better sense ?

what would be your suggestion?

Regards
Sujatha

On Tue, Mar 27, 2012 at 12:07 AM, Erick Erickson wrote:

> Shouldn't be. What do your log files say? You have to treat each
> core as a separate index. In other words, you need to have a core#/conf
> with the schema matching your core#/data/index directory etc.
>
> I suspect you've simply mis-copied something.
>
> Best
> Erick
>
> On Mon, Mar 26, 2012 at 8:27 AM, Sujatha Arun  wrote:
> > I was migrating to cores from webapp ,and I was copying a bunch of
> indexes
> > from webapps to respective cores ,when I restarted ,I had this issue
> where
> > the whole webapp with the cores would not startup and was getting index
> > corrupted message..
> >
> > In this scenario or in a scenario where there is an issue with schema
> > /config file for one core ,will the whole webapp with the cores not
> restart?
> >
> > Regards
> > Sujatha
> >
> > On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson  >wrote:
> >
> >> Index corruption is very rare, can you provide more details how you
> >> got into that state?
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun 
> wrote:
> >> > Hello,
> >> >
> >> > Suppose  I have several cores in a single webapp ,I have issue with
> Index
> >> > beong corrupted in one core  ,or schema /solrconfig of one core is not
> >> well
> >> > formed ,then entire webapp refused to load on server restart?
> >> >
> >> > Why does this happen?
> >> >
> >> > Regards
> >> > Sujatha
> >>
>

Re: Solr cores issue

2012-03-26 Thread Sujatha Arun

I was migrating to cores from webapp ,and I was copying a bunch of indexes
from webapps to respective cores ,when I restarted ,I had this issue where
the whole webapp with the cores would not startup and was getting index
corrupted message..

In this scenario or in a scenario where there is an issue with schema
/config file for one core ,will the whole webapp with the cores not restart?

Regards
Sujatha

On Mon, Mar 26, 2012 at 4:43 PM, Erick Erickson wrote:

> Index corruption is very rare, can you provide more details how you
> got into that state?
>
> Best
> Erick
>
> On Sun, Mar 25, 2012 at 1:22 PM, Sujatha Arun  wrote:
> > Hello,
> >
> > Suppose  I have several cores in a single webapp ,I have issue with Index
> > beong corrupted in one core  ,or schema /solrconfig of one core is not
> well
> > formed ,then entire webapp refused to load on server restart?
> >
> > Why does this happen?
> >
> > Regards
> > Sujatha
>

Solr cores issue

2012-03-25 Thread Sujatha Arun

Hello,

Suppose  I have several cores in a single webapp ,I have issue with Index
beong corrupted in one core  ,or schema /solrconfig of one core is not well
formed ,then entire webapp refused to load on server restart?

Why does this happen?

Regards
Sujatha

Multicore -Create new Core request errors

2012-03-09 Thread Sujatha Arun

Hello,

When I issue this query to create a new Solr Core , I get the error message
HTTP Status 500 - Can't find resource 'solrconfig.xml' in classpath or
'/home/searchuser/searchinstances/multi_core_prototype/solr/conf/

http://
/multi_core_prototype/admin/cores?action=CREATE&name=coreX&instanceDir=/home/searchuser/searchinstances/multi_core_prototype/solr/coreX

I believe that the schema and and solrcongfig are optional.

I have the default cores - core0 and core1 in solr1.3 version. what should
be the path of solrconfig ,shld it refer to path of the schema in existing
core and can I expect to see the conf folder in the new core?

Regards
Sujatha

Moving from Multiple webapps to Multi Cores -Solr 1.3

2012-03-08 Thread Sujatha Arun

Hello All,

On Protyping from moving from solr Multiple Webapps to Solr Multi Cores
[1.3 Version both]..I am running into the following issues and Questions


1) We are primarily moving to Multicore because ,we saw the Permgen memory
being increased ,each time we created a new solr webapp ,so the assumption
is that  by moving to Multicore  and sharing the same war file ,we will not
increase the permgen memory ,when we create a new core  ,but I do see about
190kb increase when a new core is created as opposed to about 13mb per new
webapp , does the permgen memory get consumed /increased per core creation
with some benefit over webapp creation?


2) We have schemas for multiple languages ,and I wanted to create  webapp
per language and create cores  for each client with same kang requirement
,with shared schema ,Would that affect if we want to add some dynamic
fields to some cores [ofcourse the indexes are separate] ? Does this
approach make sense or we can just create  n number of cores in a single
webapp with different schemas ?


3) In terms of query time ,when i query a webapp to a  particular core
,should I expect the Qtime come down or remain same?


4) on Using the create command as
multi_core_prototype/admin/cores?action=CREATE&name=coreX&instanceDir=/searchinstances/multi_core_prototype/solr/coreX&config=/searchinstances/multi_core_prototype/solr/coreX&schema=/searchinstances/multi_core_prototype/solr/core0/conf/schema.xml&dataDir/searchinstances/multi_core_prototype/solr/coreX/data

My Directory structure is

tomcat5.5
Searchinstances
...multi_core_prototype
   ...solr.war
..solr
   .. solr.xml
   .. core0
   ...data
..conf
   ..core1
..conf
..data

On the above command instance dir ,coreX  is created  under solr  and  data
directory under  coreX ,however I dont see a conf directory with schema and
Solrconfig under  CoreX,I am assuming with the above command it copies it
from the existing core0 conf folder

Let me know if I am missing anything here.

Thanks,
 Sujatha

Re: Permgen Space - GC

2012-01-29 Thread Sujatha Arun

Thanks Lance.

I am still not clear when the classes get undeployed with 1.6 JVM and using
the above start up options

Does this get undeployed  on JVM restart ? OR
Do the classes get  undeployed by Pemgen Gc?
Does  Pemgen Gc  work for 1.6 Version?

Regards
Sujatha

On Sun, Jan 29, 2012 at 6:08 AM, Lance Norskog  wrote:

> Correct. Each war file instance uses its own classloader, and in this
> case pulling in Solr and all of the dependent jars uses that much
> memory. This also occurs when you deploy/undeploy/redeploy the same
> war file. Doing that over and over fills up PermGen. Accd. to this,
> you should use both this and ClassUnload:
>
>
> http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled
>
> On Fri, Jan 27, 2012 at 10:59 PM, Sujatha Arun 
> wrote:
> > When Loading multiple solr instances in JVM ,we see the pergmen space
> going
> > up by about 13mb per Instance ,but when we remove the instances ,that are
> > no longer needed,we do not see the memory being released .This is our
> > currnt JVM startup options .
> >
> > -Xms20g
> > -Xmx20g
> > -XX:NewSize=128m
> > -XX:MaxNewSize=128m
> > -XX:MaxPermSize=1024m
> > -XX:+UseConcMarkSweepGC
> > -XX:+CMSClassUnloadingEnabled
> > -XX:+UseTLAB
> > -XX:+UseParNewGC
> > -XX:MaxTenuringThreshold=0
> > -XX:SurvivorRatio=128
> >
> > Will enabling permgen GC help here  XX:+CMSPermGenSweepingEnabled? Will
> the
> > classes not be unloaded unless we do a server restart?
> >
> > Regards
> > Sujatha
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: Solr Cores

2012-01-24 Thread Sujatha Arun

Thanks Erick.

Regards
Sujatha

On Mon, Jan 23, 2012 at 11:16 PM, Erick Erickson wrote:

> You can have a large number of cores, some people have multiple
> hundreds. Having multiple cores is preferred over having
> multiple JVMs since it's more efficient at sharing system
> resources. If you're running a 32 bit JVM, you are limited in
> the amount of memory you can let the JVM use, so that's a
> consideration, but otherwise use multiple cores in one JVM
> and give that JVM say, half of the physical memory on the
> machine and tune from there.
>
> Best
> Erick
>
> On Sun, Jan 22, 2012 at 8:16 PM, Sujatha Arun  wrote:
> > Hello,
> >
> > We have in production a number of individual solr Instnaces on a single
> > JVM.As a result ,we see that the permgenSpace keeps increasing with each
> > additional instance added.
> >
> > I would Like to know ,if we can have solr cores , instead of individual
> > instances.
> >
> >
> >   - Is there any limit to the number of cores ,for a single instance?
> >   - Will this decrease the permgen space as the LIB is shared.?
> >   - Would there be any decrease in performance with number of cores
> added?
> >   - Any thing else that I should know before moving into cores?
> >
> >
> > Any help would be appreciated?
> >
> > Regards
> > Sujatha
>

Solr Cores

2012-01-22 Thread Sujatha Arun

Hello,

We have in production a number of individual solr Instnaces on a single
JVM.As a result ,we see that the permgenSpace keeps increasing with each
additional instance added.

I would Like to know ,if we can have solr cores , instead of individual
instances.


   - Is there any limit to the number of cores ,for a single instance?
   - Will this decrease the permgen space as the LIB is shared.?
   - Would there be any decrease in performance with number of cores added?
   - Any thing else that I should know before moving into cores?


Any help would be appreciated?

Regards
Sujatha

Re: Solr Cloud Indexing

2012-01-18 Thread Sujatha Arun

Thanks for the input.I conclude that It does not make sense to do it this
way,

Regards
Sujatha

On Wed, Jan 18, 2012 at 6:26 AM, Lance Norskog  wrote:

> Cloud upload bandwidth is free, but download bandwidth costs money. If
> you upload a lot of data but do not query it often, Amazon can make
> sense.  You can also rent much cheaper hardware in other hosting
> services where you pay by the month or even by the year. If you know
> you have a cap on how much resource you will need at once, the cheaper
> sites make more sense.
>
> On Tue, Jan 17, 2012 at 7:36 AM, Erick Erickson 
> wrote:
> > This only really makes sense if you don't have enough in-house resources
> > to do your indexing locally, but it certainly is possible.
> >
> > Amazon's EC2 has been used, but really any hosting service should do.
> >
> > Best
> > Erick
> >
> > On Tue, Jan 17, 2012 at 12:09 AM, Sujatha Arun 
> wrote:
> >> Would it make sense to  Index on the cloud and periodically [2-4 times
> >> /day] replicate the index at  our server for searching .Which service
> to go
> >> with for solr Cloud Indexing ?
> >>
> >> Any good and tried services?
> >>
> >> Regards
> >> Sujatha
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Solr Cloud Indexing

2012-01-16 Thread Sujatha Arun

Would it make sense to  Index on the cloud and periodically [2-4 times
/day] replicate the index at  our server for searching .Which service to go
with for solr Cloud Indexing ?

Any good and tried services?

Regards
Sujatha

Acceptable Response Time

2012-01-11 Thread Sujatha Arun

Hello,

I am Looking into trigger point for sharding Indexes based on response time
,and would like to define an acceptable response time.

Given  a  3GB index ,when Can i think of sharding .The response times being
variable based on the query and varies from 100ms to
600ms .We are running solr 1.3 with a collapse path for this instance.

A simple query gets a quick response ,however a complex Boolean query with
collapse takes a longer time .Should we base the acceptable time base on
the most complex or simple queries or somewhere in between.

Does sharding  work with a collapse patch?

Any pointer on an acceptable response time definition and at what  point we
can think of sharding  based on the  response time?


Regards
Sujatha

Re: [Profiling] How to profile/tune Solr server

2011-11-06 Thread Sujatha Arun

hi ,

I am planning to try Sematext Monitoring. Is there anything to watch out
for ?

Regards
Sujatha



On Fri, Nov 4, 2011 at 9:21 PM,  wrote:

> Hi Spark,
>
> 2009 there was a monitor from lucidimagination:
>
> http://www.lucidimagination.com/about/news/releases/lucid-imagination-releases-performance-monitoring-utility-open-source-apache-lucene
>
> A colleague of mine calls the sematext-monitor "trojan" because "SPM phone
> home":
> "Easy in, easy out - if you try SPM and don't like it, simply stop and
> remove the small client-side piece that sends us your data"
> http://sematext.com/spm/solr-performance-monitoring/index.html
>
> Looks like other people using a "real profiler" like YourKit Java Profiler
> http://forums.yourkit.com/viewtopic.php?f=3&t=3850
>
> There is also an article about Zabbix
>
> http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/
>
> In your case any profiler would do, but if you find out a Profiler with
> solr-specific default-filter let me know.
>
>
>
> Best regrads
>  Karsten
>
> P.S. eMail in context
>
> http://lucene.472066.n3.nabble.com/Profiling-How-to-profile-tune-Solr-server-td3467027.html
>
>  Original-Nachricht 
> > Datum: Mon, 31 Oct 2011 18:35:32 +0800
> > Von: yu shen 
> > An: solr-user@lucene.apache.org
> > Betreff: Re: [Profiling] How to profile/tune Solr server
>
> > No idea so far, try to figure out.
> >
> > Spark
> >
> > 2011/10/31 Jan Høydahl 
> >
> > > Hi,
> > >
> > > There are no official tools other than looking at the built-in stats
> > pages
> > > and perhaps using JConsole or similar JVM monitoring tools. Note that
> > > Solr's JMX capabilities may let you hook your enterprise's existing
> > > monitoring dashboard up with Solr.
> > >
> > > Also check out the new monitoring service from Sematext which will give
> > > you graphs and all. So far it's free evaluation:
> > > http://sematext.com/spm/index.html
> > >
> > > Do you have a clue for why the indexing is slow?
> > >
> > > --
> > > Jan Høydahl, search solution architect
> > > Cominvent AS - www.cominvent.com
> > > Solr Training - www.solrtraining.com
> > >
> > > On 31. okt. 2011, at 04:59, yu shen wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am a solr newbie. I find solr documents easy to access and use,
> > which
> > > is
> > > > really good thing. While my problem is I did not find a solr home
> > grown
> > > > profiling/monitoring tool.
> > > >
> > > > I set up the server as a multi-core server, each core has
> > approximately
> > > 2GB
> > > > index. And I need to update solr and re-generate index in a real time
> > > > manner (In java code, using SolrJ). Sometimes the update operation is
> > > slow.
> > > > And it is expected that in a year, the index size may increase to
> 4GB.
> > > And
> > > > I need to do something to prevent performance downgrade.
> > > >
> > > > Is there any solr official monitoring & profiling tool for this?
> > > >
> > > > Spark
> > >
> > >
>

Re: Optimization /Commit memory

2011-10-27 Thread Sujatha Arun

Thanks Simon and Jay .That was helpful .

So what we are looking at  during optimize is  2 or 3 times free Disk Space
to recreate the index.

Regards
Sujatha



On Wed, Oct 26, 2011 at 12:26 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> RAM costs during optimize / merge is generally low. Optimize is
> basically a merge of all segments into one, however there are
> exceptions. Lucene streams existing segments from disk and serializes
> the new segment on the fly. When you optimize or in general when you
> merge segments you need disk space for the "source" segments and the
> "targed" (merged) segment.
>
> If you use CompoundFileSystem (CFS) you need to additional space once
> the merge is done and your files are packed into the CFS which is
> basically the size of the "target" (merged) segment. Once the merge is
> done lucene can free the diskspace unless you have an IndexReader open
> that references those segments (lucene keeps track of these files and
> frees diskspace once possible).
>
> That said, I think you should use optimize very very rarely. Usually
> if you document collection is rarely changing optimize is useful and
> reasonable once in a while. if you collection is constantly changing
> you should rely on the merge policy to balance the number of segments
> for you in the background. Lucene 3.4 has a nice improved
> TieredMergePolicy that does a great job. (previous version are also
> good - just saying)
>
> A commit is basically flushing the segment you have in memory
> (IndexWriter memory) to disk. compression ratio can be up to 30% of
> the ram cost or even more depending on your data. The actual commit
> doesn't need a notable amount of memory.
>
> hope this helps
>
> simon
>
> On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT
>  wrote:
> > I have not spent a lot of time researching it, but one would expect that
> the OS RAM requirement for optimization of an index to be minimal.
> >
> > My understanding is that during optimization an essentially new index is
> built.  Once complete it switches out the indexes and will throw away the
> old one.  (In Windows it may not throw away the old one until the next
> Commit).
> >
> > JRJ
> >
> > -Original Message-
> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
> > Sent: Friday, October 21, 2011 12:10 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Optimization /Commit memory
> >
> > Just one more thing ,when we are talking about Optimization , we
> > are referring to  HD  free space for  replicating the index  (2 or 3
> times
> > the index size  ) .what is role of  RAM (OS) here?
> >
> > Regards
> > Suajtha
> >
> > On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun 
> wrote:
> >
> >> Thanks that helps.
> >>
> >> Regards
> >> Sujatha
> >>
> >>
> >> On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT <
> jay.jae...@dot.wi.gov>wrote:
> >>
> >>> Well, since the OS RAM includes the JVM RAM, that is part of your
> >>> requirement, yes?  Aside from the JVM and normal OS requirements, all
> you
> >>> need OS RAM for is file caching.  Thus, for updates, the OS RAM is not
> a
> >>> major factor.  For searches, you want sufficient OS RAM to cache enough
> of
> >>> the index to get the query performance you need, and to cache queries
> inside
> >>> the JVM if you get a lot of repeat queries (see solrconfig.xml for the
> >>> various caches: we have not played with them much).  So, the amount of
> RAM
> >>> necessary for that is very much dependent upon the size of your index,
> so I
> >>> cannot give you a simple number.
> >>>
> >>> You seem to believe that you have to have sufficient memory to have the
> >>> entire index in memory.  Except where extremely high performance is
> >>> required, I have not found that to be the case.
> >>>
> >>> This is just one of those "your mileage may vary" things.  There is not
> a
> >>> single answer or formula that fits every situation.
> >>>
> >>> JRJ
> >>>
> >>> -Original Message-
> >>> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> >>> Sent: Wednesday, October 19, 2011 11:58 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Optimization /Commit memory
> >>>
> >>> Thanks  Jay ,
> >>>
> >>> I was trying to compute the *OS RAM requirement*  *n

Re: Optimization /Commit memory

2011-10-20 Thread Sujatha Arun

Just one more thing ,when we are talking about Optimization , we
are referring to  HD  free space for  replicating the index  (2 or 3 times
the index size  ) .what is role of  RAM (OS) here?

Regards
Suajtha

On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun  wrote:

> Thanks that helps.
>
> Regards
> Sujatha
>
>
> On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT 
> wrote:
>
>> Well, since the OS RAM includes the JVM RAM, that is part of your
>> requirement, yes?  Aside from the JVM and normal OS requirements, all you
>> need OS RAM for is file caching.  Thus, for updates, the OS RAM is not a
>> major factor.  For searches, you want sufficient OS RAM to cache enough of
>> the index to get the query performance you need, and to cache queries inside
>> the JVM if you get a lot of repeat queries (see solrconfig.xml for the
>> various caches: we have not played with them much).  So, the amount of RAM
>> necessary for that is very much dependent upon the size of your index, so I
>> cannot give you a simple number.
>>
>> You seem to believe that you have to have sufficient memory to have the
>> entire index in memory.  Except where extremely high performance is
>> required, I have not found that to be the case.
>>
>> This is just one of those "your mileage may vary" things.  There is not a
>> single answer or formula that fits every situation.
>>
>> JRJ
>>
>> -Original Message-
>> From: Sujatha Arun [mailto:suja.a...@gmail.com]
>> Sent: Wednesday, October 19, 2011 11:58 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Optimization /Commit memory
>>
>> Thanks  Jay ,
>>
>> I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14
>> GB
>> Index [cumulative Index size of all Instances].And I put it thus -
>>
>> Requirement of Operating System RAM for an Index of  14GB is   - Index
>> Size
>> + 3 Times the  maximum Index Size of Individual Instance for Optimize .
>>
>> That is to say ,I have several Instances ,combined Index Size is 14GB
>> .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
>>  14GB +3 * 2.5 GB  ~ = 22GB.
>>
>> Correct?
>>
>> Regards
>> Sujatha
>>
>>
>>
>> On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT > >wrote:
>>
>> > Commit does not particularly spike disk or memory usage, unless you are
>> > adding a very large number of documents between commits.  A commit can
>> cause
>> > a need to merge indexes, which can increase disk space temporarily.  An
>> > optimize is *likely* to merge indexes, which will usually increase disk
>> > space temporarily.
>> >
>> > How much disk space depends very much upon how big your index is in the
>> > first place.  A 2 to 3 times factor of the sum of your peak index file
>> size
>> > seems safe, to me.
>> >
>> > Solr uses only modest amounts of memory for the JVM for this stuff.
>> >
>> > JRJ
>> >
>> > -Original Message-
>> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
>> > Sent: Wednesday, October 19, 2011 4:04 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: Optimization /Commit memory
>> >
>> > Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
>> > performing Commit or Optimize or Both?
>> >
>> > what is the requirement in terms of  size of RAM and HD for commit and
>> > Optimize
>> >
>> > Regards
>> > Sujatha
>> >
>>
>
>

Re: OS Cache - Solr

2011-10-20 Thread Sujatha Arun

Yes its same ,we have a base static schema and wherever required we
use dynamic.

Regards,
Sujatha


On Thu, Oct 20, 2011 at 6:26 PM, Jaeger, Jay - DOT wrote:

> I wonder.  What if, instead of 200 instances, you had one instance, but
> built a uniqueKey up out of whatever you have now plus whatever information
> currently segregates the instances.  Then this would be much more
> manageable.
>
> In other words, what is different about each of the 200 instances?  Is the
> schema for each essentially the same, as I am guessing?
>
> JRJ
>
> -----Original Message-
> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> Sent: Thursday, October 20, 2011 12:21 AM
> To: solr-user@lucene.apache.org
> Cc: Otis Gospodnetic
> Subject: Re: OS Cache - Solr
>
> Yes 200 Individual Solr Instances not solr cores.
>
> We get an avg response time of below 1 sec.
>
> The number of documents is  not many most of the isntances ,some of the
> instnaces have about 5 lac documents on average.
>
> Regards
> Sujahta
>
> On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT  >wrote:
>
> > 200 instances of what?  The Solr application with lucene, etc. per usual?
> >  Solr cores? ???
> >
> > Either way, 200 seems to be very very very many: unusually so.  Why so
> > many?
> >
> > If you have 200 instances of Solr in a 20 GB JVM, that would only be
> 100MB
> > per Solr instance.
> >
> > If you have 200 instances of Solr all accessing the same physical disk,
> the
> > results are not likely to be satisfactory - the disk head will go nuts
> > trying to handle all of the requests.
> >
> > JRJ
> >
> > -Original Message-
> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
> > Sent: Wednesday, October 19, 2011 12:25 AM
> > To: solr-user@lucene.apache.org; Otis Gospodnetic
> > Subject: Re: OS Cache - Solr
> >
> > Thanks ,Otis,
> >
> > This is our Solr Cache  Allocation.We have the same Cache allocation for
> > all
> > our *200+ instances* in the single Server.Is this too high?
> >
> > *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096,
> > autowarmCount=1024, )
> >
> > *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384)
> >
> >
> > *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096,
> > autowarmCount=4096, )
> >
> > Regards
> > Sujatha
> >
> > On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic <
> > otis_gospodne...@yahoo.com> wrote:
> >
> > > Maybe your Solr Document cache is big and that's consuming a big part
> of
> > > that JVM heap?
> > > If you want to be able to run with a smaller heap, consider making your
> > > caches smaller.
> > >
> > > Otis
> > > 
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > > Lucene ecosystem search :: http://search-lucene.com/
> > >
> > >
> > > >
> > > >From: Sujatha Arun 
> > > >To: solr-user@lucene.apache.org
> > > >Sent: Tuesday, October 18, 2011 12:53 AM
> > > >Subject: Re: OS Cache - Solr
> > > >
> > > >Hello Jan,
> > > >
> > > >Thanks for your response and  clarification.
> > > >
> > > >We are monitoring the JVM cache utilization and we are currently using
> > > about
> > > >18 GB of the 20 GB assigned to JVM. Out total index size being abt
> 14GB
> > > >
> > > >Regards
> > > >Sujatha
> > > >
> > > >On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl 
> > > wrote:
> > > >
> > > >> Hi Sujatha,
> > > >>
> > > >> Are you sure you need 20Gb for Tomcat? Have you profiled using
> > JConsole
> > > or
> > > >> similar? Try with 15Gb and see how it goes. The reason why this is
> > > >> beneficial is that you WANT your OS to have available memory for
> disk
> > > >> caching. If you have 17Gb free after starting Solr, your OS will be
> > able
> > > to
> > > >> cache all index files in memory and you get very high search
> > > performance.
> > > >> With your current settings, there is only 12Gb free for both caching
> > the
> > > >> index and for your MySql activities.  Chances are that when you
> backup
> > > >> MySql, the cached part of your Solr index gets flushed from disk
> > caches
> > > and
> > > >> need to be re-c

Re: Optimization /Commit memory

2011-10-20 Thread Sujatha Arun

Thanks that helps.

Regards
Sujatha

On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT wrote:

> Well, since the OS RAM includes the JVM RAM, that is part of your
> requirement, yes?  Aside from the JVM and normal OS requirements, all you
> need OS RAM for is file caching.  Thus, for updates, the OS RAM is not a
> major factor.  For searches, you want sufficient OS RAM to cache enough of
> the index to get the query performance you need, and to cache queries inside
> the JVM if you get a lot of repeat queries (see solrconfig.xml for the
> various caches: we have not played with them much).  So, the amount of RAM
> necessary for that is very much dependent upon the size of your index, so I
> cannot give you a simple number.
>
> You seem to believe that you have to have sufficient memory to have the
> entire index in memory.  Except where extremely high performance is
> required, I have not found that to be the case.
>
> This is just one of those "your mileage may vary" things.  There is not a
> single answer or formula that fits every situation.
>
> JRJ
>
> -Original Message-
> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> Sent: Wednesday, October 19, 2011 11:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Optimization /Commit memory
>
> Thanks  Jay ,
>
> I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14 GB
> Index [cumulative Index size of all Instances].And I put it thus -
>
> Requirement of Operating System RAM for an Index of  14GB is   - Index Size
> + 3 Times the  maximum Index Size of Individual Instance for Optimize .
>
> That is to say ,I have several Instances ,combined Index Size is 14GB
> .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
>  14GB +3 * 2.5 GB  ~ = 22GB.
>
> Correct?
>
> Regards
> Sujatha
>
>
>
> On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT  >wrote:
>
> > Commit does not particularly spike disk or memory usage, unless you are
> > adding a very large number of documents between commits.  A commit can
> cause
> > a need to merge indexes, which can increase disk space temporarily.  An
> > optimize is *likely* to merge indexes, which will usually increase disk
> > space temporarily.
> >
> > How much disk space depends very much upon how big your index is in the
> > first place.  A 2 to 3 times factor of the sum of your peak index file
> size
> > seems safe, to me.
> >
> > Solr uses only modest amounts of memory for the JVM for this stuff.
> >
> > JRJ
> >
> > -Original Message-
> > From: Sujatha Arun [mailto:suja.a...@gmail.com]
> > Sent: Wednesday, October 19, 2011 4:04 AM
> > To: solr-user@lucene.apache.org
> > Subject: Optimization /Commit memory
> >
> > Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
> > performing Commit or Optimize or Both?
> >
> > what is the requirement in terms of  size of RAM and HD for commit and
> > Optimize
> >
> > Regards
> > Sujatha
> >
>

Re: OS Cache - Solr

2011-10-19 Thread Sujatha Arun

Yes 200 Individual Solr Instances not solr cores.

We get an avg response time of below 1 sec.

The number of documents is  not many most of the isntances ,some of the
instnaces have about 5 lac documents on average.

Regards
Sujahta

On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT wrote:

> 200 instances of what?  The Solr application with lucene, etc. per usual?
>  Solr cores? ???
>
> Either way, 200 seems to be very very very many: unusually so.  Why so
> many?
>
> If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB
> per Solr instance.
>
> If you have 200 instances of Solr all accessing the same physical disk, the
> results are not likely to be satisfactory - the disk head will go nuts
> trying to handle all of the requests.
>
> JRJ
>
> -Original Message-
> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> Sent: Wednesday, October 19, 2011 12:25 AM
> To: solr-user@lucene.apache.org; Otis Gospodnetic
> Subject: Re: OS Cache - Solr
>
> Thanks ,Otis,
>
> This is our Solr Cache  Allocation.We have the same Cache allocation for
> all
> our *200+ instances* in the single Server.Is this too high?
>
> *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096,
> autowarmCount=1024, )
>
> *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384)
>
>
> *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096,
> autowarmCount=4096, )
>
> Regards
> Sujatha
>
> On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>
> > Maybe your Solr Document cache is big and that's consuming a big part of
> > that JVM heap?
> > If you want to be able to run with a smaller heap, consider making your
> > caches smaller.
> >
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> > >
> > >From: Sujatha Arun 
> > >To: solr-user@lucene.apache.org
> > >Sent: Tuesday, October 18, 2011 12:53 AM
> > >Subject: Re: OS Cache - Solr
> > >
> > >Hello Jan,
> > >
> > >Thanks for your response and  clarification.
> > >
> > >We are monitoring the JVM cache utilization and we are currently using
> > about
> > >18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB
> > >
> > >Regards
> > >Sujatha
> > >
> > >On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl 
> > wrote:
> > >
> > >> Hi Sujatha,
> > >>
> > >> Are you sure you need 20Gb for Tomcat? Have you profiled using
> JConsole
> > or
> > >> similar? Try with 15Gb and see how it goes. The reason why this is
> > >> beneficial is that you WANT your OS to have available memory for disk
> > >> caching. If you have 17Gb free after starting Solr, your OS will be
> able
> > to
> > >> cache all index files in memory and you get very high search
> > performance.
> > >> With your current settings, there is only 12Gb free for both caching
> the
> > >> index and for your MySql activities.  Chances are that when you backup
> > >> MySql, the cached part of your Solr index gets flushed from disk
> caches
> > and
> > >> need to be re-cached later.
> > >>
> > >> How to interpret memory stats vary between OSes, and seing 163Mb free
> > may
> > >> simply mean that your OS has used most RAM for various caches and
> > paging,
> > >> but will flush it once an application asks for more memory. Have you
> > seen
> > >> http://wiki.apache.org/solr/SolrPerformanceFactors ?
> > >>
> > >> You should also slim down your index maximally by setting stored=false
> > and
> > >> indexed=false wherever possible. I would also upgrade to a more
> current
> > Solr
> > >> version.
> > >>
> > >> --
> > >> Jan Høydahl, search solution architect
> > >> Cominvent AS - www.cominvent.com
> > >> Solr Training - www.solrtraining.com
> > >>
> > >> On 17. okt. 2011, at 19:51, Sujatha Arun wrote:
> > >>
> > >> > Hello
> > >> >
> > >> > I am trying to understand the  OS cache utilization of Solr .Our
> > server
> > >> has
> > >> > several solr instances on a server .The total combined Index size of
> > all
> > >> > instances is abt 14 Gb and the size of the maximum single In

Re: Optimization /Commit memory

2011-10-19 Thread Sujatha Arun

Thanks  Jay ,

I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a 14 GB
Index [cumulative Index size of all Instances].And I put it thus -

Requirement of Operating System RAM for an Index of  14GB is   - Index Size
+ 3 Times the  maximum Index Size of Individual Instance for Optimize .

That is to say ,I have several Instances ,combined Index Size is 14GB
.Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is
 14GB +3 * 2.5 GB  ~ = 22GB.

Correct?

Regards
Sujatha



On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT wrote:

> Commit does not particularly spike disk or memory usage, unless you are
> adding a very large number of documents between commits.  A commit can cause
> a need to merge indexes, which can increase disk space temporarily.  An
> optimize is *likely* to merge indexes, which will usually increase disk
> space temporarily.
>
> How much disk space depends very much upon how big your index is in the
> first place.  A 2 to 3 times factor of the sum of your peak index file size
> seems safe, to me.
>
> Solr uses only modest amounts of memory for the JVM for this stuff.
>
> JRJ
>
> -----Original Message-
> From: Sujatha Arun [mailto:suja.a...@gmail.com]
> Sent: Wednesday, October 19, 2011 4:04 AM
> To: solr-user@lucene.apache.org
> Subject: Optimization /Commit memory
>
> Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
> performing Commit or Optimize or Both?
>
> what is the requirement in terms of  size of RAM and HD for commit and
> Optimize
>
> Regards
> Sujatha
>

Optimization /Commit memory

2011-10-19 Thread Sujatha Arun

Do we require  2 or 3 Times OS RAM memory or  Hard Disk Space while
performing Commit or Optimize or Both?

what is the requirement in terms of  size of RAM and HD for commit and
Optimize

Regards
Sujatha

1 2 >

1 - 100 of 136 matches

Mail list logo