How do I send multiple user version parameter value for a delet by id request with multiple IDs ?

2020-01-31 Thread Mou
http://solr:port/collection/update?version_field=1234582.0 

works for the payload  

{"delete":[{"id":"51"},{"id":"5"}]} with multiple ids and the version
parameter is applied to both requests.

Is it possible to send separate version numbers for the ids in the parameter
?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: One complex wildcard query lead solr OOM

2016-01-24 Thread Jian Mou
Hi Jack,

Thanks! Do you know how to disable wildcards, What I want is if input is
wildcards, just treat it as a normal char. I other words,
I just want to disable wildcard search.

Thanks,
Jian

On Fri, Jan 22, 2016 at 1:55 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> The Lucene WildcardQuery class does have an additional constructor that has
> a maxDeterminizedStates parameter to limit the size of the FSM generated by
> a wildcard queery, and the QueryParserBase class does have a method to set
> that parameter, setMaxDeterminizedStates, but there is no Solr support for
> invoking that method.
>
> It is probably worth a Jira to get such support. Even then, the question is
> how Solr should respond to the exception that gets thrown when that limit
> is reached.
>
> Even if Solr had an option to disable complex wildcards, the question is
> what you want to happen when a complex wildcard is used - should an
> exception be thrown, or... what?
>
> I suppose it might be simplest to have a Solr option to limit the number of
> wildcard characters used in a term, like to 4 or 8 or something like that.
> IOW, have Solr check the term before the WildcardQuery is generated.
>
> -- Jack Krupansky
>
> On Thu, Jan 21, 2016 at 8:18 PM, Jian Mou <la.mouj...@gmail.com> wrote:
>
> > We are using Solr as our search engine, and recently notice some user
> > input wildcard query can lead to Solr dead loop in
> >
> > org.apache.lucene.util.automaton.Operations.determinize()
> >
> > , and it also eats memory and finally OOM.
> >
> > the wildcard query seems like **?-???o·???è??**。
> >
> > Although we can validate the input parameter, but I also wonder is there
> > any configuration which can disable complex wildcard query like this
> which
> > lead to serve performance problems.
> >
> >
> > Related statcktrace
> >
> >
> > [image: Inline image 1]
> >
> >
> >
> > Thanks,
> >
> > Jian
> >
>


One complex wildcard query lead solr OOM

2016-01-21 Thread Jian Mou
We are using Solr as our search engine, and recently notice some user input
wildcard query can lead to Solr dead loop in

org.apache.lucene.util.automaton.Operations.determinize()

, and it also eats memory and finally OOM.

the wildcard query seems like **?-???o·???è??**。

Although we can validate the input parameter, but I also wonder is there
any configuration which can disable complex wildcard query like this which
lead to serve performance problems.


Related statcktrace


[image: Inline image 1]



Thanks,

Jian


ERROR CommitTracker auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2014-10-10 Thread Mou
I am using Solr cloud 4.10.0 and I have been seeing this for a while now.
Does anyone has similar experience or clue what's happening ?

auto commit error...:org.apache.solr.common.SolrException: Error opening new
searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:607)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.IllegalArgumentException: maxValue must be non-negative
(got: -3)
at
org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1141)
at 
org.apache.lucene.codecs.lucene41.ForUtil.bitsRequired(ForUtil.java:253)
at 
org.apache.lucene.codecs.lucene41.ForUtil.writeBlock(ForUtil.java:174)
at
org.apache.lucene.codecs.lucene41.Lucene41PostingsWriter.addPosition(Lucene41PostingsWriter.java:377)
at
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:486)
at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:80)
at
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:114)
at
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:441)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:510)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:621)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:414)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:292)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:277)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
... 11 more



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Error-opening-new-searcher-tp4163727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Storing termVectors for PreAnalyzed type field

2014-01-17 Thread Mou
Can anyone please confirm if this is not supported in the current version?

I am trying to use pre-analyzed field for mlt and when creating the mltquery
it does not get anything from the index.

I think even if I set termVectors=true in the PreAnalyzed field definition,
it is being ignored.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Storing-termVectors-for-PreAnalyzed-type-field-tp4112006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr3.4 on tomcat 7.0.23 - hung with error threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed

2013-08-29 Thread Mou
We are getting the following error intermittently( two in two weeks
interval). The load on server seems to be usual.I see in the log that just
before the failure ( 4-5 mins) qtime was very high, normally those query
will be processed within 300 ms but before failure they took more than 100
secs.So many requests timed out.

I would really appreciate any pointer to find the root cause of this problem
..

Aug 27, 2013 3:51:27 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [default] in context with path
[/solrSearch] threw exception
java.lang.IllegalStateException: Cannot call sendError() after the response
has been committed
at
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:451)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:283)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DocValues with docValuesFormat=Disk

2013-04-23 Thread Mou
Hi,

If you use a codec which is not default, you need to download/build lucene
codec jars and put it in solr_home/lib directory and add the codecfactory in
the solr config file.

Look here for detail instruction

http://wiki.apache.org/solr/SimpleTextCodecExample

Best,
Mou





--
View this message in context: 
http://lucene.472066.n3.nabble.com/DocValues-with-docValuesFormat-Disk-tp4058238p4058344.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: long QTime for big index

2013-02-14 Thread Mou
Just to close this discussion , we solved the problem by splitting the index.
It turned out that distributed search with 12 cores are faster than
searching two cores.

All queries ,tomcat configuration, jvm configuration remain same. Now
queries are served in milliseconds.


On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
ml-node+s472066n4037870...@n3.nabble.com wrote:
 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
 very well, the same queries return in less than one sec . I am not sure yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: long QTime for big index

2013-02-14 Thread Mou
We have two boxes, they are really nice servers, 32 core cpu, 192 G
memory with both RAID arrays and fusion IOs. But each of them running
two instances of Solr, one for indexing and the other for
searching.Search index is on fusion IO card.

Each instance has 11 cores and a small core for making indexing almost realtime.

We have around 300 Million documents and 250G on disk. They are all
metadata . Search queries are very diverse and they do not repeat very
frequently , 40 -60 qps. Before we had two cores each 125 G on disk
and solr was taking long time to get results from those two cores. CPU
use was 90%.

We never had problem with indexing. 50% of all our docs gets updated
every day, so very high indexing rate.




On Thu, Feb 14, 2013 at 4:20 PM, alxsss [via Lucene]
ml-node+s472066n4040545...@n3.nabble.com wrote:
 Hi,

 It is curious to know how many linux boxes do you have and how many cores in
 each of them. It was my understanding that solr puts in the memory all
 documents found for a keyword, not the whole index. So, why it must be
 faster with more cores, when number of selected documents from many separate
 cores  are the same as from one core?

 Thanks.
 Alex.







 -Original Message-
 From: Mou [hidden email]
 To: solr-user [hidden email]
 Sent: Thu, Feb 14, 2013 2:35 pm
 Subject: Re: long QTime for big index


 Just to close this discussion , we solved the problem by splitting the
 index.
 It turned out that distributed search with 12 cores are faster than
 searching two cores.

 All queries ,tomcat configuration, jvm configuration remain same. Now
 queries are served in milliseconds.


 On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
 [hidden email] wrote:

 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it
 performs
 very well, the same queries return in less than one sec . I am not sure
 yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html

 Sent from the Solr - User mailing list archive at Nabble.com.




 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040545.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040549.html
Sent from the Solr - User mailing list archive at Nabble.com.

long QTime for big index

2013-01-31 Thread Mou
I am running solr 3.4 on tomcat 7.

Our index is very big , two cores each 120G. We are searching the slaves
which are replicated every 30 min.
 I am using filtercache only and We have more than 90% cache hits. We use
lot of filter queries, queries are usually pretty big with 10-20 fq
parameters. Not all filters are cached.

we are searching three shards and query looks like this --
shards=core1,core2,core3q=*:* fq=field1:some valuefq = -field2=some
valuesort=date 
But some queries are taking more than 30 sec to return result and the
behavior is intermittent. I can not find relation to replication. We are
using Zing jvm which reduced our GC pause to milli secs, so GC is not a
problem.

How can I improve the qtime? Is it at all possible to get a better qtime
given our index size?

Thank you for your suggestion.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: long QTime for big index

2013-01-31 Thread Mou
Thanks for your reply.

No, there is no eviction, yet.

The time is spent mostly on org.apache.solr.handler.component.QueryComponent
to process the request.

Again, the time varies widely for same query.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037741.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: long QTime for big index

2013-01-31 Thread Mou
Thank you Shawn for reading all of my previous entries and for a detailed
answer.

To clarify, the third shard is used to store the recently added/updated
data. Two main big cores take very long to replicate ( when a full
replication is required) so the third one helps us to return the newly
indexed documents quickly. It gets deleted every hour after we replicate the
two other cores with last hour's of new/changed data. This third core is
very small.

As you said, with that big index and distributed queries , searches were too
slow.So we tried to use the filtercache to speed up the queries. Filtercache
was big as we have thousands of different filters. other caches were not
very helpful as queries are not repetitive and there is heavy add/update to
the index. So we have to use bigger heap size. Now,with that big heap size
GC pauses was horrible, so we moved to Zing jvm. Zing jvm is now using 134 G
of heap and does not have those big pauses but it also does not leave much
memory for OS. 

I am now testing with small heap, small filter cache ( just the basic
filters) and lot of memory available for OS disk cache. If that does not
work, I am thinking of breaking my index down into small pieces.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037781.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: long QTime for big index

2013-01-31 Thread Mou
Thank you again.

Unfortunately the index files will not fit in the RAM.I have to try using
document cache. I am also moving my index to SSD again, we took our index
off when fusion IO cards failed twice during indexing and index was
corrupted.Now with the bios upgrade and new driver, it is supposed to be
more reliable.

Also I am going to look into the client app to verify that it is making
proper query requests.

Surprisingly when I used a much lower value than default for
defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
very well, the same queries return in less than one sec . I am not sure yet,
need to run solrmeter with different heap size , with cache and without
cache etc.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Score threshold 'reasonably', independent of results returned

2012-08-22 Thread Mou
Hi,
I think that this totally depends on your requirements and thus applicable
for a user scenario. Score does not have any absolute meaning, it is always
relative to the query. If you want to watch some particular queries and want
to show results with score above previously set threshold, you can use this. 

If I always have that x% threshold in place , there may be many queries
which would not return anything and I certainly do not want that.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4002673.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Hi Eric,

I totally agree. That's what I also figured ultimately. One thing I am not
clear.  The replication is supposed to be incremental ?  But looks like it
is trying to replicate the whole index. May be I am changing the index so
frequently, it is triggering auto merge and a full replication ? I am
thinking in right direction?

I see that when I start the solr search instance before I start feeding the
solr Index, my searches are fine BUT it is using the old searcher so I am
not seeing the updates in the result.

So now I am trying to change my architecture. I am going to have a core
dedicated to receive daily updates, which is going to be 5 million docs and
size is going to be little less than 5 G, which is small and replication
will be faster?

I will search both the cores i.e. old data and the daily updates and do a
field collapsing on my unique id so that I do not return duplicate results
.I haven't tried grouping results ; so not sure about  the performance. Any
suggestion ?

Eventually I have to use Solr trunk like you suggested.

Thank you for your help,

On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] 
ml-node+s472066n3995754...@n3.nabble.com wrote:

 bq: This index is only used for searching and being replicated every 7 sec
 from
 the master.

 This is a red-flag. 7 second replication times are likely forcing your
 app to spend
 all its time opening new searchers. Your cached filter queries are
 likely rarely being re-used
 because they're being thrown away every 7 seconds. This assumes you're
 changing your master index frequently.

 If you need near real time, consider Solr trunk and SolrCloud, but
 trying to simulate
 NRT with very short replication intervals is usually a bad idea.

 A quick test would be to disable replication for a bit (or lengthen it
 to, say, 10 minutes)

 Best
 Erick

 On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995754i=0
 wrote:

 
  FWIW, when asked at what point one would want to split JVMs and shard,
  on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
  GC cost reasons. You're way above that.
 
  - his index is 75G, and Grant mentioned RAM heap size; we can use
 terabytes
  of index with 16Gb memory.
 
 
 
 
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995754.html
  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995774.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-18 Thread Mou
Increasing the polling interval does help. But the requirement is to get a
document indexed and searchable instantly ( sounds like RTS), 30 sec is
acceptable.I need to look at Solr NRT and cloud.

I created a new core to accept daily updates and replicate every 10 sec.
Two other cores with 234 Million documents are configured to replicate only
once a day.
I am feeding all three cores but two big cores are not replicating. While
searching I am running a group.field on my unique id and taking the most
updated one. Right now it looks fine.Every day I am going to delete the
last day's records from the daily update.

I am planning to use rsync for replication, it will be fusion IO to fusion
IO , so hopefully will be very fast. What do you think ?

We use windows service ( written in dot net C#) to feed the data using REST
call. That is really fast , we can feed more than 15 Million data in a day
to two cores easily. I am using solr config autocommit = 5 sec

I could not figure out how I was able to achieve those numbers in my test
environment, all configuration were same except I had lot less memory in
test  ! I am trying to find out what I am missing in other configuration.
My SLES kernel version is different in production, its a 3.0.*  , test was
2.6.* but I do not think that can cause a problem.

Thank you again,
Mou

On Wed, Jul 18, 2012 at 6:26 PM, Erick Erickson [via Lucene] 
ml-node+s472066n3995861...@n3.nabble.com wrote:

 Replication will indeed be incremental. But if you commit too often (and
 committing too often a common mistake) then the merging will
 eventually merge everything into new segments and the whole thing will
 be replicated.

 Additionally, optimizing (or forceMerge in 4.x) will make a single segment
 and force the entire index to replicate.

 You should emphatically _not_ have to have two cores. Solr is built to
 handle replication etc. I suspect your committing too often or some
 other mis-configuration and you're creating a problem for yourself.

 Here's what I'd do:
 1 increase the polling interval to, say, 10 minutes (or however long you
 can
 live with stale data) on the slave.

 2 decrease the commits you're  doing. This could involve the autocommit
 options
 you might have set in solrconfig.xml. It could be your client (don't
 know how you're
 indexing, solrJ?) and the commitWithin parameter. Could be you're
 optimizing (if you
 are, stop it!).

 Note that ramBufferSizeMB has no influence on how often things are
 _committed_.
 When this limit is exceeded, the accumulated indexing data is written
 to the currently-open
 segment. Multiple flushes can go to the _same_ segment. The write-once
 nature of
 segments means that after a segment is closed (through a commit), it
 is not changed. But
 a segment that is not closed may be written to multiple times until it's
 closed.

 HTH
 Erick

 On Wed, Jul 18, 2012 at 1:25 PM, Mou [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3995861i=0
 wrote:

  Hi Eric,
 
  I totally agree. That's what I also figured ultimately. One thing I am
 not
  clear.  The replication is supposed to be incremental ?  But looks like
 it
  is trying to replicate the whole index. May be I am changing the index
 so
  frequently, it is triggering auto merge and a full replication ? I am
  thinking in right direction?
 
  I see that when I start the solr search instance before I start feeding
 the
  solr Index, my searches are fine BUT it is using the old searcher so I
 am
  not seeing the updates in the result.
 
  So now I am trying to change my architecture. I am going to have a core
  dedicated to receive daily updates, which is going to be 5 million docs
 and
  size is going to be little less than 5 G, which is small and replication
  will be faster?
 
  I will search both the cores i.e. old data and the daily updates and do
 a
  field collapsing on my unique id so that I do not return duplicate
 results
  .I haven't tried grouping results ; so not sure about  the performance.
 Any
  suggestion ?
 
  Eventually I have to use Solr trunk like you suggested.
 
  Thank you for your help,
 
  On Wed, Jul 18, 2012 at 10:28 AM, Erick Erickson [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=3995861i=1
 wrote:
 
  bq: This index is only used for searching and being replicated every 7
 sec
  from
  the master.
 
  This is a red-flag. 7 second replication times are likely forcing your
  app to spend
  all its time opening new searchers. Your cached filter queries are
  likely rarely being re-used
  because they're being thrown away every 7 seconds. This assumes you're
  changing your master index frequently.
 
  If you need near real time, consider Solr trunk and SolrCloud, but
  trying to simulate
  NRT with very short replication intervals is usually a bad idea.
 
  A quick test would be to disable replication for a bit (or lengthen it
  to, say, 10 minutes)
 
  Best
  Erick
 
  On Tue, Jul 17, 2012 at 10:47 PM, Fuad Efendi [hidden email

Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-17 Thread Mou
Brian,

Thanks again.
swappiness is set to 60 and from vmstat , I can see no swapping is going on.
Also I am using fusion IO SSD for storing my index.

I also used the visualVM and it shows me that it is blocked on
lock=org.apache.lucene.index.SegmentCoreReaders@299172a7.

Any clue?


On Mon, Jul 16, 2012 at 10:38 PM, Bryan Loofbourrow [via Lucene] 
ml-node+s472066n3995452...@n3.nabble.com wrote:

 Another thing you may wish to ponder is this blog entry from Mike
 McCandless:
 http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html

 In it, he discusses the poor interaction between OS swapping, and
 long-neglected allocations in a JVM. You're on Linux, which has decent
 control over swapping decisions, so you may find that a tweak is in order,
 especially if you can discover evidence that the hard drive is being
 worked hard during GC. If the problem exists, it might be especially
 pronounced in your large JVM.

 I have no direct evidence of thrashing during GC (I am not sure how to go
 about gathering such evidence), but I have seen, on a Windows machine, a
 Tomcat running Solr refuse to shut down for many minutes, while a Resource
 Monitor session reports that that same Tomcat process is frantically
 reading from the page file the whole time. So there is something besides
 plausibility to the idea.

 -- Bryan

  -Original Message-
  From: Mou [mailto:[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3995452i=0]

  Sent: Monday, July 16, 2012 9:09 PM
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995452i=1
  Subject: Re: Using Solr 3.4 running on tomcat7 - very slow search
 
  Thanks Brian. Excellent suggestion.
 
  I haven't used VisualVM before but I am going to use it to see where CPU
  is
  going. I saw that CPU is overly used. I haven't seen so much CPU use in
  testing.
  Although I think GC is not a problem, splitting the jvm per shard would
 be
  a good idea.
 
 
  On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=3995452i=2
 wrote:
 
   5 min is ridiculously long for a query that used to take 65ms. That
  ought
   to be a great clue. The only two things I've seen that could cause
 that
   are thrashing, or GC. Hard to see how it could be thrashing, given
 your
   hardware, so I'd initially suspect GC.
  
   Aim VisualVM at the JVM. It shows how much CPU goes to GC over time,
 in

  a
   nice blue line. And if it's not GC, try out its Sampler tab, and see
  where
   the CPU is spending its time.
  
   FWIW, when asked at what point one would want to split JVMs and shard,
  on
   the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
   cost reasons. You're way above that. Maybe multiple JVMs and sharding,
   even on the same machine, would serve you better than a monster 70GB
  JVM.
  
   -- Bryan
  
-Original Message-
From: Mou [mailto:[hidden
  email]http://user/SendEmail.jtp?type=nodenode=3995446i=0]
  
Sent: Monday, July 16, 2012 7:43 PM
To: [hidden
  email]http://user/SendEmail.jtp?type=nodenode=3995446i=1
Subject: Using Solr 3.4 running on tomcat7 - very slow search
   
Hi,
   
Our index is divided into two shards and each of them has 120M docs
 ,
total
size 75G in each core.
The server is a pretty good one , jvm is given memory of 70G and
 about
same
is left for OS (SLES 11) .
   
We use all dynamic fields except th eunique id and are using long
   queries
but almost all of them are filter queires, Each query may have 10
 -30
  fq
parameters.
   
When I tested the index ( same size) but with max heap size 40 G,
   queries
  
were blazing fast. I used solrmeter to load test and it was happily
serving
12000 queries or more per min with avg 65 ms qtime.We had an
 excellent
filtercache hit ratio.
   
This index is only used for searching and being replicated every 7
 sec

from
the master.
   
But now in production server it is horribly slow and taking 5
   mins(qtime)
  
to
return a query ( same query).
What could go wrong?
   
Really appreciate your suggestions on debugging this thing..
   
   
   
--
View this message in context:
  http://lucene.472066.n3.nabble.com/Using-
Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
Sent from the Solr - User mailing list archive at Nabble.com.
  
  
   --
If you reply to this email, your message will be added to the
  discussion
   below:
  
   http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-
  very-slow-search-tp3995436p3995446.html
To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow
  search, click
  
 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=uns
 
 ubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg
  1MTA5MTUw
   .
  
 
 NAMLhttp://lucene.472066.n3

Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Hi,

Our index is divided into two shards and each of them has 120M docs , total
size 75G in each core.
The server is a pretty good one , jvm is given memory of 70G and about same
is left for OS (SLES 11) .

We use all dynamic fields except th eunique id and are using long queries
but almost all of them are filter queires, Each query may have 10 -30 fq
parameters.

When I tested the index ( same size) but with max heap size 40 G, queries
were blazing fast. I used solrmeter to load test and it was happily serving
12000 queries or more per min with avg 65 ms qtime.We had an excellent
filtercache hit ratio.

This index is only used for searching and being replicated every 7 sec from
the master.

But now in production server it is horribly slow and taking 5 mins(qtime) to
return a query ( same query).
What could go wrong?

Really appreciate your suggestions on debugging this thing..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Thanks Brian. Excellent suggestion.

I haven't used VisualVM before but I am going to use it to see where CPU is
going. I saw that CPU is overly used. I haven't seen so much CPU use in
testing.
Although I think GC is not a problem, splitting the jvm per shard would be
a good idea.


On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] 
ml-node+s472066n3995446...@n3.nabble.com wrote:

 5 min is ridiculously long for a query that used to take 65ms. That ought
 to be a great clue. The only two things I've seen that could cause that
 are thrashing, or GC. Hard to see how it could be thrashing, given your
 hardware, so I'd initially suspect GC.

 Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a
 nice blue line. And if it's not GC, try out its Sampler tab, and see where
 the CPU is spending its time.

 FWIW, when asked at what point one would want to split JVMs and shard, on
 the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
 cost reasons. You're way above that. Maybe multiple JVMs and sharding,
 even on the same machine, would serve you better than a monster 70GB JVM.

 -- Bryan

  -Original Message-
  From: Mou [mailto:[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3995446i=0]

  Sent: Monday, July 16, 2012 7:43 PM
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1
  Subject: Using Solr 3.4 running on tomcat7 - very slow search
 
  Hi,
 
  Our index is divided into two shards and each of them has 120M docs ,
  total
  size 75G in each core.
  The server is a pretty good one , jvm is given memory of 70G and about
  same
  is left for OS (SLES 11) .
 
  We use all dynamic fields except th eunique id and are using long
 queries
  but almost all of them are filter queires, Each query may have 10 -30 fq
  parameters.
 
  When I tested the index ( same size) but with max heap size 40 G,
 queries

  were blazing fast. I used solrmeter to load test and it was happily
  serving
  12000 queries or more per min with avg 65 ms qtime.We had an excellent
  filtercache hit ratio.
 
  This index is only used for searching and being replicated every 7 sec
  from
  the master.
 
  But now in production server it is horribly slow and taking 5
 mins(qtime)

  to
  return a query ( same query).
  What could go wrong?
 
  Really appreciate your suggestions on debugging this thing..
 
 
 
  --
  View this message in context: http://lucene.472066.n3.nabble.com/Using-
  Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995446.html
  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Preferred file system for Solr

2012-02-23 Thread Mou
We are using a VeloDrive (SSD) to store and search our solr index.
The system is running on SLES 11.

Right now we are using ext3 but wondering if anyone has any experience using
XFS/ext3 on SSD or FusionIO for Solr .

Does solr have any preference for the underlined file system ?

Our index will be big ( around 250 M ) docs to start with , adding 5 M docs
every week , 50 to 60 % of that will be updates. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Preferred-file-system-for-Solr-tp3771250p3771250.html
Sent from the Solr - User mailing list archive at Nabble.com.