Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-25 Thread wwang525
Hi Erick,

Up to now, all the tests were based on randomly generated requests. 

In reality, many requests will get executed more than twice since this is to
support the advertising project. On the other hand, new queries could be
generated daily. So some of the filter queries will be used frequently for a
period of time, and will not be used afterwards. 

I will take your advice to analyze the real queries once the project is in
production.

Thank you very much!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-gain-with-setting-cache-false-in-the-query-for-complex-queries-tp4224931p4225147.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi Erick,

The earlier test was done through individual requests. However, my load test
is even better.

(1) load test (3 requests/per second/per core) immediately after restarting
Solr: average response time: 122 ms
(2) load test (5 requests/per second/per core) immediately after restarting
Solr: average response time: 120 ms

(3) warm-up (filter cache not warmed up with !cache=false) with a load of 3
requests/per second/per core for 40 rounds, then load test with a load of 5
requests/per second/per core for 40 rounds: average response time: 72 ms

It is now very obvious that the previously slower query response time (on
average: 500ms) with filter cache enabled in our demanding query was due to
extra processing to fill the cache for all the randomly generated requests.

This performance (<100 ms) should be good enough in production for our
project. However, I would like to know if this response time is a typical
"Solr speed" based on 15 M records.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-gain-with-setting-cache-false-in-the-query-for-complex-queries-tp4224931p4224988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi All, 

I am working on improving query performance of queries that is based on 15 M
records, and all the queries have a list of about 6 filter queries with
grouping and faceting requirements.

So far, I found that the cache setting in solrconfig.xml is helpful after
the Solr server is warmed up. Then the average response time stayed at about
500 ms / per request whether under load or not. The analysis of the query
statistics indicated the performance "bottleneck" is in "query", and not
"facet". I was looking for ways to improve it further.

After I put !cache=false in front of the list of filter queries, and test
individual queries (not under load),  the performance seemed to be boosted
quite a lot. For example, when I tested the 10th query, I got a 130 ms
response time with !cache=false. However, without setting !cache=false, I
typically got 500ms response time for the query. 

In all the 10 queries I tested (one by one), the response time were
consistently much better with !cache=false.

Does it make sense?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-gain-with-setting-cache-false-in-the-query-for-complex-queries-tp4224931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira,

I happened to compose individual fq for each field, such as:
fq=Gatewaycode:(...)&fq=DestCode:(...)&fq=DateDep:(...)&fq=Duration:(...)

It is nice to know that I am not creating unnecessary cache entries since
the above method results in minimal carnality as you pointed out.

Thank





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699p4223988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Upayavira,

Thank you very much for pointing out the potential design issue

The queries will be determined through a configuration by business users.
There will be limited number of queries every day, and will get executed by
customers repeatedly. However, business users will change the configurations
so that new queries will get generated and also will be limited. The change
can be as frequent as daily or weekly. The project is to supporting daily
promotional based on fresh index data.

Cumulatively, there can be a lot of different queries. If I still want to
take the advantage of the filterCache, can I limit the size of the three
caches so that the RAM usage will be under control?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699p4223960.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it a good query performance with this data size ?

2015-08-19 Thread wwang525
Hi Erick,

All my queries are based on fq (filter query). I have to send the randomly
generated queries to warm up low level lucene cache.

I went to the more tedious way to warm up low level cache without utilizing
the three caches by turning off the three caches (set values to zero). Then,
I send 800 randomly generated request to Solr. The RAM jumped from 500MB to
2.5G, and stayed there.

Then, I test individual queries against Solr. This time, I got very close
response time when I requested the first time, second time, or third time. 

The results: 

(1) average response time: 803 ms with only one request having a response
time >1 second (1042 ms)
(2) the majority of the time was spent on query, and not on faceting 
(730/803 = 90%)

So the query is the bottleneck.

I also have an interesting finding: it looks like the fq query works better
with integer type. I created string type for two properties: DateDep and
Duration since the definition of docValues=true for integer type did not
work with faceted search. There was a time I accidentally used filter query
with the string type property and I found the query performance degraded
quite a lot.

Is it generally true that fq works better with integer type  ?

If this is the case, I could create two integer type properties for two
other fq to check if I can boost the performance.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699p4223920.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick,

I just tested 10 different queries with or without the faceting search on
the two properties : departure_date, and hotel_code. Under cold cache
scenario, they have pretty much the same response time, and the faceting
took much less time than the query time. Under cold cache scenario, the
"query" (under timing)  is still the "bottleneck".

I understand that the low level cache needs to be warmed up to do a more
realistic test. However, I do not have a good and consistent way to warm up
the low level cache without caching the filter queries at the same time. If
I load test some random queries before I test these 10 individual queries, I
can see a better response time in some cases, but that could also be due to
filter query cache.

To load up low level lucene cache without creating filtercache/document
cache etc, can I turn off the three cache and send a lot of queries to Solr
before I start to test the performance of each individual queries?

Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699p4223758.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick,

Two facets are probably demanding:

departure_date have 365 distinct values and hotel_code can have 800 distinct
values.

The docValues setting definitely helped me a lot even when all the queries
had the above two facets. I will test a list of queries with or without the
two facets after indexing the data (to take advantage of cache warming).

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699p4223744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi All,

I am working on a search service based on Solr (v5.1.0). The data size is 15
M records. The size of the index files is 860MB. The test was performed on a
local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel
Core i7-3770). 

I found out that setting docValues=true for faceting and grouping indeed
boosted the performance with first-time search under cold cache scenario.
For example, with our requests that use all the features like grouping,
sorting, faceting, I found the difference of faceting alone can be as much
as 300 ms. 

However, response time for the same request executed the second time seems
to be at the same level whether the setting of docValues is true or false.
Still, I set up docValues=true for all the faceting properties.

The following are what I have observed:

(1) Test single request one-by-one (no load)

With a cold cache, I execute randomly generated queries one after another.
The first query routinely exceed 1 second, but not usually more than 2
seconds. I continue to generate random requests, and execute the queries
one-by-one, the response time normally stabilized at the range of 500 ms. It
does not seem to improve more as I continue execute randomly generated
queries.

(2) Load test with randomly generated requests

Under load test scenario (each core takes 4 requests per second, and
continue for 20 round), I can see the CPU usage jumped, and the earlier
requests usually got much longer response time, they may even exceed 5
seconds. However, the CPU usage pattern will then changed to the SAW shape,
and the response time will drop, and I can see that the requests got
executed faster and faster. I usually gets an average response time around 1
second.

If I execute a load test again, the average response time will continue
drop. However, it stays at about 500 ms/per request under this load if I try
more tests.

These are the best results so far. 

I understand that the requests were all different, so it can not be compared
with the case where I execute the same query twice (usually give me a
response time around 150 ms). 

In production environment, many requests may be very similar so that the
filter queries will be executed faster. However, these tests generate all
random requests, and is different than that of production environment.

In addition, the feature of "warming up cache" may not be applicable to my
test scenarios due to randomly generated requests for all tests. 

I tried to use other search solutions, and the performance was not good.
That was why I tried to use Solr. Now that I am using Solr, I would like to
know In a typical Solr project:

(1) if it is a good response time for this data size without taking too much
advantage of cache? 
(2) if it is possible to improve even further without data sharding? For
example, to get an average of  less than 200 ms response time

Additional information to share:
(1) The tests were done when the Solr instance was not indexing. CPU was
dedicated to the test and RAM was enough.

(2) most of the setting in solrconfig.xml are default. However, cache
setting were modified. 
Note, I think the autowarmCount setting may not be very beneficial to my
tests due to randomly generated requests. However, I still got >50% hit
ratio for filter queries. This is due to the limited values for some filter
queries.





 
   

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-a-good-query-performance-with-this-data-size-tp4223699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi All,

It looks like Numeric field can not be used for faceting if
docValues="true".

The following article seemed to indicate an issue in this scenario:

https://issues.apache.org/jira/browse/SOLR-7495

"Unexpected docvalues type NUMERIC when grouping by a int facet"





--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744p4221133.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi Upayavira,

A bit more explanation on DateDep.

This value in database is expressed as a varchar (8), and has the format of
20150803. I map it to be an SortableIntField before, and it worked with the
filter query and faceted search. 

After I changed it to be TrieIntField, tried re-indexing many times, and it
gave me all the same error message that I uploaded in the last post.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744p4220983.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieIntField not working in Solr 4.7 ?

2015-08-05 Thread wwang525
Hi Upayavira,

I edited the definition of tint to have a precisionStep=0 for DateDep 
(i.e.: departure date). This field is used as filter query and also used in
faceted search.

The following are definitions:




   

The following is the log message after I executed a query:


INFO  - 2015-08-05 08:19:52.090; org.apache.solr.core.SolrCore; [db-mssql]
webapp=/solr path=/select
params={facet=true&group.ngroups=true&sort=Price+asc&facet.mincount=1&facet.limit=800&wt=jason&group.facet=true&rows=30&debugQuery=true&facet.sort=count&q=*:*&group.field=nSoftVoyageCode&facet.field=DateDep&facet.field=HotelCode&facet.field=Collection&facet.field=GatewayCode&facet.field=StarRating&facet.field=MealplanCode&group=true&fq=GatewayCode:(YYZ+OR+BUF+OR+YVO+OR+YYT+OR+YBG+OR+YAM)&fq=DestCode:(FPO+OR+AUA+OR+VRA+OR+CCC+OR+MBJ+OR+CMW+OR+CYO)&fq=DateDep:([20150820+TO+20150920]+OR+[20150720+TO+20150820])&fq=Duration:(2+OR+4+OR+6+OR+12+OR+13)}
hits=584 status=500 QTime=27 
ERROR - 2015-08-05 08:19:52.091; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Exception during facet.field:
DateDep
at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:563)
at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:549)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.solr.request.SimpleFacets$1.execute(SimpleFacets.java:503)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:573)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:260)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:84)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Type mismatch: DateDep was
indexed as NUMERIC
at
org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1161)
at
org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1145)
at
org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.setNextReader(TermGroupFacetCollector.java:130)
at org.apache.lucene.search.IndexSearcher.

Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Upayavira,

I have physically cleaned up the files under index directory, and re-index
did not fix the problem.

The following is an example of the field definition:



and the following is the definition of tint



For some reason, I keep getting the error message:


Caused by: java.lang.IllegalStateException: Type mismatch: DateDep was
indexed as NUMERIC

I am on Solr 4.7. I edited the out-of-box solrconfig.xml for DIH example to
include necessary libraries:

  
  
  


  


  


  




Not sure if there is something that is missing.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744p4220840.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Upayavira,

My queries has all the features: search, sorting, grouping, faceting. As I
was working on the project, I noticed the response time of the query got
longer and longer as I added these features.

I was reading the solr-ref-guide-4.7, and the following is from page 66. I
thought covert the field types to Trie* may improve the performance:

"The standard way that Solr builds the index is with an inverted index. This
style builds a list of terms found in all the documents in the index and
next to each term is a list of documents that the term appears in (as well
as how many times the term appears in that document). This makes search very
fast - since users search by terms, having a ready list of term-to-document
values makes the query process faster.

For other features that we now commonly associate with search, such as
sorting, faceting, and highlighting, this approach is not very efficient.
The faceting engine, for example, must look up each term that appears in
each document that will make up the result set and pull the document IDs in
order to build the facet list. In Solr, this is maintained in memory, and
can be slow to load (depending on the number of documents, terms, etc.).

In Lucene 4.0, a new approach was introduced. DocValue fields are now
column-oriented fields with a document-to-value mapping built at index time.
This approach promises to relieve some of the memory requirements of the
fieldCache and make lookups for faceting, sorting, and grouping
much faster."

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744p4220821.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi Alex,

I waited until the indexing process finished successfully.

I also set default value for these fields and I can see from simply query
that the data was fine. The error happened after I execute a faceted query.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744p4220762.html
Sent from the Solr - User mailing list archive at Nabble.com.


TrieIntField not working in Solr 4.7 ?

2015-08-04 Thread wwang525
Hi All,

I was trying to switch the type definition for some fields from
SortableIntField to TrieIntField so that I may be able boost the performance
for the queries that use grouping, sorting, and faceting.

After I switched on field for grouping, I got the following error:

java.lang.IllegalStateException: Type mismatch: nSoftVoyageCode was indexed
as NUMERIC

The data from database is defined as int, and I did not have issue with
SortableIntField definition for this field. But apparently, it does not work
with TrieIntField.

I had same issue with faceting columns that are defined as TriedIntField,
but not SortableIntField. As for now, the only way I can make it work is to
define the field as solr.StrField even if it is a different data type in the
database.

Is it possible that I might have missed something?

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/TrieIntField-not-working-in-Solr-4-7-tp4220744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-13 Thread wwang525
Hi Erick,

I think this is good solution. It is going to work although I have not
implemented with Http API which I was able to find in
https://wiki.apache.org/solr/SolrReplication.

In my local machine, a total of 800MB of index files were "downloaded"
within a minute to another folder. However, transfer the index files across
network could be longer.

I will test it with two-machine scenario.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4217122.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-13 Thread wwang525
Hi Erick,

That status request shows if the Solr instance is "busy" or "idle". I think
this is a doable option to check if the indexing process completed (idle) or
not (busy).

Now, I have some concern about the solution of not using the default polling
mechanism from the slave instance to the master instance.

The load test showed that the initial batches of requests got much longer
response time than later batches after the Solr server was started up.
Gradually, the performance got much better, presumably due to the cache
being warmed up .

I understand that the indexing process will commit the changes and also auto
warms queries in the existing cache. In this case, the indexing Solr
instance will be in a good shape to serve the requests after the indexing
process is completed. 

The question:

When the slave instances poll the indexing instance (master), do these slave
instances also auto warm queries in the existing cache? If it does, then the
polling mechanism will also make the slave instance more ready to server
requests (more performant) at any time.

When we talk about the "forced replication" solution, are we pushing
/overwriting all the old index files with the new index files? do we need to
restart Solr instance? In addition, will slave instances warmed up in any
way?

If there are too many issues with the "force replication", I might as well
work out the "incremental indexing" option. 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4217102.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Planning Solr migration to production: clean and autoSoftCommit

2015-07-10 Thread wwang525
Hi Erick,

It is Solr 4.7. For the time being, we are considering the old style
master/slave configuration.

The re-indexing is going to be every 4 hours or even every 2 hours a day, so
it is not rare. Manually managing replication is not an option. Is there any
other easy-to-manage option ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736p4216744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Planning Solr migration to production: clean and autoSoftCommit

2015-07-10 Thread wwang525
Hi,

The following questions are about the basic configuration options in
production. 

We will have three machines: one indexing instance (master) and two Solr
instances (in different machines) for searching purpose. This way, we will
always have two Solr instances dedicated for executing search requests.

Right now, we are only considering re-build full index every once in a
while, so there will be no incremental indexing. 

I understand that the indexing instance can have the indexing parameter
"clean" to be set as true or false. If I set it as true, the search index in
the indexing instance will be cleaned up and anytime when I check the index,
it is going to grow.

The question is :

(1) Will the slave instance (for executing requests) get in sync with the
master if we set the "clean" to true? This is not what we would like it to
be since the search index will be clean up and grow. Customers will need to
wait for some period of time to search for the entire data pending the
completion of the indexing job

(2) The "autoSoftCommit" is supposed to make the update visible to search. I
also configured "autoSoftCommit" in solrconfig.xml in the master. When I set
the "clean" to true in the indexing job, what is the impact of this
parameter to the search requests executed in slave machine? 

Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Planning-Solr-migration-to-production-clean-and-autoSoftCommit-tp4216736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi,

The real production requests will not be randomly generated, and a lot of
requests will be repeated. I think the performance will be better due to the
repeated requests. In addition, I am sure the configuration will need to be
adjusted once the application is in production.

For the time being, I can drop the size of filterCache to 4096 or 2048 since
it is now only 1465 in the stats page.

I forgot to mention that the size I saw in the stats page for documentCache
is already 16384 after the test, and this is the configured size in
solrconfig.xml. This is why I was asking if I need to raise the number in
the configuration.

Is there any issue or will there be any performance improvement if I raise
up the size for documentCache?

Thanks






--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-determine-cache-setting-in-Solr-Search-Instance-tp4216562p4216591.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi All,

I did a load test with a total of 800 requests (at 40 concurrent requests
per second) to be executed against Solr index with 14 M records. Performance
was good (< 1 second) especially after a short period of time of the test.
BTW, the second round of load test was even better.

The local machine has a free memory of about 15 G during the load test

I observed the following from the stats page:

(1) documentCache reached the configured size for documentCache with a hit
ratio of 0.66
(2) filterCache has 2519 hits with a hit ratio of 0.63. The size is 1465
(less than a configured size: 16384)
(3) queryResultCache has a hit ratio of 0
(4) fieldValueCache has a hit ratio of 0

The following are the cache configuration in solrconfig.xml

 







It looks like I need to increase the size of documentCache. The hit ratio of
zero for queryResultCache and fieldValueCache was surprising (zero). Is it
possible that this is due to randomly generated requests? 

What are some guideline in tuning the cache parameter?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-determine-cache-setting-in-Solr-Search-Instance-tp4216562.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-07-02 Thread wwang525
Hi,

I worked with other search solutions before, and cache management is
important in boosting performance. Apart from the cache generated due to
user's requests, loading the search index into memory is the very initial
step after the index is built. This is to ensure search results to be
retrieved from memory, and not from disk I/O.

The observation is that if the search index has not been accessed for a long
time, the performance will be degraded greatly due to the swap of the search
index from memory to disk by OS.

Does Solr automatically loads search index into memory after the index is
built? Otherwise, is there any tool or command that can accomplish this
task. 

Regards




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi All,

I did many tests with very consistent test results. Each query was executed
after re-indexing, and only one request was sent to query the index. I
disabled filterCache and queryResultCache for this test based on Erick's
recommendation.

The test document was posted to this email list earlier. Briefly, the query
without grouping and faceting took about 60 ms, and grouping on top of the
same query adds about 15 ms. However, the faceting adds additional 70 ms,
brings it to 140 ms

The index size is only 1 M records. A 10 times of the record size (> 10M)
will likely bring the total response time to > 1 second for these two
queries. My goal is to make the query as performant as possible so that we
can achieve a < 1 second response time under load.

Is a 50 ms to 60 ms response time (single request scenario) a bit too long
for 1M records with Solr? Is the faceting taking too long  (70 ms)to
process?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Test_results_round_2.doc
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4215016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-30 Thread wwang525
Hi,

I am currently investigating the queries with a much small index size (1M)
to see the grouping, faceting on the performance degradation. This will
allow me to do a lot of tests in a short period of time.

However, it looks like the query is executed much faster the second time.
This is tested after re-indexing, and not immediately executed again. It
looks like it may be due to auto warming during or after re-indexing?

I would like to get the response profile (query, faceting etc) for the same
query in two separate requests without any cache or warming so that I get a
good average number and not much fluctuation. What are the settings that I
need to disable (temporarily) just for the purpose of the investigation? In
the solrconfig.xml, I can see filterCache, queryResultCache, documentCache
etc. I am not sure what need to be disabled to facilitate my work.

I understand that cache and warming setting will be very helpful in load
test later on. However, if I can optimize the query in a single request
scenario, the performance will be in a much better shape with all the cache
and warming setting during a load test scenario.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4214968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-25 Thread wwang525
schema.xml   
solrconfig.xml
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4213864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-24 Thread wwang525
Hi All,

I built the Solr index with 14 M records.

I have > 20 G RAM in my local machine, and the Solr instance was started
with -Xms1024m -Xmx8196m

The following query:

http://localhost:8983/solr/db-mssql/select?q=*:*&fq=GatewayCode:(YYZ)&fq=DestCode:(CUN)&fq=Duration:(5
OR 6 OR 7 OR 8)&fq=DateDep:([20150610 TO
20150810])&facet=true&facet.field=DestCode&facet.field=DateDep&facet.field=GatewayCode&facet.field=HotelName&facet.sort=count&facet.limit=40&facet.mincount=1&rows=30&group=true&group.field=HotelCode&group.ngroups=true&group.facet=true&debugQuery=true

The response found a total matched base records of 98105, these records were
grouped at hotelcode level to give the ngroups: 143, however, the query only
retrieve the first base record of each group, and only 30 groups were
retrieved.

The performance statistics:

Total response time in solr.log: 1791 ms
>From the query response page: the query took 764 ms and facet took 1007 ms.
Debug took 13 ms

This is a typical query that business need. Previously, I was testing the
data size of 6 M and no faceted search, the typical response time at single
request scenario was around 200 ms.

Please let me know if additional information is needed.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4213648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
The query without load is still under 1 second. But under load, response time
can be much longer due to the queued up query.

We would like to shard the data to something like 6 M / shard, which will
still give a under 1 second response time under load.

What are some best practice to shard the data? for example, we could shard
the data by date range, but that is pretty dynamic, and we could shard data
by some other properties, but if the data is not evenly distributed, you may
not be able shard it anymore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765p4212803.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to do a Data sharding for data in a database table

2015-06-18 Thread wwang525
Hi,

We probably would like to shard the data since the response time for
demanding queries at > 10M records is getting > 1 second in a single request
scenario.

I have not done any data sharding before. What are some recommended way to
do data sharding. For example, may be by a criteria with a list of specific
values?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-do-a-Data-sharding-for-data-in-a-database-table-tp4212765.html
Sent from the Solr - User mailing list archive at Nabble.com.