Hi Tom,

I believe Solr will automatically use DocValues for faceting if you've
defined them in the schema.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 11:33 AM, Tom Burton-West <tburt...@umich.edu> wrote:
> Thanks Otis,
>
>  I'm looking forward to the presentation videos.
>
> I'll look into using DocValues.    Re-indexing 200 million docs will take a
> while though :).
> Will Solr automatically use DocValues for faceting if you have DocValues for
> the field or is there some configuration or parameter that needs to be set?
>
> Tom
>
>
> On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic
> <otis.gospodne...@gmail.com> wrote:
>>
>> Hi Tom,
>>
>> Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/
>> .  It includes info about our experiment with DocValues, which clearly
>> shows lower heap usage, which means you'll get further without getting
>> this OOM.  In our experiments we didn't sort, facet, or group, and I
>> see you are faceting, which means that DocValues, which are more
>> efficient than FieldCache, should help you even more than it helped
>> us.
>>
>> The graphs are from SPM, which you could use to monitor your Solr
>> cluster, at least while you are tuning it.
>>
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West <tburt...@umich.edu>
>> wrote:
>> > Hi Yonik,
>> >
>> > I don't know enough about JVM tuning and monitoring to do this in a
>> > clean
>> > way, so I just tried setting the max heap at 8GB and then 6GB to force
>> > garbage collection.  With it set to 6GB it goes into  a long GC loop and
>> > then runs out of heap (See below) .  The stack trace says the issue is
>> > with
>> > DocTErmOrds.uninvert:
>> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
>> >
>> >  I'm guessing the actual peak is somewhere between 6 and 8 GB.
>> >
>> > BTW: is there some documentation somewhere that explains what the stats
>> > output to INFO mean?
>> >
>> > Tom
>> >
>> >
>> > java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
>> > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
>> > overhead limit exceeded
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>> > at
>> >
>> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>> > at
>> >
>> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>> > at
>> >
>> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>> > at
>> >
>> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>> > at
>> > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
>> > at
>> >
>> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>> > at
>> >
>> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>> > at
>> >
>> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>> > at
>> >
>> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>> > at
>> >
>> > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>> > at
>> >
>> > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>> > at
>> >
>> > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>> > at
>> >
>> > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>> > at
>> >
>> > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>> > at java.lang.Thread.run(Thread.java:724)
>> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
>> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
>> > at
>> > org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
>> > at
>> >
>> > org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
>> > at
>> > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
>> > at
>> >
>> > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
>> > at
>> >
>> > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
>> > at
>> >
>> > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
>> > at
>> >
>> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
>> > at
>> >
>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>> > at
>> >
>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>> > ... 16 more
>> > </str>
>> >
>> > ---
>> > Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
>> > INFO: UnInverted multi-valued field {field=topicStr,
>> > memSize=1,768,101,824,
>> > tindexSize=86,028,
>> > time=45,854,
>> > phase1=41,039,
>> > nTerms=271,987,
>> > bigTerms=0,
>> > termInstances=569,429,716,
>> > uses=0}
>> > Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute
>> >
>> > INFO: [core] webapp=/dev-3 path=/select
>> >
>> > params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
>> > hits=138,605,690 status=0 QTime=49,797
>> >
>> >
>> >
>> > On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yo...@heliosearch.com>
>> > wrote:
>> >>
>> >> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tburt...@umich.edu>
>> >> wrote:
>> >> > When testing an index of about 200 million documents, when we do a
>> >> > first
>> >> > faceting on one field (query appended below), the memory use rises
>> >> > from
>> >> > about 2.5 GB to 13GB.  If I run GC after the query the memory use
>> >> > goes
>> >> > down
>> >> > to about 3GB and subsequent queries don't significantly increase the
>> >> > memory
>> >> > use.
>> >>
>> >> Is there a way to tell what the real max memory usage is?  I assume
>> >> 13GB is just the peak heap usage, but that could include a lot of
>> >> garbage.
>> >>
>> >> -Yonik
>> >> http://heliosearch.com -- making solr shine
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to