Re: Solr Suggest Component and OOM

2018-07-01 Thread Ratnadeep Rakshit
Has anyone ever been successful in processing 150M records using the
Suggester Component? The make of the component, please comment.

On Tue, Jun 26, 2018 at 1:37 AM, Ratnadeep Rakshit 
wrote:

> The site_address field has all the address of United states. Idea is to
> build something similar to Google Places autosuggest.
>
> Here's an example query: curl "http://localhost/solr/
> addressbook/suggest?suggest.q=1054%20club=json"
>
> Response:
>
> {
> "responseHeader": {
> "status": 0,
> "QTime": 3125,
> "params": {
> "suggest.q": "1054 club",
> "wt": "json"
> }
> },
> "suggest": {
> "mySuggester2": {
> "1054 club": {
> "numFound": 3,
> "suggestions": [{
> "term": "1054 null N COUNTRY CLUB null BLVD null STOCKTON CA
> 95204 5008",
> "weight": 0,
> "payload": "0023865882|06077|37.970769,-121.310433"
> }, {
> "term": "1054 null E HERITAGE CLUB null CIR null DELRAY
> BEACH FL 33483 3482",
> "weight": 0,
> "payload": "0117190535|12099|26.445485,-80.069336"
> }, {
> "term": "1054 null null CORAL CLUB null DR 1054 CORAL
> SPRINGS FL 33071 5657",
> "weight": 0,
> "payload": "0111342342|12011|26.243918,-80.267577"
> }]
> }
> },
> "mySuggester1": {
> "1054 club": {
> "numFound": 0,
> "suggestions": []
> }
> }
> }
> }
>
> Now when I start building with 25M address records in the addressbook
> core, the process runs smoothly. I can check the Heap utilization upto 56%
> max out of the 20GB allotted to Solr.
> I am not very experienced in metering solr performance. But it looks like
> when I increase the record size beyond 25M in the core, the build process
> fails. The query process of the suggester still works.
>
> Did that answer your questions correctly?
>
> On Tue, Jun 12, 2018 at 3:17 PM, Alessandro Benedetti <
> a.benede...@sease.io> wrote:
>
>> Hi,
>> first of all the two different suggesters you are using are based on
>> different data structures ( with different memory utilisation) :
>>
>> - FuzzyLookupFactory -> FST ( in memory and stored binary on disk)
>> - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index
>>
>> Both the data structures should be very memory efficient ( both in
>> building
>> and storage).
>> What is the cardinality of the fields you are building suggestions from ?
>> (
>> site_address and site_address_other)
>> What is the memory situation in Solr when you start the suggester
>> building ?
>> You are allocating much more memory to the JVM Solr process than the OS (
>> which in your situation doesn't fit the entire index ideal scenario).
>>
>> I would recommend to put some monitoring in place ( there are plenty of
>> open
>> source tools to do that)
>>
>> Regards
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: Solr Suggest Component and OOM

2018-06-25 Thread Ratnadeep Rakshit
The site_address field has all the address of United states. Idea is to
build something similar to Google Places autosuggest.

Here's an example query: curl "
http://localhost/solr/addressbook/suggest?suggest.q=1054%20club=json;

Response:

{
"responseHeader": {
"status": 0,
"QTime": 3125,
"params": {
"suggest.q": "1054 club",
"wt": "json"
}
},
"suggest": {
"mySuggester2": {
"1054 club": {
"numFound": 3,
"suggestions": [{
"term": "1054 null N COUNTRY CLUB null BLVD null STOCKTON CA
95204 5008",
"weight": 0,
"payload": "0023865882|06077|37.970769,-121.310433"
}, {
"term": "1054 null E HERITAGE CLUB null CIR null DELRAY BEACH
FL 33483 3482",
"weight": 0,
"payload": "0117190535|12099|26.445485,-80.069336"
}, {
"term": "1054 null null CORAL CLUB null DR 1054 CORAL
SPRINGS FL 33071 5657",
"weight": 0,
"payload": "0111342342|12011|26.243918,-80.267577"
}]
}
},
"mySuggester1": {
"1054 club": {
"numFound": 0,
"suggestions": []
}
}
}
}

Now when I start building with 25M address records in the addressbook core,
the process runs smoothly. I can check the Heap utilization upto 56% max
out of the 20GB allotted to Solr.
I am not very experienced in metering solr performance. But it looks like
when I increase the record size beyond 25M in the core, the build process
fails. The query process of the suggester still works.

Did that answer your questions correctly?

On Tue, Jun 12, 2018 at 3:17 PM, Alessandro Benedetti 
wrote:

> Hi,
> first of all the two different suggesters you are using are based on
> different data structures ( with different memory utilisation) :
>
> - FuzzyLookupFactory -> FST ( in memory and stored binary on disk)
> - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index
>
> Both the data structures should be very memory efficient ( both in building
> and storage).
> What is the cardinality of the fields you are building suggestions from ? (
> site_address and site_address_other)
> What is the memory situation in Solr when you start the suggester building
> ?
> You are allocating much more memory to the JVM Solr process than the OS (
> which in your situation doesn't fit the entire index ideal scenario).
>
> I would recommend to put some monitoring in place ( there are plenty of
> open
> source tools to do that)
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr Suggest Component and OOM

2018-06-14 Thread Ratnadeep Rakshit
Anyone from the Solr team who can shed some more light?

On Tue, Jun 12, 2018 at 8:13 PM, Ratnadeep Rakshit 
wrote:

> I observed that the build works if the data size is below 25M. The moment
> the records go beyond that, this OOM error shows up. Solar itself shows 56%
> usage of 20GB space during the build. So, is there some settings I need to
> change to handle larger data size?
>
> On Tue, Jun 12, 2018 at 3:17 PM, Alessandro Benedetti <
> a.benede...@sease.io> wrote:
>
>> Hi,
>> first of all the two different suggesters you are using are based on
>> different data structures ( with different memory utilisation) :
>>
>> - FuzzyLookupFactory -> FST ( in memory and stored binary on disk)
>> - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index
>>
>> Both the data structures should be very memory efficient ( both in
>> building
>> and storage).
>> What is the cardinality of the fields you are building suggestions from ?
>> (
>> site_address and site_address_other)
>> What is the memory situation in Solr when you start the suggester
>> building ?
>> You are allocating much more memory to the JVM Solr process than the OS (
>> which in your situation doesn't fit the entire index ideal scenario).
>>
>> I would recommend to put some monitoring in place ( there are plenty of
>> open
>> source tools to do that)
>>
>> Regards
>>
>>
>>
>> -
>> ---
>> Alessandro Benedetti
>> Search Consultant, R Software Engineer, Director
>> Sease Ltd. - www.sease.io
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>
>
>


Re: Solr Suggest Component and OOM

2018-06-12 Thread Ratnadeep Rakshit
I observed that the build works if the data size is below 25M. The moment
the records go beyond that, this OOM error shows up. Solar itself shows 56%
usage of 20GB space during the build. So, is there some settings I need to
change to handle larger data size?

On Tue, Jun 12, 2018 at 3:17 PM, Alessandro Benedetti 
wrote:

> Hi,
> first of all the two different suggesters you are using are based on
> different data structures ( with different memory utilisation) :
>
> - FuzzyLookupFactory -> FST ( in memory and stored binary on disk)
> - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index
>
> Both the data structures should be very memory efficient ( both in building
> and storage).
> What is the cardinality of the fields you are building suggestions from ? (
> site_address and site_address_other)
> What is the memory situation in Solr when you start the suggester building
> ?
> You are allocating much more memory to the JVM Solr process than the OS (
> which in your situation doesn't fit the entire index ideal scenario).
>
> I would recommend to put some monitoring in place ( there are plenty of
> open
> source tools to do that)
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Solr Suggest Component and OOM

2018-06-12 Thread Ratnadeep Rakshit
Can anyone put some light on this?

On Tue, Jun 12, 2018 at 12:32 AM, Ratnadeep Rakshit 
wrote:

> Here's the stack trace :
>
> 538  ERROR - 2018-06-07 09:07:36.030; [   x:addressbook]
> org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
> java.lang.OutOfMemory
>
> Error: Java heap space
>
>539  at org.apache.solr.servlet.HttpSolrCall.sendError(
> HttpSolrCall.java:607)
>
>540  at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:475)
>
>541  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:257)
>
>542  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:208)
>
>543  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1652)
>
>544  at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:585)
>
>545  at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
>
>546  at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:577)
>
>547  at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:223)
>
>548  at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1127)
>
>549  at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:515)
>
>550  at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
>
>551  at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1061)
>
>552  at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
>
>553  at org.eclipse.jetty.server.handler.
> ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>
>554  at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:110)
>
>555  at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:97)
>
>556  at org.eclipse.jetty.server.Server.handle(Server.java:499)
>
>557  at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:310)
>
>558  at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:257)
>
>559  at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:540)
>
>560  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:635)
>
>561  at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:555)
>
>562  at java.lang.Thread.run(Thread.java:745)
>
>563  Caused by: java.lang.OutOfMemoryError: Java heap space
>
>564  at org.apache.lucene.util.packed.
> Packed64.(Packed64.java:73)
>
>565  at org.apache.lucene.util.packed.PackedInts.getMutable(
> PackedInts.java:1009)
>
>566  at org.apache.lucene.util.packed.PackedInts.getMutable(
> PackedInts.java:976)
>
>567  at org.apache.lucene.util.packed.
> GrowableWriter.ensureCapacity(GrowableWriter.java:80)
>
>568  at org.apache.lucene.util.packed.GrowableWriter.set(
> GrowableWriter.java:88)
>
>569  at org.apache.lucene.util.packed.AbstractPagedMutable.set(
> AbstractPagedMutable.java:101)
>
>570  at org.apache.lucene.util.fst.
> NodeHash.addNew(NodeHash.java:152)
>
>571  at org.apache.lucene.util.fst.
> NodeHash.rehash(NodeHash.java:169)
>
>572  at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:
> 133)
>
>573  at org.apache.lucene.util.fst.Builder.compileNode(Builder.
> java:215)
>
>574  at org.apache.lucene.util.fst.Builder.freezeTail(Builder.
> java:310)
>
>575  at org.apache.lucene.util.fst.
> Builder.add(Builder.java:417)
>
>576  at org.apache.lucene.search.suggest.analyzing.
> AnalyzingSuggester.build(AnalyzingSuggester.java:565)
>
>577  at org.apache.lucene.search.suggest.Lookup.build(Lookup.
> java:193)
>
>578  at org.apache.solr.spelling.suggest.SolrSuggester.build(
> SolrSuggester.java:176)
>
> 576  at org.apache.lucene.search.suggest.analyzing.
> AnalyzingSuggester.build(AnalyzingSuggester.java:565)
>
>577  at org.apache.lucene.search.suggest.Lookup.build(Lookup.
> java:193)
>
>578  at org.apache.solr.spelling.suggest.SolrSuggester.build(
> SolrSuggester.java:176)
>
>579  

Re: Solr Suggest Component and OOM

2018-06-11 Thread Ratnadeep Rakshit
)

   587  at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

   588  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

   589  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

   590  at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

   591  at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

   592  at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

   593  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

   594  at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

   595  at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

   596

   597  WARN  - 2018-06-07 09:07:36.053; [   x:addressbook]
org.eclipse.jetty.servlet.ServletHandler; Error for
/solr/addressbook/suggest

   598  java.lang.OutOfMemoryError: Java heap space

   599  at
org.apache.lucene.util.packed.Packed64.(Packed64.java:73)

   600  at
org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:1009)

   601  at
org.apache.lucene.util.packed.PackedInts.getMutable(PackedInts.java:976)

   602  at
org.apache.lucene.util.packed.GrowableWriter.ensureCapacity(GrowableWriter.java:80)

   603  at
org.apache.lucene.util.packed.GrowableWriter.set(GrowableWriter.java:88)

   604  at
org.apache.lucene.util.packed.AbstractPagedMutable.set(AbstractPagedMutable.java:101)

   605  at
org.apache.lucene.util.fst.NodeHash.addNew(NodeHash.java:152)

   606  at
org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:169)

   607  at
org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:133)

   608  at
org.apache.lucene.util.fst.Builder.compileNode(Builder.java:215)

   609  at
org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:310)

   610  at org.apache.lucene.util.fst.Builder.add(Builder.java:417)

   611  at
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:565)

   612  at
org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)

   613  at
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:176)

   614  at
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)

   615  at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:246)

   616  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)

   617  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)

   618  at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)

619  at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)

   620  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)

   621  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)

   622  at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

   623  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

   624  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

   625  at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

   626  at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

   627  at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

   628  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

   629  at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

   630  at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)




On Mon, Jun 11, 2018 at 11:34 PM, Christopher Schultz <
ch...@christopherschultz.net> wrote:

> Ratnadeep,
>
> On 6/11/18 12:25 PM, Ratnadeep Rakshit wrote:
> > I am using the Solr Suggester component in Solr 5.5 with a lot of address
> > data. My Machine has allotted 20Gb RAM for solr and the machine has 32GB
> > RAM in total.
> >
> > I have an address book core with the following vitals -
> >
> > "numDocs"=153242074
> > "segmentCount"=34
> > "size"=30.29 GB
> >
> > My solrconfig.xml looks something like this -
> >
> > 
> > 
> >   mySuggester1
> >   FuzzyLookupFactory
> >   suggester_fuzzy_dir
> >
> >   
> >
> >   DocumentDictionaryFactory
> >   site_address
> >   s

Solr Suggest Component and OOM

2018-06-11 Thread Ratnadeep Rakshit
I am using the Solr Suggester component in Solr 5.5 with a lot of address
data. My Machine has allotted 20Gb RAM for solr and the machine has 32GB
RAM in total.

I have an address book core with the following vitals -

"numDocs"=153242074
"segmentCount"=34
"size"=30.29 GB

My solrconfig.xml looks something like this -



  mySuggester1
  FuzzyLookupFactory
  suggester_fuzzy_dir

  

  DocumentDictionaryFactory
  site_address
  suggestType
  property_metadata
  false
  false


  mySuggester2
  AnalyzingInfixLookupFactory
  suggester_infix_dir

  DocumentDictionaryFactory
  site_address_other
  suggestType
  property_metadata
  false
  false



The handler is defined like so -



  true
  10
  mySuggester1
  mySuggester2
  false
  explicit


  suggest



*Problem Statement*

Every time I try to build the suggest index using the suggest.build=true
url parameter, I end up with an OutOfMemory error. I have no clue how I can
make this work with the current setup. Can anyone explain why this is
happening? And how can I fix this issue?
*StackOverflow:*
https://stackoverflow.com/questions/50802122/solr-suggest-component-and-outofmemory-error