Re: 8.0 upgrade issue
Hi Shawn, The GC seems to be the issue, changing back to CMS worked and I see in the G1 docs it states it really doesn’t work well for small heap sizes. We’ll be changing to a better resourced 64-bit VM with more memory later next year with the next Ubuntu LTS release, so it should cease to be a problem after that. Thanks for the help. Scott. > On 11 Jul 2019, at 12:20 pm, Shawn Heisey wrote: > > On 6/19/2019 7:15 PM, Scott Yeadon wrote: >> I’m running Solr on Ubuntu 18.04 (32-bit) using OpenJDK 10.0.2. Up until now >> I have had no problem with Solr (started running it since 4.x), however >> after upgrading from 7.x to 8.x I am getting serious memory issues. >> I have a small repository of 30,000 documents currently using Solr 7.1 for >> the search function (for the last two years without issue). I attempted an >> upgrade to 8.1.1 and tried to perform a full reindex however, it manages >> about 1000 documents and then dies from lack of memory (or so it says). I >> tried 8.1.0 with the same result. I then tried 8.0.0 which did successfully >> manage a full reindex but then after performing a couple of search queries >> died from lack of memory. I then tried 7.7.2 which worked fine. I have now >> gone back to my original 7.1 as I can’t risk 8.x in my production system. >> Has anyone else had these issues with 8.x? >> Note that I did increase Xmx to 1024m (previously 512m) but that made no >> difference, it may be some other resource than memory, but if it is, it >> isn’t saying so, and it’s such a small repository it doesn’t seem to make >> sense to be running out of memory. > > Solr 8 has switched the garbage collector from CMS to G1, because CMS is > deprecated in newer versions of Java, and will be removed in the near future. > > G1 is a more efficient collector, but it does require somewhat more memory > beyond the heap than CMS does. For most users, this is not a problem, but > for the small heap values and total system memory you're using, it might be > enough to go over the threshold. > > You could try setting the old 7.x GC_TUNE settings in your include file, > normally named solr.in.sh on non-windows platforms. > > GC_TUNE=('-XX:NewRatio=3' \ >'-XX:SurvivorRatio=4' \ >'-XX:TargetSurvivorRatio=90' \ >'-XX:MaxTenuringThreshold=8' \ >'-XX:+UseConcMarkSweepGC' \ >'-XX:ConcGCThreads=4' '-XX:ParallelGCThreads=4' \ >'-XX:+CMSScavengeBeforeRemark' \ >'-XX:PretenureSizeThreshold=64m' \ >'-XX:+UseCMSInitiatingOccupancyOnly' \ >'-XX:CMSInitiatingOccupancyFraction=50' \ >'-XX:CMSMaxAbortablePrecleanTime=6000' \ >'-XX:+CMSParallelRemarkEnabled' \ >'-XX:+ParallelRefProcEnabled' \ >'-XX:-OmitStackTraceInFastThrow') > > I would probably also use Java 8 rather than Java 10. Java 10 is not an LTS > version, and the older version might require a little bit less memory, which > is a premium resource on your setup. Upgrading to Java 11, the next LTS > version, would likely require even more memory. > > Why are you running a 32-bit OS with such a small memory size? It's not > possible to use heap sizes much larger than 1.5 GB on a 32-bit OS. There are > also some known bugs with running Lucene-based software on 32-bit Java -- and > one of them is specifically related to the G1 collector. > > Thanks, > Shawn
8.0 upgrade issue
Hi, I’m running Solr on Ubuntu 18.04 (32-bit) using OpenJDK 10.0.2. Up until now I have had no problem with Solr (started running it since 4.x), however after upgrading from 7.x to 8.x I am getting serious memory issues. I have a small repository of 30,000 documents currently using Solr 7.1 for the search function (for the last two years without issue). I attempted an upgrade to 8.1.1 and tried to perform a full reindex however, it manages about 1000 documents and then dies from lack of memory (or so it says). I tried 8.1.0 with the same result. I then tried 8.0.0 which did successfully manage a full reindex but then after performing a couple of search queries died from lack of memory. I then tried 7.7.2 which worked fine. I have now gone back to my original 7.1 as I can’t risk 8.x in my production system. Has anyone else had these issues with 8.x? Note that I did increase Xmx to 1024m (previously 512m) but that made no difference, it may be some other resource than memory, but if it is, it isn’t saying so, and it’s such a small repository it doesn’t seem to make sense to be running out of memory. Scott.
Re: Query on multivalue field
Thanks, but just to confirm the way multiValued fields work: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (where solrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Scott. On 2/03/11 12:35 AM, Steven A Rowe wrote: Hi Scott, Querying against a multi-valued field just works - no special incantation required. Steve -Original Message- From: Scott Yeadon [mailto:scott.yea...@anu.edu.au] Sent: Monday, February 28, 2011 11:50 PM To:solr-user@lucene.apache.org Subject: Query on multivalue field Hi, I have a variable number of text-based fields associated with each primary record which I wanted to apply a search across. I wanted to avoid the use of dynamic fields if possible or having to create a different document type in the index (as the app is based around the primary record and different views mean a lot of work to revamp pagination etc). So, is there a way to apply a query to each value of a multivalued field or is it always treated as a single field from a query perspective? Thanks. Scott.
Re: Query on multivalue field
The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (wheresolrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Re: Query on multivalue field
Tested it out and seems to work well as long as I set the gap to a value much longer than the text - 1 appear to work fine for our current data. Thanks heaps for all the help guys! Scott. On 2/03/11 11:13 AM, Jonathan Rochkind wrote: Each token has a position set on it. So if you index the value alpha beta gamma, it winds up stored in Solr as (sort of, for the way we want to look at it) document1: alpha:position 1 beta:position 2 gamma: postition 3 If you set the position increment gap large, then after one value in a multi-valued field ends, the position increment gap will be added to the positions for the next value. Solr doesn't actually internally have much of any idea of a multi-valued field, ALL a multi-valued indexed field is, is a position increment gap seperating tokens from different 'values'. So index in a multi-valued field, with position increment gap 1, the values: [alpha beta gamma, aleph bet], you get kind of like: document1: alpha: 1 beta: 2 gamma: 3 aleph: 10004 bet: 10005 A large position increment gap, as far as I know and can tell (please someone correct me if I'm wrong, I am not a Solr developer) has no effect on the size or efficiency of your index on disk. I am not sure why positionIncrementGap doesn't just default to a very large number, to provide behavior that more matches what people expect from the idea of a multi-valued field. So maybe there is some flaw in my understanding, that justifies some reason for it not to be this way? But I set my positionIncrementGap very large, and haven't seen any issues. On 3/1/2011 5:46 PM, Scott Yeadon wrote: The only trick with this is ensuring the searches return the right results and don't go across value boundaries. If I set the gap to the largest text size we expect (approx 5000 chars) what impact does such a large value have (i.e. does Solr physically separate these fragments in the index or just apply the figure as part of any query? Scott. On 2/03/11 9:01 AM, Ahmet Arslan wrote: In a multiValued field, call it field1, if I have two values indexed to this field, say value 1 = some text...termA...more text and value 2 = some text...termB...more text and do a search such as field1:(termA termB) (wheresolrQueryParser defaultOperator=AND/) I'm getting a hit returned even though both terms don't occur within a single value in the multiValued field. What I'm wondering is if there is a way of applying the query against each value of the field rather than against the field in its entirety. The reason being is the number of values I want to store is variable and I'd like to avoid the use of dynamic fields or restructuring the index if possible. Your best bet can be using positionIncrementGap and to issue a phrase query (implicit AND) with the appropriate slop value. Ff you have positionIncrementGap=100, you can simulate this with using q=field1:termA termB~100 http://search-lucene.com/m/Hbdvz1og7D71/
Query on multivalue field
Hi, I have a variable number of text-based fields associated with each primary record which I wanted to apply a search across. I wanted to avoid the use of dynamic fields if possible or having to create a different document type in the index (as the app is based around the primary record and different views mean a lot of work to revamp pagination etc). So, is there a way to apply a query to each value of a multivalued field or is it always treated as a single field from a query perspective? Thanks. Scott.
relational db mapping for advanced search
Hi, I was just after some advice on how to map some relational metadata to a solr index. The web application I'm working on is based around people and the searching based around properties of these people. Several properties are more complex - for example, a person's occupations have place, from/to dates and other descriptive text; texts about a person have authors, sources and publication dates. Despite the usefulness of facets and the search-based navigation, an advanced search feature is a non-negotiable required feature of the application. An advanced search needs to be able to query a person on any set of attributes (e.g. gender, birth date, death date, place of birth) etc including the more complex search criteron as described above (occupation, texts). Taking occupation as an example, because occupation has its own metadata and a person could have worked an arbitrary number of occupations throughout their lifetime, I was wondering how/if this information can be denormalised into a single person index document to support such a search. I can't use text concatenation in a multivalued field as I need to be able to run date-based range queries (e.g. publication dates, occupation dates). And I'm not sure that resorting to multiple repeated fields based on the current limits (e.g. occ1, occ1startdate, occ1enddate, occ1place, occ2, etc) is a good approach (although that would work). If there isn't a sensible way to denormalise this, what is the best approach? For example, should I have an occupation document type, a person document type, a text/source document type and (in an advanced search context) each containing the relevant person id and (in the advanced search context) run a query against each document type and then use the intersecting set of person ids as the result used by the application for its display/pagination? And if so, how do I ensure I capture all records - for example if there are 100,000 hits on someone having worked in Australia in 1956, is there any way to ensure all 100,000 are returned in a query (similar to the facet.limit = -1) other than specifying an arbitrary high number in the rows parameter and hoping a query doesn't hit more than 100,000 and thus exclude those above the limit from the intersect processing? Or is there a single query solution? Any advice/hints welcome. Scott.
Re: relational db mapping for advanced search
Yes, I saw something in the dev stream about compound types as well which would also be useful (so in my example an occupation field could comprise of multiple fields of different types) but these are up and coming features. I suspect using multiple document types is probably the best way for now, but thanks for the heads up on the join - looks like these issues will be better addressed in the future. RDBMS in my context won't work well as requires lots of joins (and self-joins) for complex searches and in the old system these tend to lock up the DB as the temp table size grows exponentially. Scott. On 9/02/11 8:57 AM, Jonathan Rochkind wrote: I have no great answer for you, this is to me a generally unanswered question, hard to do Solr with this sort of thing, I think you seem to understand it properly. There ARE some interesting new features in trunk (not 1.4) that may be relevant, although to my perspective none of them provide magic bullet solutions. But there is a 'join' feature which could be awfully useful with the setup you suggest of having different 'types' of documents all together in the same index. https://issues.apache.org/jira/browse/SOLR-2272 From: Scott Yeadon [scott.yea...@anu.edu.au] Sent: Tuesday, February 08, 2011 4:41 PM To: solr-user@lucene.apache.org Subject: relational db mapping for advanced search Hi, I was just after some advice on how to map some relational metadata to a solr index. The web application I'm working on is based around people and the searching based around properties of these people. Several properties are more complex - for example, a person's occupations have place, from/to dates and other descriptive text; texts about a person have authors, sources and publication dates. Despite the usefulness of facets and the search-based navigation, an advanced search feature is a non-negotiable required feature of the application. An advanced search needs to be able to query a person on any set of attributes (e.g. gender, birth date, death date, place of birth) etc including the more complex search criteron as described above (occupation, texts). Taking occupation as an example, because occupation has its own metadata and a person could have worked an arbitrary number of occupations throughout their lifetime, I was wondering how/if this information can be denormalised into a single person index document to support such a search. I can't use text concatenation in a multivalued field as I need to be able to run date-based range queries (e.g. publication dates, occupation dates). And I'm not sure that resorting to multiple repeated fields based on the current limits (e.g. occ1, occ1startdate, occ1enddate, occ1place, occ2, etc) is a good approach (although that would work). If there isn't a sensible way to denormalise this, what is the best approach? For example, should I have an occupation document type, a person document type, a text/source document type and (in an advanced search context) each containing the relevant person id and (in the advanced search context) run a query against each document type and then use the intersecting set of person ids as the result used by the application for its display/pagination? And if so, how do I ensure I capture all records - for example if there are 100,000 hits on someone having worked in Australia in 1956, is there any way to ensure all 100,000 are returned in a query (similar to the facet.limit = -1) other than specifying an arbitrary high number in the rows parameter and hoping a query doesn't hit more than 100,000 and thus exclude those above the limit from the intersect processing? Or is there a single query solution? Any advice/hints welcome. Scott.
case insensitive sort and LowerCaseFilterFactory
Hi, I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the sorting of results. According to this page http://web.archiveorange.com/archive/v/AAfXfzy5Tm1uDy5mYW3B I should be able to configure the LowerCaseFilterFactory to ensure results will be indexed and returned in a case insensitive means, however this does not appear to be working for me. Is someone able to check my field config to confirm it is ok (and if anyone has any advice on making this work it would be appreciated - my issue is the same as that in the provided link (that is, upper case and lower case are being ordered separately instead of being interspersed). The sort field I'm using is of type text as defined below. The text field type is configured as follows: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType When I sort on a primaryName field (which is a text field as define above) for example, I get records listed out of order as in the following example: - Withers, Alfred Robert (1863–1956) - Young, Charles (1838–1916) - de Little, Ernest (1868–1926) - de Pledge, Thomas Frederick (1867–1954) - von Bibra, William (1876–1926) I imagine I'm missing something obvious, the obvious workaround is a namesort field however from the above post it looks like this can be avoided. Scott.
Re: case insensitive sort and LowerCaseFilterFactory
Sorry, looks like it was a data-related issue, apologies for the noise (although if anyone spots anything dodgy in the config feel free to let me know). Scott. On 18/11/10 2:21 PM, Scott Yeadon wrote: Hi, I'm running solr-tomcat 1.4.0 on Ubuntu and have an issue with the sorting of results. According to this page http://web.archiveorange.com/archive/v/AAfXfzy5Tm1uDy5mYW3B I should be able to configure the LowerCaseFilterFactory to ensure results will be indexed and returned in a case insensitive means, however this does not appear to be working for me. Is someone able to check my field config to confirm it is ok (and if anyone has any advice on making this work it would be appreciated - my issue is the same as that in the provided link (that is, upper case and lower case are being ordered separately instead of being interspersed). The sort field I'm using is of type text as defined below. The text field type is configured as follows: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType When I sort on a primaryName field (which is a text field as define above) for example, I get records listed out of order as in the following example: - Withers, Alfred Robert (1863–1956) - Young, Charles (1838–1916) - de Little, Ernest (1868–1926) - de Pledge, Thomas Frederick (1867–1954) - von Bibra, William (1876–1926) I imagine I'm missing something obvious, the obvious workaround is a namesort field however from the above post it looks like this can be avoided. Scott.
Re: [PECL-DEV] Re: PHP Solr API
Thanks. Not sure what the value should be (assume it is the servlet name, but is there a default servlet name for term vectors? - the docs don't really say much, so any guidance useful). It also looks like using the ModifiableParams returns only a single offset for each term i.e. if tf 1 there still only looks to be a single offset is returned rather than details of all three occurrences. I logged some output to see all the properties returned and the snippet showing offsets is: [01-Oct-2010 15:42:30] class is SolrObject name is sit [01-Oct-2010 15:42:30] value=6 propname=tf [01-Oct-2010 15:42:30] class is SolrObject name is offsets [01-Oct-2010 15:42:30] value=1171 propname=start [01-Oct-2010 15:42:30] value=1174 propname=end So in the above example tf=6 but only a single start and end property is returned in the sit SolrObject. Note that I haven't looked any further into this at this stage and I may be missing something obvious. Scott. On 1/10/10 9:43 PM, Israel Ekpo wrote: Scott, You can also use the SolrClient::setServlet() method with SolrClient::TERMS_SERVLET_TYPE as the type http://www.php.net/manual/en/solrclient.setservlet.php On Fri, Oct 1, 2010 at 12:57 AM, Scott Yeadon scott.yea...@anu.edu.au mailto:scott.yea...@anu.edu.au wrote: Hi, Sorry, scrap that, just found that SolrQuery is a subclass of ModifiableParams so can do this via add method and seems to work ok. Apologies for the noise. Scott. On 1/10/10 2:35 PM, Scott Yeadon wrote: Hi, Just wondering if there is a way of setting the qt parameter in the Solr PHP API. I want to use the Term Vector Component but not sure this is supported in the API? Thanks Scott. -- PECL development discussion Mailing List (http://pecl.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
PHP Solr API
Hi, I have inherited an application which uses Solr search and the PHP Solr API (http://pecl.php.net/package/solr). While the list of search results with appropriate highlighting is all good, when selecting a result that navigates to an individual article the users want to have all the hits highlighted in the full text. The problem is that the article text is HTML and Solr appears to strip the HTML by default. The highlight snippets contain no formatting and neither does the stored version of the text. This means that using a large snippet size and using the returned text as the article text is not satisfactory, nor is using the stored version returned by the return response. Obtaining offset information from the search and applying the highlighting myself within the webapp using the HTML version would be fine, but the offsets will be wrong due to the stripping of the tags. Does anyone have any advice on how I might get this to work, it doesn't seem to be a particularly unusual use case yet I could not find information on how to achieve it. It's likely I'm overlooking something simple. Anyone have any advice? Thanks. Scott.
Re: PHP Solr API
Thanks, but I still need to store text at any rate in order to get the highlighted snippets for the search results list. This isn't a problem. The issue is how to obtain correct offsets or other mechanisms for being able to display the original HTML text plus term highlighting when navigating to an individual search result. Scott. On 1/10/10 12:53 PM, Neil Lunn wrote: On Fri, 2010-10-01 at 12:00 +1000, Scott Yeadon wrote: Hi, The problem is that the article text is HTML and Solr appears to strip the HTML by default. I think what you need to look at is how the fields are defined by default in your schema. If Data sent as HTML is being added to the standard html-text type and stored then the html is stripped and words indexed by default. If you want to store the raw html then maybe you should be doing that and not storing the stripped version, just indexing it.
Re: Highlighting match term in bold rather than italic
Check out http://wiki.apache.org/solr/HighlightingParameters and the hl.simple.pre/hl.simple.post options You may be also able to control the display of the default em/ via CSS but will depend on your rendering context as to whether this is feasible. Scott. On 1/10/10 7:54 AM, efr...@gmail.com wrote: Hi all - Does anyone know how to produce solr results where the match term is highlighted in bold rather than italic? thanks in advance, Brad