Re: Solr-Ajax client
On 3/11/2014 11:48 PM, Davis Marques wrote: Just a quick announcement and request for guidance: I've developed an open source, Javascript client for Apache Solr. Its very easy to implement and can be configured to provide faceted search to an existing Solr index in just a few minutes. The source is available online here: https://bitbucket.org/esrc/eaccpf-ajax I attempted to add a note about it into the Solr wiki, at https://wiki.apache.org/solr/IntegratingSolr, but was prevented by the system. Is there some protocol for posting information to the wiki? Just give us your username on the wiki and someone will get you added. Note that it is case sensitive, at least when adding it to the permission group. This is a nice bit of work that you've done, but I'm sure you know that it is inherently unsafe to use a javascript Solr client on a website that is accessible to the Internet. Exposing a Solr server directly to the Internet is a bad idea. Do you offer any documentation telling potential users how to configure a proxy server to protect Solr? It looks like the Solr server in your online demo is protected by nginx. I'm sure that building its configuration was not a trivial task. Thanks, Shawn
Re: Partial Counts in SOLR
As Hoss pointed out above, different projects have different requirements. Some want to sort by date of ingestion reverse, which means that having posting lists organized in a reverse order with the early termination is the way to go (no such feature in Solr directly). Some other projects want to collect all docs matching a query, and then sort by rank, but you cannot guarantee, that the most recently inserted document is the most relevant in terms of your ranking. Do your current searches take too long? On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Its a long video and I will definitely go through it but it seems this is not possible with SOLR as it is? I just thought it would be quite a common issue; I mean generally for search engines its more important to show the first page results, rather than using timeAllowed which might not even return a single result. Thanks! -- Regards, Salman Akram -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: PHP Solr Client - spellchecker
Thank you very much Shawn, Now i'm trying to use prefix for my suggestion instead. Best regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/PHP-Solr-Client-spellchecker-tp4122780p4123054.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards
I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=timing -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Dienstag, 11. März 2014 14:25 To: solr-user@lucene.apache.org Subject: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards The usual use of an ngram filter is at index time and not at query time. What exactly are you trying to achieve by using ngram filtering at query time as well as index time? Generally, it is inappropriate to combine the word delimiter filter with the standard tokenizer - the later removes the punctuation that normally influences how WDF treats the parts of a token. Use the white space tokenizer if you intend to use WDF. Which query parser are you using? What fields are being queried? Please post the parsed query string from the debug output - it will show the precise generated query. I think what you are seeing is that the ngram filter is generating tokens like h_cugtest and then the WDF is removing the underscore and then h gets generated as a separate token. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Tuesday, March 11, 2014 5:09 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I got it roght the first time and here is my requesthandler. The field plain_text is searched correctly and has the sam fieldtype as title - text_de queryParser name=synonym_edismax class=solr.SynonymExpandingExtendedDismaxQParserPlugin lst name=synonymAnalyzers lst name=myCoolAnalyzer lst name=tokenizer str name=classstandard/str /lst lst name=filter str name=classshingle/str str name=outputUnigramsIfNoShinglestrue/str str name=outputUnigramstrue/str str name=minShingleSize2/str str name=maxShingleSize4/str /lst lst name=filter str name=classsynonym/str str name=tokenizerFactorysolr.KeywordTokenizerFactory/str str name=synonymssynonyms.txt/str str name=expandtrue/str str name=ignoreCasetrue/str /lst /lst /lst /queryParser requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str
Solr read-only mode with same datadir: commits are not working.
Hey guys, I've doing some tests sharing the same index between three Solr servers: *SolrA*: is allowed to both read and index. The index is stored in a NFS. It has its own configuration files. *SolrB and SolrC*: they can only read from the shared index and each one has their own configuration files. Solrconfig.xml has been changed with the following parameters: lockTypesingle/lockType When all servers startup they all work perfectly executing search operations. The problem appears when SolrA index new documents (commiting itself afther that indexation operation). If I manually execute a commit or a softCommit to SolrB or SolrC, they are not able to see the new documents added even if it is suposed to reopen a new searcher when a commit occurs. I have noticed that a commit operation in SolrA shows different segments (the newest ones) compared with the logs that SorlB/SolrC has after a commit. In other words, SolrA shows newer segments and SolrB/SolrC appears to see just the old ones. Is that normal? Any idea or suggestion to solve this? Thank you in advance, :-) Best regards, -- - Luis Cappa
Re: Solr read-only mode with same datadir: commits are not working.
I've seen that StandardDirectoryReader appears in the commit logs. Maybe this DirectoryReader type is caching somehow the old segments in SolrB and SolrC even if they have been commited previosly. If that's true, does exist any other DirectoyReader type (I don't know, SimpleDirectoryReader or FSDirectoyReader) that always read the current segments when a commit happens? 2014-03-12 11:35 GMT+01:00 Luis Cappa Banda luisca...@gmail.com: Hey guys, I've doing some tests sharing the same index between three Solr servers: *SolrA*: is allowed to both read and index. The index is stored in a NFS. It has its own configuration files. *SolrB and SolrC*: they can only read from the shared index and each one has their own configuration files. Solrconfig.xml has been changed with the following parameters: lockTypesingle/lockType When all servers startup they all work perfectly executing search operations. The problem appears when SolrA index new documents (commiting itself afther that indexation operation). If I manually execute a commit or a softCommit to SolrB or SolrC, they are not able to see the new documents added even if it is suposed to reopen a new searcher when a commit occurs. I have noticed that a commit operation in SolrA shows different segments (the newest ones) compared with the logs that SorlB/SolrC has after a commit. In other words, SolrA shows newer segments and SolrB/SolrC appears to see just the old ones. Is that normal? Any idea or suggestion to solve this? Thank you in advance, :-) Best regards, -- - Luis Cappa -- - Luis Cappa
Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards
You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=timing -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Dienstag, 11. März 2014 14:25 To: solr-user@lucene.apache.org Subject: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards The usual use of an ngram filter is at index time and not at query time. What exactly are you trying to achieve by using ngram filtering at query time as well as index time? Generally, it is inappropriate to combine the word delimiter filter with the standard tokenizer - the later removes the punctuation that normally influences how WDF treats the parts of a token. Use the white space tokenizer if you intend to use WDF. Which query parser are you using? What fields are being queried? Please post the parsed query string from the debug output - it will show the precise generated query. I think what you are seeing is that the ngram filter is generating tokens like h_cugtest and then the WDF is removing the underscore and then h gets generated as a separate token. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Tuesday, March 11, 2014 5:09 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I got it roght the first time and here is my requesthandler. The field plain_text is searched correctly and has the sam fieldtype as title - text_de queryParser name=synonym_edismax class=solr.SynonymExpandingExtendedDismaxQParserPlugin lst name=synonymAnalyzers lst name=myCoolAnalyzer lst name=tokenizer str name=classstandard/str /lst lst name=filter str name=classshingle/str str name=outputUnigramsIfNoShinglestrue/str str name=outputUnigramstrue/str str name=minShingleSize2/str str name=maxShingleSize4/str /lst lst name=filter str name=classsynonym/str str name=tokenizerFactorysolr.KeywordTokenizerFactory/str str name=synonymssynonyms.txt/str str name=expandtrue/str str name=ignoreCasetrue/str /lst /lst /lst
Re: Solr use with Cloudera HDFS failed creating directory
does anyone able to sort this one out ? im hitting same error is there a way to fix this by copying right version of jars. I tried copying older version of jar in solr lib but get same error. Solr: 4.6.1 Hadoop: 2.0.0..CDH -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-use-with-Cloudera-HDFS-failed-creating-directory-tp4109143p4123082.html Sent from the Solr - User mailing list archive at Nabble.com.
Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards
yes that is exactly what happend in the analyzer. the term i searched for was listed on both sides (index query). here's the rest: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -Original-Nachricht- Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 12/03/2014 13:25 Betreff: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=timing -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Dienstag, 11. März 2014 14:25 To: solr-user@lucene.apache.org Subject: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards The usual use of an ngram filter is at index time and not at query time. What exactly are you trying to achieve by using ngram filtering at query time as well as index time? Generally, it is inappropriate to combine the word delimiter filter with the
Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7
Hi, the Apache Lucene/Solr committers decided with a large majority on the vote to require Java 7 for the next minor release of Apache Lucene and Apache Solr (version 4.8)! Support for Java 6 by Oracle already ended more than a year ago and Java 8 is coming out in a few days. The next release will also contain some improvements for Java 7: - Better file handling (especially on Windows) in the directory implementations. Files can now be deleted on windows, although the index is still open - like it was always possible on Unix environments (delete on last close semantics). - Speed improvements in sorting comparators: Sorting now uses Java 7's own comparators for integer and long sorts, which are highly optimized by the Hotspot VM.. If you want to stay up-to-date with Lucene and Solr, you should upgrade your infrastructure to Java 7. Please be aware that you must use at least use Java 7u1. The recommended version at the moment is Java 7u25. Later versions like 7u40, 7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 prerelease, which has fixed this bug. Once 7u60 is out, this will be the recommended version. In addition, there is no Oracle/BEA JRockit available for Java 7, use the official Oracle Java 7. JRockit was never working correctly with Lucene/Solr (causing index corrumption), so this should not be an issue for you. Please also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs Apache Lucene and Apache Solr were also heavily tested with all prerelease versions of Java 8, so you can also give it a try! Looking forward to the official Java 8 release next week - I will run my indexes with that version for sure! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
Re: Solr read-only mode with same datadir: commits are not working.
Hi again! I'm diving inside DirectUpdateHandler2 code and it seems that the problem is that when a commit, when core.openNewSercher(true,true) is called it returns a RefCountedSolrIndexSearcher with a new searcher reference that points to an old (probably cached somehow) data dir. I've tried with core.openNewSearcher(false, false) but it doesn't work. What I think that I need is simple: after a commit, SolrIndexSearcher must be reload with a recent index snapshot not using any NRT caching method or whatever. (...) synchronized (solrCoreState.getUpdateLock()) { if (ulog != null) ulog.preSoftCommit(cmd); if (cmd.openSearcher) { core.getSearcher(true, false, waitSearcher); } else { // force open a new realtime searcher so realtime-get and versioning code can see the latest * RefCountedSolrIndexSearcher searchHolder = core.openNewSearcher(true, true); * searchHolder.decref(); } if (ulog != null) ulog.postSoftCommit(cmd); } It seems that executing this a new SolrIndexSearcher is returned, but I don't know how to set that new SolrIndexSearcher to the SolrCore instance: * SolrIndexSearcher searcher = core.newSearcher(Last update searcher);* Does anybody knows if possible? Thanks in advance! Best, 2014-03-12 12:10 GMT+01:00 Luis Cappa Banda luisca...@gmail.com: I've seen that StandardDirectoryReader appears in the commit logs. Maybe this DirectoryReader type is caching somehow the old segments in SolrB and SolrC even if they have been commited previosly. If that's true, does exist any other DirectoyReader type (I don't know, SimpleDirectoryReader or FSDirectoyReader) that always read the current segments when a commit happens? 2014-03-12 11:35 GMT+01:00 Luis Cappa Banda luisca...@gmail.com: Hey guys, I've doing some tests sharing the same index between three Solr servers: *SolrA*: is allowed to both read and index. The index is stored in a NFS. It has its own configuration files. *SolrB and SolrC*: they can only read from the shared index and each one has their own configuration files. Solrconfig.xml has been changed with the following parameters: lockTypesingle/lockType When all servers startup they all work perfectly executing search operations. The problem appears when SolrA index new documents (commiting itself afther that indexation operation). If I manually execute a commit or a softCommit to SolrB or SolrC, they are not able to see the new documents added even if it is suposed to reopen a new searcher when a commit occurs. I have noticed that a commit operation in SolrA shows different segments (the newest ones) compared with the logs that SorlB/SolrC has after a commit. In other words, SolrA shows newer segments and SolrB/SolrC appears to see just the old ones. Is that normal? Any idea or suggestion to solve this? Thank you in advance, :-) Best regards, -- - Luis Cappa -- - Luis Cappa -- - Luis Cappa
IDF maxDocs / numDocs
I am noticing the maxDocs between replicas is consistently different and that in the idf calculation it is used which causes idf scores for the same query/doc between replicas to be different. obviously an optimize can normalize the maxDocs scores, but that is only temporary.. is there a way to have idf use numDocs instead (as it should be consistent across replicas)? thanks, steve
RE: IDF maxDocs / numDocs
Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount in all our custom similarities, also because it allows you to have multiple languages in one index where one is much larger than the other. The small language will have very high IDF scores using maxDoc but they are proportional enough using docCount(). Using docCount() also fixes SolrCloud ranking problems, unless one of your replica's becomes inconsistent ;) https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/CollectionStatistics.html#docCount%28%29 -Original message- From:Steven Bower smb-apa...@alcyon.net Sent: Wednesday 12th March 2014 16:08 To: solr-user solr-user@lucene.apache.org Subject: IDF maxDocs / numDocs I am noticing the maxDocs between replicas is consistently different and that in the idf calculation it is used which causes idf scores for the same query/doc between replicas to be different. obviously an optimize can normalize the maxDocs scores, but that is only temporary.. is there a way to have idf use numDocs instead (as it should be consistent across replicas)? thanks, steve
More Maintenance Releases?
Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: More Maintenance Releases?
+1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull (dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards
Hi Jack, do you know how i can use local parameters in my solrconfig? The params are visible in the debugquery-output but solr doesn't parse them. lst name=invariants str name=fq{!q.op=OR} (*:* -organisations:[ TO *] -roles:[ TO *]) (+organisations:($org) +roles:($r)) (-organisations:[ TO *] +roles:($r)) (+organisations:($org) -roles:[ TO *])/str /lst -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Mittwoch, 12. März 2014 14:44 To: solr-user@lucene.apache.org Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards yes that is exactly what happend in the analyzer. the term i searched for was listed on both sides (index query). here's the rest: analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -Original-Nachricht- Von: Jack Krupansky j...@basetechnology.com An: solr-user@lucene.apache.org Datum: 12/03/2014 13:25 Betreff: Re: NOT SOLVED searches for single char tokens instead of from 3 uppwards You didn't show the new index analyzer - it's tricky to assure that index and query are compatible, but the Admin UI Analysis page can help. Generally, using pure defaults for WDF is not what you want, especially for query time. Usually there needs to be a slight asymmetry between index and query for WDF - index generates more terms than query. -- Jack Krupansky -Original Message- From: Andreas Owen Sent: Wednesday, March 12, 2014 6:20 AM To: solr-user@lucene.apache.org Subject: RE: NOT SOLVED searches for single char tokens instead of from 3 uppwards I now have the following: analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer The gui analysis shows me that wdf doesn't cut the underscore anymore but it still returns 0 results? Output: lst name=debug str name=rawquerystringyh_cug/str str name=querystringyh_cug/str str name=parsedquery(+DisjunctionMaxQuery((tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0)) ((expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_ coord/str str name=parsedquery_toString+(tags:yh_cug^10.0 | links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0 | editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *] (+*:* -expiration:*))^6.0) (div(int(clicks),max(int(displays),const(1^8.0/str lst name=explain/ arr name=expandedSynonyms stryh_cug/str /arr lst name=reasonForNotExpandingSynonyms str name=nameDidntFindAnySynonyms/str str name=explanationNo synonyms found for this query. Check your synonyms file./str /lst lst name=mainQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boost_queries str(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str /arr arr name=parsed_boost_queries str(expiration:[1394619501862 TO *] (+MatchAllDocsQuery(*:*) -expiration:*))^6.0/str /arr arr name=boostfuncs strdiv(clicks,max(displays,1))^8/str /arr /lst lst name=synonymQueryParser str name=QParserExtendedDismaxQParser/str null name=altquerystring/ arr name=boostfuncs
Change replication factor
After a collection has been created in SolrCloud, is there a way to modify the Replication Factor? Say I start with a few nodes in the cluster, and have a replication factor of 2. Over time, the index grows and we add more nodes to the cluster, can I increase the replication factor to 3? Thanks! Mike
Re: Change replication factor
You can simply create a new SolrCore with the same collection and shard id as the colleciton and shard you want to add a replica too. There is also an addReplica command comming to the collections API. Or perhaps it’s in 4.7, I don’t know, this JIRA issue is a little confusing as it’s still open, though it looks like stuff has been committed: https://issues.apache.org/jira/browse/SOLR-5130 -- Mark Miller about.me/markrmiller On March 12, 2014 at 10:40:15 AM, Mike Hugo (m...@piragua.com) wrote: After a collection has been created in SolrCloud, is there a way to modify the Replication Factor? Say I start with a few nodes in the cluster, and have a replication factor of 2. Over time, the index grows and we add more nodes to the cluster, can I increase the replication factor to 3? Thanks! Mike
Re: Change replication factor
Thanks Mark! Mike On Wed, Mar 12, 2014 at 12:43 PM, Mark Miller markrmil...@gmail.com wrote: You can simply create a new SolrCore with the same collection and shard id as the colleciton and shard you want to add a replica too. There is also an addReplica command comming to the collections API. Or perhaps it's in 4.7, I don't know, this JIRA issue is a little confusing as it's still open, though it looks like stuff has been committed: https://issues.apache.org/jira/browse/SOLR-5130 -- Mark Miller about.me/markrmiller On March 12, 2014 at 10:40:15 AM, Mike Hugo (m...@piragua.com) wrote: After a collection has been created in SolrCloud, is there a way to modify the Replication Factor? Say I start with a few nodes in the cluster, and have a replication factor of 2. Over time, the index grows and we add more nodes to the cluster, can I increase the replication factor to 3? Thanks! Mike
Re: More Maintenance Releases?
Hi; I'm not a committer yet but I want to share my thoughts from a perspective of a user. I've been using SolrCloud since 4.1.0 version of it. I've read nearly all e-mails and I follow mail list too. Solr project has a great development cycle and has a frequent release cycle. In fact, if you compare it with some other Apache Projects it is has really nice commit rates. I've prepared a chart that explains the release cycle of Solr since 4.0 and attached it to this e-mail to make everything clear. When you check the chart that I prepared you will see that Solr has followed that release cycle(for 4.x releases): If needed it has always had bugfix releases. So except for 4.0, 4.1.0 and 4.4.0 it had bug fix-releases (I do not include 4.7). However bug-fix releases are applied once for each main release. I mean there is no 4.3.2 after 4.3.1 or 4.6.2 after 4.6.1 When you use a project as like Solr you should catch up the current release or current stable release (as like a bugfix release). I think question should be that. If somebody finds a bug at a bugfix release what will happen? Will be a 4.x.2 release or it will be resolved with 4.x+1.2? I also think that solution can be that: maintaining 4.x.1 and applying changes to both for 4.x+1.0 and 4.x.2 So if anybody wants to use new features (of course with recently bug fixes) and accept the risk of new features user can use 4.x+1.0 otherwise a more stable version: 4.x.2 This causes a new question. What will be the limit for *y* at 4.x.y? As a perspective of a user who uses Solr and tests and checks its all versions my thought is that: 2 (or 3) may be enough for that. Long term support is a good idea (if you accept value of *y* as 2 or 3 it will be 4-6 months). Solr is developing so fast and it has nearly good features that users really need it. If maintenance is not a problem to apply bug-fixes to a release of 4.x.2 and 4.x+1.0 having a *y* vale that is greater than 1 may be a solution. If we just say that: this release will be long term supported -I think that- people will want to use new releases after a time later because of the new features nowadays. On the other hand if we release more than 1 bug-fix releases and if people do not need new features they will have a more stable version of their current version and will be able to use it. Thanks; Furkan KAMACI 2014-03-12 18:34 GMT+02:00 Mark Miller markrmil...@gmail.com: +1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull ( dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: More Maintenance Releases?
Hi; I've attached the chart that I've prepared as I mentioned at e-mail. Thanks; Furkan KAMACI 2014-03-12 21:17 GMT+02:00 Furkan KAMACI furkankam...@gmail.com: Hi; I'm not a committer yet but I want to share my thoughts from a perspective of a user. I've been using SolrCloud since 4.1.0 version of it. I've read nearly all e-mails and I follow mail list too. Solr project has a great development cycle and has a frequent release cycle. In fact, if you compare it with some other Apache Projects it is has really nice commit rates. I've prepared a chart that explains the release cycle of Solr since 4.0 and attached it to this e-mail to make everything clear. When you check the chart that I prepared you will see that Solr has followed that release cycle(for 4.x releases): If needed it has always had bugfix releases. So except for 4.0, 4.1.0 and 4.4.0 it had bug fix-releases (I do not include 4.7). However bug-fix releases are applied once for each main release. I mean there is no 4.3.2 after 4.3.1 or 4.6.2 after 4.6.1 When you use a project as like Solr you should catch up the current release or current stable release (as like a bugfix release). I think question should be that. If somebody finds a bug at a bugfix release what will happen? Will be a 4.x.2 release or it will be resolved with 4.x+1.2? I also think that solution can be that: maintaining 4.x.1 and applying changes to both for 4.x+1.0 and 4.x.2 So if anybody wants to use new features (of course with recently bug fixes) and accept the risk of new features user can use 4.x+1.0 otherwise a more stable version: 4.x.2 This causes a new question. What will be the limit for *y* at 4.x.y? As a perspective of a user who uses Solr and tests and checks its all versions my thought is that: 2 (or 3) may be enough for that. Long term support is a good idea (if you accept value of *y* as 2 or 3 it will be 4-6 months). Solr is developing so fast and it has nearly good features that users really need it. If maintenance is not a problem to apply bug-fixes to a release of 4.x.2 and 4.x+1.0 having a *y* vale that is greater than 1 may be a solution. If we just say that: this release will be long term supported -I think that- people will want to use new releases after a time later because of the new features nowadays. On the other hand if we release more than 1 bug-fix releases and if people do not need new features they will have a more stable version of their current version and will be able to use it. Thanks; Furkan KAMACI 2014-03-12 18:34 GMT+02:00 Mark Miller markrmil...@gmail.com: +1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull ( dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: More Maintenance Releases?
Furkan, This list tends to eat attachments. Could you post it somewhere like imgur? Thanks, Greg On Mar 12, 2014, at 2:19 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I've attached the chart that I've prepared as I mentioned at e-mail. Thanks; Furkan KAMACI 2014-03-12 21:17 GMT+02:00 Furkan KAMACI furkankam...@gmail.com: Hi; I'm not a committer yet but I want to share my thoughts from a perspective of a user. I've been using SolrCloud since 4.1.0 version of it. I've read nearly all e-mails and I follow mail list too. Solr project has a great development cycle and has a frequent release cycle. In fact, if you compare it with some other Apache Projects it is has really nice commit rates. I've prepared a chart that explains the release cycle of Solr since 4.0 and attached it to this e-mail to make everything clear. When you check the chart that I prepared you will see that Solr has followed that release cycle(for 4.x releases): If needed it has always had bugfix releases. So except for 4.0, 4.1.0 and 4.4.0 it had bug fix-releases (I do not include 4.7). However bug-fix releases are applied once for each main release. I mean there is no 4.3.2 after 4.3.1 or 4.6.2 after 4.6.1 When you use a project as like Solr you should catch up the current release or current stable release (as like a bugfix release). I think question should be that. If somebody finds a bug at a bugfix release what will happen? Will be a 4.x.2 release or it will be resolved with 4.x+1.2? I also think that solution can be that: maintaining 4.x.1 and applying changes to both for 4.x+1.0 and 4.x.2 So if anybody wants to use new features (of course with recently bug fixes) and accept the risk of new features user can use 4.x+1.0 otherwise a more stable version: 4.x.2 This causes a new question. What will be the limit for y at 4.x.y? As a perspective of a user who uses Solr and tests and checks its all versions my thought is that: 2 (or 3) may be enough for that. Long term support is a good idea (if you accept value of y as 2 or 3 it will be 4-6 months). Solr is developing so fast and it has nearly good features that users really need it. If maintenance is not a problem to apply bug-fixes to a release of 4.x.2 and 4.x+1.0 having a y vale that is greater than 1 may be a solution. If we just say that: this release will be long term supported -I think that- people will want to use new releases after a time later because of the new features nowadays. On the other hand if we release more than 1 bug-fix releases and if people do not need new features they will have a more stable version of their current version and will be able to use it. Thanks; Furkan KAMACI 2014-03-12 18:34 GMT+02:00 Mark Miller markrmil...@gmail.com: +1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull (dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: IDF maxDocs / numDocs
My problem is that both maxDoc() and docCount() both report documents that have been deleted in their values. Because of merging/etc.. those numbers can be different per replica (or at least that is what I'm seeing). I need a value that is consistent across replicas... I see in the comment it makes mention of not using IndexReader.numDocs() but there doesn't seem to me a way to get ahold of the IndexReader within a similarity implementation (as only TermStats, CollectionStats are passed in, and neither contains of ref to the reader) I am contemplating just using a static value for the number of docs as this won't change dramatically often.. steve On Wed, Mar 12, 2014 at 11:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount in all our custom similarities, also because it allows you to have multiple languages in one index where one is much larger than the other. The small language will have very high IDF scores using maxDoc but they are proportional enough using docCount(). Using docCount() also fixes SolrCloud ranking problems, unless one of your replica's becomes inconsistent ;) https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/CollectionStatistics.html#docCount%28%29 -Original message- From:Steven Bower smb-apa...@alcyon.net Sent: Wednesday 12th March 2014 16:08 To: solr-user solr-user@lucene.apache.org Subject: IDF maxDocs / numDocs I am noticing the maxDocs between replicas is consistently different and that in the idf calculation it is used which causes idf scores for the same query/doc between replicas to be different. obviously an optimize can normalize the maxDocs scores, but that is only temporary.. is there a way to have idf use numDocs instead (as it should be consistent across replicas)? thanks, steve
Delta import throws java heap space exception
Hi, I have some problems when execute the delta import with 2 million of rows from mysql database: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57) at java.nio.CharBuffer.allocate(CharBuffer.java:331) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777) at java.nio.charset.Charset.decode(Charset.java:810) at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010) at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820) at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541) at com.mysql.jdbc.ResultSetImpl.getStringInternal( ResultSetImpl.java:5812) at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.getARow(JdbcDataSource.java:315) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.access$700(JdbcDataSource.java:254) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:294) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:286) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext( EntityProcessorBase.java:117) at org.apache.solr.handler.dataimport.SqlEntityProcessor. nextModifiedRowKey(SqlEntityProcessor.java:86) at org.apache.solr.handler.dataimport.EntityProcessorWrapper. nextModifiedRowKey(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder. collectDelta(DocBuilder.java:781) at org.apache.solr.handler.dataimport.DocBuilder.doDelta( DocBuilder.java:338) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:223) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) -- java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@47a034e7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa ta(MysqlIO.java:3361) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks( ConnectionImpl.java:5204) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649) at org.apache.solr.handler.dataimport.JdbcDataSource. closeConnection(JdbcDataSource.java:436) at org.apache.solr.handler.dataimport.JdbcDataSource. close(JdbcDataSource.java:421) at org.apache.solr.handler.dataimport.DocBuilder. closeEntityProcessorWrappers(DocBuilder.java:288) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:277) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) Currently I have the batchSize parameter stetted to -1 Configuration: - SOLR 4.4 - Centos 5.5 - 2GB RAM - 1 Procesosr Does someone have the same error? Could someone help me, please? Thank you, Richard
single node causing cluster-wide outage
Hi all! After upgrading to Solr 4.6.1 we encountered a situation where a cluster outage was traced to a single node misbehaving, after restarting the node the cluster immediately returned to normal operation. The bad node had ~420 threads locked on FastLRUCache and most httpshardexecutor threads were waiting on apache commons http futures. Has anyone encountered such a situation? what can we do to prevent misbehaving nodes from bringing down the entire cluster? Cheers, Avishai
Re: Delta import throws java heap space exception
Hi Richard, How much ram do you assign to java heap? Try increasing it to 1 gb for example. Please see : https://wiki.apache.org/solr/ShawnHeisey Ahmet On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez richard.marqu...@gmail.com wrote: Hi, I have some problems when execute the delta import with 2 million of rows from mysql database: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57) at java.nio.CharBuffer.allocate(CharBuffer.java:331) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777) at java.nio.charset.Charset.decode(Charset.java:810) at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010) at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820) at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541) at com.mysql.jdbc.ResultSetImpl.getStringInternal( ResultSetImpl.java:5812) at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.getARow(JdbcDataSource.java:315) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.access$700(JdbcDataSource.java:254) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:294) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:286) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext( EntityProcessorBase.java:117) at org.apache.solr.handler.dataimport.SqlEntityProcessor. nextModifiedRowKey(SqlEntityProcessor.java:86) at org.apache.solr.handler.dataimport.EntityProcessorWrapper. nextModifiedRowKey(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder. collectDelta(DocBuilder.java:781) at org.apache.solr.handler.dataimport.DocBuilder.doDelta( DocBuilder.java:338) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:223) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) -- java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@47a034e7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa ta(MysqlIO.java:3361) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks( ConnectionImpl.java:5204) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649) at org.apache.solr.handler.dataimport.JdbcDataSource. closeConnection(JdbcDataSource.java:436) at org.apache.solr.handler.dataimport.JdbcDataSource. close(JdbcDataSource.java:421) at org.apache.solr.handler.dataimport.DocBuilder. closeEntityProcessorWrappers(DocBuilder.java:288) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:277) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) Currently I have the batchSize parameter stetted to -1 Configuration: - SOLR 4.4 - Centos 5.5 - 2GB RAM - 1 Procesosr Does someone have the same error? Could someone help me, please? Thank you, Richard
Re: Solr-Ajax client
Shawn; My user name is davismarques on the wiki. Yes, I am aware that its a bad idea to expose Solr directly to the Internet. As you've discovered, we filter all requests to the server so that only select requests make it through. I do not yet have documentation for the Javascript application, nor advice on configuring a proxy. However, documentation and setup instructions are on my to-do list so I'll get to that soon. Davis On Wed, Mar 12, 2014 at 6:03 PM, Shawn Heisey s...@elyograg.org wrote: On 3/11/2014 11:48 PM, Davis Marques wrote: Just a quick announcement and request for guidance: I've developed an open source, Javascript client for Apache Solr. Its very easy to implement and can be configured to provide faceted search to an existing Solr index in just a few minutes. The source is available online here: https://bitbucket.org/esrc/eaccpf-ajax I attempted to add a note about it into the Solr wiki, at https://wiki.apache.org/solr/IntegratingSolr, but was prevented by the system. Is there some protocol for posting information to the wiki? Just give us your username on the wiki and someone will get you added. Note that it is case sensitive, at least when adding it to the permission group. This is a nice bit of work that you've done, but I'm sure you know that it is inherently unsafe to use a javascript Solr client on a website that is accessible to the Internet. Exposing a Solr server directly to the Internet is a bad idea. Do you offer any documentation telling potential users how to configure a proxy server to protect Solr? It looks like the Solr server in your online demo is protected by nginx. I'm sure that building its configuration was not a trivial task. Thanks, Shawn -- Davis M. Marques t: 61 0418 450 194 e: dmarq@gmail.com w: http://www.davismarques.com/
Re: Migration issues - Solr 4.3.0 to Solr 4.4.0
We decided to go with the latest (it seems to have a lot more bug /performance fixes). The issue i mentioned was a red herring. I was able to successfully upgrade On Tue, Mar 11, 2014 at 2:09 PM, Chris W chris1980@gmail.com wrote: Moving 4 versions ahead may need much additional tests from my side to ensure our cluster performance is good and within our SLA. Moving to 4.4 (just 1 month after 4.3.1 was released) gives me the most important bug fix for reloading collections (which does not work now and have to do a rolling restart) I am also ok upgrading to 4.5 but do not want to go too far without more testing Either way i think i will hit the issue mentioned above On Tue, Mar 11, 2014 at 12:22 PM, Erick Erickson erickerick...@gmail.comwrote: First I have to ask why you're going to 4.4 rather than 4.7. I understand vetting requirements, but I thought I'd ask No use going through this twice if you can avoid it. On Tue, Mar 11, 2014 at 12:49 PM, Chris W chris1980@gmail.com wrote: I am running solrcloud version 4.3.0 with a 10 m1.xlarge nodes and using zk to manage the state/data for collections and configs.I want to upgrade to version 4.4.0. When i deploy a 4.4 version of solrcloud in my test environment, none of the collections/configs (created using the 4.3 version of solr) that exist in zk show up in the core admin. I should also mention that, all of my collection configs for solrconfig.xml have luceneMatchVersionLUCENE_43/luceneMatchVersion. Should i change the lucene version to match lucene_44 (match solr version?) to get it working again? What is the best way to upgrade to a newer version of solrcloud without deleting and recreating all configs and collections? Kindly advise -- Best -- C -- Best -- C -- Best -- C
Re: More Maintenance Releases?
Hi; Here is the link: http://i740.photobucket.com/albums/xx43/kamaci/Solr_Releases_Furkan_KAMACI_zps8c0c196c.jpg Thanks; Furkan KAMACI 2014-03-12 21:21 GMT+02:00 Greg Walters greg.walt...@answers.com: Furkan, This list tends to eat attachments. Could you post it somewhere like imgur? Thanks, Greg On Mar 12, 2014, at 2:19 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I've attached the chart that I've prepared as I mentioned at e-mail. Thanks; Furkan KAMACI 2014-03-12 21:17 GMT+02:00 Furkan KAMACI furkankam...@gmail.com: Hi; I'm not a committer yet but I want to share my thoughts from a perspective of a user. I've been using SolrCloud since 4.1.0 version of it. I've read nearly all e-mails and I follow mail list too. Solr project has a great development cycle and has a frequent release cycle. In fact, if you compare it with some other Apache Projects it is has really nice commit rates. I've prepared a chart that explains the release cycle of Solr since 4.0 and attached it to this e-mail to make everything clear. When you check the chart that I prepared you will see that Solr has followed that release cycle(for 4.x releases): If needed it has always had bugfix releases. So except for 4.0, 4.1.0 and 4.4.0 it had bug fix-releases (I do not include 4.7). However bug-fix releases are applied once for each main release. I mean there is no 4.3.2 after 4.3.1 or 4.6.2 after 4.6.1 When you use a project as like Solr you should catch up the current release or current stable release (as like a bugfix release). I think question should be that. If somebody finds a bug at a bugfix release what will happen? Will be a 4.x.2 release or it will be resolved with 4.x+1.2? I also think that solution can be that: maintaining 4.x.1 and applying changes to both for 4.x+1.0 and 4.x.2 So if anybody wants to use new features (of course with recently bug fixes) and accept the risk of new features user can use 4.x+1.0 otherwise a more stable version: 4.x.2 This causes a new question. What will be the limit for y at 4.x.y? As a perspective of a user who uses Solr and tests and checks its all versions my thought is that: 2 (or 3) may be enough for that. Long term support is a good idea (if you accept value of y as 2 or 3 it will be 4-6 months). Solr is developing so fast and it has nearly good features that users really need it. If maintenance is not a problem to apply bug-fixes to a release of 4.x.2 and 4.x+1.0 having a y vale that is greater than 1 may be a solution. If we just say that: this release will be long term supported -I think that- people will want to use new releases after a time later because of the new features nowadays. On the other hand if we release more than 1 bug-fix releases and if people do not need new features they will have a more stable version of their current version and will be able to use it. Thanks; Furkan KAMACI 2014-03-12 18:34 GMT+02:00 Mark Miller markrmil...@gmail.com: +1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull ( dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another version to be a long term support release? This would help with the overall stability of Solr and help in the decision about how/when to upgrade Solr. Cheers, -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Zookeeper latencies and pending requests - Solr 4.3
Hi I have a 3 node zk ensemble . I see a very high latency for zk responses and also a lot of outstanding requests (in the order of 30-40) I also see that the requests are not going to all zookeeper nodes equally. One node has more requests/connections than the others. I see that CPU/Mem and disk usage limits are very normal (under 30% cpu, disk reads in the order of kb, jvm size is 2 Gb but it hasnt even reached 30% usage). The size of data in zk is around 50MB I also see a few zk timeout for solrcloud nodes causing them to be shown as dead in the cloud view. I have increased the connection timeout to around 3 minutes and still the same issue seems to be happening How do i make zk respond faster to requests and where does zk usually spend time while dealing with incoming requests? Any pointers on how to move forward will be great -- Best -- C
Re: Delta import throws java heap space exception
Hi Ahmet, Thank you for your response, currently I have the next configuration for JVM: -XX:+PrintGCDetails-XX:-UseParallelGC-XX:SurvivorRatio=8-XX:NewRatio=2 -XX:+HeapDumpOnOutOfMemoryError-XX:PermSize=128m-XX:MaxPermSize=256m -Xms1024m-Xmx2048m I have 3.67 GB of physical RAM and 2GB is asigned to JVM (-Xmx2048m) 2014-03-12 17:32 GMT-04:00 Ahmet Arslan iori...@yahoo.com: Hi Richard, How much ram do you assign to java heap? Try increasing it to 1 gb for example. Please see : https://wiki.apache.org/solr/ShawnHeisey Ahmet On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez richard.marqu...@gmail.com wrote: Hi, I have some problems when execute the delta import with 2 million of rows from mysql database: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57) at java.nio.CharBuffer.allocate(CharBuffer.java:331) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777) at java.nio.charset.Charset.decode(Charset.java:810) at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010) at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820) at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541) at com.mysql.jdbc.ResultSetImpl.getStringInternal( ResultSetImpl.java:5812) at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.getARow(JdbcDataSource.java:315) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.access$700(JdbcDataSource.java:254) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:294) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:286) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext( EntityProcessorBase.java:117) at org.apache.solr.handler.dataimport.SqlEntityProcessor. nextModifiedRowKey(SqlEntityProcessor.java:86) at org.apache.solr.handler.dataimport.EntityProcessorWrapper. nextModifiedRowKey(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder. collectDelta(DocBuilder.java:781) at org.apache.solr.handler.dataimport.DocBuilder.doDelta( DocBuilder.java:338) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:223) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) -- java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@47a034e7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa ta(MysqlIO.java:3361) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks( ConnectionImpl.java:5204) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649) at org.apache.solr.handler.dataimport.JdbcDataSource. closeConnection(JdbcDataSource.java:436) at org.apache.solr.handler.dataimport.JdbcDataSource. close(JdbcDataSource.java:421) at org.apache.solr.handler.dataimport.DocBuilder. closeEntityProcessorWrappers(DocBuilder.java:288) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:277) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) Currently I have the batchSize parameter stetted to -1 Configuration: - SOLR 4.4 - Centos 5.5 - 2GB RAM - 1 Procesosr Does someone have the same
Re: Zookeeper latencies and pending requests - Solr 4.3
Hi; FAQ page says that: *Q: I'm seeing lot's of session timeout exceptions - what to do?* *A: Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this for no good reason, but it should be high enough that you don't see a lot of false session timeouts due to load, network lag, or garbage collection pauses. The default timeout is 15 seconds, but some environments might need to go as high as 30-60 seconds*. So when you do that what is the load of your network? Do you get that timeouts while heavy indexing or at an idle time? If not there should be a network problem. Could you chech whether a problem exists between your Zookeeper ensembles? On the other hand could you give some more information about your infrastructure and Solr logs? (PS: 50 mb data *may *cause a problem for your architecture) Thanks; Furkan KAMACI 2014-03-13 0:57 GMT+02:00 Chris W chris1980@gmail.com: Hi I have a 3 node zk ensemble . I see a very high latency for zk responses and also a lot of outstanding requests (in the order of 30-40) I also see that the requests are not going to all zookeeper nodes equally. One node has more requests/connections than the others. I see that CPU/Mem and disk usage limits are very normal (under 30% cpu, disk reads in the order of kb, jvm size is 2 Gb but it hasnt even reached 30% usage). The size of data in zk is around 50MB I also see a few zk timeout for solrcloud nodes causing them to be shown as dead in the cloud view. I have increased the connection timeout to around 3 minutes and still the same issue seems to be happening How do i make zk respond faster to requests and where does zk usually spend time while dealing with incoming requests? Any pointers on how to move forward will be great -- Best -- C
Re: Delta import throws java heap space exception
Hi; Could you send your data-config.xml? Thanks; Furkan KAMACI 2014-03-13 1:01 GMT+02:00 Richard Marquina Lopez richard.marqu...@gmail.com : Hi Ahmet, Thank you for your response, currently I have the next configuration for JVM: -XX:+PrintGCDetails-XX:-UseParallelGC-XX:SurvivorRatio=8-XX:NewRatio=2 -XX:+HeapDumpOnOutOfMemoryError-XX:PermSize=128m-XX:MaxPermSize=256m -Xms1024m-Xmx2048m I have 3.67 GB of physical RAM and 2GB is asigned to JVM (-Xmx2048m) 2014-03-12 17:32 GMT-04:00 Ahmet Arslan iori...@yahoo.com: Hi Richard, How much ram do you assign to java heap? Try increasing it to 1 gb for example. Please see : https://wiki.apache.org/solr/ShawnHeisey Ahmet On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez richard.marqu...@gmail.com wrote: Hi, I have some problems when execute the delta import with 2 million of rows from mysql database: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57) at java.nio.CharBuffer.allocate(CharBuffer.java:331) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777) at java.nio.charset.Charset.decode(Charset.java:810) at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010) at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820) at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541) at com.mysql.jdbc.ResultSetImpl.getStringInternal( ResultSetImpl.java:5812) at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.getARow(JdbcDataSource.java:315) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator.access$700(JdbcDataSource.java:254) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:294) at org.apache.solr.handler.dataimport.JdbcDataSource$ ResultSetIterator$1.next(JdbcDataSource.java:286) at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext( EntityProcessorBase.java:117) at org.apache.solr.handler.dataimport.SqlEntityProcessor. nextModifiedRowKey(SqlEntityProcessor.java:86) at org.apache.solr.handler.dataimport.EntityProcessorWrapper. nextModifiedRowKey(EntityProcessorWrapper.java:267) at org.apache.solr.handler.dataimport.DocBuilder. collectDelta(DocBuilder.java:781) at org.apache.solr.handler.dataimport.DocBuilder.doDelta( DocBuilder.java:338) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:223) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter. runCmd(DataImporter.java:478) at org.apache.solr.handler.dataimport.DataImporter$1.run( DataImporter.java:457) -- java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@47a034e7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa ta(MysqlIO.java:3361) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks( ConnectionImpl.java:5204) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1649) at org.apache.solr.handler.dataimport.JdbcDataSource. closeConnection(JdbcDataSource.java:436) at org.apache.solr.handler.dataimport.JdbcDataSource. close(JdbcDataSource.java:421) at org.apache.solr.handler.dataimport.DocBuilder. closeEntityProcessorWrappers(DocBuilder.java:288) at org.apache.solr.handler.dataimport.DocBuilder.execute( DocBuilder.java:277) at org.apache.solr.handler.dataimport.DataImporter. doDeltaImport(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter.
Re: Zookeeper latencies and pending requests - Solr 4.3
Hi Furkan Load on the network is very low when read workload is on the cluster. During indexing, a few of my commits get hung forever and the solr nodes are attempting to get connection from zookeeper. The peer communication between zk is very good and i havent seen any issues. The network transfer is around 15-20 mBps when i restart a solr node. *Infrastructure*: 10 node solrcloud cluster with 3 node zk ensemble (m1.medium instances with 1 core cpu, 1.5Gb of Heap out of total of 3Gb ram). Solr logs are in the same mount as the solr data and tlogs. Zk logs are also in the same mount as zk data. I have 80+ collections which can grow up to 150-200 easily. *Regarding ZK Data* Why does 50MB pose a problem if none of the system parameters are in an alarming state? I have around 80+ collections in solr and the every collection has the same schema but different solrconfig.xml. Hence I am bundling every schema,config into a different zk folder and pushing that as a separate config. Is there a way in solr/zookeeper to use one for common files (like velocity template, schema) and push just the solrconfig.xml into another config directory? In the 50MB I am sure that atleast 90% of the data is duplicate across configs Kindly advise and thanks for your response On Wed, Mar 12, 2014 at 4:08 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; FAQ page says that: *Q: I'm seeing lot's of session timeout exceptions - what to do?* *A: Try raising the ZooKeeper session timeout by editing solr.xml - see the zkClientTimeout attribute. The minimum session timeout is 2 times your ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The default tickTime is 2 seconds. You should avoiding raising this for no good reason, but it should be high enough that you don't see a lot of false session timeouts due to load, network lag, or garbage collection pauses. The default timeout is 15 seconds, but some environments might need to go as high as 30-60 seconds*. So when you do that what is the load of your network? Do you get that timeouts while heavy indexing or at an idle time? If not there should be a network problem. Could you chech whether a problem exists between your Zookeeper ensembles? On the other hand could you give some more information about your infrastructure and Solr logs? (PS: 50 mb data *may *cause a problem for your architecture) Thanks; Furkan KAMACI 2014-03-13 0:57 GMT+02:00 Chris W chris1980@gmail.com: Hi I have a 3 node zk ensemble . I see a very high latency for zk responses and also a lot of outstanding requests (in the order of 30-40) I also see that the requests are not going to all zookeeper nodes equally. One node has more requests/connections than the others. I see that CPU/Mem and disk usage limits are very normal (under 30% cpu, disk reads in the order of kb, jvm size is 2 Gb but it hasnt even reached 30% usage). The size of data in zk is around 50MB I also see a few zk timeout for solrcloud nodes causing them to be shown as dead in the cloud view. I have increased the connection timeout to around 3 minutes and still the same issue seems to be happening How do i make zk respond faster to requests and where does zk usually spend time while dealing with incoming requests? Any pointers on how to move forward will be great -- Best -- C -- Best -- C
Re: Solr-Ajax client
Hey Davis I've added you to the Contributors Group :) -Stefan On Wednesday, March 12, 2014 at 11:49 PM, Davis Marques wrote: Shawn; My user name is davismarques on the wiki. Yes, I am aware that its a bad idea to expose Solr directly to the Internet. As you've discovered, we filter all requests to the server so that only select requests make it through. I do not yet have documentation for the Javascript application, nor advice on configuring a proxy. However, documentation and setup instructions are on my to-do list so I'll get to that soon. Davis On Wed, Mar 12, 2014 at 6:03 PM, Shawn Heisey s...@elyograg.org (mailto:s...@elyograg.org) wrote: On 3/11/2014 11:48 PM, Davis Marques wrote: Just a quick announcement and request for guidance: I've developed an open source, Javascript client for Apache Solr. Its very easy to implement and can be configured to provide faceted search to an existing Solr index in just a few minutes. The source is available online here: https://bitbucket.org/esrc/eaccpf-ajax I attempted to add a note about it into the Solr wiki, at https://wiki.apache.org/solr/IntegratingSolr, but was prevented by the system. Is there some protocol for posting information to the wiki? Just give us your username on the wiki and someone will get you added. Note that it is case sensitive, at least when adding it to the permission group. This is a nice bit of work that you've done, but I'm sure you know that it is inherently unsafe to use a javascript Solr client on a website that is accessible to the Internet. Exposing a Solr server directly to the Internet is a bad idea. Do you offer any documentation telling potential users how to configure a proxy server to protect Solr? It looks like the Solr server in your online demo is protected by nginx. I'm sure that building its configuration was not a trivial task. Thanks, Shawn -- Davis M. Marques t: 61 0418 450 194 e: dmarq@gmail.com (mailto:dmarq@gmail.com) w: http://www.davismarques.com/
Re: More Maintenance Releases?
Wondering if 4.7 is a natural point to do this. See Uwe's announcement that as of Solr 4.8, Solr/Lucene will _require_ Java 1.7 rather than Java 1.6. I know some organizations will not be able to make this transition easily, thus I suspect we'll see ongoing requests to please back-port XXX to Solr 4.7 since we can't use Java 1.7). Hmmm, Solr 4.7, Java 1.7, coincidence? :). Does it make any sense to think of essentially freezing 4.7 except for bug fixes that we selectively back-port? Mostly random musings, but I thought I'd throw it out there. NOTE: I'm not volunteering to be the release manager for this, that'd take someone who's stuck on 1.6 IMO. Best, Erick On Wed, Mar 12, 2014 at 6:52 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; Here is the link: http://i740.photobucket.com/albums/xx43/kamaci/Solr_Releases_Furkan_KAMACI_zps8c0c196c.jpg Thanks; Furkan KAMACI 2014-03-12 21:21 GMT+02:00 Greg Walters greg.walt...@answers.com: Furkan, This list tends to eat attachments. Could you post it somewhere like imgur? Thanks, Greg On Mar 12, 2014, at 2:19 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; I've attached the chart that I've prepared as I mentioned at e-mail. Thanks; Furkan KAMACI 2014-03-12 21:17 GMT+02:00 Furkan KAMACI furkankam...@gmail.com: Hi; I'm not a committer yet but I want to share my thoughts from a perspective of a user. I've been using SolrCloud since 4.1.0 version of it. I've read nearly all e-mails and I follow mail list too. Solr project has a great development cycle and has a frequent release cycle. In fact, if you compare it with some other Apache Projects it is has really nice commit rates. I've prepared a chart that explains the release cycle of Solr since 4.0 and attached it to this e-mail to make everything clear. When you check the chart that I prepared you will see that Solr has followed that release cycle(for 4.x releases): If needed it has always had bugfix releases. So except for 4.0, 4.1.0 and 4.4.0 it had bug fix-releases (I do not include 4.7). However bug-fix releases are applied once for each main release. I mean there is no 4.3.2 after 4.3.1 or 4.6.2 after 4.6.1 When you use a project as like Solr you should catch up the current release or current stable release (as like a bugfix release). I think question should be that. If somebody finds a bug at a bugfix release what will happen? Will be a 4.x.2 release or it will be resolved with 4.x+1.2? I also think that solution can be that: maintaining 4.x.1 and applying changes to both for 4.x+1.0 and 4.x.2 So if anybody wants to use new features (of course with recently bug fixes) and accept the risk of new features user can use 4.x+1.0 otherwise a more stable version: 4.x.2 This causes a new question. What will be the limit for y at 4.x.y? As a perspective of a user who uses Solr and tests and checks its all versions my thought is that: 2 (or 3) may be enough for that. Long term support is a good idea (if you accept value of y as 2 or 3 it will be 4-6 months). Solr is developing so fast and it has nearly good features that users really need it. If maintenance is not a problem to apply bug-fixes to a release of 4.x.2 and 4.x+1.0 having a y vale that is greater than 1 may be a solution. If we just say that: this release will be long term supported -I think that- people will want to use new releases after a time later because of the new features nowadays. On the other hand if we release more than 1 bug-fix releases and if people do not need new features they will have a more stable version of their current version and will be able to use it. Thanks; Furkan KAMACI 2014-03-12 18:34 GMT+02:00 Mark Miller markrmil...@gmail.com: +1 to the idea, I love bug fix releases (which is why I volunteered to do the last couple). The main limiting factor is a volunteer to do it. Users requesting a specific bug fix relese is probably a good way to prompt volunteers though. -- Mark Miller about.me/markrmiller On March 12, 2014 at 9:14:50 AM, Doug Turnbull ( dturnb...@opensourceconnections.com) wrote: Hello Solr community, We have been using Solr to great effect at OpenSource Connections. Occasionally though, we'll hit a bug in say 4.5.1, that gets fixed in 4.6.0. Unfortunately, as 4.6.0 is a release sporting several new features, there's invariably new bugs that get introduced. So while my bug in 4.5.1 is fixed, a new bug related to new features in 4.6.0 means 4.6.0 might be a showstopper. This is more a question for the PMC than anything (with comments from others welcome). Would it be possible to do more minor bug-fix releases? I realize this could be a burden, so maybe it would be good to pick a version and decide this will be a long term support release. We will backport bug fixes and do several additional bug-fix releases for 4-6 months? Then we'd pick another
Re: Migration issues - Solr 4.3.0 to Solr 4.4.0
Glad to hear it! Thanks for bringing closure... Erick On Wed, Mar 12, 2014 at 6:53 PM, Chris W chris1980@gmail.com wrote: We decided to go with the latest (it seems to have a lot more bug /performance fixes). The issue i mentioned was a red herring. I was able to successfully upgrade On Tue, Mar 11, 2014 at 2:09 PM, Chris W chris1980@gmail.com wrote: Moving 4 versions ahead may need much additional tests from my side to ensure our cluster performance is good and within our SLA. Moving to 4.4 (just 1 month after 4.3.1 was released) gives me the most important bug fix for reloading collections (which does not work now and have to do a rolling restart) I am also ok upgrading to 4.5 but do not want to go too far without more testing Either way i think i will hit the issue mentioned above On Tue, Mar 11, 2014 at 12:22 PM, Erick Erickson erickerick...@gmail.comwrote: First I have to ask why you're going to 4.4 rather than 4.7. I understand vetting requirements, but I thought I'd ask No use going through this twice if you can avoid it. On Tue, Mar 11, 2014 at 12:49 PM, Chris W chris1980@gmail.com wrote: I am running solrcloud version 4.3.0 with a 10 m1.xlarge nodes and using zk to manage the state/data for collections and configs.I want to upgrade to version 4.4.0. When i deploy a 4.4 version of solrcloud in my test environment, none of the collections/configs (created using the 4.3 version of solr) that exist in zk show up in the core admin. I should also mention that, all of my collection configs for solrconfig.xml have luceneMatchVersionLUCENE_43/luceneMatchVersion. Should i change the lucene version to match lucene_44 (match solr version?) to get it working again? What is the best way to upgrade to a newer version of solrcloud without deleting and recreating all configs and collections? Kindly advise -- Best -- C -- Best -- C -- Best -- C
Re-index Parent-Child Schema
Hi, I've inherited an Solr application with a Schema that contains parent-child relationship. All child elements are maintained in multi-value fields. So an Order with 3 Order lines will result in an array of size 3 in Solr, This worked fine as long as clients queried only on Order, but with new requirements it is serving inaccurate results. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders : 123 and 345, which is correct per Solr. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for BookingRecordId:145) this should be excluded. Per this blog : http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html I can't use span queries as I have tons of child elements to query and I want to keep any changes to client queries to minimum. So is creating multiple indexes is the only way? We have 3 Physical boxes with SolrCloud and at some point we would like to shard. Appreciate any inputs. Best, -Vijay
Re: Support for Numeric DocValues Updates in Solr?
I tried looking at Solr code but it seems it require more deeper understanding of Solr code to add this support, may be other solr expert can provide some pointers. Did any one else tried Updating Numeric DocValues in Solr but not committed the patch yet? It would be really nice if this feature added, since our usecase required to update few numeric fields at much higher rate than normal document update. On Wed, Nov 20, 2013 at 7:48 AM, Gopal Patwa gopalpa...@gmail.com wrote: +1 to add this support in Solr On Wed, Nov 20, 2013 at 7:16 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Numeric DocValues Updates functionality that came via https://issues.apache.org/jira/browse/LUCENE-5189 sounds very valuable, while we wait for full/arbitrary field updates (https://issues.apache.org/jira/browse/LUCENE-4258). Would it make sense to add support for Numeric DocValues Updates to Solr? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Solr-Ajax client
Sweet. Thanks Stefan. Davis On Thu, Mar 13, 2014 at 10:37 AM, Stefan Matheis matheis.ste...@gmail.comwrote: Hey Davis I've added you to the Contributors Group :) -Stefan On Wednesday, March 12, 2014 at 11:49 PM, Davis Marques wrote: Shawn; My user name is davismarques on the wiki. Yes, I am aware that its a bad idea to expose Solr directly to the Internet. As you've discovered, we filter all requests to the server so that only select requests make it through. I do not yet have documentation for the Javascript application, nor advice on configuring a proxy. However, documentation and setup instructions are on my to-do list so I'll get to that soon. Davis On Wed, Mar 12, 2014 at 6:03 PM, Shawn Heisey s...@elyograg.org(mailto: s...@elyograg.org) wrote: On 3/11/2014 11:48 PM, Davis Marques wrote: Just a quick announcement and request for guidance: I've developed an open source, Javascript client for Apache Solr. Its very easy to implement and can be configured to provide faceted search to an existing Solr index in just a few minutes. The source is available online here: https://bitbucket.org/esrc/eaccpf-ajax I attempted to add a note about it into the Solr wiki, at https://wiki.apache.org/solr/IntegratingSolr, but was prevented by the system. Is there some protocol for posting information to the wiki? Just give us your username on the wiki and someone will get you added. Note that it is case sensitive, at least when adding it to the permission group. This is a nice bit of work that you've done, but I'm sure you know that it is inherently unsafe to use a javascript Solr client on a website that is accessible to the Internet. Exposing a Solr server directly to the Internet is a bad idea. Do you offer any documentation telling potential users how to configure a proxy server to protect Solr? It looks like the Solr server in your online demo is protected by nginx. I'm sure that building its configuration was not a trivial task. Thanks, Shawn -- Davis M. Marques t: 61 0418 450 194 e: dmarq@gmail.com (mailto:dmarq@gmail.com) w: http://www.davismarques.com/ -- Davis M. Marques t: 61 0418 450 194 e: dmarq@gmail.com w: http://www.davismarques.com/