[jira] [Commented] (SOLR-11611) Starting Solr using solr.cmd fails in Windows, when the path contains a parenthesis
[ https://issues.apache.org/jira/browse/SOLR-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290621#comment-16290621 ] Romain MERESSE commented on SOLR-11611: --- You should use delayed expansion in your batch file. {code} IF "%SOLR_SSL_ENABLED%"=="true" ( set "SSL_PORT_PROP=-Dsolr.jetty.https.port=%SOLR_PORT%" set "START_OPTS=%START_OPTS% %SOLR_SSL_OPTS% !SSL_PORT_PROP!" ) {code} KO {code} IF "%SOLR_SSL_ENABLED%"=="true" ( set "SSL_PORT_PROP=-Dsolr.jetty.https.port=%SOLR_PORT%" set "START_OPTS=!START_OPTS! %SOLR_SSL_OPTS% !SSL_PORT_PROP!" ) {code} OK > Starting Solr using solr.cmd fails in Windows, when the path contains a > parenthesis > --- > > Key: SOLR-11611 > URL: https://issues.apache.org/jira/browse/SOLR-11611 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCLI >Affects Versions: 7.1 > Environment: Microsoft Windows [Version 10.0.15063] > java version "1.8.0_144" > Java(TM) SE Runtime Environment (build 1.8.0_144-b01) > Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) >Reporter: Jakob Furrer > Fix For: 7.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > Starting Solr using solr.cli fails in Windows, when the path contains spaces. > Use the following example to reproduce the error: > {quote}C:\>c: > C:\>cd "C:\Program Files (x86)\Company Name\ProductName Solr\bin" > C:\Program Files (x86)\Company Name\ProductName Solr\bin>dir > Volume in Laufwerk C: hat keine Bezeichnung. > Volumeseriennummer: 8207-3B8B > Verzeichnis von C:\Program Files (x86)\Company Name\ProductName Solr\bin > 06.11.2017 15:52 . > 06.11.2017 15:52 .. > 06.11.2017 15:39 init.d > 03.11.2017 17:32 8 209 post > 03.11.2017 17:3275 963 solr > 06.11.2017 14:2469 407 solr.cmd >3 Datei(en),153 579 Bytes >3 Verzeichnis(se), 51 191 619 584 Bytes frei > C:\Program Files (x86)\Company Name\ProductName Solr\bin>solr.cmd start > *"\Company" kann syntaktisch an dieser Stelle nicht verarbeitet werden.* > C:\Program Files (x86)\Company Name\ProductName Solr\bin>{quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037948#comment-15037948 ] Romain MERESSE commented on SOLR-2649: -- Very high criticality for me too. Why is it a minor priority ? > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Bug > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson >Priority: Minor > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1672) RFE: facet reverse sort count
[ https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555654#comment-14555654 ] Romain MERESSE commented on SOLR-1672: -- I need this feature too. Resorting to client-side sorting is not a good solution. Any news ? RFE: facet reverse sort count - Key: SOLR-1672 URL: https://issues.apache.org/jira/browse/SOLR-1672 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Java, Solrj, http Reporter: Peter Sturge Priority: Minor Attachments: SOLR-1672.patch Original Estimate: 0h Remaining Estimate: 0h As suggested by Chris Hosstetter, I have added an optional Comparator to the BoundedTreeSetLong in the UnInvertedField class. This optional comparator is used when a new (and also optional) field facet parameter called 'facet.sortorder' is set to the string 'dsc' (e.g. f.facetname.facet.sortorder=dsc for per field, or facet.sortorder=dsc for all facets). Note that this parameter has no effect if facet.method=enum. Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to its default behaviour. This change affects 2 source files: UnInvertedField.java [line 438] The getCounts() method signature is modified to add the 'facetSortOrder' parameter value to the end of the argument list. DIFF UnInvertedField.java: - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix) throws IOException { + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int offset, int limit, Integer mincount, boolean missing, String sort, String prefix, String facetSortOrder) throws IOException { [line 556] The getCounts() method is modified to create an overridden BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter equals 'dsc'. DIFF UnInvertedField.java: - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize); + final BoundedTreeSetLong queue = (sort.equals(count) || sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new BoundedTreeSetLong(maxsize, new Comparator() { @Override public int compare(Object o1, Object o2) { if (o1 == null || o2 == null) return 0; int result = ((Long) o1).compareTo((Long) o2); return (result != 0 ? result 0 ? -1 : 1 : 0); //lowest number first sort }}) : new BoundedTreeSetLong(maxsize)) : null; SimpleFacets.java [line 221] A getFieldParam(field, facet.sortorder, asc); is added to retrieve the new parameter, if present. 'asc' used as a default value. DIFF SimpleFacets.java: + String facetSortOrder = params.getFieldParam(field, facet.sortorder, asc); [line 253] The call to uif.getCounts() in the getTermCounts() method is modified to pass the 'facetSortOrder' value string. DIFF SimpleFacets.java: - counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix); + counts = uif.getCounts(searcher, base, offset, limit, mincount,missing,sort,prefix, facetSortOrder); Implementation Notes: I have noted in testing that I was not able to retrieve any '0' counts as I had expected. I believe this could be because there appear to be some optimizations in SimpleFacets/count caching such that zero counts are not iterated (at least not by default) as a performance enhancement. I could be wrong about this, and zero counts may appear under some other as yet untested circumstances. Perhaps an expert familiar with this part of the code can clarify. In fact, this is not such a bad thing (at least for my requirements), as a whole bunch of zero counts is not necessarily useful (for my requirements, starting at '1' is just right). There may, however, be instances where someone *will* want zero counts - e.g. searching for zero product stock counts (e.g. 'what have we run out of'). I was envisioning the facet.mincount field being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 or possibly higher), but because of the caching/optimization, the behaviour is somewhat different than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613644#comment-13613644 ] Romain MERESSE commented on SOLR-3245: -- Any update on this issue? This problem is still present in 4.2 Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0-ALPHA Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field (above) that is only indexed - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of one document is 3-4 kB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on
[jira] [Commented] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277 ] Romain MERESSE commented on SOLR-3245: -- Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0-ALPHA Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} One document has 23 fields: - 14 text fields copy to one text field (above) that is only indexed - 8 other indexed fields (2 strings, 2 tdates, 3 tint, 1 tfloat) The size of one document is 3-4 kB. -- This message is automatically generated by JIRA. If you think it
[jira] [Comment Edited] (SOLR-3245) Poor performance of Hunspell with Polish Dictionary
[ https://issues.apache.org/jira/browse/SOLR-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481277#comment-13481277 ] Romain MERESSE edited comment on SOLR-3245 at 10/22/12 9:51 AM: Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Quite sad as this is a very important feature (stemming is poor with Snowball) was (Author: rohk): Same problem here, with French dictionary in Solr 3.6 With Hunspell : ~5 documents/s Without Hunspell : ~280 documents/s Someone got a solution ? ... Poor performance of Hunspell with Polish Dictionary --- Key: SOLR-3245 URL: https://issues.apache.org/jira/browse/SOLR-3245 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0-ALPHA Environment: Centos 6.2, kernel 2.6.32, 2 physical CPU Xeon 5606 (4 cores each), 32 GB RAM, 2 SSD disks in RAID 0, java version 1.6.0_26, java settings -server -Xms4096M -Xmx4096M Reporter: Agnieszka Labels: performance Attachments: pl_PL.zip In Solr 4.0 Hunspell stemmer with polish dictionary has poor performance whereas performance of hunspell from http://code.google.com/p/lucene-hunspell/ in solr 3.4 is very good. Tests shows: Solr 3.4, full import 489017 documents: StempelPolishStemFilterFactory - 2908 seconds, 168 docs/sec HunspellStemFilterFactory - 3922 seconds, 125 docs/sec Solr 4.0, full import 489017 documents: StempelPolishStemFilterFactory - 3016 seconds, 162 docs/sec HunspellStemFilterFactory - 44580 seconds (more than 12 hours), 11 docs/sec My schema is quit easy. For Hunspell I have one text field I copy 14 text fields to: {code:xml} field name=text type=text_pl_hunspell indexed=true stored=false multiValued=true/ copyField source=field1 dest=text/ copyField source=field14 dest=text/ {code} The text_pl_hunspell configuration: {code:xml} fieldType name=text_pl_hunspell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true !--filter class=solr.KeywordMarkerFilterFactory protected=protwords_pl.txt/-- /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.HunspellStemFilterFactory dictionary=dict/pl_PL.dic affix=dict/pl_PL.aff ignoreCase=true filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer /fieldType {code} I use Polish dictionary (files stopwords_pl.txt, protwords_pl.txt, synonyms_pl.txt are empy)- pl_PL.dic, pl_PL.aff. These are the same files I used in 3.4 version. For Polish Stemmer the diffrence is only in definion text field: {code} field name=text type=text_pl indexed=true stored=false multiValued=true/ fieldType name=text_pl class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=dict/protwords_pl.txt/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=dict/synonyms_pl.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=dict/stopwords_pl.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.StempelPolishStemFilterFactory/ filter class=solr.KeywordMarkerFilterFactory